Re: Corrupted subjects on the archive website
От | Stefan Kaltenbrunner |
---|---|
Тема | Re: Corrupted subjects on the archive website |
Дата | |
Msg-id | 5602E1C8.4030706@kaltenbrunner.cc обсуждение исходный текст |
Ответ на | Corrupted subjects on the archive website (Thomas Munro <thomas.munro@enterprisedb.com>) |
Ответы |
Re: Corrupted subjects on the archive website
|
Список | pgsql-www |
On 09/23/2015 06:59 AM, Thomas Munro wrote: > Hi > > Why do some message display with corrupted subjects on the mailing > list archives site? The replies to the message below, but not the > message itself, are displayed with a corrupted subject. They appear > fine in my mail client though. > > http://www.postgresql.org/message-id/20150922134404.5050.75087@wrigleys.postgresql.org > > The website shows "Re: [BUGS] BUG #13632: violation de l'intégrité rQ1|ɕѥ". > My mail client shows "Re: [BUGS] BUG #13632: violation de l'intégrité > référentielle". > > The original message that displays correctly has the following raw header: > > Subject: BUG #13632: violation de l'intégrité ré > férentielle > > The reply that doesn't display correctly has the following raw header: > > Subject: Re: [BUGS] BUG #13632: violation de l'intégrité r > éférentielle > > A wise denizen of #postgresql pointed out that 'UTF-8' decoded as > base64 produces 'Q1\377' of which we see at least the 'Q1' in the > corrupted string. I looked a bit at the code and did some testing - the difference between the original mail (which is stored and displayed correctly in the archives database) and the two replys that have it corrupted is how the line wrapping for the Subject is done(basically linebreak + space in the first version and linebreak+tab in the broken one). We use decode_header() from the python email package to parse headers and it is actually capable of correctly decoding both variants. However there is a special hack in our importer code citing http://bugs.python.org/issue504152 that removes \n\t unconditionally from the raw string. I dont know the details of why that was put in originally but that surely must be wrong in general because it removes the required seperation between different header words through a linear whitespace per RFC2047(because in this case it leaves no seperation at all causing header_decode() to go haywire). I think it was magnus who put that special case in so maybe he can shed some light on the issue this change was targeted at? Stefan
В списке pgsql-www по дате отправления: