Re: Differences in UTF8 between 8.0 and 8.1
От | Andrew - Supernews |
---|---|
Тема | Re: Differences in UTF8 between 8.0 and 8.1 |
Дата | |
Msg-id | slrndm1g2i.g61.andrew+nonews@trinity.supernews.net обсуждение исходный текст |
Ответ на | Differences in UTF8 between 8.0 and 8.1 (Paul Lindner <lindner@inuus.com>) |
Список | pgsql-hackers |
On 2005-10-27, Paul Lindner <lindner@inuus.com> wrote: > On Mon, Oct 24, 2005 at 05:07:40AM -0000, Andrew - Supernews wrote: >> I'm inclined to suspect that the whole sequence c1 f9 d4 c2 d0 c7 d2 b9 >> was never actually a valid utf-8 string, and that the d2 b9 is only valid >> by coincidence (it's a Cyrillic letter from Azerbaijani). I know the 8.0 >> utf-8 check was broken, but I didn't realize it was quite so bad. > > Looking at the data it appears that it is a sequence of latin1 > characters. They all have the eighth bit set and all seem to pass the > check. In latin1 it comes out as total gibberish, so I think you'll find it is actually in something else. Some googling suggests it is most likely in a Chinese double-byte charset (GB2312). -- Andrew, Supernews http://www.supernews.com - individual and corporate NNTP services
В списке pgsql-hackers по дате отправления: