Re: Differences in UTF8 between 8.0 and 8.1
От | Andrew - Supernews |
---|---|
Тема | Re: Differences in UTF8 between 8.0 and 8.1 |
Дата | |
Msg-id | slrndlor0s.g61.andrew+nonews@trinity.supernews.net обсуждение исходный текст |
Ответ на | Differences in UTF8 between 8.0 and 8.1 (Paul Lindner <lindner@inuus.com>) |
Ответы |
Re: Differences in UTF8 between 8.0 and 8.1
|
Список | pgsql-hackers |
On 2005-10-24, Paul Lindner <lindner@inuus.com> wrote: > Here's a cut and paste from emacs hexl-mode: > > 00000000: 3530 3833 6335 3038 330a 3c20 5641 4c55 5083c5083.< VALU > 00000010: 4553 2028 3230 3235 3533 2c20 27c1 f9d4 ES (202553, '... > 00000020: c2d0 c7d2 b927 2c20 0a2d 2d2d 0a3e 2056 .....', .---.> V > 00000030: 414c 5545 5320 2832 3032 3535 332c 2027 ALUES (202553, ' > 00000040: d2b9 272c 200a 3136 3939 3432 6331 3639 ..', .169942c169 > 00000050: 3934 320a 3c20 5641 4c55 4553 2028 3833 942.< VALUES (83 > 00000060: 3031 352c 2027 b7ed a8c6 a448 272c 200a 015, '.....H', . > 00000070: 2d2d 2d0a 3e20 5641 4c55 4553 2028 3833 ---.> VALUES (83 > 00000080: 3031 352c 2027 c6a4 4827 2c20 0a 015, '..H', . > > This is of a minimal diff between a UTF8 scrubbed file and the > original dump. > > It appears the offending bytes are: > > C1 F9 C2 D0 C7 I'm inclined to suspect that the whole sequence c1 f9 d4 c2 d0 c7 d2 b9 was never actually a valid utf-8 string, and that the d2 b9 is only valid by coincidence (it's a Cyrillic letter from Azerbaijani). I know the 8.0 utf-8 check was broken, but I didn't realize it was quite so bad. > and > > B7 ED A8 Likewise, that whole sequence b7 ed a8 c6 a4 was probably never valid; c6 a4 also isn't a character you'd expect to find in common use. My guess is that this was data in some non-utf-8 charset that managed to get past the defective checks in 8.0. -- Andrew, Supernews http://www.supernews.com - individual and corporate NNTP services
В списке pgsql-hackers по дате отправления: