Re: Differences in UTF8 between 8.0 and 8.1
От | Paul Lindner |
---|---|
Тема | Re: Differences in UTF8 between 8.0 and 8.1 |
Дата | |
Msg-id | 20051024034800.GD27646@inuus.com обсуждение исходный текст |
Ответ на | Re: Differences in UTF8 between 8.0 and 8.1 (Andrew - Supernews <andrew+nonews@supernews.com>) |
Ответы |
Re: Differences in UTF8 between 8.0 and 8.1
|
Список | pgsql-hackers |
On Sun, Oct 23, 2005 at 05:56:50AM -0000, Andrew - Supernews wrote: > On 2005-10-22, Paul Lindner <lindner@inuus.com> wrote: > > I've generated dumps using pg_dump from 8.0 and 8.1. Attempting to > > restore these results in > > > > Invalid UNICODE byte sequence detected near byte ... > > What were the exact offending bytes? Here's a cut and paste from emacs hexl-mode: 00000000: 3530 3833 6335 3038 330a 3c20 5641 4c55 5083c5083.< VALU 00000010: 4553 2028 3230 3235 3533 2c20 27c1 f9d4 ES (202553, '... 00000020: c2d0 c7d2 b927 2c20 0a2d 2d2d 0a3e 2056 .....', .---.> V 00000030: 414c 5545 5320 2832 3032 3535 332c 2027 ALUES (202553, ' 00000040: d2b9 272c 200a 3136 3939 3432 6331 3639 ..', .169942c169 00000050: 3934 320a 3c20 5641 4c55 4553 2028 3833 942.< VALUES (83 00000060: 3031 352c 2027 b7ed a8c6 a448 272c 200a 015, '.....H', . 00000070: 2d2d 2d0a 3e20 5641 4c55 4553 2028 3833 ---.> VALUES (83 00000080: 3031 352c 2027 c6a4 4827 2c20 0a 015, '..H', . This is of a minimal diff between a UTF8 scrubbed file and the original dump. It appears the offending bytes are: C1 F9 C2 D0 C7 and B7 ED A8 > > Question: > > > > Does the 8.1 Unicode sanity code accept the full set of characters > > accepted by the 8.0 Unicode sanity code? > > No. 8.0 and before accepted a lot of stuff that it should never have, and > failed to accept stuff that it should have. Thanks go out to John Hansen, he recommended to run the dump through iconv: iconv -c -f UTF8 -t UTF8 -o fixed.sql dump.sql This seems to strip out invalid UTF8 and will allow for a clean import. Someone should add this to the Release Notes/FAQ.. -- Paul Lindner ||||| | | | | | | | | | lindner@inuus.com
В списке pgsql-hackers по дате отправления: