Re: 8.0, UTF8, and CLIENT_ENCODING
От | Hannes Dorbath |
---|---|
Тема | Re: 8.0, UTF8, and CLIENT_ENCODING |
Дата | |
Msg-id | 464CC60C.1040300@theendofthetunnel.de обсуждение исходный текст |
Ответ на | 8.0, UTF8, and CLIENT_ENCODING (Paul Ramsey <pramsey@refractions.net>) |
Список | pgsql-general |
Paul Ramsey wrote: > I have a small database (PgSQL 8.0, database encoding UTF8) that folks > are inserting into via a web form. The form itself is declared > ISO-8859-1 and the prior to inserting any data, pg_client_encoding is > set to LATIN1. > > Most of the high-bit characters are correctly translated from LATIN1 to > UTF8. So for e-accent-egu I see the two-byte UTF8 value in the database. > > Sometimes, in their wisdom, people cut'n'paste information out of MSWord > and put that in the form. Instead of being mapped to 2-byte UTF8 > high-bit equivalents, they are going into the database directly as > one-byte values > 127. That is, as illegal UTF8 values. > > When I try to dump'n'restore this database into PgSQL 8.2, my data can't > made the transit. > > Firstly, is this "kinda sorta" encoding handling expected in 8.0, or did > I do something wrong? > > Secondly, anyone know any useful tools to pipe a stream through to strip > out illegal UTF8 bytes, so I can pipe my dump through that rather than > hand editing it? This is know issue, use iconv -c -f UTF-8 -t UTF-8 -o cleanfile.sql dumpfile.sql to convert your dumps. I'm not sure if this is fixed in the 8.0 branch at all. -- Best regards, Hannes Dorbath
В списке pgsql-general по дате отправления: