Re: Unicode is not UTF-8. was :psqlODBC-Driver Test / text
От | Johann Zuschlag |
---|---|
Тема | Re: Unicode is not UTF-8. was :psqlODBC-Driver Test / text |
Дата | |
Msg-id | 442D5DFB.4080501@online.de обсуждение исходный текст |
Ответ на | Re: Unicode is not UTF-8. was :psqlODBC-Driver Test / text fields (Hiroshi Inoue <inoue@tpf.co.jp>) |
Ответы |
Re: Unicode is not UTF-8. was :psqlODBC-Driver Test / text
|
Список | pgsql-odbc |
Hiroshi Inoue schrieb: > > Unicode ODBC drivers handle UCS-2 not UTF-8 even in European > environemt. Unfortunately PostgreSQL doesn't handle UCS-2 > directly(because it could contain NULL bytes in the string), the > unicode driver sets the client_encoding to UTF-8 automatically and > converts from UCS-2 data to UTF-8 data which the PostgreSQL backend > can understands when sending queries. So what you > can see in the backend log is UTF-8. Then the backend converts from > UTF-8 data to the server encoding data. After all, the locale > (especially LC_COLLATE) setting you need is the one which matches the > backend encoding. > Hmm..., so Windows XP uses UCS-2 or do be more correct (like Bart mentioned) UTF-16 (which is nearly the same, except for the surrogates). That is converted to UTF-8, sent to the backend and then converted to the proper locale and stored. I've read about the problems with the NULL bytes on Unix machines. Let's have two examples: 1. backend-1 = ISO8859-1 backend-2 = UTF-8 'A' = U+0041 (does windows use big-endian?) Win UCS-2: U+0041 ODBC UTF-8: U+41 backend-1 stores = 0x41 backend-2 stores = U+41 2. 'Ä' = U+00C4 (german A-Umlaut) Win UCS-2: U+00C4 ODBC UTF-8: U+C384 backend-1 stores = 0xC4 backend-2 stores = U+C384 Did I get that right? So I have to be really careful when testing. Regards, Johann
В списке pgsql-odbc по дате отправления: