Re: Unicode is not UTF-8. was :psqlODBC-Driver Test / text fields
От | Hiroshi Inoue |
---|---|
Тема | Re: Unicode is not UTF-8. was :psqlODBC-Driver Test / text fields |
Дата | |
Msg-id | 442C4F10.3090004@tpf.co.jp обсуждение исходный текст |
Ответ на | Unicode is not UTF-8. was :psqlODBC-Driver Test / text fields (Johann Zuschlag <zuschlag2@online.de>) |
Ответы |
Re: Unicode is not UTF-8. was :psqlODBC-Driver Test / text
|
Список | pgsql-odbc |
Johann Zuschlag wrote: > Dave Page schrieb: > >> If 'ö' is 'ö', then isn't the query above mixing single and a >> multibyte encoding? Ie. It should all be single byte - e.g. >> >> select name from kunde where name >= 'ö' order by name asc; >> >> Or all multibyte (displayed byte by byte) whatever that results in: >> >> s*e*l*e*c*t* *n*a*m*e* *f*r*o*m* *k*u*n*d*e* *w*h*e*r*e* *n*a*m*e* >> *>*=* *'*ö'*;* >> >> Of course, we all know how well I grok encoding issues :-) >> > > Hi Dave, > > I can understand you. This encoding issues drive me also crazy some > times. :-) > > The problem with UTF-8 is that all ASCII characters are represented by > one byte and all non ASCII characters, e.g. German Umlauts, are > represented by two bytes. That's why UTF-8 is called a > "variable-length multibyte encoding". In a pure Unicode world, e.g. > U+xxxx with two bytes, every character is represented by two bytes > (fixed-length multibyte encoding). So Unicode is not equal to UTF-8, > even though the PostgreSQL documentation is stating that. > > If you like, see: http://www.utf8-chartable.de/ or some explanation at > http://czyborra.com/utf/ > > Windows XP supports ANSI, UTF-8, Unicode and Unicode Big Endian. > Unfortunately (or fortunately?) Windows seems to use UTF-8 for > European languages. Hiroshi can you explain that? I guess the Japanese > edition of Windows XP is using pure 2 byte Unicode. Unicode ODBC drivers handle UCS-2 not UTF-8 even in European environemt. Unfortunately PostgreSQL doesn't handle UCS-2 directly(because it could contain NULL bytes in the string), the unicode driver sets the client_encoding to UTF-8 automatically and converts from UCS-2 data to UTF-8 data which the PostgreSQL backend can understands when sending queries. So what you can see in the backend log is UTF-8. Then the backend converts from UTF-8 data to the server encoding data. After all, the locale (especially LC_COLLATE) setting you need is the one which matches the backend encoding. > > I can't say anything about psql. But the new psqlodbc driver 7.03.26X > seems to handle that situation very well. > > So I suppose the test was valid to a certain extend, Yes thanks. I can't test the LATINxx encoding by myself. regards, Hiroshi Inoue
В списке pgsql-odbc по дате отправления: