Unicode is not UTF-8. was :psqlODBC-Driver Test / text fields
От | Johann Zuschlag |
---|---|
Тема | Unicode is not UTF-8. was :psqlODBC-Driver Test / text fields |
Дата | |
Msg-id | 442C3452.5020704@online.de обсуждение исходный текст |
Ответ на | Re: psqlODBC-Driver Test / text fields ("Dave Page" <dpage@vale-housing.co.uk>) |
Ответы |
Re: Unicode is not UTF-8. was :psqlODBC-Driver Test / text fields
Re: Unicode is not UTF-8. was :psqlODBC-Driver Test / text |
Список | pgsql-odbc |
Dave Page schrieb: > If 'ö' is 'ö', then isn't the query above mixing single and a multibyte encoding? Ie. It should all be single byte - e.g. > > select name from kunde where name >= 'ö' order by name asc; > > Or all multibyte (displayed byte by byte) whatever that results in: > > s*e*l*e*c*t* *n*a*m*e* *f*r*o*m* *k*u*n*d*e* *w*h*e*r*e* *n*a*m*e* *>*=* *'*ö'*;* > > Of course, we all know how well I grok encoding issues :-) > Hi Dave, I can understand you. This encoding issues drive me also crazy some times. :-) The problem with UTF-8 is that all ASCII characters are represented by one byte and all non ASCII characters, e.g. German Umlauts, are represented by two bytes. That's why UTF-8 is called a "variable-length multibyte encoding". In a pure Unicode world, e.g. U+xxxx with two bytes, every character is represented by two bytes (fixed-length multibyte encoding). So Unicode is not equal to UTF-8, even though the PostgreSQL documentation is stating that. If you like, see: http://www.utf8-chartable.de/ or some explanation at http://czyborra.com/utf/ Windows XP supports ANSI, UTF-8, Unicode and Unicode Big Endian. Unfortunately (or fortunately?) Windows seems to use UTF-8 for European languages. Hiroshi can you explain that? I guess the Japanese edition of Windows XP is using pure 2 byte Unicode. I can't say anything about psql. But the new psqlodbc driver 7.03.26X seems to handle that situation very well. So I suppose the test was valid to a certain extend, since the characters are handled in this mixed way in Win XP. I still have some funny behaviour with Unicode in psql (even after setting LC_COLLATE correctly :-) ). For my production machines I will anyway use ISO-8859-1 (or ISO-8859-15). Then the driver will convert all characters to single byte avoiding all kind of problems. But feel free to ask me for tests... ;-) Regards, Johann
В списке pgsql-odbc по дате отправления: