Re: Unicode support
От | Greg Stark |
---|---|
Тема | Re: Unicode support |
Дата | |
Msg-id | 4136ffa0904131326u5ede7272yadd838cf7426b75a@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Unicode support (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: Unicode support
|
Список | pgsql-hackers |
On Mon, Apr 13, 2009 at 9:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Andrew Dunstan <andrew@dunslane.net> writes: >> This isn't about the number of bytes, but about whether or not we should >> count characters encoded as two or more combined code points as a single >> char or not. > > It's really about whether we should support non-canonical encodings. > AFAIK that's a hack to cope with implementations that are restricted > to UTF-16, and we should Just Say No. Clients that are sending these > things converted to UTF-8 are in violation of the standard. Is it really true trhat canonical encodings never contain any composed characters in them? I thought there were some glyphs which could only be represented by composed characters. Also, users can construct strings of unicode code points themselves in SQL using || or other text operators. That said, my impression is that composed character support is pretty thin on the ground elsewhere as well, but I don't have much first-hand experience. The original post seemed to be a contrived attempt to say "you should use ICU". If composed character support were a show-stopper and there was no other way to get it then it might be convincing, but I don't know that it is and I don't know that ICU is the only place to get it. And I'm sure it's not the only way to handle multiple encodings in a database. -- greg
В списке pgsql-hackers по дате отправления: