Re: invalidly encoded strings
От | Andrew Dunstan |
---|---|
Тема | Re: invalidly encoded strings |
Дата | |
Msg-id | 46E56C22.6090101@dunslane.net обсуждение исходный текст |
Ответ на | Re: invalidly encoded strings (Tatsuo Ishii <ishii@postgresql.org>) |
Ответы |
Re: invalidly encoded strings
Re: invalidly encoded strings |
Список | pgsql-hackers |
Tatsuo Ishii wrote: > > I don't understand whole discussion. > > Why do you think that employing the Unicode code point as the chr() > argument could avoid endianness issues? Are you going to represent > Unicode code point as UCS-4? Then you have to specify the endianness > anyway. (see the UCS-4 standard for more details) > The code point is simply a number. The result of chr() will be a text value one char (not one byte) wide, in the relevant database encoding. U+nnnn maps to the same Unicode char and hence the same UTF8 encoding pattern regardless of endianness. e.g. U+00a9 is the copyright symbol on all machines. So to get this char in a UTF8 database you could call "select chr(169)" and get back the byte pattern \xC2A9. > Or are you going to represent Unicode point as a character string such > as 'U+0259'? Then representing any encoding as a string could avoid > endianness issues anyway, and I don't see Unicode code point is any > better than others. > The argument will be a number, as now. > Also I'd like to point out all encodings has its own code point > systems as far as I know. For example, EUC-JP has its corresponding > code point systems, ASCII, JIS X 0208 and JIS X 0212. So I don't see > we can't use "code point" as chr()'s argument for othe encodings(of > course we need optional parameter specifying which character set is > supposed). > Where can I find the tables that map code points (as opposed to encodings) to characters for these others? cheers andrew
В списке pgsql-hackers по дате отправления: