Re: Bug in UTF8-Validation Code?
От | Mark Dilger |
---|---|
Тема | Re: Bug in UTF8-Validation Code? |
Дата | |
Msg-id | 46117D6D.7050705@markdilger.com обсуждение исходный текст |
Ответ на | Re: Bug in UTF8-Validation Code? (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: Bug in UTF8-Validation Code?
Re: Bug in UTF8-Validation Code? |
Список | pgsql-hackers |
Tom Lane wrote: > Mark Dilger <pgsql@markdilger.com> writes: >>> pgsql=# select chr(14989485); >>> chr >>> ----- >>> ä¸ >>> (1 row) > > Is there a principled rationale for this particular behavior as > opposed to any other? > > In particular, in UTF8 land I'd have expected the argument of chr() > to be interpreted as a Unicode code point, not as actual UTF8 bytes > with a randomly-chosen endianness. > > Not sure what to do in other multibyte encodings. "Not sure what to do in other multibyte encodings" was pretty much my rationale for this particular behavior. I standardized on network byte order because there are only two endianesses to choose from, and the other seems to be a more surprising choice. I looked around on the web for a standard for how to convert an integer into a valid multibyte character and didn't find anything. Andrew, Supernews has said upthread that chr() is clearly wrong and needs to be fixed. If so, we need some clear definition what "fixed" means. Any suggestions? mark
В списке pgsql-hackers по дате отправления: