Re: Bug in UTF8-Validation Code?
От | Mark Dilger |
---|---|
Тема | Re: Bug in UTF8-Validation Code? |
Дата | |
Msg-id | 4611814F.1070308@markdilger.com обсуждение исходный текст |
Ответ на | Re: Bug in UTF8-Validation Code? (Mark Dilger <pgsql@markdilger.com>) |
Ответы |
Re: Bug in UTF8-Validation Code?
|
Список | pgsql-hackers |
Mark Dilger wrote: > Tom Lane wrote: >> Mark Dilger <pgsql@markdilger.com> writes: >>>> pgsql=# select chr(14989485); >>>> chr >>>> ----- >>>> ä¸ >>>> (1 row) >> >> Is there a principled rationale for this particular behavior as >> opposed to any other? >> >> In particular, in UTF8 land I'd have expected the argument of chr() >> to be interpreted as a Unicode code point, not as actual UTF8 bytes >> with a randomly-chosen endianness. >> >> Not sure what to do in other multibyte encodings. > > "Not sure what to do in other multibyte encodings" was pretty much my > rationale for this particular behavior. I standardized on network byte > order because there are only two endianesses to choose from, and the > other seems to be a more surprising choice. > > I looked around on the web for a standard for how to convert an integer > into a valid multibyte character and didn't find anything. Andrew, > Supernews has said upthread that chr() is clearly wrong and needs to be > fixed. If so, we need some clear definition what "fixed" means. > > Any suggestions? > > mark Since chr() is defined in oracle_compat.c, I decided to look at what Oracle might do. See http://download-west.oracle.com/docs/cd/B10501_01/server.920/a96540/functions18a.htm It looks to me like they are doing the same thing that I did, though I don't have Oracle installed anywhere to verify that. Is there a difference? mark
В списке pgsql-hackers по дате отправления: