Re: Bug in UTF8-Validation Code?
От | Mark Dilger |
---|---|
Тема | Re: Bug in UTF8-Validation Code? |
Дата | |
Msg-id | 46117E27.9030703@markdilger.com обсуждение исходный текст |
Ответ на | Re: Bug in UTF8-Validation Code? (Mark Dilger <pgsql@markdilger.com>) |
Список | pgsql-hackers |
Mark Dilger wrote: > Tom Lane wrote: >> Mark Dilger <pgsql@markdilger.com> writes: >>>> pgsql=# select chr(14989485); >>>> chr >>>> ----- >>>> ä¸ >>>> (1 row) >> >> Is there a principled rationale for this particular behavior as >> opposed to any other? >> >> In particular, in UTF8 land I'd have expected the argument of chr() >> to be interpreted as a Unicode code point, not as actual UTF8 bytes >> with a randomly-chosen endianness. >> >> Not sure what to do in other multibyte encodings. > > "Not sure what to do in other multibyte encodings" was pretty much my > rationale for this particular behavior. I standardized on network byte > order because there are only two endianesses to choose from, and the > other seems to be a more surprising choice. > > I looked around on the web for a standard for how to convert an integer > into a valid multibyte character and didn't find anything. Andrew, > Supernews has said upthread that chr() is clearly wrong and needs to be > fixed. If so, we need some clear definition what "fixed" means. > > Any suggestions? > > mark Another issue to consider when thinking about the corect definition of chr() is that ascii(chr(X)) = X. This gets weird if X is greater than 255. If nothing else, the name "ascii" is no longer appropriate. mark
В списке pgsql-hackers по дате отправления: