Re: invalidly encoded strings
От | Tatsuo Ishii |
---|---|
Тема | Re: invalidly encoded strings |
Дата | |
Msg-id | 20070911.112750.70199461.t-ishii@sraoss.co.jp обсуждение исходный текст |
Ответ на | Re: invalidly encoded strings (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: invalidly encoded strings
Re: invalidly encoded strings Re: invalidly encoded strings Re: invalidly encoded strings |
Список | pgsql-hackers |
> Tatsuo Ishii <ishii@postgresql.org> writes: > > If you regard the unicode code point as simply a number, why not > > regard the multibyte characters as a number too? > > Because there's a standard specifying the Unicode code points *as > numbers*. The mapping from those numbers to UTF8 strings (and other > representations) is well-defined by the standard. > > > Also I'm wondering you what we should do with different > > backend/frontend encoding combo. > > Nothing. chr() has always worked with reference to the database > encoding, and we should keep it that way. Where is it documented? > BTW, it strikes me that there is another hole that we need to plug in > this area, and that's the convert() function. Being able to create > a value of type text that is not in the database encoding is simply > broken. Perhaps we could make it work on bytea instead (providing > a cast from text to bytea but not vice versa), or maybe we should just > forbid the whole thing if the database encoding isn't SQL_ASCII. Please don't do that. It will break an usefull use case of convert(). A user has a database encoded in UTF-8. He has English, French, Chinese and Japanese data in tables. To sort the tables in the language order, he will do like this: SELECT * FROM japanese_table ORDER BY convert(japanese_text using utf8_to_euc_jp); Without using convert(), he will get random order of data. This is because Kanji characters are in random order in UTF-8, while Kanji characters are reasonably ordered in EUC_JP. -- Tatsuo Ishii SRA OSS, Inc. Japan
В списке pgsql-hackers по дате отправления: