Re: Bug in UTF8-Validation Code?
От | Tatsuo Ishii |
---|---|
Тема | Re: Bug in UTF8-Validation Code? |
Дата | |
Msg-id | 20070405.095614.95827390.t-ishii@sraoss.co.jp обсуждение исходный текст |
Ответ на | Re: Bug in UTF8-Validation Code? (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-hackers |
> Andrew - Supernews <andrew+nonews@supernews.com> writes: > > Thinking about this made me realize that there's another, ahem, elephant > > in the room here: convert(). > > By definition convert() returns text strings which are not valid in the > > server encoding. How can this be addressed? > > Remove convert(). Or at least redefine it as dealing in bytea not text. That would break some important use cases. 1) A user have UTF-8 database which contains various language data. Each language has its own table. He wants to sort aSELECT result by using ORDER BY. Since locale cannot handle multiple languages, he uses C locale and do the SELECT somethinglike this: SELECT * FROM french_table ORDER BY convert(t, 'LATIN1'); SELECT * FROM japanese_table ORDER BY convert(t, 'EUC_JP'); 2) A user has a UTF-8 database but unfortunately his OS's UTF-8 locale is broken. He decided to use C locale and want tosort the result from SELECT like this. SELECT * FROM japanese_table ORDER BY convert(t, 'EUC_JP'); Note that sorting by UTF-8 physical order would produce random results. So following would not help him in this case: SELECT * FROM japanese_table ORDER BY t; Also I don't understand what this is different to the problem when we have a message catalogue which does not match the encoding. -- Tatsuo Ishii SRA OSS, Inc. Japan
В списке pgsql-hackers по дате отправления: