Re: Bug in UTF8-Validation Code?

Поиск

Список

Период

Сортировка

От	Tatsuo Ishii
Тема	Re: Bug in UTF8-Validation Code?
Дата	4 апреля 2007 г. 21:55:27
Msg-id	20070405.095614.95827390.t-ishii@sraoss.co.jp обсуждение исходный текст
Ответ на	Re: Bug in UTF8-Validation Code? (Tom Lane <tgl@sss.pgh.pa.us>)
Список	pgsql-hackers

Дерево обсуждения

> Andrew - Supernews <andrew+nonews@supernews.com> writes:
> > Thinking about this made me realize that there's another, ahem, elephant
> > in the room here: convert().
> > By definition convert() returns text strings which are not valid in the
> > server encoding. How can this be addressed?
> 
> Remove convert().  Or at least redefine it as dealing in bytea not text.

That would break some important use cases. 

1) A user have UTF-8 database which contains various language  data. Each language has its own table. He wants to sort
aSELECT  result by using ORDER BY. Since locale cannot handle multiple  languages, he uses C locale and do the SELECT
somethinglike this:

  SELECT * FROM french_table ORDER BY convert(t, 'LATIN1');  SELECT * FROM japanese_table ORDER BY convert(t,
'EUC_JP');

2) A user has a UTF-8 database but unfortunately his OS's UTF-8 locale  is broken. He decided to use C locale and want
tosort the result  from SELECT like this.

  SELECT * FROM japanese_table ORDER BY convert(t, 'EUC_JP');
  Note that sorting by UTF-8 physical order would produce random  results. So following would not help him in this
case:
  SELECT * FROM japanese_table ORDER BY t;

Also I don't understand what this is different to the problem when we
have a message catalogue which does not match the encoding.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Bug in UTF8-Validation Code?