Re: Patch for collation using ICU
От | John Hansen |
---|---|
Тема | Re: Patch for collation using ICU |
Дата | |
Msg-id | 5066E5A966339E42AA04BA10BA706AE50A9317@rodrick.geeknet.com.au обсуждение исходный текст |
Ответ на | Patch for collation using ICU (Palle Girgensohn <girgen@pingpong.net>) |
Ответы |
Re: Patch for collation using ICU
|
Список | pgsql-hackers |
> -----Original Message----- > From: Tatsuo Ishii [mailto:t-ishii@sra.co.jp] > Sent: Sunday, May 08, 2005 11:08 PM > To: John Hansen > Cc: pgman@candle.pha.pa.us; girgen@pingpong.net; > pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] Patch for collation using ICU > > > > I don't buy it. If current conversion tables does the > right thing, > > > why we need to replace. Or if conversion tables are not > correct, why > > > don't you fix it? I think the rule of character > conversion will not > > > change frequently, especially for LATIN languages. Thus > maintaining > > > cost is not too high. > > > > I never said we need to, but if we're going to implement > ICU, then we > > might as well go all the way. > > So you admit there's no benefit using ICU for replacing > existing conversions? > > Besides ICU does not support all existing conversions, I > think ICU has serious flaw for using conversion. If I > understand correctly, ICU uses UNICODE internally to do the > conversion. For example, to implement > SJIS->EUC_JP conversion, ICU first converts SJIS to UNICODE then > converts UNICODE to EUC_JP. Problem is these conversion is > not roud trip(conversion between SJIS/EUC_JP and UNICODE will > lose some information). Thus SJIS->EUC_JP->SJIS conversion > using ICU does not preserve original text. Just for the record, I fetched a web page encoded in sjis, and converted it to euc-jp and back using uconv from ICU 3.2, and the result is the original is identical to the transformed file. uconv -f Shift_JIS -t EUC-JP -o index.html.euc index.htmluconv -f EUC-JP -t Shift_JIS -o index.html.sjis index.html.eucdiffindex.html index.html.sjis ... John
В списке pgsql-hackers по дате отправления: