Re: Radix tree for character conversion
От | Heikki Linnakangas |
---|---|
Тема | Re: Radix tree for character conversion |
Дата | |
Msg-id | af224134-80dc-b18e-54f8-d45504754fc0@iki.fi обсуждение исходный текст |
Ответ на | Radix tree for character conversion (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>) |
Ответы |
Re: Radix tree for character conversion
|
Список | pgsql-hackers |
On 10/07/2016 11:36 AM, Kyotaro HORIGUCHI wrote: > The radix conversion function and map conversion script became > more generic than the previous state. So I could easily added > radix conversion of EUC_JP in addition to SjiftJIS. > > nm -S said that the size of radix tree data for sjis->utf8 > conversion is 34kB and that for utf8->sjis is 46kB. (eucjp->utf8 > 57kB, utf8->eucjp 93kB) LUmapSJIS and ULmapSJIS was 62kB and > 59kB, and LUmapEUC_JP and ULmapEUC_JP was 106kB and 105kB. If I'm > not missing something, radix tree is faster and require less > memory. Cool! > Currently the tree structure is devided into several elements, > One for 2-byte, other ones for 3-byte and 4-byte codes and output > table. The other than the last one is logically and technically > merged into single table but it makes the generator script far > complex than the current complexity. I no longer want to play > hide'n seek with complex perl object.. I think that's OK. There isn't really anything to gain by merging them. > It might be better that combining this as a native feature of the > core. Currently the helper function is in core but that function > is given as conv_func on calling LocalToUtf. Yeah, I think we want to completely replace the current binary-search based code with this. I would rather maintain just one mechanism. > Current implement uses *.map files of pg_utf_to_local as > input. It seems not good but the radix tree files is completely > uneditable. Provide custom made loading functions for every > source instead of load_chartable() would be the way to go. > > # However, for example utf8_to_sjis.map, it doesn't seem to have > # generated from the source mentioned in UCS_to_SJIS.pl Ouch. We should find and document an authoritative source for all the mappings we have... I think the next steps here are: 1. Find an authoritative source for all the existing mappings. 2. Generate the radix tree files directly from the authoritative sources, instead of the existing *.map files. 3. Completely replace the existing binary-search code with this. - Heikki
В списке pgsql-hackers по дате отправления: