Re: Radix tree for character conversion
От | Kyotaro HORIGUCHI |
---|---|
Тема | Re: Radix tree for character conversion |
Дата | |
Msg-id | 20161021.173321.105120238.horiguchi.kyotaro@lab.ntt.co.jp обсуждение исходный текст |
Ответ на | Re: Radix tree for character conversion (Heikki Linnakangas <hlinnaka@iki.fi>) |
Ответы |
Re: Radix tree for character conversion
|
Список | pgsql-hackers |
Hello, this is new version of radix charconv. At Sat, 8 Oct 2016 00:37:28 +0300, Heikki Linnakangas <hlinnaka@iki.fi> wrote in <6d85d710-9554-a928-29ff-b2d3b80b01c9@iki.fi> > What I don't want is that the current *.map files are turned into the > authoritative source files, that we modify by hand. There are no > comments in them, for starters, which makes hand-editing > cumbersome. It seems that we have edited some of them by hand already, > but we should rectify that. Agreed. So, I identifed source files of each character for EUC_JP and SJIS conversions to clarify what has been done on them. SJIS conversion is made from CP932.TXT and 8 additional conversions for UTF8->SJIS and none for SJIS->UTF8. EUC_JP is made from CP932.TXT and JIS0212.TXT. JIS0201.TXT and JIS0208.TXT are useless. It adds 83 or 86 (different by direction) conversion entries. http://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT http://unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/JIS/JIS0212.TXT Now the generator scripts don't use *.map as source and in turn generates old-style map files as well as radix tree files. For convenience, UCS_to_(SJIS|EUC_JP).pl takes parater --flat and -v. The format generates the old-style flat map as well as radix map file and additional -v adds source description for each line in the flat map file. During working on this, EUC_JP map lacks some conversions but it is another issue. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
В списке pgsql-hackers по дате отправления: