Re: [HACKERS] Extra Vietnamese unaccent rules
От | Kha Nguyen |
---|---|
Тема | Re: [HACKERS] Extra Vietnamese unaccent rules |
Дата | |
Msg-id | 262536FD-F41D-4776-9056-9FBA60DA61EA@gmail.com обсуждение исходный текст |
Ответ на | Re: [HACKERS] Extra Vietnamese unaccent rules (Thomas Munro <thomas.munro@enterprisedb.com>) |
Список | pgsql-hackers |
Could you explain to me what this line means: “ 1EA5;LATIN SMALL LETTER A WITH CIRCUMFLEX AND ACUTE;Ll;0;L;00E2 0301;;;;N;;;1EA4;;1EA4 “ If you could give me an example of adding a rule for “recursive” case, I can do the rest. I am not familiar with this unaccentformat generation yet. Thanks Kha > On 26 May 2017, at 21.19, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > > On Sat, May 27, 2017 at 5:13 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I wrote: >>> Nguyen Le Hoang Kha <nlhkha@gmail.com> writes: >>>> Most of the time in Vietnamese language, there are up to 2 accents in a >>>> character. These unaccent rules are added to handle such cases (which are >>>> very common). >> >>> I can't see any reason not to add these --- any objections out there? >> >> Oh, wait a minute. Patching unaccent.rules directly isn't the way >> to do this; that file is supposed to be generated by >> generate_unaccent_rules.py. Can you see how to modify that script >> to produce these rules? > > Looking at one example from this patch: > > UTF8: <E1><BA><A5> > Codepoint: 1EA5 > Name: LATIN SMALL LETTER A WITH CIRCUMFLEX AND ACUTE > > In UnicodData.txt it's this line: > > 1EA5;LATIN SMALL LETTER A WITH CIRCUMFLEX AND ACUTE;Ll;0;L;00E2 > 0301;;;;N;;;1EA4;;1EA4 > > The problem is that generate_unaccent_rules.py assumes that the > composing data is a plain letter followed by some number of > diacritical modifiers. That's true for the characters with a single > accent, but in this multi-accent case it's *composed* character 00E2 > (LATIN SMALL LETTER A WITH CIRCUMFLEX) and a diacritical marker 0301 > (COMBINING ACCENT ACUTE). So we need to teach it to be recursive. > > -- > Thomas Munro > http://www.enterprisedb.com
В списке pgsql-hackers по дате отправления: