Re: [HACKERS] Extra Vietnamese unaccent rules
От | Dang Minh Huong |
---|---|
Тема | Re: [HACKERS] Extra Vietnamese unaccent rules |
Дата | |
Msg-id | 7a813796-80c5-aa93-8772-bbddf2f6a10f@gmail.com обсуждение исходный текст |
Ответ на | Re: [HACKERS] Extra Vietnamese unaccent rules (Michael Paquier <michael.paquier@gmail.com>) |
Ответы |
Re: [HACKERS] Extra Vietnamese unaccent rules
|
Список | pgsql-hackers |
On 2017/07/05 15:28, Michael Paquier wrote: > I have finally been able to look at this patch. Thanks for reviewing and the new version of the patch. > (Surprised to see that generate_unaccent_rules.py is inconsistent on > MacOS, runs fine on Linux). > > def get_plain_letter(codepoint, table): > """Return the base codepoint without marks.""" > if is_letter_with_marks(codepoint, table): > - return table[codepoint.combining_ids[0]] > + if len(table[codepoint.combining_ids[0]].combining_ids) > 1: > + # Recursive to find the plain letter > + return get_plain_letter(table[codepoint.combining_ids[0]],table) > + elif is_plain_letter(table[codepoint.combining_ids[0]]): > + return table[codepoint.combining_ids[0]] > + else: > + return None > elif is_plain_letter(codepoint): > return codepoint > else: > - raise "mu" > + return None > The code paths returning None should not be reached, so I would > suggest adding an assertion instead. Callers of get_plain_letter would > blow up on None, still that would make future debugging harder. > > def is_letter_with_marks(codepoint, table): > - """Returns true for plain letters combined with one or more marks.""" > + """Returns true for letters combined with one or more marks.""" > # See http://www.unicode.org/reports/tr44/tr44-14.html#General_Category_Values > return len(codepoint.combining_ids) > 1 and \ > - is_plain_letter(table[codepoint.combining_ids[0]]) and \ > + (is_plain_letter(table[codepoint.combining_ids[0]]) or\ > + is_letter_with_marks(table[codepoint.combining_ids[0]],table)) > and \ > all(is_mark(table[i]) for i in codepoint.combining_ids[1:] > This was already hard to follow, and this patch makes its harder. I > think that the thing should be refactored with multiple conditions. > > if is_letter_with_marks(codepoint, table): > - charactersSet.add((codepoint.id, > + if get_plain_letter(codepoint, table) <> None: > + charactersSet.add((codepoint.id, > This change is not necessary as a letter with marks is not a plain > character anyway. > > Testing with characters having two accents, the results are produced > as wanted. I am attaching an updated patch with all those > simplifications. Thoughts? Thanks, so pretty. The patch is fine to me. --- Thanks and best regards, Dang Minh Huong --- このEメールはアバスト アンチウイルスによりウイルススキャンされています。 https://www.avast.com/antivirus
В списке pgsql-hackers по дате отправления: