On 2017/07/05 15:28, Michael Paquier wrote:
> I have finally been able to look at this patch.
Thanks for reviewing and the new version of the patch.
> (Surprised to see that generate_unaccent_rules.py is inconsistent on
> MacOS, runs fine on Linux).
>
> def get_plain_letter(codepoint, table):
> """Return the base codepoint without marks."""
> if is_letter_with_marks(codepoint, table):
> - return table[codepoint.combining_ids[0]]
> + if len(table[codepoint.combining_ids[0]].combining_ids) > 1:
> + # Recursive to find the plain letter
> + return get_plain_letter(table[codepoint.combining_ids[0]],table)
> + elif is_plain_letter(table[codepoint.combining_ids[0]]):
> + return table[codepoint.combining_ids[0]]
> + else:
> + return None
> elif is_plain_letter(codepoint):
> return codepoint
> else:
> - raise "mu"
> + return None
> The code paths returning None should not be reached, so I would
> suggest adding an assertion instead. Callers of get_plain_letter would
> blow up on None, still that would make future debugging harder.
>
> def is_letter_with_marks(codepoint, table):
> - """Returns true for plain letters combined with one or more marks."""
> + """Returns true for letters combined with one or more marks."""
> # See http://www.unicode.org/reports/tr44/tr44-14.html#General_Category_Values
> return len(codepoint.combining_ids) > 1 and \
> - is_plain_letter(table[codepoint.combining_ids[0]]) and \
> + (is_plain_letter(table[codepoint.combining_ids[0]]) or\
> + is_letter_with_marks(table[codepoint.combining_ids[0]],table))
> and \
> all(is_mark(table[i]) for i in codepoint.combining_ids[1:]
> This was already hard to follow, and this patch makes its harder. I
> think that the thing should be refactored with multiple conditions.
>
> if is_letter_with_marks(codepoint, table):
> - charactersSet.add((codepoint.id,
> + if get_plain_letter(codepoint, table) <> None:
> + charactersSet.add((codepoint.id,
> This change is not necessary as a letter with marks is not a plain
> character anyway.
>
> Testing with characters having two accents, the results are produced
> as wanted. I am attaching an updated patch with all those
> simplifications. Thoughts?
Thanks, so pretty. The patch is fine to me.
---
Thanks and best regards,
Dang Minh Huong
---
このEメールはアバスト アンチウイルスによりウイルススキャンされています。
https://www.avast.com/antivirus