Re: BUG #18362: unaccent rules and Old Greek text
От | Michael Paquier |
---|---|
Тема | Re: BUG #18362: unaccent rules and Old Greek text |
Дата | |
Msg-id | ZdvLGeJ1BsXRkrdQ@paquier.xyz обсуждение исходный текст |
Ответ на | Re: BUG #18362: unaccent rules and Old Greek text (Thomas Munro <thomas.munro@gmail.com>) |
Ответы |
Re: BUG #18362: unaccent rules and Old Greek text
|
Список | pgsql-bugs |
On Sun, Feb 25, 2024 at 04:21:36PM +1300, Thomas Munro wrote: > On Sun, Feb 25, 2024 at 11:14 AM PG Bug reporting form > <noreply@postgresql.org> wrote: >> So, there are reasons to keep the current unaccent.rules as it is, but... >> there are other reasons to add a few lines to it, f.e. after line 955 and >> insert five greek vowels with Oxia >> Please add: >> ά α >> έ ε >> ή η >> ί ι >> ό ο >> ύ υ >> ώ ω Correct me if I'm wrong of course, but reading a bit on the matter at [1], letters with Tonos or Oxia are actually equivalent since 1986, and we only include character with Tonos in our unaccent.rules. > We don't exactly maintain this list manually, we extract it from > Unicode source data. Can you see what needs to be adjusted in here to > achieve that goal? See commits like e3dd7c06e627 or 59f47fb98dab for some references. Unfortunately, we've been using as policy to not backpatch any changes to the in-core rules file, and you can plug in your own file. Saying that, these additions sound like a natural addition seen from here. > Perhaps a new range or something like that? It seems to me that it is a bit more complicated than that, because Unicode.data decomposes the characters with Oxia as characters with Tonos, and then characters with Tonos are decomposed with the "base" alphabet characters + Tonos. We do a recursive lookup at the unicode table in get_plain_letter() and is_letter_with_marks(), so it seems to me that we're not missing much, and I suspect that there should be no need for a new custom range of characters.. Cees, perhaps you would like to get a shot at that? [1]: https://en.wikipedia.org/wiki/Greek_diacritics#Unicode -- Michael
Вложения
В списке pgsql-bugs по дате отправления: