Re: BUG #18362: unaccent rules and Old Greek text
От | Cees van Zeeland |
---|---|
Тема | Re: BUG #18362: unaccent rules and Old Greek text |
Дата | |
Msg-id | d9875ca6-6438-4c52-adc1-5bc2ed28c362@freedom.nl обсуждение исходный текст |
Ответ на | Re: BUG #18362: unaccent rules and Old Greek text (Michael Paquier <michael@paquier.xyz>) |
Ответы |
Re: BUG #18362: unaccent rules and Old Greek text
|
Список | pgsql-bugs |
Thomas Munro <thomas.munro@gmail.com> wrote:
> If I tell the script to follow such "simple" redirections, I
> get over a thousand new mappings, including those. See attached.
> There is probably more correct terminology that I'm using here...
> Unicode.data decomposes the characters with Oxia as characters with
> Tonos, and then characters with Tonos are decomposed with the "base"
> alphabet characters + Tonos. We do a recursive lookup at the unicode
> table in get_plain_letter() and is_letter_with_marks(), so it seems to
> me that we're not missing much, and I suspect that there should be no
> need for a new custom range of characters..
>
> Cees, perhaps you would like to get a shot at that?
>
> [1]: https://en.wikipedia.org/wiki/Greek_diacritics#Unicode
I'm not an expert, but obviously computers make a difference between the two versions of the characters.
We are talking about this series:
U+1F70 - U+1F7D: ὰ ά ὲ έ ὴ ή ὶ ί ὸ ό ὺ ύ ὼ ώ
Is it possible to filter / limit in some way the redirection in the script to this range?
~
Cees
> If I tell the script to follow such "simple" redirections, I
> get over a thousand new mappings, including those. See attached.
> There is probably more correct terminology that I'm using here...
Michael Paquier wrote:
> It seems to me that it is a bit more complicated than that, because> Unicode.data decomposes the characters with Oxia as characters with
> Tonos, and then characters with Tonos are decomposed with the "base"
> alphabet characters + Tonos. We do a recursive lookup at the unicode
> table in get_plain_letter() and is_letter_with_marks(), so it seems to
> me that we're not missing much, and I suspect that there should be no
> need for a new custom range of characters..
>
> Cees, perhaps you would like to get a shot at that?
>
> [1]: https://en.wikipedia.org/wiki/Greek_diacritics#Unicode
I'm not an expert, but obviously computers make a difference between the two versions of the characters.
We are talking about this series:
U+1F70 - U+1F7D: ὰ ά ὲ έ ὴ ή ὶ ί ὸ ό ὺ ύ ὼ ώ
Is it possible to filter / limit in some way the redirection in the script to this range?
~
Cees
В списке pgsql-bugs по дате отправления: