Re: BUG #18362: unaccent rules and Old Greek text

Поиск
Список
Период
Сортировка
От Cees van Zeeland
Тема Re: BUG #18362: unaccent rules and Old Greek text
Дата
Msg-id d9875ca6-6438-4c52-adc1-5bc2ed28c362@freedom.nl
обсуждение исходный текст
Ответ на Re: BUG #18362: unaccent rules and Old Greek text  (Michael Paquier <michael@paquier.xyz>)
Ответы Re: BUG #18362: unaccent rules and Old Greek text  (Thomas Munro <thomas.munro@gmail.com>)
Список pgsql-bugs
Thomas Munro <thomas.munro@gmail.com> wrote:
>    If I tell the script to follow such "simple" redirections, I
>   get over a thousand new mappings, including those.  See attached.
>   There is probably more correct terminology that I'm using here...

Michael Paquier wrote:
> It seems to me that it is a bit more complicated than that, because
> Unicode.data decomposes the characters with Oxia as characters with
> Tonos, and then characters with Tonos are decomposed with the "base"
> alphabet characters + Tonos.  We do a recursive lookup at the unicode
> table in get_plain_letter() and is_letter_with_marks(), so it seems to
> me that we're not missing much, and I suspect that there should be no
> need for a new custom range of characters..
>
> Cees, perhaps you would like to get a shot at that?
>
> [1]: https://en.wikipedia.org/wiki/Greek_diacritics#Unicode

I'm not an expert, but obviously computers make a difference between the two versions of the characters.
We are talking about this series:
U+1F70 - U+1F7D:    ὰ     ά     ὲ     έ     ὴ     ή     ὶ     ί     ὸ     ό     ὺ     ύ     ὼ     ώ        
Is it possible to filter / limit in some way the redirection in the script to this range?

~
Cees



В списке pgsql-bugs по дате отправления:

Предыдущее
От: PG Bug reporting form
Дата:
Сообщение: BUG #18364: psql execution error: Segmentation fault
Следующее
От: PG Bug reporting form
Дата:
Сообщение: BUG #18365: Inconsistent cost function between materialized and non-materialized CTE