Re: BUG #18362: unaccent rules and Old Greek text
От | Thomas Munro |
---|---|
Тема | Re: BUG #18362: unaccent rules and Old Greek text |
Дата | |
Msg-id | CA+hUKGK7OKZcCpvD92RyJtu6m_b6XRuZRNqSu_5Y3vHDn7KDpA@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: BUG #18362: unaccent rules and Old Greek text (Cees van Zeeland <cees.van.zeeland@freedom.nl>) |
Ответы |
Re: BUG #18362: unaccent rules and Old Greek text
Re: BUG #18362: unaccent rules and Old Greek text |
Список | pgsql-bugs |
On Tue, Feb 27, 2024 at 1:33 AM Cees van Zeeland <cees.van.zeeland@freedom.nl> wrote: > I'm not an expert, but obviously computers make a difference between the two versions of the characters. > We are talking about this series: > U+1F70 - U+1F7D: ὰ ά ὲ έ ὴ ή ὶ ί ὸ ό ὺ ύ ὼ ώ > Is it possible to filter / limit in some way the redirection in the script to this range? Right, so to get this in we either need to decide that we're OK with adding that many characters, or figure out some systematic way to select just the ones we want. One hint that might be helpful if someone wants to investigate: I suspect that a lot of those mappings might be marked with <font>, which seems to be for code points for alternative renderings ("mathematical" bold, italic, fraktur etc), so perhaps we could filter them out that way without losing the oxia-marked characters if that's the way it has to be. I think all the relevant part of the character database file is described here: https://unicode.org/reports/tr44/#Property_Values The file we're currently using is 15.1: https://www.unicode.org/Public/15.1.0/ucd/UnicodeData.txt I registered this thread as https://commitfest.postgresql.org/47/4873/ .
В списке pgsql-bugs по дате отправления: