Re: BUG #18362: unaccent rules and Old Greek text
От | Cees van Zeeland |
---|---|
Тема | Re: BUG #18362: unaccent rules and Old Greek text |
Дата | |
Msg-id | 63c65b3a-d142-409d-92ec-2a7d1df6f697@freedom.nl обсуждение исходный текст |
Ответ на | Re: BUG #18362: unaccent rules and Old Greek text (Thomas Munro <thomas.munro@gmail.com>) |
Список | pgsql-bugs |
Hi Thomas, I found: https://www.unicode.org/Public/15.1.0/ucd/CompositionExclusions.txt that might be useful to tackle characters that we are searching for. Hope this helps. Cees On 01/03/2024 02:53, Thomas Munro wrote: > On Tue, Feb 27, 2024 at 1:33 AM Cees van Zeeland > <cees.van.zeeland@freedom.nl> wrote: >> I'm not an expert, but obviously computers make a difference between the two versions of the characters. >> We are talking about this series: >> U+1F70 - U+1F7D: ὰ ά ὲ έ ὴ ή ὶ ί ὸ ό ὺ ύ ὼ ώ >> Is it possible to filter / limit in some way the redirection in the script to this range? > Right, so to get this in we either need to decide that we're OK with > adding that many characters, or figure out some systematic way to > select just the ones we want. One hint that might be helpful if > someone wants to investigate: I suspect that a lot of those mappings > might be marked with <font>, which seems to be for code points for > alternative renderings ("mathematical" bold, italic, fraktur etc), so > perhaps we could filter them out that way without losing the > oxia-marked characters if that's the way it has to be. > > I think all the relevant part of the character database file is described here: > > https://unicode.org/reports/tr44/#Property_Values > > The file we're currently using is 15.1: > > https://www.unicode.org/Public/15.1.0/ucd/UnicodeData.txt > > I registered this thread as https://commitfest.postgresql.org/47/4873/ .
В списке pgsql-bugs по дате отправления: