Re: BUG #18362: unaccent rules and Old Greek text
От | Michael Paquier |
---|---|
Тема | Re: BUG #18362: unaccent rules and Old Greek text |
Дата | |
Msg-id | ZdvMcEkMYoMqELiG@paquier.xyz обсуждение исходный текст |
Ответ на | Re: BUG #18362: unaccent rules and Old Greek text (Thomas Munro <thomas.munro@gmail.com>) |
Ответы |
Re: BUG #18362: unaccent rules and Old Greek text
|
Список | pgsql-bugs |
On Mon, Feb 26, 2024 at 12:15:57PM +1300, Thomas Munro wrote: > The Python script is looking for combining sequences that add accents, > but this one has just "03AC" in the combining sequence field, so it's > a kind of "simple" redirection that points here: > > 03AC;GREEK SMALL LETTER ALPHA WITH TONOS;Ll;0;L;03B1 0301;;;;N;GREEK > SMALL LETTER ALPHA TONOS;;0386;;0386 > > That has a normal looking sequence that we can understand (α + an > accent). If I tell the script to follow such "simple" redirections, I > get over a thousand new mappings, including those. See attached. > There is probably more correct terminology that I'm using here... Ah, you've beaten me to it. Yes, that's pretty much the impression I was getting while looking at the set of characters in Unicode.txt. I am not entirely sure if what you are doing is the best way to do it, but the set of characters generated in unaccent.rules makes sense here. I am surprised to see that many, TBH. Perhaps you should add a few characters of these series to unaccent.sql? -- Michael
Вложения
В списке pgsql-bugs по дате отправления: