Re: BUG #15548: Unaccent does not remove combining diacriticalcharacters
От | Daniel Verite |
---|---|
Тема | Re: BUG #15548: Unaccent does not remove combining diacriticalcharacters |
Дата | |
Msg-id | 5d77cc08-d582-4f83-a17f-f2c992d123a9@manitou-mail.org обсуждение исходный текст |
Ответ на | Re: BUG #15548: Unaccent does not remove combining diacritical characters (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: BUG #15548: Unaccent does not remove combining diacritical characters
|
Список | pgsql-bugs |
Tom Lane wrote: > Hm, I thought the OP's proposal was just to make unaccent drop > combining diacriticals independently of context, which'd avoid the > combinatorial-growth problem. In that case, this could be achieved by simply appending the diacriticals themselves to unaccent.rules, since replacement of a string by an empty string is already supported as a rule. It doesn't seem like the current file has any of these, but from https://www.postgresql.org/docs/11/unaccent.html : "Alternatively, if only one character is given on a line, instances of that character are deleted; this is useful in languages where accents are represented by separate characters" Incidentally we may want to improve this bit of doc to mention explicitly the Unicode decomposed forms as a use case for removing characters. In fact I wonder if that's not what it's already trying to express, but confusing "languages" with "forms". Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
В списке pgsql-bugs по дате отправления: