Re: BUG #18216: Unaccent function is unable to remove accents (diacritic signs) from Japanese character 'ド'
От | Peter Eisentraut |
---|---|
Тема | Re: BUG #18216: Unaccent function is unable to remove accents (diacritic signs) from Japanese character 'ド' |
Дата | |
Msg-id | 2c0389dc-a355-4de2-8a70-185b03a4b1e3@eisentraut.org обсуждение исходный текст |
Ответ на | BUG #18216: Unaccent function is unable to remove accents (diacritic signs) from Japanese character 'ド' (PG Bug reporting form <noreply@postgresql.org>) |
Список | pgsql-bugs |
On 28.11.23 08:15, PG Bug reporting form wrote: > PostgreSQL's unaccent module does not use Unicode normalisation, but only a > simple search-and-replace dictionary. The dictionary, unaccent.rules > (https://github.com/postgres/postgres/blob/master/contrib/unaccent/unaccent.rules) > , does not contain these Japanese characters, thus its unable to remove > the diacritic signs. Can someone please guide when we can expect these > Japanese characters will be added. > > Also tried to check with latest versions of Postgresql still the latest > version does not have support for the Japanese characters. > > https://pgpedia.info/u/unaccent.html As the subsequent discussion shows, it's not quite clear to everybody what the exact mandate of the unaccent extension is. Maybe we'll arrive at some conclusion. In the meantime, I suggest you also consider solving this with collations. You might find that those have a more principled approach to this problem, and they also have a lot of customization capabilities. The documentation contains examples of accent-insensitive collations (e.g., [0]). Maybe that will work for you, or serve as the basis for customization. [0]: https://www.postgresql.org/docs/current/collation.html#COLLATION-NONDETERMINISTIC
В списке pgsql-bugs по дате отправления: