Re: unicode match normal forms
От | Daniel Verite |
---|---|
Тема | Re: unicode match normal forms |
Дата | |
Msg-id | 48e7eaab-9403-4d65-8581-cd1e55231d28@manitou-mail.org обсуждение исходный текст |
Ответ на | unicode match normal forms (hamann.w@t-online.de) |
Список | pgsql-general |
Hamann W wrote: > in unicode letter ä exists in two versions - linux and windows use a > composite whereas macos prefers > the decomposed form. Is there any way to make a semi-exact match that > accepts both variants? Aside from normalizing the strings into the same normal form before comparing, non-deterministic ICU collations will recognize them as identical (they're "canonically equivalent" in Unicode terms) For instance, CREATE COLLATION nd ( provider = 'icu', locale='', deterministic = false ); SELECT nfc_form, nfd_form, nfc_form = nfd_form COLLATE nd AS equal1, nfc_form = nfd_form COLLATE "C" AS equal2 -- or any deterministic collation FROM (VALUES (E'j\u00E4hrlich', E'j\u0061\u0308hrlich')) AS s(nfc_form, nfd_form); nfc_form | nfd_form | equal1 | equal2 ----------+----------+--------+-------- jährlich | jährlich | t | f (1 row) Normalizing is available as a built-in function since Postgres 13 and non-deterministic collations appeared in Postgres 12. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: https://www.manitou-mail.org Twitter: @DanielVerite
В списке pgsql-general по дате отправления: