Re: insensitive collations
От | Daniel Verite |
---|---|
Тема | Re: insensitive collations |
Дата | |
Msg-id | 54ef5bfe-043e-4328-9d70-5818789838f4@manitou-mail.org обсуждение исходный текст |
Ответ на | Re: insensitive collations (Jim Finnerty <jfinnert@amazon.com>) |
Ответы |
Re: insensitive collations
|
Список | pgsql-hackers |
Jim Finnerty wrote: > For a UTF8 encoded, case-insensitive (nondeterministic), accent-sensitive > ICU > collation, a LIKE predicate can be used with a small transformation of the > predicate, and the pattern can contain multi-byte characters: > > from: > > SELECT * FROM locations WHERE location LIKE 'midi-Pyrené%'; > -- ERROR: nondeterministic collations are not supported for LIKE > > to: > > SELECT * FROM locations WHERE lower(location) COLLATE "C" LIKE > lower('midi-Pyrené%'); For prefix matching, there's a simpler way with non-deterministic collations based on the advice in [1] The trick is that if an ICU collation is assigned to "location", whether it's deterministic or not, SELECT * FROM locations WHERE location LIKE 'midi-Pyrené%'; is equivalent to: SELECT * FROM locations WHERE location BETWEEN 'midi-Pyrené' AND 'midi-Pyrené' || E'\uFFFF'; and that will use a btree index if available. Also, it works with all features of ND-collations and all encodings, not just case-insensitiveness and UTF-8. Now that doesn't solve LIKE '%midi-Pyrené%', or LIKE '%midi_Pyrené%', but that trick could be a building block for an algorithm implementing LIKE with ND-collations in the future. [1] https://unicode-org.github.io/icu/userguide/collation/architecture.html#generating-bounds-for-a-sort-key-prefix-matching Best regards, -- Daniel Vérité PostgreSQL-powered mailer: https://www.manitou-mail.org Twitter: @DanielVerite
В списке pgsql-hackers по дате отправления: