Re: insensitive collations
От | Jim Finnerty |
---|---|
Тема | Re: insensitive collations |
Дата | |
Msg-id | 1617475666203-0.post@n3.nabble.com обсуждение исходный текст |
Ответ на | Re: insensitive collations (Jim Finnerty <jfinnert@amazon.com>) |
Ответы |
Re: insensitive collations
|
Список | pgsql-hackers |
My previous post had a flaw, but fixing that flaw led me to what may be a bug? Using column COLLATE "C" ILIKE pattern almost does what you'd like, but the single-character wildcard is treated as a single byte with "C", and that won't match a multi-byte character. The fix, I thought, would be to use a deterministic CS_AS ICU collation, since we can handle the per-character advance correctly in that case (we think). Well, maybe not. It looks like single-character wildcards using a deterministic ICU collation match a single byte instead of a single character: This creates a deterministic case-sensitive, accent-sensitive collation in a utf8-encoded database: SET client_encoding = WIN1252; CREATE COLLATION CS_AS ( provider = icu, locale = 'utf8@colStrength=secondary;colCaseLevel=yes', deterministic = true); CREATE TABLE locations (location VARCHAR(255) COLLATE CS_AS); CREATE INDEX location_index ON locations (location); INSERT INTO locations VALUES ('Franche-Comté') , ('Midi-Pyrénées') , ('midi-Pyrénées') , ('midi-Pyrenées') , ('Brian Bruß') , ('Brian Bruss') , ('Steven Sossmix') , ('Provence-Alpes-Côte d Azur'); postgres=# SELECT * FROM locations WHERE location LIKE 'Franche-Comté'; location ---------------- Franche-Comté (1 row) postgres=# SELECT * FROM locations WHERE location LIKE 'Franche-Comt_'; -- is this a bug? location ---------- (0 rows) postgres=# SELECT * FROM locations WHERE location LIKE 'Franche-Comt__'; -- the wildcard is applied byte by byte instead of character by character, so the 2-byte accented character is matched only by 2 '_'s location ---------------- Franche-Comté (1 row) ----- Jim Finnerty, AWS, Amazon Aurora PostgreSQL -- Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
В списке pgsql-hackers по дате отправления: