Re: Built-in case-insensitive collation pg_unicode_ci
От | Laurenz Albe |
---|---|
Тема | Re: Built-in case-insensitive collation pg_unicode_ci |
Дата | |
Msg-id | f3b42d3ccef71f431f3c8ea436422f3b87867527.camel@cybertec.at обсуждение исходный текст |
Ответ на | Built-in case-insensitive collation pg_unicode_ci (Jeff Davis <pgsql@j-davis.com>) |
Список | pgsql-hackers |
On Fri, 2025-09-19 at 17:21 -0700, Jeff Davis wrote: > -------- > Proposal > -------- > > New builtin case-insensitive collation PG_UNICODE_CI, where the > ordering semantics are just: > > strcmp(CASEFOLD(arg1), CASEFOLD(arg2)) > > and the character semantics are the same as PG_UNICODE_FAST. I think that this is interesting. > ---------- > Motivation > ---------- > > Non-deterministic collations cannot be used by SIMILAR TO, and may > cause problems for ILIKE and regexes. The reason is that pattern > matching often depends on the character-by-character semantics, but ICU > collations aren't constrained enough for these semantics to work. See: > > However, PG_UNICODE_CI collation does have character-by-character > semantics which are well-defined for pattern matching. > > That takes us a step closer to allowing the database default collation > to be case-insensitive. What is still missing for that? Pattern matching? > ---------- > Versioning > ---------- > > Unlike other built-in collations, the order does depend on the version > of Unicode, so the collation is given a version equal to the version of > Unicode. (Other builtin collations have a version of "1".) > > That means that indexes, including primary keys, can become > inconsistent after a major version upgrade if the version of Unicode > has changed. The conditions where this can happen are much narrower > than with libc or ICU collations: > > (a) The database in the prior version must contain code points > unassigned as of that version; and > (b) Some of those previously-unassigned code points must be assigned > to a Cased character in the newer version. That's an improvement for people who are ready to perform a test upgrade and check if any indexes are corrupted - they will likely see that none are, so no index needs to be rebuilt. I tried your patch. It works as advertised, and I didn't manage to break it. Yours, Laurenz Albe
В списке pgsql-hackers по дате отправления: