Re: Upgrading locale issues
От | rihad |
---|---|
Тема | Re: Upgrading locale issues |
Дата | |
Msg-id | 352ac37a-39e6-5afe-d822-b4e031b9b314@mail.ru обсуждение исходный текст |
Ответ на | Re: Upgrading locale issues ("Daniel Verite" <daniel@manitou-mail.org>) |
Ответы |
Re: Upgrading locale issues
|
Список | pgsql-general |
On 05/02/2019 05:36 PM, Daniel Verite wrote: > rihad wrote: > >> Thanks for the reply. Do you know what would a "decent" ICU collation be >> to bind to a field's schema definition so it would mimic a UTF-8 >> encoding for a multilingual column? Maybe und-x-icu? We aren't as much >> concerned about their sortability in most cases, we just want indexes to >> better handle future PG/ICU upgrades. But what does und(efined) even >> mean with respect to collations? > "undefined" in this context means unspecified language and > unspecified country or region. It implies that no language-specific > nor regional rule will be applied to compare strings. > > Using C.UTF-8 as the collation for text fields to index may be the > best trade-off in your case. It should be immune to libc and ICU > upgrades. > > With C.UTF-8, a string like 'BC' will sort before 'ab', and punctuation > and accents will also sort differently than with a linguistic-aware > collation. Thanks, I'm a bit confused here. AFAIK indexes are used for at least two things: for speed and for skipping the ORDER BY step (since btree indexes are already sorted). Will such an "upgrade-immune" C.UTF-8 index still work correctly for table lookups? And can the existing en_US.UTF-8 fields' definition be altered in place, without a dump+restore? en_US.UTF-8 is the default encoding+locale+collation, it isn't set explicitly for any of our string columns. I assume there's some "catch-all" ordering taking place even for the C locale, so there won't be any bizarre things like b coming before a, or generally for any language, the second letter of its alphabet coming before the first? > If your applications care about that, it can be fixed by simply > adding COLLATE "default" to the ORDER BY clause of the queries that > are meant to present data to users. > COLLATE "default" means the collation of the database, which > presumably would be something like "language_REGION.UTF-8" in your > case. If you never specified it explicitly, it came from initdb which > itself got it from the environment of the server. > > > Best regards,
В списке pgsql-general по дате отправления: