Re: Upgrading locale issues

Поиск
Список
Период
Сортировка
От rihad
Тема Re: Upgrading locale issues
Дата
Msg-id 352ac37a-39e6-5afe-d822-b4e031b9b314@mail.ru
обсуждение исходный текст
Ответ на Re: Upgrading locale issues  ("Daniel Verite" <daniel@manitou-mail.org>)
Ответы Re: Upgrading locale issues
Список pgsql-general
On 05/02/2019 05:36 PM, Daniel Verite wrote:
>     rihad wrote:
>
>> Thanks for the reply. Do you know what would a "decent" ICU collation be
>> to bind to a field's schema definition so it would mimic a UTF-8
>> encoding for a multilingual column? Maybe und-x-icu? We aren't as much
>> concerned about their sortability in most cases, we just want indexes to
>> better handle future PG/ICU upgrades. But what does und(efined) even
>> mean with respect to collations?
> "undefined" in this context means unspecified language and
> unspecified country or region. It implies that no language-specific
> nor regional rule will be applied to compare strings.
>
> Using C.UTF-8 as the collation for text fields to index may be the
> best trade-off in your case. It should be immune to libc and ICU
> upgrades.
>
> With C.UTF-8, a string like 'BC' will sort before 'ab', and punctuation
> and accents will also sort differently than with a linguistic-aware
> collation.
Thanks, I'm a bit confused here. AFAIK indexes are used for at least two 
things: for speed and for skipping the ORDER BY step (since btree 
indexes are already sorted). Will such an "upgrade-immune" C.UTF-8 index 
still work correctly for table lookups? And can the existing  
en_US.UTF-8 fields' definition be altered in place, without a 
dump+restore? en_US.UTF-8 is the default encoding+locale+collation, it 
isn't set explicitly for any of our string columns. I assume there's 
some "catch-all" ordering taking place even for the C locale, so there 
won't be any bizarre things like b coming before a, or generally for any 
language, the second letter of its alphabet coming before the first?


> If your applications care about that, it can be fixed by simply
> adding COLLATE "default" to the ORDER BY clause of the queries that
> are meant to present data to users.
> COLLATE "default" means the collation of the database, which
> presumably would be something like "language_REGION.UTF-8" in your
> case. If you never specified it explicitly, it came from initdb which
> itself got it from the environment of the server.
>
>
> Best regards,





В списке pgsql-general по дате отправления:

Предыдущее
От: Sabit Nepal
Дата:
Сообщение: Postgres using inefficient query plan when using OR filter, usescorrect indices when using IN clause
Следующее
От: Ray Cote
Дата:
Сообщение: Debugging Failed Startup