Re: Built-in CTYPE provider

Поиск
Список
Период
Сортировка
От Daniel Verite
Тема Re: Built-in CTYPE provider
Дата
Msg-id f7dd3ff4-5f1c-4a0f-8a3c-0a521d35b001@manitou-mail.org
обсуждение исходный текст
Ответ на Re: Built-in CTYPE provider  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
    Robert Haas wrote:

> For someone who is currently defaulting to es_ES.utf8 or fr_FR.utf8,
> a change to C.utf8 would be a much bigger problem, I would
> think. Their alphabet isn't in code point order, and so things would
> be alphabetized wrongly.

> That might be OK if they don't care about ordering for any purpose
> other than equality lookups, but otherwise it's going to force them
> to change the default, where today they don't have to do that.

Sure, in whatever collation setup we expose, we need to keep
it possible and even easy to sort properly with linguistic rules.

But some reasons to use $LANG as the default locale/collation
are no longer as good as they used to be, I think.

Starting with v10/ICU we have many pre-created ICU locales with
fixed names, and starting with v16, we can simply write "ORDER BY
textfield COLLATE unicode" which is good enough in most cases. So
the configuration "bytewise sort by default" / "linguistic sort on-demand"
has become more realistic.

By contrast in the pre-v10 days with only libc collations, an
application could have no idea which collations were going to be
available on the server, and how they were named precisely, as this
varies across OSes and across installs even with the same OS.
On Windows, I think that before v16 initdb did not create any libc
collation beyond C/POSIX and the default language/region of the OS.

In that libc context, if a db wants the C locale by default for
performance and truly immutable indexes, but the client app needs to
occasionally do in-db linguistic sorts, the app needs to figure out
which collation name will work for that. This is hard if you don't
target a specific installation that guarantees that such or such
collation is going to be installed.
Whereas if the linguistic locale is the default, the app never needs
to know its name or anything about it. So it's done that way,
linguistic by default. But that leaves databases with many
indexes sorted linguistically instead of bytewise for fields
that semantically never need any linguistic sort.


Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Laurenz Albe
Дата:
Сообщение: Re: Set log_lock_waits=on by default
Следующее
От: Laurenz Albe
Дата:
Сообщение: Re: Trigger violates foreign key constraint