[17] collation provider "builtin"
От | Jeff Davis |
---|---|
Тема | [17] collation provider "builtin" |
Дата | |
Msg-id | 9d63548c4d86b0f820e1ff15a83f93ed9ded4543.camel@j-davis.com обсуждение исходный текст |
Ответы |
Re: [17] collation provider "builtin"
Re: [17] collation provider "builtin" Re: [17] collation provider "builtin" |
Список | pgsql-hackers |
The locale "C" (and equivalently, "POSIX") is not really a libc locale; it's implemented internally with memcmp for collation and pg_ascii_tolower, etc., for ctype. The attached patch implements a new collation provider, "builtin", which only supports "C" and "POSIX". It does not change the initdb default provider, so it must be requested explicitly. The user will be guaranteed that collations with provider "builtin" will never change semantics; therefore they need no version and indexes are not at risk of corruption. See previous discussion[1]. (Caveat: the "C" locale ordering may depend on the specific encoding. For UTF-8, memcmp is equivalent to code point order, but that may not be true of other encodings. Encodings can't change during pg_upgrade, so indexes are not at risk; but the encoding can change during dump/reload so results may change.) This built-in provider is just here to support "C" and "POSIX" using memcmp/pg_ascii_*, and no other locales. It is not intended as a general license to take on the problem of maintaining locales. We may support some other locale name to mean "code point order", but like UCS_BASIC, that would just be an alias for locale "C" that also checks that the encoding is UTF-8. Motivation: Why not just use the "C" locale with the libc provider? 1. It's more clear to the user what's going on: Postgres is managing the provider; we aren't passing it on to libc at all. With the libc provider, something like C.UTF-8 leaves room for confusion[2]; with the built-in provider, "C.UTF-8" is not a supported locale and the user will get an error if it's requested. 2. The libc provider conflates LC_COLLATE/LC_CTYPE with the default collation; whereas in the icu and built-in providers, they are separate concepts. With ICU and builtin, you can set LC_COLLATE and LC_CTYPE for a database to whatever you want at creation time 3. If you use libc with locale "C", then future CREATE DATABASE commands will default to the libc provider (because that would be the provider for template0), which is not what the user wants if the purpose is to avoid problems with external collation providers. If you use the built-in provider instead, then future CREATE DATABASE commands will only succeed if the user either specifies locale C or explicitly chooses a new provider; which will allow them a chance to prepare for any challenges. 4. It makes it easier to document the trade-offs between various providers without confusing special cases around the C locale. [1] https://www.postgresql.org/message-id/87sfb4gwgv.fsf%40news-spur.riddles.org.uk [2] https://www.postgresql.org/message-id/8a3dc06f-9b9d-4ed7-9a12-2070d8b0165f@manitou-mail.org -- Jeff Davis PostgreSQL Contributor Team - AWS
Вложения
В списке pgsql-hackers по дате отправления: