Re: Pg_upgrade and collation
От | Peter Geoghegan |
---|---|
Тема | Re: Pg_upgrade and collation |
Дата | |
Msg-id | CAH2-WzmaWsucQTFtg7gKS95xu=eTiKtPCWxf9fjJtNK7=+MxkQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Pg_upgrade and collation (Bruce Momjian <bruce@momjian.us>) |
Ответы |
Re: Pg_upgrade and collation
|
Список | pgsql-docs |
On Tue, Jun 28, 2016 at 3:20 PM, Bruce Momjian <bruce@momjian.us> wrote: >> I have long advocated adopting ICU as our defacto standard "collation >> provider", primarily so that we can directly control collations and >> collation versioning. I think that doing this would solve many >> problems. Besides, even SQLite has optional ICU support. PostgreSQL is >> the only major database system that I'm aware of that relies on >> operating system collations exclusively. > > I am hopeful ICU has improved enough since we last researched that > support for it will soon be added. There is a patch available that is not ready to be submitted, and doesn't have a real advocate, but is at least enough to convince me that it's very doable. Performance is certainly no impediment to adopting ICU, even without considering that it effectively re-introduces abbreviated keys for text when the C collation is not used. The best argument for ICU is the evidently lax attitude that the glibc people have towards the correctness and consistency of their collations: https://bugzilla.redhat.com/show_bug.cgi?id=1320356#c3 Here, Carlos O'Donnell, a glic committer, says "Regarding (b), the collations in glibc may change from build to build depending on changes in the algorithms or locales. You cannot rely on the collation stay the same once the process exits (nor can you rely upon it via a shared memory mapping to another process sorting strings in memory)". Frankly, we have no excuse for not heeding his warning. I'm not annoyed at the glibc people for taking this position. There is, quite simply, a misalignment of incentives. For the glibc people, the assumption is that any problem with collations leads only to slight annoyance from end users, as when the GUI produces subtly wrong ordering. Whereas, for us, any inconsistency is an extremely serious problem. Here we have the maintainers of glibc telling us that they feel like it's okay that that can happen at any time. Surely that isn't good enough. ICU as a project has every incentive to see things the same way as we do. The library explicitly decouples collation rule versions from algorithm versions. All of this is carefully considered, for the benefit of the numerous major database systems that use ICU. -- Peter Geoghegan
В списке pgsql-docs по дате отправления: