Re: pg_collation.collversion for C.UTF-8
От | Daniel Verite |
---|---|
Тема | Re: pg_collation.collversion for C.UTF-8 |
Дата | |
Msg-id | 5ad8d2f8-c11f-46d6-aab5-ed529d8e958a@manitou-mail.org обсуждение исходный текст |
Ответ на | Re: pg_collation.collversion for C.UTF-8 (Jeff Davis <pgsql@j-davis.com>) |
Ответы |
Re: pg_collation.collversion for C.UTF-8
|
Список | pgsql-hackers |
Jeff Davis wrote: > > For libc: this change may affect any user who happened to have > > LANG=C.UTF-8 in their environment at initdb time, which is probably a > > lot of users, and some buildfarm members. However, the average risk > > seems to be much lower, because we've gone a long time with the > > assumption that C.UTF-8 has the same behavior as C, and this only > > recently came up. Currently, neither lc_collate_is_c() nor lookup_collation_cache() think that C.UTF-8 is a C collation, since they do that kind of test: if (strcmp(localeptr, "C") == 0) result = true; else if (strcmp(localeptr, "POSIX") == 0) result = true; else result = false; What is relatively new (v15) is that we compute a version for libc collations in get_collation_actual_version(), with code that assumes that C.* does not need a version, implying that it's immune to Unicode changes. What came up in this thread is that this assumption is not true for at least one major platform: Debian/Ubuntu for releases occurring before 2022 (glibc < 2.35). > We can avoid this risk by converting C.anything or POSIX.anything to > plain "C" or "POSIX", respectively, for new collations before storing > the string in the catalog. For upgraded collations, we can preserve the > existing locale name. When opening the locale, we would still only > recognize plain "C" and "POSIX" as the C locale. Then Postgres would not sort the same as the operating system with the same locale, at least on some OS. Concerning glibc, after waiting a few years, glibc<2.35 will be obsolete, and C.UTF-8 sorting like C will happen by itself. But in the meantime, personally I don't quite see why Postgres should start forcing C.UTF-8 to sort differently in the database than in the OS. Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite
В списке pgsql-hackers по дате отправления: