Re: Initcap works differently with different locale providers
От | Oleg Tselebrovskiy |
---|---|
Тема | Re: Initcap works differently with different locale providers |
Дата | |
Msg-id | 0a54a90a5154281486b1acb07e5650df@postgrespro.ru обсуждение исходный текст |
Ответ на | Re: Initcap works differently with different locale providers (Alexander Korotkov <aekorotkov@gmail.com>) |
Список | pgsql-docs |
Alexander Korotkov wrote at 2025-07-28 17:23: > On Mon, Jul 28, 2025 at 1:20 PM Alexander Korotkov > <aekorotkov@gmail.com> wrote: >> >> On 25 Sep 2024, at 18:13, Oleg Tselebrovskiy >> <o.tselebrovskiy@postgrespro.ru> wrote: >> >> Greetings, everyone! >> >> One of our clients has found a difference in behaviour of initcap >> function when >> using different locale providers, shown below >> >> postgres=# create database test_db_1 locale_provider=icu >> locale="ru_RU.UTF-8" template=template0; >> NOTICE: using standard form "ru-RU" for ICU locale "ru_RU.UTF-8" >> CREATE DATABASE >> postgres=# \c test_db_1; >> You are now connected to database "test_db_1" as user "postgres". >> test_db_1=# select initcap('ЧиЮ А.Ю.'); >> initcap >> ---------- >> Чию А.ю. >> (1 row) >> test_db_1=# select initcap('joHn d.e.'); >> initcap >> ----------- >> John D.e. >> (1 row) >> postgres=# create database test_db_2 locale_provider=libc >> locale="ru_RU.UTF-8" template=template0; >> CREATE DATABASE >> postgres=# \c test_db_2 >> You are now connected to database "test_db_2" as user "postgres". >> test_db_2=# select initcap('ЧиЮ А.Ю.'); >> initcap >> ---------- >> Чию А.Ю. >> (1 row) >> test_db_2=# select initcap('joHn d.e.'); >> initcap >> ----------- >> John D.E. >> (1 row) >> >> And an easier reproduction (should work for REL_12_STABLE and up) >> >> postgres=# SELECT initcap('first.second' COLLATE "en-x-icu"); >> initcap >> -------------- >> First.second >> (1 row) >> postgres=# SELECT initcap('first.second' COLLATE "en_US"); >> initcap >> -------------- >> First.Second >> (1 row) >> >> This behaviour is reproducible on REL_12_STABLE and up to master >> >> I don't believe that this is an erroneous behaviour, just a differing >> one, hence >> just a documentation change proposition >> >> I suggest adding a clarification that this function works differently >> with libc >> and ICU providers because there is a difference in what a "word" is >> between them >> >> In libc a word is a sequence of alphanumeric characters, separated by >> non-alphanumeric characters (as it is written in documentation right >> now) >> In ICU words are divided according to Unicode® Standard Annex #29 [1] >> >> Similar issue was briefly discussed in [2] >> >> The suggested documentation patch is attached (versions for >> REL_13_STABLE+ and >> for REL_12_STABLE only) >> >> [1]: https://www.unicode.org/reports/tr29/#Word_Boundaries >> [2]: >> https://www.postgresql.org/message-id/CAEwbS1R8pwhRkwRo3XsPt24ErBNtFWuReAZhVPJwA3oqo148tA%40mail.gmail.com >> >> Oleg Tselebrovskiy, Postgres >> Professional<v1-0001-string-functions.patch><v1-0002-string-functions-REL_12.patch> >> >> >> I can confirm inicap works with libc and libicu as you stated. The >> documentation patch looks good to me. I’ve written a commit message. >> The REL_12_STABLE branch is not relevant anymore as it’s out of >> support. I’m going to push this if no objections. > > I'm sorry for these many messages. My email client just gone crazy. > Must be fixed now. > > ------ > Regards, > Alexander Korotkov > Supabase Commit message looks good to me, also no objections on ignoring REL_12_STABLE :) Thank you! Regards, Oleg Tselebrovskiy
В списке pgsql-docs по дате отправления: