Re: Windows default locale vs initdb
От | Thomas Munro |
---|---|
Тема | Re: Windows default locale vs initdb |
Дата | |
Msg-id | CA+hUKG+Pa28J-SCsn6d5x8KkqgqdAQ7q-pTgMYrht9B-c0dD5w@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Windows default locale vs initdb (Thomas Munro <thomas.munro@gmail.com>) |
Ответы |
Re: Windows default locale vs initdb
|
Список | pgsql-hackers |
Another country has changed its name, and a Windows OS update has again broken every PostgreSQL cluster in that whole country[1] (or at least those that had accepted initdb's default choice of locale, probably most). Let's get to the bottom of this, because otherwise it is simply going to keep happening, causing administrative pain for a lot of people. Here is a rebase of the basic patch I proposed last time, and a re-statement of what we know: 1. initdb chooses a default locale using a technique that gives you an unstable ("Czech Republic"->"Czechia", "Turkey"->"Türkiye"), non-ASCII ("Norwegian (Bokmål)") string that we are warned we should not store anywhere. We store it, and then later it is not recognised. Instead we should select an IETF BCP 47 locale name, based on stable ISO country and language codes, like "en-US", "tr-TR" etc. Here is the patch to teach initdb to use that, unchanged from v3 except that I tweaked the docs a bit. 2. In Windows 10+ it is now also possible to put ".UTF-8" on the end of locale names. I couldn't figure out whether we should do that, and what effect it has on ctypes -- apparently not the effect I expected (see upthread). Was our UTF-8 support on Windows already broken, and this new ".UTF-8" thing is just a new way to reach that brokenness? Is it OK to continue to choose the "legacy" single byte encodings by default on that OS, and consider that a separate topic for separate research? 3. It is not clear to me how we should deal with pg_upgrade. Eventually we want all of the old-school names to fade away, and pg_upgrade would need to be part of that. Perhaps there is some API that can be used to translate to the new canonical forms without us having to maintain translation tables and other messiness in our tree. 4. Eventually we should probably ban non-ASCII characters from entering the relevant catalogues (they are shared, so their encoding is undefined except that they must be a superset of ASCII), and delete all the old win32setlocale.c kludges, after we reach a point where everyone should be using exclusively BCP 47. [1] https://www.postgresql.org/message-id/flat/18196-b10f93dfbde3d7db%40postgresql.org
Вложения
В списке pgsql-hackers по дате отправления: