Re: Order changes in PG16 since ICU introduction
От | Jeff Davis |
---|---|
Тема | Re: Order changes in PG16 since ICU introduction |
Дата | |
Msg-id | 73f97db9a37fd281bd5077ce16383cd8006b1f59.camel@j-davis.com обсуждение исходный текст |
Ответ на | Re: Order changes in PG16 since ICU introduction (Andrew Gierth <andrew@tao11.riddles.org.uk>) |
Ответы |
Re: Order changes in PG16 since ICU introduction
Re: Order changes in PG16 since ICU introduction |
Список | pgsql-hackers |
On Tue, 2023-06-06 at 21:37 +0100, Andrew Gierth wrote: > > > > > > I like either "internal" or "builtin" because they correctly identify > that no external resources are used. I'm not keen on "postgresql". "builtin" seems to be the winner. New patch series attached with doc and test updates. This has been a long discussion (it's a messy problem), but I think I've addressed the most important concerns raised. If you disagree with something, please indicate whether it's an objection, or a more minor difference of opinion that I can weigh against other opinions. Also please indicate if you think something is out of scope for 16. Patches 0001, 0002: These patches implement the built-in provider and automatically change provider=icu to provider=builtin when the locale is C. Other approaches were considered: * Pretend that ICU can support the C locale, and use similar checks throughout the code like the libc provider does: This was somewhat of a hack, and had potential issues with upgraded clusters, and several people seemed to reject it. * Switch to the libc provider for the C locale: would make the libc provider even more complicated and had some potential for confusion, and also has catalog representation problems when --locale is specified along with --lc-ctype. Ultimately we need to choose one approach, and the built-in provider seems the nicest (though most invasive). It reflects the reality that we don't actually use libc or icu for the C locale, and it's nicer to document. The builtin provider seemed to get the most support. Patch 0003: Makes LOCALE apply to all providers. The overall feel after this patch is that "locale" now means the collation locale, and LC_COLLATE/LC_CTYPE are for the server environment. When using libc, LC_COLLATE and LC_CTYPE still work as they did before, but their relationship to database collation feels more like a special case of the libc provider. I believe most people favor this patch and I haven't seen recent objections. I didn't find any surprising behaviors, but there are a few that I'd like to draw attention to: 0. If you initdb with --locale-provider=libc, and don't specify ICU at any later point, then none of these changes should affect you and you'll remain on libc. If someone notices otherwise, please let me know. 1. If you specify --locale-provider=builtin at initdb time, you *must* specify --locale=C/POSIX, otherwise you get an error. 2. Patch 0004 is possibly out of scope for 16, but it felt consistent with the other UI changes and low risk. Please try with/without before objecting. 3. Daniel Verite felt that we should only change the provider from ICU to "builtin" for the C locale if the provider is defaulting to ICU; not if it's specified as ICU. I did not differentiate between specifying ICU and defaulting to ICU because: a. "libc" unconditionally uses the built-in memcmp() logic for C, it never actually uses libc b. If a user really wants the root locale or the en-US-u-va-posix locale, they can specify those directly c. I don't see any plausible case where it helps a user to keep provider=icu when locale=C. 4. Joe Conway and Peter Eisentraut both felt that C.UTF-8 with provider=icu should not be changed to use the builtin provider, and instead passed on to ICU. I implemented a compromise where initdb will change C.UTF-8 to the built-in provider; but CREATE DATABASE/COLLATION will pass it along to ICU (which may support it as en-US-u-va-posix in some versions, or may throw an error in other versions). My reasoning is that initdb is pulling from the environment, and we should try harder to succeed on any reasonable environmental settings (otherwise initdb with default settings could fail); whereas we can be more strict with CREATE DATABASE/COLLATION. 5. For the built-in provider, initdb defaults to UTF-8 rather than SQL_ASCII. Otherwise, you would be unable to use ICU at all later, because ICU doesn't support SQL_ASCII. -- Jeff Davis PostgreSQL Contributor Team - AWS
Вложения
В списке pgsql-hackers по дате отправления: