Trying out native UTF-8 locales on Windows
| От | Thomas Munro | 
|---|---|
| Тема | Trying out native UTF-8 locales on Windows | 
| Дата | |
| Msg-id | CA+hUKGKk-n2zZmW-vA2pW4GZ6C1UfQY0xtEDYmLdBjArdVDMZA@mail.gmail.com обсуждение исходный текст  | 
		
| Список | pgsql-hackers | 
Here's a very short patch to experiment with the idea of using Windows' native UTF-8 support when possible, ie when using "en-US.UTF-8" in a UTF-8 database. Otherwise it continues to use the special Windows-only wchar_t conversion that allows for locales with non-matching locales, ie the reason you're allowed to use "English_United States.1252" in a UTF-8 database on that OS, something we wouldn't allow on Unix. As I understand it, that mechanism dates from the pre-Windows 10 era when it had no .UTF-8 locales but users wanted or needed to use UTF-8 databases. I think some locales used encodings that we don't even support as server encodings, eg SJIS in Japan, so that was a workaround. I assume you could use "ja-JP.UTF-8" these days. CI tells me it compiles and passes, but I am not a Windows person, I'm primarily interested in code cleanup and removing weird platform differences. I wonder if someone directly interested in Windows would like to experiment with this and report whether (1) it works as expected and (2) "en-US.UTF-8" loses performance compared to "en-US" (which I guess uses WIN1252 encoding and triggers the conversion path?), and similarly for other locale pairs you might be interested in? It's possible that strcoll_l() internally converts the whole string to wchar_t internally anyway, in which case it might turn out to be marginally slower. We often have to copy the char strings up front ourselves in the regular path strcoll_l() path in order to null-terminate them, something that is skipped in the wchar_t conversion path that combines widening with null-termination in one step. Not sure if that'd kill the idea, but it'd at least be nice to know if we might eventually be able to drop the special code paths and strange configuration possibilities compared to Unix, and use it in less performance critical paths. At the very least, the comments are wrong...
Вложения
В списке pgsql-hackers по дате отправления: