[HACKERS] ICU locales and text/char(n) SortSupport on Windows
От | Peter Geoghegan |
---|---|
Тема | [HACKERS] ICU locales and text/char(n) SortSupport on Windows |
Дата | |
Msg-id | CAH2-WznnOrK=u-Ui2+vVk+-exMvAk9=nLbyaYVSmWCpAJ5en+A@mail.gmail.com обсуждение исходный текст |
Ответы |
[HACKERS] !USE_WIDE_UPPER_LOWER compile errors in v10+
Re: [HACKERS] ICU locales and text/char(n) SortSupport on Windows Re: [HACKERS] ICU locales and text/char(n) SortSupport on Windows |
Список | pgsql-hackers |
varstr_sortsupport() only allows Windows to use SortSupport with a non-C-locale (when the server encoding happens to be UTF-8, which I assume is the common case). This is because we (quite reasonably) don't want to have to duplicate the ugly UTF-8 to UTF-16 conversion hack from varstr_cmp() for the SortSupport authoritative comparator (varstrfastcmp_locale() shouldn't get its own copy of this kludge, because it's supposed to be "fast"). This broad restriction made sense when Windows + UTF-8 + non-C-locale necessarily required the aforementioned UTF-16 conversion kludge. However, iff an ICU locale is in use on Windows (or any other platform), then we can always use SortSupport, regardless of anything else (we should not have the core code install a fmgr comparison shim that just calls varstr_cmp(), though we still do). We don't actually need the UTF-16 kludge at all, so we can use SortSupport without any special care. The current state of affairs doesn't make any sense, AFAICT, and so the restriction should be removed on general principle: we *already* expect ICU to have no restrictions that are peculiar to Windows, as we see in varstr_cmp() and str_tolower(). It's just arbitrary to hold on to this restriction. This restriction also seems worth fixing because Windows users are generally more likely to want to use ICU locales; most of them would otherwise end up actually paying the overhead for the UTF-16 kludge. (Presumably the UTF-16 conversion makes text sorting *even slower* than it would be if we merely didn't do SortSupport, which is to say: very slow indeed.) In summary, we're currently attaching the use of SortSupport to the wrong thing. We're treating this UTF-16 business as something that implies a broad OS/platform restriction, when in fact it should be treated as implying a restriction for one particular collation provider only (a collation provider that happens to be built into Windows, but isn't really special to us). Attached patch shows what I'm getting at. This is untested, since I don't use Windows. Proceed with caution. On a related note, am I the only one that finds it questionable that str_tolower() has an "#ifdef USE_WIDE_UPPER_LOWER" block that itself contains an "#ifdef USE_ICU" block? It seems like those two things might get conflated on some platforms. We don't want lower() to ever not use the ICU infrastructure when an ICU collation is used, and yet it's not obvious that that's impossible. I understand that the code in regc_pg_locale.c kind of insists on using USE_WIDE_UPPER_LOWER facilities, and that that was always accepted as legacy that ICU had to live with. Maybe a static assertion is all that we need here (ICU builds must also be USE_WIDE_UPPER_LOWER builds). -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
В списке pgsql-hackers по дате отправления: