patch suggestion: Fix citext_utf8 test's "Turkish I" with ICU collation provider

Поиск
Список
Период
Сортировка
От Anton Voloshin
Тема patch suggestion: Fix citext_utf8 test's "Turkish I" with ICU collation provider
Дата
Msg-id 52104a17-7a23-c315-1a97-06c691af748c@postgrespro.ru
обсуждение исходный текст
Список pgsql-hackers
Hello, hackers.

In current master, as well as in REL_15_STABLE, installcheck in 
contrib/citext fails in most locales, if we use ICU as a locale provider:

$ rm -fr data; initdb --locale-provider icu --icu-locale en-US -D data 
&& pg_ctl -D data -l logfile start && make -C contrib/citext 
installcheck; pg_ctl -D data stop; cat contrib/citext/regression.diffs
...
test citext                       ... ok          457 ms
test citext_utf8                  ... FAILED       21 ms
...
diff -u 
/home/ashutosh/pg/REL_15_STABLE/contrib/citext/expected/citext_utf8.out 
/home/ashutosh/pg/REL_15_STABLE/contrib/citext/results/citext_utf8.out
--- 
/home/ashutosh/pg/REL_15_STABLE/contrib/citext/expected/citext_utf8.out 
    2022-07-14 17:45:31.747259743 +0300
+++ 
/home/ashutosh/pg/REL_15_STABLE/contrib/citext/results/citext_utf8.out 
    2022-10-21 19:43:21.146044062 +0300
@@ -54,7 +54,7 @@
  SELECT 'i'::citext = 'İ'::citext AS t;
   t
  ---
- t
+ f
  (1 row)

The reason is that in ICU lowercasing Unicode symbol "İ" (U+0130
"LATIN CAPITAL LETTER I WITH DOT ABOVE") can give two valid results:
- "i", i.e. "U+0069 LATIN SMALL LETTER I" in "tr" and "az" locales.
- "i̇", i.e. "U+0069 LATIN SMALL LETTER I" followed by "U+0307 COMBINING
   DOT ABOVE" in all other locales I've tried (including "en-US", "de",
   "ru", etc).
And the way this test is currently written only accepts plain latin "i", 
which might be true in glibc, but is not so in ICU. Verified on ICU 
70.1, but I've seen this on few other ICU versions as well, so I think 
this is probably an ICU's feature, not a bug(?).

Since we probably want installcheck in contrib/citext to pass on
databases with various locales, including reasonable ICU-based ones,
I suggest to fix this test by accepting either of outputs as valid.

I can see two ways of doing that:
1. change SQL in the test to use "IN" instead of "=";
2. add an alternative output.

I think in this case "IN" is better, because that allows a single 
comment to address both possible outputs and to avoid unnecessary 
duplication.

I've attached a patch authored mostly by my colleague, Roman Zharkov, as 
one possible fix.

Only versions 15+ are affected.

Any comments?

-- 
Anton Voloshin
Postgres Professional, The Russian Postgres Company
https://postgrespro.ru
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: David Kimura
Дата:
Сообщение: Multiple grouping set specs referencing duplicate alias
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: refactor ownercheck and aclcheck functions