Re: BUG #15892: URGENT: Using an ICU collation in a primary keycolumn breaks ILIKE query
От | James Inform |
---|---|
Тема | Re: BUG #15892: URGENT: Using an ICU collation in a primary keycolumn breaks ILIKE query |
Дата | |
Msg-id | 1820097204.279159.1562258411044@email.ionos.de обсуждение исходный текст |
Ответ на | Re: BUG #15892: URGENT: Using an ICU collation in a primary keycolumn breaks ILIKE query ("Daniel Verite" <daniel@manitou-mail.org>) |
Список | pgsql-bugs |
The weird thing is, that this behavior only occures when used on a field field a primary key: If you do: create database testdb; \c testdb; -- Just create a simple table with one column create table icutest(data text not null collate "de-x-icu" primary key, data2 text collate "de-x-icu"); -- Insert a record with uppercase string insert into icutest values ('MYTEST','MYTEST'); -- This is not giving a match select * from icutest where data ilike 'mytest'; -- BUT THIS GIVES A MATCH: select * from icutest where data2 ilike 'mytest'; -- So it seems to be especially related to the scenario where a primary key / index exists. > On July 4, 2019 at 12:36 PM Daniel Verite <daniel@manitou-mail.org> wrote: > > > PG Bug reporting form wrote: > > -- Just create a simple table with one column > > create table icutest(data text not null collate "de-x-icu" primary key); > > > > -- Insert a record with uppercase string > > insert into icutest values ('MYTEST'); > > > > -- This is not giving a match > > select * from icutest where data ilike 'mytest'; > > This also happens on v10 and on the master branch. > > The bug seems to come from a mistake in like_support.c: > > > /* * Check whether char is a letter (and, hence, subject to case-folding) > * * In multibyte character sets or with ICU, we can't use isalpha, and it does * not seem worth trying to convert to wchar_tto use iswalpha. Instead, > just * assume any multibyte char is potentially case-varying. > */ > static int > pattern_char_isalpha(char c, bool is_multibyte, > pg_locale_t locale, bool locale_is_c) > { > if (locale_is_c) > return (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z'); > else if (is_multibyte && IS_HIGHBIT_SET(c)) > return true; > else if (locale && locale->provider == COLLPROVIDER_ICU) > return IS_HIGHBIT_SET(c) ? true : false; > > > With an ICU locale, this returns false for all characters in 'mytest'. > > I think this eventually leads the caller to incorrectly believe that it > can optimize the test into an exact match (data='mytest'), given > there are otherwise no wildcards in the pattern. > > On fixing the bug, if we make this function returns true for all > characters under an ICU locale, it appears to work, but we're loosing an > opportunity to optimize for some patterns. > If OTOH we wanted to use an ICU call like u_isalpha(), to be closer > to what's done with libc, we'd need to pass a UChar32 argument, > not a char, and since we're in a char-oriented context, I don't see how > to do that. > > > Best regards, > -- > Daniel Vérité > PostgreSQL-powered mailer: http://www.manitou-mail.org > Twitter: @DanielVerite
В списке pgsql-bugs по дате отправления: