Re: case insensitive collation of Greek's sigma
От | Peter Eisentraut |
---|---|
Тема | Re: case insensitive collation of Greek's sigma |
Дата | |
Msg-id | 9e3220da-d47e-add7-8b97-7c65b12ff6d7@enterprisedb.com обсуждение исходный текст |
Ответ на | case insensitive collation of Greek's sigma (Jakub Jedelsky <jakub.jedelsky@gooddata.com>) |
Ответы |
Re: case insensitive collation of Greek's sigma
|
Список | pgsql-general |
On 26.11.21 08:37, Jakub Jedelsky wrote: > postgres=# SELECT > postgres-# 'ΣΣ' ILIKE 'σσ' COLLATE "en_US", > postgres-# 'ΣΣ' ILIKE 'σς' COLLATE "en_US" > postgres-# ; > ?column? | ?column? > ----------+---------- > t | f > (1 row) > > postgres=# SELECT > postgres-# 'ΣΣ' ILIKE 'σσ' COLLATE "en-US-x-icu", > postgres-# 'ΣΣ' ILIKE 'σς' COLLATE "en-US-x-icu"; > ?column? | ?column? > ----------+---------- > f | t > (1 row) > > If I could start, I think both results are wrong as both should return > True. If I got it right, in the background there is a lower() function > running to compare strings, which is not enough for such cases (until > the left side isn't taken as a standalone word). The reason for these results is that for multibyte encodings, a ILIKE b basically does lower(a) LIKE lower(b), and select lower('ΣΣ' COLLATE "en_US"), lower('ΣΣ' COLLATE "en-US-x-icu"); lower | lower -------+------- σσ | σς Running lower() like this is really the wrong thing to do. We should be doing "case folding" instead, which normalizes these differences for the purpose of case-insensitive comparisons.
В списке pgsql-general по дате отправления: