Re: case insensitive collation of Greek's sigma
От | Frank Limpert |
---|---|
Тема | Re: case insensitive collation of Greek's sigma |
Дата | |
Msg-id | ca27a92d-9af1-7f39-c04d-44ffcd05de64@yahoo.com обсуждение исходный текст |
Ответ на | case insensitive collation of Greek's sigma (Jakub Jedelsky <jakub.jedelsky@gooddata.com>) |
Список | pgsql-general |
Am 26.11.21 um 08:37 schrieb Jakub Jedelsky: > Hello, > > during our tests of Postgres with ICU we found an issue with ILIKE of > upper and lowercase sigma (Σ). The letter has two lowercase variants σ > and ς (at the end of a word). I'm working with en_US and en-US-x-icu > collations and results are a bit unexpected - they are inverted: > > postgres=# SELECT > postgres-# 'ΣΣ' ILIKE 'σσ' COLLATE "en_US", > postgres-# 'ΣΣ' ILIKE 'σς' COLLATE "en_US" > postgres-# ; > ?column? | ?column? > ----------+---------- > t | f > (1 row) > > postgres=# SELECT > postgres-# 'ΣΣ' ILIKE 'σσ' COLLATE "en-US-x-icu", > postgres-# 'ΣΣ' ILIKE 'σς' COLLATE "en-US-x-icu"; > ?column? | ?column? > ----------+---------- > f | t > (1 row) > > I run those commands on the latest (14.1) official docker image. > > Is it possible to unify the behaviour?And which one is correct from > the community point of view? > > If I could start, I think both results are wrong as both should return > True. If I got it right, in the background there is a lower() function > running to compare strings, which is not enough for such cases (until > the left side isn't taken as a standalone word). > > Thanks, > > - jj Have you seen the subtle intricacies in this example? => SELECT 'ΣΣ Μ' ILIKE 'σσ Μ' COLLATE "en_US" AS c0, 'ΣΣ Μ' ILIKE 'σς Μ' COLLATE "en_US" AS c1, 'ΣΣ Μ' ~* 'σσ Μ' COLLATE "el-GR-x-icu" AS c2, 'ΣΣ Μ' ~* 'σς Μ' COLLATE "el-GR-x-icu" AS c3, 'ΣΣ Μ' ILIKE 'σσ Μ' COLLATE "el-GR-x-icu" AS c4, 'ΣΣ Μ' ILIKE 'σς Μ' COLLATE "el-GR-x-icu" AS c5, 'ΣΣ Μ' ~* 'σσ Μ' COLLATE "en-US-x-icu" AS c6, 'ΣΣ Μ' ~* 'σς Μ' COLLATE "en-US-x-icu" AS c7, 'ΣΣ Μ' ILIKE 'σσ Μ' COLLATE "en-US-x-icu" AS c8, 'ΣΣ Μ' ILIKE 'σς Μ' COLLATE "en-US-x-icu" AS c9; c0 | c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8 | c9 ----+----+----+----+----+----+----+----+----+---- t | f | t | t | f | t | t | t | f | t (1 row) Obviously, the ILIKE operator is really strict regarding to the correct letter at the end of the word. The regular expression operator works as you expected. Happy computing... Frank
В списке pgsql-general по дате отправления: