Обсуждение: BUG #2554: ILIKE operator works incorrectly
The following bug has been logged online: Bug reference: 2554 Logged by: JarosÅaw Bojar Email address: jarek.bojar@gmail.com PostgreSQL version: 8.1.4 Operating system: i386-redhat-linux-gnu, compiled by GCC i386-redhat-linux-gcc (GCC) 4.1.0 Description: ILIKE operator works incorrectly Details: ILIKE operator works incorrectly with UTF8 encoding and Polish characters. Consider following SQL statements: CREATE DATABASE test ENCODING='UTF8'; \c test CREATE TABLE the_table (val VARCHAR(50)); INSERT INTO the_table (val) VALUES ('Åwiat'); INSERT INTO the_table (val) VALUES ('ÅÄ ka'); INSERT INTO the_table (val) VALUES ('Äma'); INSERT INTO the_table (val) VALUES ('abc'); INSERT INTO the_table (val) VALUES ('ABC'); Without Polish characters ILIKE works correctly: SELECT * FROM the_table WHERE val ilike 'abc'; val ----- abc ABC (2 rows) But with Polish characters it does not work correctly. Following queries should give single row results, but they do not return any rows: SELECT * FROM the_table WHERE val ilike 'Åwiat'; val ----- (0 rows) SELECT * FROM the_table WHERE val ilike 'ÅÄ ka'; val ----- (0 rows) SELECT * FROM the_table WHERE val ilike 'Äma'; val ----- (0 rows) On the contrary functions like UPPER work correctly with Polish characters and following queries produce correct results: SELECT * FROM the_table WHERE UPPER(val) like UPPER('Åwiat'); val ------- Åwiat (1 row) SELECT * FROM the_table WHERE UPPER(val) like UPPER('Äma'); val ----- Äma (1 row) SELECT * FROM the_table WHERE UPPER(val) like UPPER('ÅÄ ka'); val ------ ÅÄ ka (1 row) The bug is also present in PostgreSQL 8.1.0 on Windows XP.
JarosÅaw Bojar wrote: > ILIKE operator works incorrectly with UTF8 encoding and Polish > characters. What does SHOW lc_collate say? -- Peter Eisentraut http://developer.postgresql.org/~petere/
Peter Eisentraut <peter_e@gmx.net> writes: > JarosÅaw Bojar wrote: >> ILIKE operator works incorrectly with UTF8 encoding and Polish >> characters. > What does SHOW lc_collate say? Even if the locale is set right, iwchareq() is completely broken for multibyte charsets: it's trying to apply the system's tolower() to a pg_wchar, which is highly unlikely to work. This has been a known problem for awhile, but no one's stepped up to fix it. regards, tom lane