BUG #14072: char() and application of locale collation
От | emmanuel.reynard@eranea.com |
---|---|
Тема | BUG #14072: char() and application of locale collation |
Дата | |
Msg-id | 20160407102820.15196.86869@wrigleys.postgresql.org обсуждение исходный текст |
Список | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 14072 Logged by: Emmanuel Reynard Email address: emmanuel.reynard@eranea.com PostgreSQL version: 9.4.7 Operating system: linux Description: Hello, We've been using a custom locale since we needed to use special ordering for certain values in a UTF-8 database. For example, E'\xEC\x82\x80' (E'\uC080') is defined as the first in the list, before all other possible codepoints/characters. While it works fine for varchars, properly placing this codepoint before a space, it doesnt work as we expected in the case of chars. This seems to be explained by the fact that bcharcmp() (in src/backend/utils/adt/varchar.c) uses the bcTrueLength of said char(), which does not count the number of trailing spaces. Hence a blank char(3) with value '\x202020' is of "true length" 0 and smaller than any other string, and any locale information will be disregarded. This behaviour already had surprised us while calling the length() function on char(x) with blank pads : test_utf8=# select length(' '::char(3)) as char_len, char_length(' '::char(3)) as char_charlen, octet_length(' '::char(3)) as char_octetlen, length(' '::varchar(3)) as varchar_len; char_len | char_charlen | char_octetlen | varchar_len ----------+--------------+---------------+------------- 0 | 0 | 3 | 3 (1 row) Is this considered normal? Especially the case with char(x) which can ignore partially ignore collation if they have trailing blanks. I can imagine that changing this length behaviour might break a few things though, is there a workaround besides creating a new custom type? Cheers. Emmanuel Reynard - test case : create table test_collation (c1 char(3), c2 varchar(3)); insert into test_collation values (' ', ' '); insert into test_collation values (E'\xEC\x82\x80', E'\xEC\x82\x80'); insert into test_collation values (' ' || E'\uC080', ' ' || E'\uC080'); select c2, c2::bytea from test_collation order by c2 collate "fr_FR.utf8"; c2 | c2 -----+------------ | \x20 ì | \x20ec8280 ì | \xec8280 select c2, c2::bytea from test_collation order by c2 collate "locale_eranea"; c2 | c2 -----+------------ ì | \xec8280 | \x20 ì | \x20ec8280 select c1, c1::bytea from test_collation order by c1 collate "fr_FR.utf8"; c1 | c1 ------+-------------- | \x202020 ì | \x20ec828020 ì | \xec82802020 select c1, c1::bytea from test_collation order by c1 collate "locale_eranea"; c1 | c1 ------+-------------- | \x202020 ì | \xec82802020 ì | \x20ec828020
В списке pgsql-bugs по дате отправления: