BUG #14072: char() and application of locale collation

Поиск
Список
Период
Сортировка
От emmanuel.reynard@eranea.com
Тема BUG #14072: char() and application of locale collation
Дата
Msg-id 20160407102820.15196.86869@wrigleys.postgresql.org
обсуждение исходный текст
Список pgsql-bugs
The following bug has been logged on the website:

Bug reference:      14072
Logged by:          Emmanuel Reynard
Email address:      emmanuel.reynard@eranea.com
PostgreSQL version: 9.4.7
Operating system:   linux
Description:

Hello,

We've been using a custom locale since we needed to use special ordering for
certain values in a UTF-8 database.

For example, E'\xEC\x82\x80' (E'\uC080') is defined as the first in the
list, before all other possible codepoints/characters.

While it works fine for varchars, properly placing this codepoint before a
space, it doesnt work as we expected in the case of chars.

This seems to be explained by the fact that bcharcmp() (in
src/backend/utils/adt/varchar.c) uses the bcTrueLength of said char(), which
does not count the number of trailing spaces.
Hence a blank char(3) with value '\x202020' is of "true length" 0 and
smaller than any other string, and any locale information will be
disregarded.

This behaviour already had surprised us while calling the length() function
on char(x) with blank pads :
test_utf8=# select length('   '::char(3)) as char_len, char_length('
'::char(3)) as char_charlen, octet_length('   '::char(3)) as char_octetlen,
length('   '::varchar(3)) as varchar_len;
 char_len | char_charlen | char_octetlen | varchar_len
----------+--------------+---------------+-------------
        0 |            0 |             3 |           3
(1 row)

Is this considered normal? Especially the case with char(x) which can ignore
partially ignore collation if they have trailing blanks.

I can imagine that changing this length behaviour might break a few things
though, is there a workaround besides creating a new custom type?

Cheers.

Emmanuel Reynard

- test case :
create table test_collation (c1 char(3), c2 varchar(3));
insert into test_collation values (' ', ' ');
insert into test_collation values (E'\xEC\x82\x80', E'\xEC\x82\x80');
insert into test_collation values (' ' || E'\uC080', ' ' || E'\uC080');

select c2, c2::bytea from test_collation order by c2 collate "fr_FR.utf8";
 c2  |     c2
-----+------------
     | \x20
  삀 | \x20ec8280
 삀  | \xec8280

select c2, c2::bytea from test_collation order by c2 collate
"locale_eranea";
 c2  |     c2
-----+------------
 삀  | \xec8280
     | \x20
  삀 | \x20ec8280

select c1, c1::bytea from test_collation order by c1 collate "fr_FR.utf8";
  c1  |      c1
------+--------------
      | \x202020
  삀  | \x20ec828020
 삀   | \xec82802020

 select c1, c1::bytea from test_collation order by c1 collate
"locale_eranea";
  c1  |      c1
------+--------------
      | \x202020
 삀   | \xec82802020
  삀  | \x20ec828020

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Miklós Fazekas
Дата:
Сообщение: released savepoint blocking further statements
Следующее
От: Tim Singletary
Дата:
Сообщение: Re: BUG #14054: "create index using gist ..." on large table never completes.