Обсуждение: Weird sorting order
Hi... I have the following table: CREATE TABLE test ( id SERIAL PRIMARY KEY, val VARCHAR(32) NOT NULL ); INSERT INTO test VALUES (DEFAULT, '##34''), (DEFAULT, '##32'), (DEFAULT, '##31'), (DEFAULT, '2ff'), (DEFAULT, '##26'), (DEFAULT, '2##33'), (DEFAULT, '2##25'), (DEFAULT, '2##24'), (DEFAULT, '2##23'), (DEFAULT, '211'), (DEFAULT, '210'), (DEFAULT, '203'), (DEFAULT, '202'), (DEFAULT, '201'), (DEFAULT, '200'); Why is it that when running the query: SELECT * FROM test ORDER BY val; I get the following result? id | val ----+------- 1 | 200 2 | 201 3 | 202 4 | 203 5 | 210 6 | 211 7 | 2##23 8 | 2##24 9 | 2##25 10 | 2##33 11 | ##26 12 | 2ff 13 | ##31 14 | ##32 15 | ##34 (15 rows) Shouldn't value '2ff' be placed right after '211' but before '2##23'? Tested on PostgreSQL 8.2, 8.4, Linux SuSE & Kubuntu. Thank you!... -- Robert Voinea <robert (dot) voinea (at) topex (dot) ro> Software Developer Phone: +40 21 408 38 00 / ext. 343 Fax: +40 21 408 38 08 Local time: GMT+2 http://www.topex.ro
Robert Voinea <robert.voinea@topex.ro> wrote: > Why is it that when running the query: > > SELECT * FROM test ORDER BY val; > > I get the following result? > id | val > ----+------- > 1 | 200 > 2 | 201 > 3 | 202 > 4 | 203 > 5 | 210 > 6 | 211 > 7 | 2##23 > 8 | 2##24 > 9 | 2##25 > 10 | 2##33 > 11 | ##26 > 12 | 2ff > 13 | ##31 > 14 | ##32 > 15 | ##34 > (15 rows) > > Shouldn't value '2ff' be placed right after '211' but before > '2##23'? That depends on your collation configuration. What do you get from?: show lc_collate; In many collations, special characters such as '#' are ignored. -Kevin
It's ignoring the hash marks. It's like they are invisible characters. Tested on 8.1.16 -------- Original Message -------- Subject: [ADMIN] Weird sorting order Date: Fri, 16 Jul 2010 15:14:09 +0300 From: Robert Voinea <robert.voinea@topex.ro> Organization: Topex To: pgsql-admin@postgresql.org Hi... I have the following table: CREATE TABLE test ( id SERIAL PRIMARY KEY, val VARCHAR(32) NOT NULL ); INSERT INTO test VALUES (DEFAULT, '##34''), (DEFAULT, '##32'), (DEFAULT, '##31'), (DEFAULT, '2ff'), (DEFAULT, '##26'), (DEFAULT, '2##33'), (DEFAULT, '2##25'), (DEFAULT, '2##24'), (DEFAULT, '2##23'), (DEFAULT, '211'), (DEFAULT, '210'), (DEFAULT, '203'), (DEFAULT, '202'), (DEFAULT, '201'), (DEFAULT, '200'); Why is it that when running the query: SELECT * FROM test ORDER BY val; I get the following result? id | val ----+------- 1 | 200 2 | 201 3 | 202 4 | 203 5 | 210 6 | 211 7 | 2##23 8 | 2##24 9 | 2##25 10 | 2##33 11 | ##26 12 | 2ff 13 | ##31 14 | ##32 15 | ##34 (15 rows) Shouldn't value '2ff' be placed right after '211' but before '2##23'? Tested on PostgreSQL 8.2, 8.4, Linux SuSE & Kubuntu. Thank you!... -- Robert Voinea <robert (dot) voinea (at) topex (dot) ro> Software Developer Phone: +40 21 408 38 00 / ext. 343 Fax: +40 21 408 38 08 Local time: GMT+2 http://www.topex.ro -- Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin
Вложения
On Friday 16 July 2010 18:34:06 Kevin Grittner wrote: > Robert Voinea <robert.voinea@topex.ro> wrote: > > Why is it that when running the query: > > > > SELECT * FROM test ORDER BY val; > > > > I get the following result? > > > > id | val > > > > ----+------- > > > > 1 | 200 > > 2 | 201 > > 3 | 202 > > 4 | 203 > > 5 | 210 > > 6 | 211 > > 7 | 2##23 > > 8 | 2##24 > > 9 | 2##25 > > > > 10 | 2##33 > > 11 | ##26 > > 12 | 2ff > > 13 | ##31 > > 14 | ##32 > > 15 | ##34 > > > > (15 rows) > > > > Shouldn't value '2ff' be placed right after '211' but before > > '2##23'? > > That depends on your collation configuration. What do you get > from?: > > show lc_collate; > > In many collations, special characters such as '#' are ignored. => show lc_collate; lc_collate ------------- en_US.UTF-8 -- Robert Voinea <robert (dot) voinea (at) topex (dot) ro> Software Developer Phone: +40 21 408 38 00 / ext. 343 Fax: +40 21 408 38 08 Local time: GMT+2 http://www.topex.ro
Robert Voinea <robert.voinea@topex.ro> wrote: > => show lc_collate; > lc_collate > ------------- > en_US.UTF-8 I'm afraid the order you're seeing is what you're supposed to get for that collation sequence. In that collation, special characters (including spaces) are only used as tie-breakers for values which are tied when the special characters are ignored. There may be a few consequences of that which you haven't yet found. As one example: test=# show lc_collate; lc_collate ------------- en_US.UTF-8 (1 row) test=# create table t1 (c1 text); CREATE TABLE test=# insert into t1 values ('one'),(' one'),('one '),('##one'),('one##'); INSERT 0 5 test=# select '"' || c1 || '"' from t1 order by c1; ?column? ---------- "one" " one" "##one" "one " "one##" (5 rows) test=# select '"' || c1 || '"' from t1 order by c1 desc; ?column? ---------- "one##" "one " "##one" " one" "one" (5 rows) For that reason, we have (so far) used the C locale, which provides the binary sort you probably expected, and we use special columns, maintained by triggers, to control selection and sequencing as needed -- for example we have a "searchName" column in any table where we have name columns, which is forced into a canonical format. FWIW, our algorithm for generating a canonical name also excludes spaces and the '#' character, although it still has significant differences from the en_US.UTF-8 collation. -Kevin
This is really interesting. Is there a way to set the locale on the fly for a sort? Having looked at the docs, it appears as if the initDB step is the only opportunity to do so. Regards, r.b.
Burgholzer, Robert (DEQ)" <Robert.Burgholzer@deq.virginia.gov> wrote: > This is really interesting. Is there a way to set the locale on > the fly for a sort? Having looked at the docs, it appears as if > the initDB step is the only opportunity to do so. Starting with 8.4 you can specify it at the database level: http://www.postgresql.org/docs/8.4/interactive/sql-createdatabase.html Peter Eisentraut is working on a way to set the collation for an individual column or (I think) for a sort; but that won't be in 9.0. Hopefully in 9.1.... http://archives.postgresql.org/message-id/1279045531.32647.14.camel@vanquo.pezone.net -Kevin
On Monday 19 July 2010 19:15:32 Kevin Grittner wrote: > Burgholzer, Robert (DEQ)" <Robert.Burgholzer@deq.virginia.gov> > > wrote: > > This is really interesting. Is there a way to set the locale on > > the fly for a sort? Having looked at the docs, it appears as if > > the initDB step is the only opportunity to do so. > > Starting with 8.4 you can specify it at the database level: > > http://www.postgresql.org/docs/8.4/interactive/sql-createdatabase.html > > Peter Eisentraut is working on a way to set the collation for an > individual column or (I think) for a sort; but that won't be in 9.0. > Hopefully in 9.1.... > > http://archives.postgresql.org/message-id/1279045531.32647.14.camel@vanquo. > pezone.net > > -Kevin Thank you for clarifying this. -- Robert Voinea <robert (dot) voinea (at) topex (dot) ro> Software Developer Phone: +40 21 408 38 00 / ext. 343 Fax: +40 21 408 38 08 Local time: GMT+2 http://www.topex.ro