Обсуждение: Issues with LC_COLLATE, across 8.3/8.4 on various platforms
Hi, i have the following problem, text ordering seems to behave incosistently across various lc_collate values, OS'es, PostgreSQLversions. Some behaviour might be expected, some not, thats why i am asking to see where i stand with this. Test Data postgres@dynacom=# SELECT * from test_sort_order; fooname ------------------------- Cylinder head cover No1 Cylinder Liner No1 Cylinder head No1 (3 rows) Now the query # SELECT * from test_sort_order order by fooname; in PostgreSQL 8.3.7 on x86_64-unknown-linux-gnu (lc_collate=en_US.UTF-8) gives fooname ------------------------- Cylinder head cover No1 Cylinder head No1 Cylinder Liner No1 while in all of PostgreSQL 8.3.3 on i686-pc-linux-gnu (lc_collate=C) PostgreSQL 8.3.7 (lc_collate=el_GR.UTF-8) on i386-unknown-freebsd6.3 PostgreSQL 8.4.1 on x86_64-unknown-freebsd8.0 (lc_collate=en_US.UTF-8), gives fooname ------------------------- Cylinder Liner No1 Cylinder head No1 Cylinder head cover No1 Database encoding is SQL_ASCII in all four cases. -- Achilleas Mantzios
It seems that libc's behaviour regarding LC_COLLATE in Linux/FreeBSD are two different stories, hence the seen results. (Switching LC_COLLATE between en_US.UTF-8 and POSIX, in FreeBSD it does not have any impact on orderring while in Linux does) I think i have resolved the issue, so the solution boils down to setting LC_COLLATE=C in our linux central production postgresqlserver. What is the shortest way to change the default locale on a production installation running PostgreSQL-8.3.9? Is there anything less painful than dump, initdb,restore? I know 8.4 allows for per-db locale settings, i am just asking if there is a way to solve the issue now without waiting till the migration to 8.4 Thanx! Στις Tuesday 02 February 2010 14:26:44 ο/η Achilleas Mantzios έγραψε: > Hi, i have the following problem, text ordering seems to behave incosistently across various lc_collate values, OS'es,PostgreSQL versions. > Some behaviour might be expected, some not, thats why i am asking to see where i stand with this. > Test Data > postgres@dynacom=# SELECT * from test_sort_order; > fooname > ------------------------- > Cylinder head cover No1 > Cylinder Liner No1 > Cylinder head No1 > (3 rows) > > Now the query > # SELECT * from test_sort_order order by fooname; > in PostgreSQL 8.3.7 on x86_64-unknown-linux-gnu (lc_collate=en_US.UTF-8) gives > fooname > ------------------------- > Cylinder head cover No1 > Cylinder head No1 > Cylinder Liner No1 > > while in all of > PostgreSQL 8.3.3 on i686-pc-linux-gnu (lc_collate=C) > PostgreSQL 8.3.7 (lc_collate=el_GR.UTF-8) on i386-unknown-freebsd6.3 > PostgreSQL 8.4.1 on x86_64-unknown-freebsd8.0 (lc_collate=en_US.UTF-8), gives > fooname > ------------------------- > Cylinder Liner No1 > Cylinder head No1 > Cylinder head cover No1 > > Database encoding is SQL_ASCII in all four cases. > > -- > Achilleas Mantzios > -- Achilleas Mantzios
Achilleas Mantzios <achill@matrix.gatewaynet.com> writes: > What is the shortest way to change the default locale on a production installation running PostgreSQL-8.3.9? > Is there anything less painful than dump, initdb,restore? No :-( regards, tom lane
Στις Tuesday 02 February 2010 17:37:05 ο/η Tom Lane έγραψε: > Achilleas Mantzios <achill@matrix.gatewaynet.com> writes: > > What is the shortest way to change the default locale on a production installation running PostgreSQL-8.3.9? > > Is there anything less painful than dump, initdb,restore? > > No :-( > Thanx Tom, taking into account that the large and most important DB in our installation (over 100Gb as SQL dump) lives in a tablespaceof its own, is there a way to "cheat" postgresql after initdb in order to see this tablespace and the database in it, avoiding the wholecycle? i think this is not a good idea, i just thought i could ask, just to make sure. > regards, tom lane > -- Achilleas Mantzios
On tis, 2010-02-02 at 17:28 +0200, Achilleas Mantzios wrote: > It seems that libc's behaviour regarding LC_COLLATE in Linux/FreeBSD > are two different stories, hence the seen results. The UTF-8 locales on FreeBSD and Mac OS X are pretty much broken. This is becoming a FAQ ...
Στις Wednesday 03 February 2010 16:53:20 ο/η Peter Eisentraut έγραψε: > On tis, 2010-02-02 at 17:28 +0200, Achilleas Mantzios wrote: > > It seems that libc's behaviour regarding LC_COLLATE in Linux/FreeBSD > > are two different stories, hence the seen results. > > The UTF-8 locales on FreeBSD and Mac OS X are pretty much broken. > > This is becoming a FAQ ... This is a libc/locale combination issue. Care to elaborate? AFAICT for my case in FreeBSD, the greek UTF locale el_GR.UTF-8 behaves as expected without problems. > > -- Achilleas Mantzios
On ons, 2010-02-03 at 17:05 +0200, Achilleas Mantzios wrote: > Στις Wednesday 03 February 2010 16:53:20 ο/η Peter Eisentraut έγραψε: > > On tis, 2010-02-02 at 17:28 +0200, Achilleas Mantzios wrote: > > > It seems that libc's behaviour regarding LC_COLLATE in Linux/FreeBSD > > > are two different stories, hence the seen results. > > > > The UTF-8 locales on FreeBSD and Mac OS X are pretty much broken. > > > > This is becoming a FAQ ... > > This is a libc/locale combination issue. Care to elaborate? > AFAICT for my case in FreeBSD, the greek UTF locale el_GR.UTF-8 behaves as expected without problems. Well, """ while in all of PostgreSQL 8.3.3 on i686-pc-linux-gnu (lc_collate=C) PostgreSQL 8.3.7 (lc_collate=el_GR.UTF-8) on i386-unknown-freebsd6.3 PostgreSQL 8.4.1 on x86_64-unknown-freebsd8.0 (lc_collate=en_US.UTF-8), gives fooname ------------------------- Cylinder Liner No1 Cylinder head No1 Cylinder head cover No1 """ is not a correct sorting result in my book, unless your measure of correctness is ASCII byte value order. And then it's left as an exercise to determine what the results would be for texts involving non-ASCII letters.
Στις Wednesday 03 February 2010 21:16:36 ο/η Peter Eisentraut έγραψε: > On ons, 2010-02-03 at 17:05 +0200, Achilleas Mantzios wrote: > > Στις Wednesday 03 February 2010 16:53:20 ο/η Peter Eisentraut έγραψε: > > > On tis, 2010-02-02 at 17:28 +0200, Achilleas Mantzios wrote: > > > > It seems that libc's behaviour regarding LC_COLLATE in Linux/FreeBSD > > > > are two different stories, hence the seen results. > > > > > > The UTF-8 locales on FreeBSD and Mac OS X are pretty much broken. > > > > > > This is becoming a FAQ ... ^^^^ I guess you are right. In my new FreeBSD system i get: postgres@smadevnew:~> uname -sr FreeBSD 8.0-RELEASE-p2 postgres@smadevnew:~> ls -l /usr/share/locale/en_US.UTF-8/ total 0 lrwxr-xr-x 1 root wheel 28 Nov 21 16:30 LC_COLLATE -> ../la_LN.US-ASCII/LC_COLLATE lrwxr-xr-x 1 root wheel 17 Nov 21 16:30 LC_CTYPE -> ../UTF-8/LC_CTYPE lrwxr-xr-x 1 root wheel 30 Nov 21 16:30 LC_MESSAGES -> ../en_US.ISO8859-1/LC_MESSAGES lrwxr-xr-x 1 root wheel 30 Nov 21 16:30 LC_MONETARY -> ../en_US.ISO8859-1/LC_MONETARY lrwxr-xr-x 1 root wheel 29 Nov 21 16:30 LC_NUMERIC -> ../en_US.ISO8859-1/LC_NUMERIC lrwxr-xr-x 1 root wheel 26 Nov 21 16:30 LC_TIME -> ../en_US.ISO8859-1/LC_TIME > > -- Achilleas Mantzios