Re: Collate order on Mac OS X, text with diacritics in UTF-8
От | Craig Ringer |
---|---|
Тема | Re: Collate order on Mac OS X, text with diacritics in UTF-8 |
Дата | |
Msg-id | 4B4E845F.80906@postnewspapers.com.au обсуждение исходный текст |
Ответ на | Re: Collate order on Mac OS X, text with diacritics in UTF-8 (Martin Flahault <martin@billjobs.com>) |
Список | pgsql-general |
On 13/01/2010 11:15 PM, Martin Flahault wrote: > It seems there is a problem with the collating order on BSD systems with > diacritics using UTF8. > If you put this text : > a > A > à > é > e > E > > in a UTF8 text file and use the "sort" command on it, you will have the > same wrong output as with PostgreSQL : > A > E > a > e > à > é First: PostgreSQL expects the OS to behave correctly and sort according to the locale. It relies on the C library for this. If the C library doesn't do it right, PostgreSQL won't do it right either. So you need to get Mac OS X to do the right thing. Your results match what I get on a Linux system without a properly generated fr_FR.UTF-8 locale. Libc falls back on the "C" locale, which sorts that way. If I generate the fr_FR.UTF-8 locale and run the sort (on the file "x"), I get the desired result: LANG=fr_FR.UTF-8 LC_ALL=fr_FR.UTF-8 sort x a A à e E é I don't know Mac OS X well, but this is making me wonder if maybe you're just missing the required information for the locale, so libc is falling back on the "C" locale. (Of course, being Mac OS X there are probably at least three out of date or simply false "man" pages describing the behaviour, none of which reflect the reality of a magic config key buried somewhere in NetInfo, for which the documentation is also completely out of date. Bitter? Me? Yeah, I admin a bunch of OS X machines on a business network.) Hmm... a quick test suggests that Mac OS X (testing on 10.4) at least *thinks* it supports the fr_FR.UTF-8 locale: osx104$ LANG=xxx LC_ALL=xxx locale LANG="xxx" LC_COLLATE="C" LC_CTYPE="C" LC_MESSAGES="C" LC_MONETARY="C" LC_NUMERIC="C" LC_TIME="C" LC_ALL="C" osx104$ LANG=fr_FR.UTF-8 LC_ALL=fr_FR.UTF-8 locale LANG="fr_FR.UTF-8" LC_COLLATE="fr_FR.UTF-8" LC_CTYPE="fr_FR.UTF-8" LC_MESSAGES="fr_FR.UTF-8" LC_MONETARY="fr_FR.UTF-8" LC_NUMERIC="fr_FR.UTF-8" LC_TIME="fr_FR.UTF-8" LC_ALL="fr_FR.UTF-8" osx104$ locale -a | grep fr_FR fr_FR fr_FR.ISO8859-1 fr_FR.ISO8859-15 fr_FR.UTF-8 ... yet it clearly doesn't: osx104$ LANG=C LC_ALL=C sort x A E a e à é osx104$ LANG=fr_FR.UTF-8 LC_ALL=fr_FR.UTF-8 sort x A E a e à é osx104$ LANG=fr_FR.ISO8859-1 LC_ALL=fr_FR.ISO8859-1 sort x A E a e à é Mac OS X seems to keep its locale config in /usr/share/locale . Looking there, there are clearly LC_COLLATE files for fr_FR.UTF-8 . However, they're identical to those for en_US.UTF-8: osx104$ cd /usr/share/locale osx104$ diff fr_FR.UTF-8/LC_COLLATE en_US.UTF-8/LC_COLLATE ... so your OS's localized collation support is broken/missing, at least if the same is true for more modern versions of OS X. -- Craig Ringer
В списке pgsql-general по дате отправления: