Re: [WIP] collation support revisited (phase 1)
От | Zdenek Kotala |
---|---|
Тема | Re: [WIP] collation support revisited (phase 1) |
Дата | |
Msg-id | 488612DE.5060206@sun.com обсуждение исходный текст |
Ответ на | Re: [WIP] collation support revisited (phase 1) (Martijn van Oosterhout <kleptog@svana.org>) |
Список | pgsql-hackers |
Martijn van Oosterhout napsal(a): > On Mon, Jul 21, 2008 at 03:15:56AM +0200, Radek Strnad wrote: >> I was trying to sort out the problem with not creating new catalog for >> character sets and I came up following ideas. Correct me if my ideas are >> wrong. >> >> Since collation has to have a defined character set. > > Not really. AIUI at least glibc and ICU define a collation over all > possible characters (ie unicode). When you create a locale you take a > subset and use that. Think about it: if you want to sort strings and > one of them happens to contain a chinese charater, it can't *fail*. > Note strcoll() has no error return for unknown characters. It has. See http://www.opengroup.org/onlinepubs/009695399/functions/strcoll.html The strcoll() function may fail if: [EINVAL] [CX] The s1 or s2 arguments contain characters outside the domain of the collating sequence. >> I'm suggesting to use >> already written infrastructure of encodings and to use list of encodings in >> chklocale.c. Currently databases are not created with specified character >> set but with specified encoding. I think instead of pointing a record in >> collation catalog to another record in character set catalog we might use >> only name (string) of the encoding. > > That's reasonable. From an abstract point of view collations and > encodings are orthoginal, it's only when you're using POSIX locales > that there are limitations on how you combine them. I think you can > assume a collation can handle any characters that can be produced by > encoding. I think you are not correct. You cannot use collation over all UNICODE. See http://www.unicode.org/reports/tr10/#Common_Misperceptions. Same characters can be ordered differently in different languages. Zdenek -- Zdenek Kotala Sun Microsystems Prague, Czech Republic http://sun.com/postgresql
В списке pgsql-hackers по дате отправления: