Re: [BUGS] Bug #659: lower()/upper() bug on
От | Hannu Krosing |
---|---|
Тема | Re: [BUGS] Bug #659: lower()/upper() bug on |
Дата | |
Msg-id | 1021365344.2382.13.camel@taru.tm.ee обсуждение исходный текст |
Ответ на | Re: [BUGS] Bug #659: lower()/upper() bug on (Tatsuo Ishii <t-ishii@sra.co.jp>) |
Ответы |
Re: [BUGS] Bug #659: lower()/upper() bug on
|
Список | pgsql-hackers |
On Tue, 2002-05-14 at 03:29, Tatsuo Ishii wrote: > > I think it is really not hard to do this for UTF-8. I don't have to know the > > relation between the locale and the encoding. Look at this: > > We can use the LC_CTYPE from pg_controldata or alternatively the LC_CTYPE > > at server startup. For nearly every locale (de_DE, ja_JP, ...) there exists > > also a locale *.utf8 (de_DE.utf8, ja_JP.utf8, ...) at least for the actual Linux glibc. > > My Linux box does not have *.utf8 locales at all. Probably not so many > platforms have them up to now, I guess. What linux do you use ? At least newer Redhat Linuxen have them and I suspect that all newer glibc's are capable of using them. > > > We don't need to know more than this. If we call > > setlocale(LC_CTYPE, <value of LC_CTYPE extended with .utf8 if not already given>) > > then glibc is aware of doing all the conversions. I attach a small demo program > > which set the locale ja_JP.utf8 and is able to translate german umlaut A (upper) to > > german umlaut a (lower). > > Interesting idea, but the problem is we have to decide to use exactly > one locale before initdb. In my understanding, users willing to use > Unicode (UTF-8) tend to use multiple languages. This is natural since > Unicode claims it can handle several languages. For example, user > might want to have a table like this in a UTF-8 database: > > create table t1( > english text, -- English message > germany text, -- Germany message > japanese text -- Japanese message > ); > > If you have set the local to, say de_DE, then: > > select lower(japanese) from t1; > > would be executed in de_DE.utf8 locale, and I doubt it produces any > meaningfull results for Japanese. IIRC it may, as I think that it will include full UTF8 upper/lower tables, at least on Linux. For example en_US will produce right upper/lower results for Estonian, though collation is off and some chars are missing if using iso-8859-1. btw, does Japanese language have distinct upper and lower case letters ? -------------- Hannu
В списке pgsql-hackers по дате отправления: