Re: Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.
От | Hiroshi Inoue |
---|---|
Тема | Re: Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use. |
Дата | |
Msg-id | 493733EE.7000503@tpf.co.jp обсуждение исходный текст |
Ответ на | Re: Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use. (Magnus Hagander <magnus@hagander.net>) |
Список | pgsql-hackers |
Magnus Hagander wrote: > Hiroshi Inoue wrote: >>> I think the thing us that as long as the encodings are compatible >>> (latin1 with different names for example) it worked fine. >>> >>>> In any case I think the problem is that gettext is >>>> looking at a setting that is not what we are looking at. Particularly >>>> with the 8.4 changes to allow per-database locale settings, this has >>>> got to be fixed in a bulletproof way. >> Attached is a new patch to apply bind_textdomain_codeset() to most >> server encodings. Exceptions are PG_SQL_ASCII, PG_MULE_INTERNAL >> and PG_EUC_JIS_2004. "EUC-JP" may be OK for EUC_JIS_2004. >> >> Unfortunately it's hard for Saito-san and me to check encodings >> other than EUC-JP. > > In principle this looks good, I think, but I'm a bit worried around the > lack of testing. Thanks and I agree with you. > I can do some testing under LATIN1 which is what we use > in Sweden (just need to get gettext working *at all* in my dev > environment again - I've somehow managed to break it), and perhaps we > can find someone to do a test in an eastern-european locale to get some > more datapoints? > > Can you outline the steps one needs to go through to show the problem, > so we can confirm it's fixed in these locales? Saito-san and I have been working on another related problem about changing LC_MESSAGES locale properly under Windows and would be able to provide a patch in a few days. It seems preferable for us to apply the patch also so as to change the message catalog easily. Attached is an example in which LC_MESSAGES is cht_twn(zh_TW) and the server encoding is EUC-TW. You can see it as a UTF-8 text because the client_encoding is set to UTF-8 first. BTW you can see another problem at line 4 in the text. At the point the LC_MESSAGES is still japanese and postgres fails to convert a Japanese error message to EUC_TW encoding. There's no wonder but it doesn't seem preferable. regards, Hiroshi Inoue set client_encoding to utf_8; SET 1; psql:cmd/euctw.sql:2: ERROR: character 0xb9e6 of encoding "EUC_TW" has no equivalent in "UTF8" select current_database(); current_database ------------------ euctw (1 �s) show server_encoding; server_encoding ----------------- EUC_TW (1 �s) show lc_messages; lc_messages -------------------- Japanese_Japan.932 (1 �s) set lc_messages to cht; SET select a; psql:cmd/euctw.sql:7: 錯誤: 欄位"a"不存在 LINE 1: select a; ^ 1; psql:cmd/euctw.sql:8: 錯誤: 在"語法錯誤"附近發生 1 LINE 1: 1; ^ select * from a; psql:cmd/euctw.sql:9: 錯誤: relation "a"不存在 LINE 1: select * from a; ^
В списке pgsql-hackers по дате отправления: