Re: Windows and locales and UTF-8 (oh my)
От | Magnus Hagander |
---|---|
Тема | Re: Windows and locales and UTF-8 (oh my) |
Дата | |
Msg-id | 20071015114010.GD5806@svr2.hagander.net обсуждение исходный текст |
Ответ на | Re: Windows and locales and UTF-8 (oh my) (Magnus Hagander <magnus@hagander.net>) |
Ответы |
Re: Windows and locales and UTF-8 (oh my)
|
Список | pgsql-hackers |
On Mon, Oct 15, 2007 at 01:26:00PM +0200, Magnus Hagander wrote: > On Mon, Oct 15, 2007 at 11:09:54AM +0200, Magnus Hagander wrote: > > On Sat, Oct 06, 2007 at 01:53:31PM -0400, Tom Lane wrote: > > > I am thinking that Dave's discovery explains some previously unsolved > > > bug reports, such as > > > http://archives.postgresql.org/pgsql-bugs/2007-05/msg00260.php > > > If Windows returns LC_CTYPE=C in a situation like this, then > > > the various single-byte-charset optimization paths that are enabled by > > > lc_ctype_is_c() would be mistakenly used, leading to misbehavior in > > > upper()/lower() and other places. ISTM we had better hack > > > lc_ctype_is_c() so that on Windows (only), if the database encoding > > > is UTF-8 then it returns FALSE regardless of what setlocale says. > > > > Yes, I think we a change to that routine. > > > > But. What about the case when we actually *have* locale=C and > > encoding=UTF8. We need to care for that one somehow. Perhaps we should look > > at LC_COLLATE instead (again, on Windows only. Possibly even only in the > > windows+locale_returns_c+encoring=utf8 case, to distinguish these two)? > > Hmm. Looking more at that, may there be another problem? Looking at > WriteControlFile(), it writes out what setlocale(LC_CTYPE) returns, which > will then be "C" - even if the database isn't in C. > > But I don't really know when that code is called, or if I'm just looking at > things wrong. Just starting up and shutting down the database leaves it at > Swedish_Sweden.1252, not C. > (1252 is still the wrong encoding specifyer, but it'll work anyway since we > convert to UTF16) Gah, got that backwards. Of course it does, because it only returns "C" if we set to Swedish_Sweden.65001, and we don't *do* that with the patch I sent in earlier. We set it to Swedish_Sweden, which is a perfectly valid LC_CTYPE. And given that, do we even nede to special-case lc_ctype_is_c() at all? If we never pass in a .65001 locale (which we don't, because it fails)? //Magnus
В списке pgsql-hackers по дате отправления: