Re: UTF-8 encoding problem w/ libpq
От | Martin Schäfer |
---|---|
Тема | Re: UTF-8 encoding problem w/ libpq |
Дата | |
Msg-id | 11A8567A97B15648846060F5CD818EB8CAC2253F62@DEV001EX.Dev.cadcorp.net обсуждение исходный текст |
Ответ на | Re: UTF-8 encoding problem w/ libpq (Heikki Linnakangas <hlinnakangas@vmware.com>) |
Ответы |
Re: UTF-8 encoding problem w/ libpq
|
Список | pgsql-hackers |
> Can't really blame Windows on that. On Windows, we don't require that the > encoding and LC_CTYPE's charset match. The OP used UTF-8 encoding in the > server, but LC_CTYPE="English_United Kingdom.1252", ie. LC_CTYPE implies > WIN1252 encoding. We allow that and it generally works on Windows > because in varstr_cmp, we use MultiByteToWideChar() followed by > wcscoll_l(), which doesn't care about the charset implied by LC_CTYPE. > But for isupper(), it matters. Does this mean that the UTF-8 messing up would disappear if the database were using a different locale for LC_CTYPE? If so,which locale should I use? This would be useful for a temporary workaround. > > We talked about this before and went off into the weeds about whether > > it was sensible to try to use towlower() and whether that wouldn't > > create undesirably platform-sensitive results. I wonder though if we > > couldn't just fix this code to not do anything to high-bit-set bytes > > in multibyte encodings. > > Yeah, we should do that. It makes no sense to call isupper or tolower on > bytes belonging to multi-byte characters. Actually, I would expect that 'create table HÄUSER (...)' would create a table named 'häuser', and not a table named 'hÄuser',so towlower seems the right choice IMHO. Martin
В списке pgsql-hackers по дате отправления: