Re: UTF-8 encoding problem w/ libpq
От | Andrew Dunstan |
---|---|
Тема | Re: UTF-8 encoding problem w/ libpq |
Дата | |
Msg-id | 51ACD5F5.3030407@dunslane.net обсуждение исходный текст |
Ответ на | Re: UTF-8 encoding problem w/ libpq (Heikki Linnakangas <hlinnakangas@vmware.com>) |
Список | pgsql-hackers |
On 06/03/2013 12:22 PM, Heikki Linnakangas wrote: > On 03.06.2013 18:27, ktm@rice.edu wrote: >> On Mon, Jun 03, 2013 at 04:09:29PM +0100, Martin Schäfer wrote: >>> >>>>> If I change the strCreate query and add double quotes around the >>>>> column >>>> name, then the problem disappears. But the original name is already in >>>> lowercase, so I think it should also work without quoting the >>>> column name. >>>>> Am I missing some setup in either the database or in the use of >>>>> libpq? >>>>> >>>>> I’m using PostgreSQL 9.2.1, compiled by Visual C++ build 1600, 64-bit >>>>> >>>>> The database uses: >>>>> ENCODING = 'UTF8' >>>>> LC_COLLATE = 'English_United Kingdom.1252' >>>>> LC_CTYPE = 'English_United Kingdom.1252' >>>>> >>>>> Thanks for any help, >>>>> >>>>> Martin >>>>> >>>> >>>> Hi Martin, >>>> >>>> If you do not want the lowercase behavior, you must put double-quotes >>>> around the column name per the documentation: >>>> >>>> http://www.postgresql.org/docs/9.2/interactive/sql-syntax- >>>> lexical.html#SQL-SYNTAX-IDENTIFIERS >>>> >>>> section 4.1.1. >>>> >>>> Regards, >>>> Ken >>> >>> The original name 'id_äß' is already in lowercase. The backend >>> should leave it unchanged IMO. >> >> Only in utf-8 which needs to be double-quoted for a column name as >> you have >> seen, otherwise the value will be lowercased per byte. > > He *is* using UTF-8. Or trying to, anyway :-). The downcasing in the > backend is supposed to leave bytes with the high-bit set alone, ie. in > UTF-8 encoding, it's supposed to leave ä and ß alone. > > I suspect that the conversion to UTF-8, before the string is sent to > the server, is not being done correctly. I'm not sure what's wrong > there, but I'd suggest printing the actual byte sequence sent to the > server, to check if it's in fact valid UTF-8. ie. replace the PQexec() > line with something like: > > const char *s = ToUtf8(strCreate.c_str()).c_str(); > int i; > for (i=0; s[i]; i++) > printf("%02x", (unsigned char) s[i]); > printf("\n"); > pResult = PQexec(pConn, s); > > That should contain the UTF-8 byte sequence for äß, "c3a4c39f" > > Umm, no, the backend code doesn't do it right. Some time ago I suggested a fix for this - see <http://www.postgresql.org/message-id/50ACF7FA.7070108@dunslane.net>. Tom thought there might be other places that need fixing, and I haven't had time to look for them. But maybe we should just fix this one for now at least. cheers andrew
В списке pgsql-hackers по дате отправления: