BUG #8105: names are transformed to lowercase incorrectly
От | pg@kolesar.hu |
---|---|
Тема | BUG #8105: names are transformed to lowercase incorrectly |
Дата | |
Msg-id | E1UUHU1-0000iG-BT@wrigleys.postgresql.org обсуждение исходный текст |
Список | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 8105 Logged by: Andr=C3=A1s Koles=C3=A1r Email address: pg@kolesar.hu PostgreSQL version: 9.1.5 Operating system: Windows = Description: = If I specify an unicode field name without quotes, field name gets lowecased incorrectly. pgAdmin 1.14.2 on Linux, PostgreSQL server 9.1.5 on Windows: SELECT =C3=A9rt=C3=A9k FROM (SELECT 1 AS "=C3=A9rt=C3=A9k") AS x; ********** Error ********** SQL state: 42703 Character: 8 In the example above I specify an unicode column name ("=C3=A9rt=C3=A9k" me= ans "value" in Hungarian), then I try to read it. If I use double quotes in the outer query, it works. However, the above example works fine if the server runs on Linux: "PostgreSQL 9.1.9 on i686-pc-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.7.2-2ubuntu1) 4.7.2, 32-bit" I see the same problem from PHP client. There is a more verbose error message: ERROR: column "=EF=BF=BDrt=EF=BF=BDk" does not exist LINE 1: SELECT =C3=A9rt=C3=A9k FROM (SELECT 1 AS "=C3=A9rt=C3=A9k") AS x ^ The "=C3=A9" character is represented incorrectly in the error message, it = shows where the problem is. This character (U+00E9) is represented in UTF8 as C3 A9. In the error message it is an invalid UTF8 sequence: E3 A9. I think Windows uses Windows-1250 or Windows-1252 character set where C3 lowers to E3. A9 survives tolower() because it means =C2=A9 (copyright sign) in these charsets, without lowercase pair. I have localized the problem in PostgreSQL source: src/backend/parser/scansup.c:128 char * downcase_truncate_identifier(const char *ident, int len, bool warn) { // ... for (i =3D 0; i < len; i++) // ... if (IS_HIGHBIT_SET(ch) && isupper(ch)) ch =3D tolower(ch); This function walks through identifiers byte-by-byte, lowers them if they were individual characters. This is incorrect in multibyte character sets. It works on Linux with UTF8 system encoding because isupper() returns false both for C3 and A9. The same issue is reported below: Database object names and libpq in UTF-8 locale on Windows http://permalink.gmane.org/gmane.comp.db.postgresql.sql/29464 Solution 1: tolower() only A-Z. Solution 2: use a lowercase function that uses client_encoding
В списке pgsql-bugs по дате отправления: