PostgreSQL fails to convert decomposed utf-8 to other encodings
От | Craig Ringer |
---|---|
Тема | PostgreSQL fails to convert decomposed utf-8 to other encodings |
Дата | |
Msg-id | 53E179E1.3060404@2ndquadrant.com обсуждение исходный текст |
Ответы |
Re: PostgreSQL fails to convert decomposed utf-8 to other encodings
|
Список | pgsql-bugs |
There's a bug in encoding conversions from utf-8 to other encodings that results in corrupt output if decomposed utf-8 is used. PostgreSQL doesn't process utf-8 to pre-composed form first, so decomposed UTF-8 is not handled correctly. Take á: regress=> -- Decomposed - 'a' then 'acute' regress=> SELECT E'\u0061\u0301'; ' ?column? ---------- aÌ (1 row) regress=> -- Precomposed - 'a-acute' regress=> SELECT E'\u00E1'; ?column? ---------- á (1 row) regress=> SELECT convert_to(E'\u0061\u0301', 'iso-8859-1'); ERROR: character with byte sequence 0xcc 0x81 in encoding "UTF8" has no equivalent in encoding "LATIN1" regress=> SELECT convert_to(E'\u00E1', 'iso-8859-1'); convert_to ------------ \xe1 (1 row) This affects input from the client too: regress=> SELECT convert_to('aÌ', 'iso-8859-1'); ERROR: character with byte sequence 0xcc 0x81 in encoding "UTF8" has no equivalent in encoding "LATIN1" regress=> SELECT convert_to('á', 'iso-8859-1'); convert_to ------------ \xe1 (1 row) ... yes, that looks like the same function producing different results on identical input. You might not be able to reproduce with copy and paste from this mail if your client normalizes UTF-8, but you'll be able to by printing the decomposed character to your terminal as an escape string, then copying and pasting from there. We should've probably been normalizing decomposed sequences to precomposed as part of utf-8 validation wherever 'text' input occurs, but it's too late for that now as DBs in the wild will contain decomposed chars. Instead, conversion functions need to normalize decomposed chars to precomposed before converting from utf-8 to another encoding. Comments? -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
В списке pgsql-bugs по дате отправления: