Re: Trouble with UTF-8 data
От | Albe Laurenz |
---|---|
Тема | Re: Trouble with UTF-8 data |
Дата | |
Msg-id | D960CB61B694CF459DCFB4B0128514C2CC26AD@exadv11.host.magwien.gv.at обсуждение исходный текст |
Ответ на | Re: Trouble with UTF-8 data (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: Trouble with UTF-8 data
|
Список | pgsql-general |
Tom Lane wrote: >> But I'm still getting this error when loading the data into the new >> database: > >> ERROR: invalid byte sequence for encoding "UTF8": 0xeda7a1 > > The reason PG doesn't like this sequence is that it corresponds to > a Unicode "surrogate pair" code point, which is not supposed to > ever appear in UTF-8 representation --- surrogate pairs are a kluge for > UTF-16 to deal with Unicode code points of more than 16 bits. 0xEDA7A1 (UTF-8) corresponds to UNICODE code point 0xD9E1, which, when interpreted as a high surrogare and followed by a low surrogate, would correspond to the UTF-16 encoding of a code point between 0x88400 and 0x887FF (depending on the value of the low surrogate). These code points do not correspond to any valid character. So - unless there is a flaw in my reasoning - there's something fishy with these data anyway. Janine, could you give us a hex dump of that line from the copy statement? Yours, Laurenz Albe
В списке pgsql-general по дате отправления: