Re: Trouble with UTF-8 data
От | Tom Lane |
---|---|
Тема | Re: Trouble with UTF-8 data |
Дата | |
Msg-id | 16915.1200613130@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Trouble with UTF-8 data (Janine Sisk <janine@furfly.net>) |
Ответы |
Re: Trouble with UTF-8 data
|
Список | pgsql-general |
Janine Sisk <janine@furfly.net> writes: > But I'm still getting this error when loading the data into the new > database: > ERROR: invalid byte sequence for encoding "UTF8": 0xeda7a1 The reason PG doesn't like this sequence is that it corresponds to a Unicode "surrogate pair" code point, which is not supposed to ever appear in UTF-8 representation --- surrogate pairs are a kluge for UTF-16 to deal with Unicode code points of more than 16 bits. See http://en.wikipedia.org/wiki/UTF-16 I think you need a version of iconv that knows how to fold surrogate pairs into proper UTF-8 form. It might also be that the data is outright broken --- if this sequence isn't followed by another surrogate-pair sequence then it isn't valid Unicode by anybody's interpretation. 7.2.x unfortunately didn't check Unicode data carefully, and would have let this data pass without comment ... regards, tom lane
В списке pgsql-general по дате отправления: