Re: Trouble with UTF-8 data
От | Janine Sisk |
---|---|
Тема | Re: Trouble with UTF-8 data |
Дата | |
Msg-id | 305D0D29-63FE-4EA3-8524-039B9E69B884@furfly.net обсуждение исходный текст |
Ответ на | Re: Trouble with UTF-8 data ("Albe Laurenz" <laurenz.albe@wien.gv.at>) |
Ответы |
Re: Trouble with UTF-8 data
|
Список | pgsql-general |
On Jan 18, 2008, at 12:00 AM, Albe Laurenz wrote: > 0xEDA7A1 (UTF-8) corresponds to UNICODE code point 0xD9E1, which, > when interpreted as a high surrogare and followed by a low surrogate, > would correspond to the UTF-16 encoding of a code point > between 0x88400 and 0x887FF (depending on the value of the low > surrogate). > > These code points do not correspond to any valid character. > So - unless there is a flaw in my reasoning - there's something > fishy with these data anyway. > > Janine, could you give us a hex dump of that line from the copy > statement? Certainly. Do you want to see it as it came from the old database, or after I ran it through iconv? Although iconv wasn't able to solve this problem it did fix others in other tables; unfortunately I have no way of knowing if it also mangled some data at the same time. The version of iconv I have does know about UTF16 so I tried using that as the "from" encoding instead of UTF8, but the result had new errors in places where the original data was good, so that was obviously a step backwards. BTW, in case it matters I found out I misidentified the version of PG this data came from - it's actually 7.3.6. thanks, janine
В списке pgsql-general по дате отправления: