Re: [HACKERS] UTF-8 data migration problem in Postgresql 7.2
От | Patrice Hédé |
---|---|
Тема | Re: [HACKERS] UTF-8 data migration problem in Postgresql 7.2 |
Дата | |
Msg-id | 20020221181911.GB19184@idf.net обсуждение исходный текст |
Ответ на | Re: [HACKERS] UTF-8 data migration problem in Postgresql 7.2 (Jean-Michel POURE <jm.poure@freesurf.fr>) |
Список | pgsql-odbc |
Hi Jean-Michel, I just started browsing this list again after a long absence... * Jean-Michel POURE <jm.poure@freesurf.fr> [020221 18:39]: > 5) Surrogate pairs > I heard PostgreSQL did not support surrogate pairs. Is this a problem of > surrogate pair? Just my 0.02 cents, I know very little about UTF-8. Surrogate pairs only exist in UTF-16. They are used to access characters which are not on the BMP. UTF-8 has a different way to encode these characters. Encoding surrogates in UTF-8 is invalid and should be rejected by any application receiving a UTF-8 stream (actually, they used to be just irregular, but starting with Unicode 3.2, they will be illegal). Regarding your sequence E3/82/27, it cannot be valid under any scheme. UTF-8 is done in a way that any subsequent byte is equal or above 0x80. For E3 in particular, the 3rd byte has to be between 80 and BF. Anyway "UTF-8 encoded surrogates" can only start with ED, so that's not your problem here. Hope this helps. Patrice -- Patrice Hédé email: patrice hede à islande org www : http://www.islande.org/
В списке pgsql-odbc по дате отправления: