Re: character 0xe29986 of encoding "UTF8" has no equivalent in "LATIN2"
От | Andreas Kalsch |
---|---|
Тема | Re: character 0xe29986 of encoding "UTF8" has no equivalent in "LATIN2" |
Дата | |
Msg-id | 4A783164.9040804@gmx.de обсуждение исходный текст |
Ответ на | Re: character 0xe29986 of encoding "UTF8" has no equivalent in "LATIN2" (Alban Hertroys <dalroi@solfertje.student.utwente.nl>) |
Ответы |
Re: character 0xe29986 of encoding "UTF8" has no equivalent in "LATIN2"
|
Список | pgsql-general |
Alban, what I do to simplify the data chain: HTTP encoding > PHP string encoding > client connection > server - all is UTF8. Plus invalid byte check in PHP (or server). What I have tested inside Postgres is entering a 3 byte UTF8 character to this function. And I have got an error. This is a character I will not filter out, if some Unicode artists will enter it. It is an international website and the simplification is just for indexing. But I think that this will not solve the problem and I have to use Python or Perl to get it done. Alban Hertroys schrieb: > On 4 Aug 2009, at 24:57, Andreas Kalsch wrote: > >>> I think the real problem is: Where do you lose the original encoding >>> the users input their data with? If you specify that encoding on the >>> connection and send it to a database that can handle UTF-8 then you >>> shouldn't be getting any conversion problems in the first place. >> Nowhere - I will validate input data on the client side (PHP or >> Python) and send it to the server. Of course the only encoding I will >> use on any side is UTF8. I just wnated to use this Latin thing for >> simplification of characters. > > Yes you are. How could your users input invalid characters in the > first place if that were not the case? You're not suggesting they > managed to enter characters in an encoding for which they weren't > valid on their own systems, do you?[1] > > You say your client is using PHP or Python, which suggests it's a > website. That means the input goes like this: web browser -> website > -> database. All three of those steps use some encoding and you can > take them into account. That should prevent this problem altogether. > > You have control over which encoding your client and the database use, > and the web browser tells what encoding it used in the POST request so > you can pass that along to the database when storing data or convert > it in your client. > > [1] There exists of course a small group of people who enjoy posting > raw byte data to a web-form, but would it matter whether they'd get an > error about their encoding or not? They do not intend to enter valid > data after all ;) > > Alban Hertroys > > -- > If you can't see the forest for the trees, > cut the trees and you'll see there is no forest. > > > !DSPAM:933,4a7820e310131447310801! > > >
В списке pgsql-general по дате отправления: