Re: BUG #13785: Postgresql encoding screw-up
От | Feike Steenbergen |
---|---|
Тема | Re: BUG #13785: Postgresql encoding screw-up |
Дата | |
Msg-id | CAK_s-G21eiMoKdKeBM42Zgr5-LC7mZ14FCPJ9gPxqe0d8kw+hw@mail.gmail.com обсуждение исходный текст |
Ответ на | BUG #13785: Postgresql encoding screw-up (ntpt@seznam.cz) |
Список | pgsql-bugs |
Hi, > there is a major design flaw or bug I feel your pain, but how is this a bug? Once the character that cannot be mapped to latin2 is stored, there is no information about the source-encoding (win1250) of this character available anymore. Any client connecting (whether your application or pg_dump) will get that character "as is". I don't see a way around solving this in general, other than rejecting characters that do not fit in the target character set > where client use multiple encodings that have more characters then database > encoding, the database is screwed forever The allowed conversions from LATIN2 to other encodings is quite limited (MULE_INTERNAL, UTF8, WIN1250), , see: see: http://www.postgresql.org/docs/9.4/static/multibyte.html#AEN35768: If the clients using different encodings all touch the same data, the data is already dirty. The migration is only bringing it to light then. If the clients all touch different parts of the data, the data can be safely migrated by exporting distinct parts of data in its correct encoding and then importing it with that encoding in the the target database with UTF8 encoding. > I thik that safe practice would be: Pg_dum with -E as used by client > applicaton and then restore to newly created utf8 database . It should be > mentioned as safe way in the doc, at least This looks safe to me, you export unknown characters data into its original encoding thereby making them known again. If you now import this into UTF8 it will be encoded correctly, because both the source (WIN1250) as the target (UTF8) can encode these character. regards, Feike Steenbergen
В списке pgsql-bugs по дате отправления: