Re: Dumping in LATIN1 and restoring in UTF-8
От | Marco Bizzarri |
---|---|
Тема | Re: Dumping in LATIN1 and restoring in UTF-8 |
Дата | |
Msg-id | 3f0d61c40607060051y583529cekda79e0eba249ab6d@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Dumping in LATIN1 and restoring in UTF-8 (Tino Wildenhain <tino@wildenhain.de>) |
Ответы |
Re: Dumping in LATIN1 and restoring in UTF-8
|
Список | pgsql-general |
On 7/6/06, Tino Wildenhain <tino@wildenhain.de> wrote: > Marco Bizzarri schrieb: > > Hi all. > > > > Here is my use case: I've an application which uses PostgreSQL as > > backend. Up to now, the database was encoded in SQL_ASCII or LATIN1. > > Now, we need to migrate to UTF-8. > > > > What we tried, was to: > > > > 1) dump the database using pg_dump, in tar format (we had blob); > > 2) modifying the result, using some conversion tool (like recode) > > > > > > 3) destroying the old database > > 4) recreating the database with UNICODE setting > > 5) restoring the database using pg_restore > > > > The result was not what I expected. The pg_restore was using the > > LATIN1 encoding to encode the strings, resulting in a LATIN1 encoded > > in UTF-8... > > > > The problem lied in the toc.dat file, which stated that the client > > encoding was LATIN1, instead of UTF-8. > > > > The solution in the end has been to manually modifying the toc.dat > > file, substituting the LATIN1 string with UTF-8 (plus a space, since > > the toc.dat is a binary file). > > > > Even though it worked for us, I wonder if there is any other way to > > accomplish the same result, at least to specify the encoding for the > > restore. > > Yes, its actually quite esay: you dump as you feel apropriate, > then create the database with the encoding you want, > restore w/o creating database and you are done. > Restore sets the client encoding to what it actually was > in the dump data (in your case latin-1) and the database > would be utf-8 - postgres automatically recodes. No need > for iconv and friends. > > Regards > Tino > First of all, thank you for your answer. However, I suspect I did not understand your answer, since the commands I used were: 1) pg_dump -Ft -b -f dump.sql.tar database 2) dropdb database 3) createdb -E UNICODE database 4) pg_restore -d database dump.sql.tar According to my experience, this produces a "double encoding". As you can see, I hand-created the database, with the proper encoding. However, when I reimported the database, the result was a latin1 encoded in utf-8, rather than a pure utf-8. How my procedure was different with respect to yours? I will make some test with a sample database, and enabling the logging, so that I can understand the commands which are issued. Regards Marco -- Marco Bizzarri http://notenotturne.blogspot.com/
В списке pgsql-general по дате отправления: