Hi all,
I'd like to move my database encoding from SQL_ASCII to UTF8, mostly
because "No encoding conversion will be done when the setting is
SQL_ASCII. Thus, this setting is not so much a declaration that a
specific encoding is in use, as a declaration of ignorance about the
encoding." (from
http://www.postgresql.org/docs/current/static/multibyte.html) I saw a
few threads on the list regarding this before, for instance this one
(http://archives.postgresql.org/pgsql-admin/2004-01/msg00225.php) but
there's a specific issue that I'm having that wasn't addressed.
I have some UTF-8 data in my databases, and it's causing dump/restore
to fail. Specifically, I'm seeing messages like:
pg_restore: [archiver (db)] COPY failed: ERROR: invalid byte sequence
for encoding "UTF8": 0xe14c65
HINT: This error can also happen if the byte sequence does not match
the encoding expected by the server, which is controlled by
"client_encoding".
CONTEXT: COPY applicants, line 282
Which happens even if I specify "-E UTF8" in the pg_dump command.
Here's the weirder part. If I just update the encoding by hand in
pg_database (as cautiously suggested by Tom Lane in the aforementioned
thread), it works. I doubt this will work in the general case, and I'd
like to at least offer this option for other people's databases.
I also tried using GNU recode (version 3.6) as suggested in similar
threads, but I got errors in both the plain and custom pg_dump
formats.
$ recode ascii..utf8 man.sql
recode: man.sql failed: Invalid input in step `ANSI_X3.4-1968..UTF-8'
$ recode ..utf8 man.sql
recode: man.sql failed: Invalid input in step `CHAR..UTF-8'
Any ideas?
Peter