Обсуждение: Can't restore a pg_dump due to encoding errors

Поиск
Список
Период
Сортировка

Can't restore a pg_dump due to encoding errors

От
Dan Harris
Дата:
Hello all:

I'm in a bit of a bind here.  Today is my maintenance window for upgrading from
8.0.3 to 8.2.3.  I did a pg_dumpall overnight ( resulting in a 72GB file ) and
then a pg_dump on one of my small databases this morning to test pg_restore.
When trying to reload, I'm getting the following error:

psql:sac.bak:2793776: ERROR:  invalid byte sequence for encoding "UTF8": 0x92

A google search came up with this link:
http://www.hostingforum.ca/280472-invalid-byte-sequence-encoding-utf8.html

Which sounds very simliar to what's going on here.

Again the dump was done on 8.0.3, which sounds like it was before the anti-sql
injection patch that would refuse to import "invalid" characters.  The database
( for now ) *should* only be storing English/Latin characters.  Although, the
old server had LOCALE='C' previously ( this was unnecessary so I'm trying to
drop that today ).  I don't know if this would matter, but that's the only thing
that was different from a bog standard Postgres setup besides the obvious tuning
changes.  I have never changed any of the encoding settings on client or server
from the defaults.

So far, I tried editing the dump file and changing the client_encoding to 'UTF8'
at the top, hoping this might make a difference but did not.

The advice in that thread seems to be "fix the bad data".  But I am not seeing
how I can tell exactly what or where this bad data is.  Luckily, this small dump
file is manageable by opening in vim and pointing to the line 2793776 the error
references, however the line referenced is just the "\." at the end of a huge
COPY section.  I don't know which line inside that secion is the problem and
there's enough of them that it'd be pretty impractical for me to visually scan
the file looking for something ( I'm not even quite sure what to look for at
this point ).  And, if that is the only solution and I run into a problem
restoring the 72GB dump file, I'm going to be in some serious trouble,
especially if I am not alerted to the problem until *after* it's finished
looking at all the COPY statements!

What say the sages?

Thanks

-Dan

Re: Can't restore a pg_dump due to encoding errors

От
adey
Дата:
We had a similar conversion conflict some time back whe upgrading and found using a search for anything not A-Z, 0-9, etc, was useful to find offensive data and fix it.

On 4/8/07, Dan Harris <fbsd@drivefaster.net> wrote:
Hello all:

I'm in a bit of a bind here.  Today is my maintenance window for upgrading from
8.0.3 to 8.2.3.  I did a pg_dumpall overnight ( resulting in a 72GB file ) and
then a pg_dump on one of my small databases this morning to test pg_restore.
When trying to reload, I'm getting the following error:

psql:sac.bak:2793776: ERROR:  invalid byte sequence for encoding "UTF8": 0x92

A google search came up with this link:
http://www.hostingforum.ca/280472-invalid-byte-sequence-encoding-utf8.html

Which sounds very simliar to what's going on here.

Again the dump was done on 8.0.3, which sounds like it was before the anti-sql
injection patch that would refuse to import "invalid" characters.  The database
( for now ) *should* only be storing English/Latin characters.  Although, the
old server had LOCALE='C' previously ( this was unnecessary so I'm trying to
drop that today ).  I don't know if this would matter, but that's the only thing
that was different from a bog standard Postgres setup besides the obvious tuning
changes.  I have never changed any of the encoding settings on client or server
from the defaults.

So far, I tried editing the dump file and changing the client_encoding to 'UTF8'
at the top, hoping this might make a difference but did not.

The advice in that thread seems to be "fix the bad data".  But I am not seeing
how I can tell exactly what or where this bad data is.  Luckily, this small dump
file is manageable by opening in vim and pointing to the line 2793776 the error
references, however the line referenced is just the "\." at the end of a huge
COPY section.  I don't know which line inside that secion is the problem and
there's enough of them that it'd be pretty impractical for me to visually scan
the file looking for something ( I'm not even quite sure what to look for at
this point ).  And, if that is the only solution and I run into a problem
restoring the 72GB dump file, I'm going to be in some serious trouble,
especially if I am not alerted to the problem until *after* it's finished
looking at all the COPY statements!

What say the sages?

Thanks

-Dan

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to majordomo@postgresql.org so that your
      message can get through to the mailing list cleanly

Re: Can't restore a pg_dump due to encoding errors

От
Dan Harris
Дата:
Dan Harris wrote:
> Hello all:
>
> I'm in a bit of a bind here.  Today is my maintenance window for
> upgrading from 8.0.3 to 8.2.3.  I did a pg_dumpall overnight ( resulting
> in a 72GB file ) and then a pg_dump on one of my small databases this
> morning to test pg_restore. When trying to reload, I'm getting the
> following error:
>
> psql:sac.bak:2793776: ERROR:  invalid byte sequence for encoding "UTF8":
> 0x92

I found the utility 'iconv', which sounds like it addresses this problem.  I ran
the utility against the dump file and then it loaded without error.. Just
validating that the data is indeed there.