Re: error while trying to change the database encoding on a database
От | Geoffrey Myers |
---|---|
Тема | Re: error while trying to change the database encoding on a database |
Дата | |
Msg-id | 4D3DCB8B.4060400@serioustechnology.com обсуждение исходный текст |
Ответ на | Re: error while trying to change the database encoding on a database (Adrian Klaver <adrian.klaver@gmail.com>) |
Ответы |
Re: error while trying to change the database encoding
on a database
|
Список | pgsql-general |
Adrian Klaver wrote: > On 01/24/2011 09:16 AM, Geoffrey Myers wrote: > >> >> We hope to identify the characters and fix them in the existing >> database, then convert. It appears to be very limited, but it would help >> if there was some way to identify these characters outside of simply >> doing the reload of the data and finding the errors. >> >> Hence the reason I asked about a resource that might identify the >> characters. > > The problem is that from the standpoint of the SQL_ASCII database there > is nothing wrong with the characters per se. AFAIK there is no built in > function to validate characters. The reason is that valid is determined > by the encoding and if you know the encoding then you really don't need > to determine validity. If you want to see one way others have tackled > this, search on iconv in the mailing list archive. This requires working > on an external copy of the data and knowing something about the > encodings involved. The nearest I could ever find to an encoding > detector is: > > http://chardet.feedparser.org/ > > It is a Python program and the encodings it detects are limited but it > might work for you. > > Given all the above, when I was faced with the problem you are facing I > found it easiest to make an educated guess as to the original encoding > and then do test restores with client_encoding set to my guess. Understood. We had figured the problem to be small, and it appears it is and thus felt we could address it a character at a time. Then get this error: pg_restore: [archiver (db)] Error from TOC entry 5258; 0 17549 TABLE DATA fax postgres pg_restore: [archiver (db)] COPY failed: ERROR: invalid byte sequence for encoding "UTF8": 0xe28053 That hex value doesn't translate to a single character. I've dumped the data to a file as you suggested, but reviewing the identified line brings no joy. -- Until later, Geoffrey "I predict future happiness for America if they can prevent the government from wasting the labors of the people under the pretense of taking care of them." - Thomas Jefferson
В списке pgsql-general по дате отправления: