Re: finding rows with invalid characters
От | Jasen Betts |
---|---|
Тема | Re: finding rows with invalid characters |
Дата | |
Msg-id | id3i3v$j4l$1@reversiblemaps.ath.cx обсуждение исходный текст |
Ответ на | finding rows with invalid characters (Sim Zacks <sim@compulab.co.il>) |
Список | pgsql-general |
On 2010-11-21, Sim Zacks <sim@compulab.co.il> wrote: > I am using PG 8.2.17 with UTF8 encoding. > "PostgreSQL 8.2.17 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.1 > (Gentoo 4.1.1)" > > One of my tables somehow has invalid characters in it: >> ERROR: invalid byte sequence for encoding "UTF8": 0xa9 >> HINT: This error can also happen if the byte sequence does not match >> the encoding expected by the server, which is controlled by >> "client_encoding". > I have already manually found a number of the bad rows by running > queries with text functions (upper) between groups of IDs until I found > the specific bad row. > > 1) Is there a quicker way to get a list of all rows with invalid characters dumpthe table, run it through iconv , diff agaist the original. > 2) Shouldn't the database prevent these rows from being entered in the > first place? it should have, but that bug has now been fixed. > 3) I have backups of this database (using -Fc) and I noticed that on > restore, this table is not restored because of this error. Is there a > way to fix the existing backups, or tell the restore to ignore bad rows > instead of erroring out the whole table? translate them to SQL (use pg_resore with no databse name) then you can again use iconv to clean them. use iconv something like this. iconv --from-code UTF8 --to-code UTF8 -c < input_file > output_file This will translate surrogates and drop other invalid characters. if you have any constraints that place lower bounds on string-length this has the potential to break them. -- ⚂⚃ 100% natural
В списке pgsql-general по дате отправления: