Re: COPY fails on 8.1 with invalid byte sequences in text
От | Birju Prajapati |
---|---|
Тема | Re: COPY fails on 8.1 with invalid byte sequences in text |
Дата | |
Msg-id | e02b8c250610290427y34d8c4d7i49314ba597f12c0@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: COPY fails on 8.1 with invalid byte sequences in text ("Thomas H." <me@alternize.com>) |
Список | pgsql-bugs |
On 27/10/06, Thomas H. <me@alternize.com> wrote: > FYI, prior to 8.2, there is another source of bad UTF8 byte sequences: > > when using tsearch2 on utf8 content in <8.2, tsearch2 was generating bad > utf8 sequences. as tsearch2 does lowercase each char in the text its > indexing, it did also do so with multibyte-characters... unfortunately > taking each byte separately, so it seems. the unicode-representation of > german umlauts (=E4=F6=FC) are some examples of charcodes, that where tur= ned into > invalid sequences. > > this data could be successfully pg_dump'ed, but not pg_restore'd. in 8.2, > this looks fixed. to upgrade from 8.1.5 to 8.2b1 we had to remove all > tsearch2 index data, dump the db, restore the db in 8.2 and recreate the > indices. You need to initdb with utf8 and then install tsearch2 with utf8. Both need utf8. I had a similar problem. Perhaps your 8.1 postgres cluster wasn't utf8? > > - thomas > > > > ----- Original Message ----- > From: "Jeff Davis" <pgsql@j-davis.com> > To: <pgsql-bugs@postgresql.org> > Sent: Saturday, October 28, 2006 12:38 AM > Subject: Re: [BUGS] COPY fails on 8.1 with invalid byte sequences in text > > > > On Fri, 2006-10-27 at 14:42 -0700, Jeff Davis wrote: > >> It seems to be essentially a data corruption issue if applications > >> insert binary data in text fields using escape sequences. Shouldn't > >> PostgreSQL reject an invalid UTF8 sequence in any text type? > >> > > > > Another note: PostgreSQL rejects invalid UTF8 sequences in other > > contexts. For instance, if you use PQexecParams() and insert using type > > text and any format (text or binary), it will reject invalid sequences. > > It will of course allow anything to be sent when the type is bytea. > > > > Also, I thought I'd publish the workaround that I'm using. > > > > I created a function that seems to work for validating text data as > > being valid UTF8. > > > > CREATE OR REPLACE FUNCTION valid_utf8(TEXT) returns BOOLEAN > > LANGUAGE plperlu AS > > $valid_utf8$ > > use utf8; > > return utf8::decode($_[0]) ? 1 : 0; > > $valid_utf8$; > > > > I just add a check constraint on all of my text attributes in all of my > > tables. Not fun, but it works. > > > > Regards, > > Jeff Davis > > > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 6: explain analyze is your friend > > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org >
В списке pgsql-bugs по дате отправления: