Обсуждение: Postgresql 9.4.4 - ERROR: invalid byte sequence for encoding "UTF8": 0x92
Hi, I have posted a question about this same issue on JDBC thinking it is a driver issue. I was told this error is generatedby the back-end itself rather than by the driver so posting the question in admin forum. See discussion on this here http://www.postgresql.org/list/pgsql-jdbc/since/201508080000/ I am currently running 9.1.9 and trying to upgrade to 9.4. I have done a dump and restore, when I start my java applicationI am getting the below error. The server uses SQL_ASCII encoding and the client encoding is UTF8. There are some invalid characters in the database but this has not caused a problem in the currentversion or 9.3 (tried a restore in 9.3 and the application works fine). ERROR: invalid byte sequence for encoding "UTF8": 0x92 STATEMENT: SELECT * FROM client_data WHERE status_code = 0 ORDER BY client_name, description org.postgresql.util.PSQLException: ERROR: invalid byte sequence for encoding "UTF8": 0x92 > at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2270) > at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1998) > at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:255) > at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:570) > at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:420) > at org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:305) > at com.sun.rowset.JdbcRowSetImpl.execute(JdbcRowSetImpl.java:567) Same error with postgresql-9.4-1201.jdbc4.jar & postgresql-9.1-902.jdbc4.jar. Appreciate your help. Thanks, Prasanth
On Aug 11, 2015, at 8:59 AM, Prasanth Reddy <dbadmin@nqadmin.com> wrote: > > The server uses SQL_ASCII encoding and the > client encoding is UTF8. There are some invalid characters in the database but this has not caused a problem in the currentversion or 9.3 (tried a restore in 9.3 and the application works fine). Later versions of PostgreSQL do better checking of UTF-8, and disallow invalid sequences. You're going to have to straighten out your encoding conflicts. -- Scott Ribe scott_ribe@elevated-dev.com http://www.elevated-dev.com/ https://www.linkedin.com/in/scottribe/ (303) 722-0567 voice
Prasanth Reddy <dbadmin@nqadmin.com> writes: > I am currently running 9.1.9 and trying to upgrade to 9.4. I have done a dump and restore, when I start my java applicationI am getting the below error. The server uses SQL_ASCII encoding and the > client encoding is UTF8. There are some invalid characters in the database but this has not caused a problem in the currentversion or 9.3 (tried a restore in 9.3 and the application works fine). > ERROR: invalid byte sequence for encoding "UTF8": 0x92 > STATEMENT: SELECT * FROM client_data WHERE status_code = 0 ORDER BY client_name, description You need to fix the encoding errors in your data. 9.4 is intentionally less lax about that than prior versions. Or, if you really want the database to be totally encoding-ignorant, use SQL_ASCII as both client and server encoding. But if you have the client declared to use UTF8, the server will try not to send anything that isn't valid UTF8. I believe the specific change that's biting you is Author: Tom Lane <tgl@sss.pgh.pa.us> Branch: master Release: REL9_4_BR [49c817eab] 2014-02-23 15:22:50 -0500 Plug some more holes in encoding conversion. Various places assume that pg_do_encoding_conversion() and pg_server_to_any() will ensure encoding validity of their results; but they failed to do so in the case that the source encoding is SQL_ASCII while the destination is not. We cannot perform any actual "conversion" in that scenario, but we should still validate the string according to the destination encoding. Per bug #9210 from Digoal Zhou. but there were some others of the same ilk in 9.4. regards, tom lane
Re: Postgresql 9.4.4 - ERROR: invalid byte sequence for encoding "UTF8": 0x92
От
Prasanth Reddy
Дата:
Thanks for the prompt response. I was playing with it a bit more and seems like any character with value less than 65533is working fine, guessing that is all Unicode characters. Does the server also reject an insert/update when there are invalid characters? I took a character that is supposed to be invalid (displayed asa small box, from application using 9.1 version) and pasted it in to application using 9.4 version of postgresql and I was able to save it to database. Should this have failed? If I find and fix all these characters (which would be a huge task), I want to make sure that the database is not going totake any new invalid characters. Please let me know if there is some setting I can change in the configuration to do this. Another option I was thinking of is may be change the encoding of the databaseitself to UTF8. Before the pg_restore used to fail when I tried the database encoding of UTF8 may be if I fix the invalid characters and then do a dump it would work. Thanks, Prasanth Prasanth Reddy <dbadmin(at)nqadmin(dot)com> writes: > I am currently running 9.1.9 and trying to upgrade to 9.4. I have done a dump and restore, when I start my java applicationI am getting the below error. The server uses SQL_ASCII encoding and the > client encoding is UTF8. There are some invalid characters in the database but this has not caused a problem in the currentversion or 9.3 (tried a restore in 9.3 and the application works fine). > ERROR: invalid byte sequence for encoding "UTF8": 0x92 > STATEMENT: SELECT * FROM client_data WHERE status_code = 0 ORDER BY client_name, description You need to fix the encoding errors in your data. 9.4 is intentionally less lax about that than prior versions. Or, if you really want the database to be totally encoding-ignorant, use SQL_ASCII as both client and server encoding. But if you have the client declared to use UTF8, the server will try not to send anything that isn't valid UTF8. I believe the specific change that's biting you is Author: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> Branch: master Release: REL9_4_BR [49c817eab] 2014-02-23 15:22:50 -0500 Plug some more holes in encoding conversion. Various places assume that pg_do_encoding_conversion() and pg_server_to_any() will ensure encoding validity of their results; but they failed to do so in the case that the source encoding is SQL_ASCII while the destination is not. We cannot perform any actual "conversion" in that scenario, but we should still validate the string according to the destination encoding. Per bug #9210 from Digoal Zhou. but there were some others of the same ilk in 9.4. regards, tom lane Prasanth Reddy <dbadmin(at)nqadmin(dot)com> writes: > I am currently running 9.1.9 and trying to upgrade to 9.4. I have done a dump and restore, when I start my java applicationI am getting the below error. The server uses SQL_ASCII encoding and the > client encoding is UTF8. There are some invalid characters in the database but this has not caused a problem in the currentversion or 9.3 (tried a restore in 9.3 and the application works fine). > ERROR: invalid byte sequence for encoding "UTF8": 0x92 > STATEMENT: SELECT * FROM client_data WHERE status_code = 0 ORDER BY client_name, description You need to fix the encoding errors in your data. 9.4 is intentionally less lax about that than prior versions. Or, if you really want the database to be totally encoding-ignorant, use SQL_ASCII as both client and server encoding. But if you have the client declared to use UTF8, the server will try not to send anything that isn't valid UTF8. I believe the specific change that's biting you is Author: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> Branch: master Release: REL9_4_BR [49c817eab] 2014-02-23 15:22:50 -0500 Plug some more holes in encoding conversion. Various places assume that pg_do_encoding_conversion() and pg_server_to_any() will ensure encoding validity of their results; but they failed to do so in the case that the source encoding is SQL_ASCII while the destination is not. We cannot perform any actual "conversion" in that scenario, but we should still validate the string according to the destination encoding. Per bug #9210 from Digoal Zhou. but there were some others of the same ilk in 9.4. regards, tom lane On 08/11/2015 09:59 AM, Prasanth Reddy wrote: > Hi, > > I have posted a question about this same issue on JDBC thinking it is a driver issue. I was told this error is generatedby the back-end itself rather than by the driver so posting the question in > admin forum. See discussion on this here http://www.postgresql.org/list/pgsql-jdbc/since/201508080000/ > > I am currently running 9.1.9 and trying to upgrade to 9.4. I have done a dump and restore, when I start my java applicationI am getting the below error. The server uses SQL_ASCII encoding and the > client encoding is UTF8. There are some invalid characters in the database but this has not caused a problem in the currentversion or 9.3 (tried a restore in 9.3 and the application works fine). > > ERROR: invalid byte sequence for encoding "UTF8": 0x92 > STATEMENT: SELECT * FROM client_data WHERE status_code = 0 ORDER BY client_name, description > > > org.postgresql.util.PSQLException: ERROR: invalid byte sequence for encoding "UTF8": 0x92 >> at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2270) >> at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1998) >> at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:255) >> at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:570) >> at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:420) >> at org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:305) >> at com.sun.rowset.JdbcRowSetImpl.execute(JdbcRowSetImpl.java:567) > > Same error with postgresql-9.4-1201.jdbc4.jar & postgresql-9.1-902.jdbc4.jar. > > Appreciate your help. > > Thanks, > Prasanth >