Re: ArrayIndexOutOfBoundsException in Encoding.decodeUTF8()
От | Joseph Shraibman |
---|---|
Тема | Re: ArrayIndexOutOfBoundsException in Encoding.decodeUTF8() |
Дата | |
Msg-id | 3E1C814A.1020307@selectacast.net обсуждение исходный текст |
Ответ на | Re: ArrayIndexOutOfBoundsException in Encoding.decodeUTF8() (Barry Lind <blind@xythos.com>) |
Список | pgsql-jdbc |
Well this data was inserted into postgres through the jdbc driver in the first place. So how come postgres itself didn't complain about non-ascii data? How do I change the encoding? And what will the side effects be? Barry Lind wrote: > Joseph, > > The problem is that your database claims to be ASCII, but you are > storing non-ascii data in it. > > As of 7.3 the jdbc driver has the server convert from the database > character set to UTF8. Then send the data to the driver in UTF8 and the > driver then decodes the UTF8 to java unicode. > > The conversion from ASCII to UTF8 is a noop since the 127 characters of > ascii map directly to the same values in UTF8. However since you are > storing not ASCII data the values that have the values from 128 - 255 > just get passed from the server to the client without any additional > processing (since there aren't supposed to be any values in this range), > but then when the driver tries to convert to java unicode, it can't > because it has received an invalid UTF8 character. > > It seems that you are actually storing Latin1 data in this database and > thus the database character set should probably be Latin1. > > In 7.2 is was possible to override the character set used by the driver, > however I don't think this works anymore when connecting to a 7.3 > server. .... looks at code .... Yes the override is ignored if the > server is a 7.3 server. You could hack at AbstractJdbc1Connection to > work around the issue or just correctly set the database character set > to match the data that the database contains. > > thanks, > --Barry > > > Joseph Shraibman wrote: > >> BTW the string that caused this is 'Oné' >> >> Joseph Shraibman wrote: >> >>> java.lang.ArrayIndexOutOfBoundsException: 3 >>> at org.postgresql.core.Encoding.decodeUTF8(Encoding.java:253) >>> at org.postgresql.core.Encoding.decode(Encoding.java:165) >>> at org.postgresql.core.Encoding.decode(Encoding.java:181) >>> at >>> org.postgresql.jdbc1.AbstractJdbc1ResultSet.getString(AbstractJdbc1ResultSet.java:97) >>> >>> >>> The relavent code is: >>> >>> while (i < k) { >>> z = data[i] & 0xFF; >>> if (z < 0x80) { >>> l_cdata[j++] = (char)data[i]; >>> i++; >>> } else if (z >= 0xE0) { // length == 3 >>> y = data[i+1] & 0xFF; //<<== THIS IS LINE 253 >>> x = data[i+2] & 0xFF; >>> val = (z-0xE0)*pow2_12 + (y-0x80)*pow2_6 + (x-0x80); >>> l_cdata[j++] = (char) val; >>> i+= 3; >>> } else { // length == 2 (maybe add checking for >>> length > 3, throw exception if it is >>> >>> >>> And in the method that calls that: >>> >>> if (encoding.equals("UTF-8")) { >>> return decodeUTF8(encodedString, offset, length); >>> } >>> >>> The thing is my database encoding is SQL_ASCII >>> >>> => SELECT version(), getdatabaseencoding() ; >>> >>> version | getdatabaseencoding >>> ---------------------------------------------------------------------------------------------------------+--------------------- >>> >>> PostgreSQL 7.3.1 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.2 >>> 20020903 (Red Hat Linux 8.0 3.2-7) | SQL_ASCII >>> (1 row) >>> >>> ... so why is it trying to decode the string as UTF-8? I just >>> upgraded this database from 7.2.3 yesterday. >>> >>
В списке pgsql-jdbc по дате отправления: