Re: ArrayIndexOutOfBoundsException in Encoding.decodeUTF8()

Поиск
Список
Период
Сортировка
От Joseph Shraibman
Тема Re: ArrayIndexOutOfBoundsException in Encoding.decodeUTF8()
Дата
Msg-id 3E1C814A.1020307@selectacast.net
обсуждение исходный текст
Ответ на Re: ArrayIndexOutOfBoundsException in Encoding.decodeUTF8()  (Barry Lind <blind@xythos.com>)
Список pgsql-jdbc
Well this data was inserted into postgres through the jdbc driver in the first place.

So how come postgres itself didn't complain about non-ascii data?  How do I change the
encoding?  And what will the side effects be?

Barry Lind wrote:
> Joseph,
>
> The problem is that your database claims to be ASCII, but you are
> storing non-ascii data in it.
>
> As of 7.3 the jdbc driver has the server convert from the database
> character set to UTF8.  Then send the data to the driver in UTF8 and the
> driver then decodes the UTF8 to java unicode.
>
> The conversion from ASCII to UTF8 is a noop since the 127 characters of
> ascii map directly to the same values in UTF8.  However since you are
> storing not ASCII data the values that have the values from 128 - 255
> just get passed from the server to the client without any additional
> processing (since there aren't supposed to be any values in this range),
> but then when the driver tries to convert to java unicode, it can't
> because it has received an invalid UTF8 character.
>
> It seems that you are actually storing Latin1 data in this database and
> thus the database character set should probably be Latin1.
>
> In 7.2 is was possible to override the character set used by the driver,
> however I don't think this works anymore when connecting to a 7.3
> server.  .... looks at code .... Yes the override is ignored if the
> server is a 7.3 server.  You could hack at AbstractJdbc1Connection to
> work around the issue or just correctly set the database character set
> to match the data that the database contains.
>
> thanks,
> --Barry
>
>
> Joseph Shraibman wrote:
>
>> BTW the string that caused this is 'Oné'
>>
>> Joseph Shraibman wrote:
>>
>>> java.lang.ArrayIndexOutOfBoundsException: 3
>>>         at org.postgresql.core.Encoding.decodeUTF8(Encoding.java:253)
>>>         at org.postgresql.core.Encoding.decode(Encoding.java:165)
>>>         at org.postgresql.core.Encoding.decode(Encoding.java:181)
>>>         at
>>> org.postgresql.jdbc1.AbstractJdbc1ResultSet.getString(AbstractJdbc1ResultSet.java:97)
>>>
>>>
>>> The relavent code is:
>>>
>>>         while (i < k) {
>>>             z = data[i] & 0xFF;
>>>             if (z < 0x80) {
>>>                 l_cdata[j++] = (char)data[i];
>>>                 i++;
>>>             } else if (z >= 0xE0) {        // length == 3
>>>                 y = data[i+1] & 0xFF; //<<== THIS IS LINE 253
>>>                 x = data[i+2] & 0xFF;
>>>                 val = (z-0xE0)*pow2_12 + (y-0x80)*pow2_6 + (x-0x80);
>>>                 l_cdata[j++] = (char) val;
>>>                 i+= 3;
>>>             } else {        // length == 2 (maybe add checking for
>>> length > 3, throw exception if it is
>>>
>>>
>>> And in the method that calls that:
>>>
>>>     if (encoding.equals("UTF-8")) {
>>>                     return decodeUTF8(encodedString, offset, length);
>>>                 }
>>>
>>> The thing is my database encoding is SQL_ASCII
>>>
>>> => SELECT version(),  getdatabaseencoding() ;
>>>
>>> version                 | getdatabaseencoding
>>>
---------------------------------------------------------------------------------------------------------+---------------------

>>>
>>>  PostgreSQL 7.3.1 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.2
>>> 20020903 (Red Hat Linux 8.0 3.2-7) | SQL_ASCII
>>> (1 row)
>>>
>>> ... so why is it trying to decode the string as UTF-8?  I just
>>> upgraded this database from 7.2.3 yesterday.
>>>
>>


В списке pgsql-jdbc по дате отправления:

Предыдущее
От: Felipe Schnack
Дата:
Сообщение: Re: server-side prepared
Следующее
От: "Michael Paesold"
Дата:
Сообщение: Re: synchronized code