Обсуждение: Re: [JDBC] ArrayIndexOutOfBoundsException in Encoding.decodeUTF8()

Поиск
Список
Период
Сортировка

Re: [JDBC] ArrayIndexOutOfBoundsException in Encoding.decodeUTF8()

От
Joseph Shraibman
Дата:
Barry Lind wrote:
> Joseph,
>
> In postgres UNICODE means utf8.

Which differs from java unicode?

I notice there is no way to change a database's encoding.  If I just change the encoding
type in the pg_database to latin1 will there be data loss?

>
> --Barry
>
> Joseph Shraibman wrote:
>
>> Barry Lind wrote:
>>
>>> Joseph,
>>>
>>> The problem is that your database claims to be ASCII, but you are
>>> storing non-ascii data in it.
>>>
>>> As of 7.3 the jdbc driver has the server convert from the database
>>> character set to UTF8.  Then send the data to the driver in UTF8 and
>>> the driver then decodes the UTF8 to java unicode.
>>
>>
>>
>> I see this in my postgres log when I connect via jdbc:
>>
>> LOG:  query: set datestyle to 'ISO'; select version(), case when
>> pg_encoding_to_char(1) = 'SQL_ASCII' then 'UNKNOWN' else
>> getdatabaseencoding() end;
>> LOG:  query: set client_encoding = 'UNICODE'; show autocommit
>>
>> So if client_encoding is unicode why is the driver trying to convert
>> from UTF8?
>>
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 5: Have you checked our extensive FAQ?
>>
>> http://www.postgresql.org/users-lounge/docs/faq.html
>>
>


--
Joseph Shraibman
joseph@xtenit.com
Increase signal to noise ratio.  http://xis.xtenit.com


Re: [JDBC] ArrayIndexOutOfBoundsException in Encoding.decodeUTF8()

От
Barry Lind
Дата:

Joseph Shraibman wrote:
>>
>> In postgres UNICODE means utf8.
>
>
> Which differs from java unicode?
>

Yes.  Unicode in java is 16 bit characters (I think the term for this is
UCS2), two bytes for each character, whereas utf8 is a variable length
encoding with characters represented by 1, 2 or 3 bytes.

> I notice there is no way to change a database's encoding.  If I just
> change the encoding type in the pg_database to latin1 will there be data
> loss?

The recommended way to do this would be to dump the contents of the
database, create a new database with the desired character set and then
import the data into that new database.  I don't know if changing
pg_database directly would work or not.

--Barry



Character Encoding WAS: ArrayIndexOutOfBoundsException in Encoding.decodeUTF8()

От
Joseph Shraibman
Дата:
Barry Lind wrote:
>
>
> Joseph Shraibman wrote:

>> I notice there is no way to change a database's encoding.  If I just
>> change the encoding type in the pg_database to latin1 will there be
>> data loss?
>
>
> The recommended way to do this would be to dump the contents of the
> database, create a new database with the desired character set and then
> import the data into that new database.  I don't know if changing
> pg_database directly would work or not.
>
>
That didn't work. When I tried that Oné turned into Oné, which confuses me because I
thought my problem was that I was storing latin1 chars in a text field that was supposed
to only have the lower ascii bits.  Oh well, I guess it is dump/reload time.


Re: Character Encoding WAS: ArrayIndexOutOfBoundsException in Encoding.decodeUTF8()

От
Joseph Shraibman
Дата:
Joseph Shraibman wrote:
> Barry Lind wrote:
>> Joseph Shraibman wrote:
>>> I notice there is no way to change a database's encoding.  If I just
>>> change the encoding type in the pg_database to latin1 will there be
>>> data loss?
>>
>>
>>
>> The recommended way to do this would be to dump the contents of the
>> database, create a new database with the desired character set and
>> then import the data into that new database.  I don't know if changing
>> pg_database directly would work or not.
>>
>>
> That didn't work.

Acutally it did. My test data was flawed.  What didn't work is editing the dump to change
the type to unicode.