Обсуждение: JDBC driver, client_encoding and a SQL_ASCII database in production

Поиск
Список
Период
Сортировка

JDBC driver, client_encoding and a SQL_ASCII database in production

От
Emmanuel Guiton
Дата:
Hello All,

I am using Pentaho Data Integration (PDI) to transform a database in
production to some other database, correctly structured for my business
intelligence analysis purpose.
PDI uses the JDBC driver to perform the transformation. The issue is
that JDBC only works with UTF-8 as client_encoding while the charset of
the database in production is SQL_ASCII and was filled with ISO-8859-1
characters (and is full of characters such as é è ô ...). The only way I
can get the correct string back is to use client_encoding='LATIN1'.

Changing the character set of the original database is not an option as
it is in production.

Anyone has a good idea on how could I proceed to get correctly the content ?

Regards,
 - Emmanuel

--
Ingénieur étude et développement
Intrinsec
215, avenue Georges Clemenceau
92000 Nanterre
http://www.intrinsec.com





Emmanuel GUITON

Ingénieur développement
Standard : +33 1 41 91 77 77 l Fax : +33 1 41 91 77 78

215, avenue Georges Clemenceau l 92024 NANTERRE
http://www.intrinsec.com/img_site/content/20091216V2_Intrinsec_diversite.pdf
http://www.intrinsec.com/



Re: JDBC driver, client_encoding and a SQL_ASCII database in production

От
Oliver Jowett
Дата:
Emmanuel Guiton wrote:
> Hello All,
>
> I am using Pentaho Data Integration (PDI) to transform a database in
> production to some other database, correctly structured for my business
> intelligence analysis purpose.
> PDI uses the JDBC driver to perform the transformation. The issue is
> that JDBC only works with UTF-8 as client_encoding while the charset of
> the database in production is SQL_ASCII and was filled with ISO-8859-1
> characters (and is full of characters such as é è ô ...). The only way I
> can get the correct string back is to use client_encoding='LATIN1'.
>
> Changing the character set of the original database is not an option as
> it is in production.
>
> Anyone has a good idea on how could I proceed to get correctly the
> content ?

Take a copy of the production database, change the database encoding to
be LATIN1, and do your conversion from that copy?

-O

Re: JDBC driver, client_encoding and a SQL_ASCII database in production

От
Emmanuel Guiton
Дата:

Oliver Jowett wrote:
> Emmanuel Guiton wrote:
>
>> Hello All,
>>
>> I am using Pentaho Data Integration (PDI) to transform a database in
>> production to some other database, correctly structured for my business
>> intelligence analysis purpose.
>> PDI uses the JDBC driver to perform the transformation. The issue is
>> that JDBC only works with UTF-8 as client_encoding while the charset of
>> the database in production is SQL_ASCII and was filled with ISO-8859-1
>> characters (and is full of characters such as é è ô ...). The only way I
>> can get the correct string back is to use client_encoding='LATIN1'.
>>
>> Changing the character set of the original database is not an option as
>> it is in production.
>>
>> Anyone has a good idea on how could I proceed to get correctly the
>> content ?
>>
>
> Take a copy of the production database, change the database encoding to
> be LATIN1, and do your conversion from that copy?
>
> -O
>
Thanks for the idea, but the volume is too heavy and performance
problems already too important on the original database to overload it
with an additional daily dump.
This is not just a one-shot issue. The analysis tool I am setting up is
made to continually analyze the activity of my company.

Would there be a way to get the binary content of text field ?
Maybe that could be the solution, performing the encoding conversion at
the application level, then.

 - Emmanuel

--
Ingénieur étude et développement
Intrinsec
215, avenue Georges Clemenceau
92000 Nanterre
http://www.intrinsec.com





Emmanuel GUITON

Ingénieur développement
Standard : +33 1 41 91 77 77 l Fax : +33 1 41 91 77 78

215, avenue Georges Clemenceau l 92024 NANTERRE
http://www.intrinsec.com/img_site/content/20091216V2_Intrinsec_diversite.pdf
http://www.intrinsec.com/



Re: JDBC driver, client_encoding and a SQL_ASCII database in production

От
Oliver Jowett
Дата:
Emmanuel Guiton wrote:

> Would there be a way to get the binary content of text field ?
> Maybe that could be the solution, performing the encoding conversion at
> the application level, then.

That's not simple, the conversion from network data to String is done
very early, well before the data gets anywhere near the application level.

You may have to run with a modified driver that you have patched to
understand encodings other than UTF-8.

-O

Re: JDBC driver, client_encoding and a SQL_ASCII database in production

От
Kris Jurka
Дата:

On Thu, 11 Mar 2010, Oliver Jowett wrote:

> You may have to run with a modified driver that you have patched to
> understand encodings other than UTF-8.
>

The JDBC driver does support running with a non-UTF-8 encoding, but only
for server versions prior to 7.3.  There's no reason it couldn't work for
later versions, so the easiest thing to do is to tweak the v2 protocol
setup code to work for your server version and then use that.

Kris Jurka