Обсуждение: Encoding Problem

Поиск
Список
Период
Сортировка

Encoding Problem

От
Jerome Colombie
Дата:
Hi,

I'm using Postgresql since the first 8.0 version (Windows2000) and it
gets better with every new version. But one problem remains with the
jdbc driver: The encoding of umlaute is still wrong. This problem occurs
also in all pgAdmin III versions.
The psql client on the other hand works fine.
I tried different server encodings like sql_ascii or latin1 but I always
get these strange characters in pgAdmin III and with jdbc. I also tried
to specify the charSet property with the connection string in jdbc but
that didn't help.
Does anyone know how I can get the correct strings with umlaut (e.g.
ä,ö,ü) in java?
I thought this problem would be solved in the newer versions, but even
in rc2 the problem still exists.

Greetings, Jérôme

Re: Encoding Problem

От
Kris Jurka
Дата:

On Wed, 12 Jan 2005, Jerome Colombie wrote:

> I'm using Postgresql since the first 8.0 version (Windows2000) and it
> gets better with every new version. But one problem remains with the
> jdbc driver: The encoding of umlaute is still wrong. This problem occurs
> also in all pgAdmin III versions. The psql client on the other hand
> works fine. I tried different server encodings like sql_ascii or latin1
> but I always get these strange characters in pgAdmin III and with jdbc.

Note that pgadmin and jdbc are completely separate, so I can only speak
for the java side of things.  For the pg jdbc driver you cannot use a
database with sql_ascii encoding.  You must use a real encoding, latin1
should be fine.

> I also tried to specify the charSet property with the connection string
> in jdbc but that didn't help.

If you read the documentation to find the charSet property you should have
also seen the note that said this is completely ignored for server
versions >= 7.3.

> Does anyone know how I can get the correct strings with umlaut (e.g.
> ä,ö,ü) in java? I thought this problem would be solved in the newer
> versions, but even in rc2 the problem still exists.
>

I have no reason to believe there is a problem with the server/jdbc
encoding handling.  I would suspect the problem is related to how you are
trying to enter/display these values.  If you can reproduce this problem
with a standalone Java program that runs against a non sql-ascii database
and does direct string comparisons (doesn't rely on things like
System.out.println), then please post such a program here.

Kris Jurka

Re: Encoding Problem

От
Oliver Jowett
Дата:
Jerome Colombie wrote:

> I tried different server encodings like sql_ascii or latin1 but I always
> get these strange characters in pgAdmin III and with jdbc. I also tried
> to specify the charSet property with the connection string in jdbc but
> that didn't help.
> Does anyone know how I can get the correct strings with umlaut (e.g.
> ä,ö,ü) in java?
> I thought this problem would be solved in the newer versions, but even
> in rc2 the problem still exists.

It should "just work" when using a server encoding of LATIN1 (I assume
you can represent umlaut-ed characters in LATIN1?) or UNICODE.

Can you provide some example code that fails? Also, which driver version
are you using?

-O

Re: Encoding Problem

От
Kris Jurka
Дата:

On Thu, 13 Jan 2005, Oliver Jowett wrote:

> It should "just work" when using a server encoding of LATIN1 (I assume
> you can represent umlaut-ed characters in LATIN1?) or UNICODE.
>

Just an FYI, the windows version does not support unicode.  See:

http://pginstaller.projects.postgresql.org/FAQ_windows.html#2.6

Kris Jurka

Re: Encoding Problem

От
Kris Jurka
Дата:

On Fri, 14 Jan 2005, Jerome Colombie wrote:

> I made a low-level program which prints to the console. It seems that
> the jdbc driver is correct, although the Strings need some
> postprocessing in Java. I want to create html output, so I don't know if
> I have to change my windows settings, java settings (Locale) or just
> need to reformat the strings in java code.
> My test program looks like this:
>

>             String t1 = new String("aäöü".getBytes(), "ISO-8859-1");
>             String t1 = new String("aäöü".getBytes(), "UTF-8");

At least one of these is clearly bogus.  I don't know what your default
encoding is, but getBytes() will return data in that encoding, from there
you are telling Java to interpret that one piece of data in two
different ways, one of these must be wrong.

> How can I get the correct output in html with java code? I know that
> technically it hasn't got to do with jdbc but I still hope someone can
> give me a solution so I don't need to change the java code. I hope I can
> solve this problem by changing either the database configuration or the
> java or windows locale.
>

To produce correctly encoded html, you need to ensure that your java
environment's default encoding matches up with the encoding you have set
for the page.

Kris Jurka

Re: Encoding Problem

От
Oliver Jowett
Дата:
Jerome Colombie wrote:

>        byte[] temp = test.getBytes();

getBytes() uses the JVM's default encoding to convert from the internal
string char[] representation, which might be wrong.

More interesting is test.getBytes("UTF-8") or test.getChars() or
test.charAt(x)

-O

Re: Encoding Problem

От
Jerome Colombie
Дата:
Hi,

I made a low-level program which prints to the console. It seems that
the jdbc driver is correct, although the Strings need some
postprocessing in Java. I want to create html output, so I don't know if
I have to change my windows settings, java settings (Locale) or just
need to reformat the strings in java code.
My test program looks like this:

*********
bh=# create table test (
  t          VARCHAR(5)
);

bh=# insert into test values ('aäöü');

bh=# select * from test;
  t
------
 aäöü
(1 row)

bh=# select getdatabaseencoding();
 getdatabaseencoding
---------------------
 LATIN1
(1 row)

bh=# SHOW client_encoding;
 client_encoding
-----------------
 LATIN1
(1 row)

bh=# select version();
                                                version
--------------------------------------------------------------------------------------------------------
 PostgreSQL 8.0.0rc2 on i686-pc-mingw32, compiled by GCC gcc.exe (GCC)
3.3.1 (mingw special 20030804-1)
(1 row)
*********
import java.sql.*;
import java.util.Properties;

public class BaseDAO {
    protected static String dbHost = "localhost";
    protected static String dbUrl = "jdbc:postgresql://" + dbHost +
":5432/bh";
    protected static String dbUser = "postgres";
    protected static String dbPassword = "***";
    protected static String dbDriver = "org.postgresql.Driver";

    static {
        //Register driver
        try {

DriverManager.registerDriver((Driver)Class.forName(dbDriver).newInstance());
        } catch (Exception e){
            System.out.println("Error: " + e);
        }
    }

    public BaseDAO() {
        super();
        try {
                Properties props = new Properties();
                props.put("user", dbUser);
                props.put("password", dbPassword);
                Connection con = DriverManager.getConnection(dbUrl, props);
                ResultSet rs =
con.createStatement().executeQuery("select * from test");
                if (rs.next()) {
                    byte[] temp = rs.getString("t").getBytes();
              System.out.println("Database String");
              System.out.println(rs.getString("t"));
              System.out.println("Database Bytes");
              for (int i = 0; i < temp.length; i++) {
                  System.out.println(temp[i]);
              }
                }
      } catch(SQLException se) {
          System.out.println("Error: " + se);
        }
    }

    public static void main(String[] args) {
        BaseDAO dao = new BaseDAO();
        String test = "aäöü";
        System.out.println("Java String");
        System.out.println(test);
        System.out.println("Java Bytes");
        byte[] temp = test.getBytes();
        for (int i = 0; i < temp.length; i++) {
            System.out.println(temp[i]);
        }
        try {
            System.out.println("ISO-8859-1");
            String t1 = new String("aäöü".getBytes(), "ISO-8859-1");
            temp = t1.getBytes();
            for (int i = 0; i < temp.length; i++) {
                System.out.println(temp[i]);
            }
        } catch (java.io.UnsupportedEncodingException e) {
            System.out.println("Encoding exception: " + e);
        }
        try {
            System.out.println("UTF-8");
            String t1 = new String("aäöü".getBytes(), "UTF-8");
            temp = t1.getBytes();
            for (int i = 0; i < temp.length; i++) {
                System.out.println(temp[i]);
            }
        } catch (java.io.UnsupportedEncodingException e) {
            System.out.println("Encoding exception: " + e);
        }

    }
}
*********
D:\temp\test>javac BaseDAO.java

D:\temp\test>java -classpath d:\projects\jar\pg74.214.jdbc3.jar;. BaseDAO
Database String
a???
Database Bytes
97
63
63
63
Java String
aõ÷³
Java Bytes
97
-28
-10
-4
ISO-8859-1
97
-28
-10
-4
UTF-8
97
63
63
*********
How can I get the correct output in html with java code? I know that
technically it hasn't got to do with jdbc but I still hope someone can
give me a solution so I don't need to change the java code. I hope I can
solve this problem by changing either the database configuration or the
java or windows locale.

Regards, Jerome