Обсуждение: Strings with null characters produce exceptions when selected or inserted. Attempts to select messages with null bytes produces "ERROR: insufficient data left in message". And inserting produces "ERROR: invalid byte sequence for encoding \"UTF8\": 0x00". Since a null character is a valid UTF code point why is it rejected by the JDBC driver? The attached test can work with Mysql and their JDBC driver.

Поиск
Список
Период
Сортировка
Strings with null characters produce exceptions when selected or inserted. Attempts to select messages with null bytes
produces"ERROR: insufficient data left in message". And inserting produces "ERROR: invalid byte sequence for encoding
\"UTF8\":0x00". 

Since a null character is a valid UTF code point why is it rejected by the JDBC driver?

It can work with Mysql and their JDBC driver.

import java.sql.*;

public class TestStringsWithNullBytes {
  public static void main(String[] args) {
    String noNulls = "No null bytes";
    String nulls = "Null bytes \000\000";
    try {
      Class.forName("org.postgresql.Driver");
      Connection db = DriverManager.getConnection(
        "jdbc:postgresql://localhost/test?user=test&password=secret");
      /*
      Class.forName("com.mysql.jdbc.Driver");
      Connection db = DriverManager.getConnection(
        "jdbc:mysql://localhost/test?user=test&password=secret");
        */
      db.setAutoCommit(true);
      tryQuery(db, "SELECT * FROM test");
      tryQuery(db, "DROP TABLE IF EXISTS test");
      tryQuery(db, "CREATE TABLE test (name TEXT)");
      tryQuery(db, "SELECT '" + noNulls + "'");
      tryQuery(db, "SELECT '" + nulls + "'");
      tryQuery(db, "INSERT INTO test (name) VALUES('" + noNulls + "')");
      tryQuery(db, "INSERT INTO test (name) VALUES('" + nulls + "')");
      tryPreparedQuery(db, "SELECT ?", noNulls);
      tryPreparedQuery(db, "SELECT ?", nulls);
      tryPreparedQuery(db, "INSERT INTO test (name) VALUES(?)", noNulls);
      tryPreparedQuery(db, "INSERT INTO test (name) VALUES(?)", nulls);
      tryQuery(db, "SELECT * FROM test");
    }
    catch(Exception e) {
      e.printStackTrace();
    }
  }

  private static void tryQuery(Connection db, String query) {
    try {
      Statement s = db.createStatement();
      if (s.execute(query)) {
        ResultSet rs = s.getResultSet();
        while(rs.next()) {
          System.out.println(escape(query) + ": " + escape(rs.getString(1)));
        }
      }
    }
    catch(Exception e) {
      System.err.println(escape(query) + ": " + e.getMessage());
    }
  }

  private static void tryPreparedQuery(Connection db, String query, String s) {
    try {
      PreparedStatement ps = db.prepareStatement(query);
      ps.setString(1, s);
      if (ps.execute()) {
        ResultSet rs = ps.getResultSet();
        while(rs.next()) {
          System.out.println(escape(query) + ": " + escape(rs.getString(1)));
        }
      }
    }
    catch(Exception e) {
      System.err.println(escape(query) + ": " + e.getMessage());
    }
  }

  private static StringBuilder escape(String s) {
    StringBuilder sb = new StringBuilder();
    for(int i = 0; i < s.length(); i++) {
      if ((32 > (int)s.charAt(i)) || (126 < (int)s.charAt(i))) {
        sb.append("<" + (int)s.charAt(i) + ">");
      }
      else {
        sb.append(s.charAt(i));
      }
    }
    return(sb);
  }
}







On Wed, 3 Jun 2009, user2037@ymail.com wrote:

> Strings with null characters produce exceptions when selected or
> inserted. Attempts to select messages with null bytes produces "ERROR:
> insufficient data left in message". And inserting produces "ERROR:
> invalid byte sequence for encoding \"UTF8\": 0x00".
>
> Since a null character is a valid UTF code point why is it rejected by
> the JDBC driver?

Because the server can't handle it.  The server is written in C and tracks
all textual data as C strings which are null terminated.  It cannot handle
intermediate null bytes, so the driver is just providing that message as
early as possible to you.

Kris Jurka

Kris Jurka wrote:
>
>
> On Wed, 3 Jun 2009, user2037@ymail.com wrote:
>
>> Strings with null characters produce exceptions when selected or
>> inserted. Attempts to select messages with null bytes produces "ERROR:
>> insufficient data left in message". And inserting produces "ERROR:
>> invalid byte sequence for encoding \"UTF8\": 0x00".
>>
>> Since a null character is a valid UTF code point why is it rejected by
>> the JDBC driver?
>
> Because the server can't handle it.  The server is written in C and
> tracks all textual data as C strings which are null terminated.  It
> cannot handle intermediate null bytes, so the driver is just providing
> that message as early as possible to you.

Note that the `bytea' type _does_ store null bytes fine.

It's interesting that \0x00 is in fact valid utf-8, since it raises the
question of whether Pg should in fact support null bytes in `text' and
`varchar' strings.

--
Craig Ringer