Обсуждение: BUG #1268: Two different Unicode chars are treated as equal in a query
BUG #1268: Two different Unicode chars are treated as equal in a query
От
"PostgreSQL Bugs List"
Дата:
The following bug has been logged online:
Bug reference: 1268
Logged by: Kent Tong
Email address: kent@cpttm.org.mo
PostgreSQL version: 7.4.5
Operating system: RedHat 9
Description: Two different Unicode chars are treated as equal in a
query
Details:
Steps:
1. Create a test database: "createdb -E Unicode -U postgres testdb".
2. Create a test table: "create table testtable (id varchar(100) primary
key);".
3. With JDBC, insert a record whose id contains unicode: "insert into
testtable values(<a unicode char whose code is 0x4e8c>);".
4. With JDBC, try to retrieve a record whose id contains a different unicde:
"select from testtable where id=<a unicode char whose code is 0x4e94>;". It
should not find any record but it finds the record created in step 3.
Here is the JUnit test case:
public class PgSQLTest extends TestCase {
private Connection conn;
protected void setUp() throws Exception {
conn = makeConnection();
}
protected void tearDown() throws Exception {
conn.close();
}
public void testChinese() throws Exception {
deleteAll();
insertRow();
PreparedStatement st =
conn.prepareStatement("select * from testtable where id=?");
try {
st.setString(1, "\u4e94");
ResultSet rs = st.executeQuery();
assertFalse(rs.next());
} finally {
st.close();
}
}
private void insertRow() throws SQLException {
PreparedStatement st =
conn.prepareStatement("insert into testtable values(?)");
st.setString(1, "\u4e8c");
st.executeUpdate();
st.close();
}
private void deleteAll() throws SQLException {
PreparedStatement st = conn.prepareStatement("delete from testtable");
st.executeUpdate();
st.close();
}
private Connection makeConnection()
throws ClassNotFoundException, SQLException {
Class.forName("org.postgresql.Driver");
Properties properties = new Properties();
properties.put("user", "postgres");
properties.put("password", "");
return DriverManager.getConnection(
"jdbc:postgresql://localhost/testdb",
properties);
}
}
"PostgreSQL Bugs List" <pgsql-bugs@postgresql.org> writes:
> Description: Two different Unicode chars are treated as equal in a
> query
This would be a matter to take up with the maintainer of your locale
(which you didn't mention, but in any case it's a locale bug). We
just do what strcoll() tells us.
Note that it's possible this is a configuration error and not an
outright bug. Check to make sure that the locale you initdb'd
under is actually designed to work with UTF-8 data.
regards, tom lane
Tom Lane wrote:
> "PostgreSQL Bugs List" <pgsql-bugs@postgresql.org> writes:
>
>>Description: Two different Unicode chars are treated as equal in a
>>query
>
>
> This would be a matter to take up with the maintainer of your locale
> (which you didn't mention, but in any case it's a locale bug). We
> just do what strcoll() tells us.
Thanks for the quick reply. The system locale is zh_TW.Big5. However,
I've tried setting it to "C" but the test case still fails.
In order to check if it's a locale bug, I've written a C program:
#include <locale.h>
#include <stdio.h>
#include <string.h>
int main() {
char *s1 = "\xe4\xba\x8c";
char *s2 = "\xe4\xba\x94";
setlocale(LC_ALL, "en.UTF-8");
//setlocale(LC_ALL, "zh.Big5"); //doesn't make any difference
printf("%d\n", strcoll(s1, s2));
return 0;
}
and compiled it and run it on that computer. It prints -1.
It means that strcoll is working.
> Note that it's possible this is a configuration error and not an
> outright bug. Check to make sure that the locale you initdb'd
> under is actually designed to work with UTF-8 data.
Does it matter? The encoding provided to initdb is just
a default for the databases to be created in the future.
When I used createdb, I did specify "-E unicode".
--
Kent Tong, Msc, MCSE, SCJP, CCSA, Delphi Certified
Manager of IT Dept, CPTTM
Authorized training for Borland, Cisco, Microsoft, Oracle, RedFlag & RedHat
Kent Tong <kent@cpttm.org.mo> writes:
> Does it matter? The encoding provided to initdb is just
> a default for the databases to be created in the future.
Yes it does, and you missed the point. I said *locale*, not *encoding*.
The LC_COLLATE and LC_CTYPE settings that prevail during initdb are
fixed and not alterable without re-initdb. (I agree that this sucks,
but that's how it is for now...)
Your test program doesn't prove a lot unless you are sure it's executing
under the same locale settings as the postmaster is running in.
regards, tom lane
Tom Lane wrote: > Yes it does, and you missed the point. I said *locale*, not *encoding*. > The LC_COLLATE and LC_CTYPE settings that prevail during initdb are > fixed and not alterable without re-initdb. (I agree that this sucks, > but that's how it is for now...) You're right. After using: initdb --locale zh_TW.utf8 /var/lib/pgsql/data then it works fine! Thanks again and sorry about any inconvenience. -- Kent Tong, Msc, MCSE, SCJP, CCSA, Delphi Certified Manager of IT Dept, CPTTM Authorized training for Borland, Cisco, Microsoft, Oracle, RedFlag & RedHat