Re: [PATCHES] UNICODE characters above 0x10000
От | Oliver Jowett |
---|---|
Тема | Re: [PATCHES] UNICODE characters above 0x10000 |
Дата | |
Msg-id | 4115918A.1020405@opencloud.com обсуждение исходный текст |
Ответ на | Re: [PATCHES] UNICODE characters above 0x10000 (Tatsuo Ishii <t-ishii@sra.co.jp>) |
Список | pgsql-hackers |
Tatsuo Ishii wrote: >>Tom Lane wrote: >> >> >>>If I understood what I was reading, this would take several things: >>>* Remove the "special UTF-8 check" in pg_verifymbstr; >>>* Extend pg_utf2wchar_with_len and pg_utf_mblen to handle the 4-byte case; >>>* Set maxmblen to 4 in the pg_wchar_table[] entry for UTF-8. >>> >>>Are there any other places that would have to change? Would this break >>>anything? The testing aspect is what's bothering me at the moment. >> >>Does this change what client_encoding = UNICODE might produce? The JDBC >>driver will need some tweaking to handle this -- Java uses UTF-16 >>internally and I think some supplementary character (?) scheme for >>values above 0xffff as of JDK 1.5. > > > Java doesn't handle UCS above 0xffff? I didn't know that. As long as > you put in/out JDBC, it shouldn't be a problem. However if other APIs > put in such a data, you will get into trouble... Internally, Java strings are arrays of UTF-16 values. Before JDK 1.5, all the string-manipulation library routines assumed that one code point == one UTF-16 value, so you can't represent values above 0xffff. The 1.5 libraries understand using supplementary characters to use multiple UTF-16 values per code point. See http://java.sun.com/developer/technicalArticles/Intl/Supplementary/ However, the JDBC driver needs to be taught about how to translate between UTF-8 representations of code points above 0xffff and pairs of UTF-16 values. Previously it didn't need to do anything since the server didn't use those high values. It's a minor thing.. -O
В списке pgsql-hackers по дате отправления: