Re: [PATCHES] UNICODE characters above 0x10000
От | Tom Lane |
---|---|
Тема | Re: [PATCHES] UNICODE characters above 0x10000 |
Дата | |
Msg-id | 350.1091897000@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: [PATCHES] UNICODE characters above 0x10000 (Dennis Bjorklund <db@zigo.dhs.org>) |
Ответы |
Re: [PATCHES] UNICODE characters above 0x10000
|
Список | pgsql-hackers |
Dennis Bjorklund <db@zigo.dhs.org> writes: > On Sat, 7 Aug 2004, Tatsuo Ishii wrote: >> Anyway my point is if current specification of Unicode only allows >> 24-bit range, why we need to allow usage against the specification? > Is there a specific reason you want to restrict it to 24 bits? I see several places that have to allocate space on the basis of the maximum encoded character length possible in the current encoding (look for uses of pg_database_encoding_max_length). Probably the only one that's really significant for performance is text_substr(), but that's enough to be an argument against setting maxmblen higher than we have to. It looks to me like supporting 4-byte UTF-8 characters would be enough to handle the existing range of Unicode codepoints, and that is probably as much as we want to do. If I understood what I was reading, this would take several things: * Remove the "special UTF-8 check" in pg_verifymbstr; * Extend pg_utf2wchar_with_len and pg_utf_mblen to handle the 4-byte case; * Set maxmblen to 4 in the pg_wchar_table[] entry for UTF-8. Are there any other places that would have to change? Would this break anything? The testing aspect is what's bothering me at the moment. regards, tom lane
В списке pgsql-hackers по дате отправления: