Re: UNICODE characters above 0x10000
От | Tom Lane |
---|---|
Тема | Re: UNICODE characters above 0x10000 |
Дата | |
Msg-id | 26451.1091855190@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: UNICODE characters above 0x10000 ("John Hansen" <john@geeknet.com.au>) |
Ответы |
Re: UNICODE characters above 0x10000
Re: UNICODE characters above 0x10000 Re: UNICODE characters above 0x10000 |
Список | pgsql-hackers |
"John Hansen" <john@geeknet.com.au> writes: > My apologies for not reading the code properly. > Attached patch using pg_utf_mblen() instead of an indexed table. > It now also do bounds checks. I think you missed my point. If we don't need this limitation, the correct patch is simply to delete the whole check (ie, delete lines 827-836 of wchar.c, and for that matter we'd then not need the encoding local variable). What's really at stake here is whether anything else breaks if we do that. What else, if anything, assumes that UTF characters are not more than 2 bytes? Now it's entirely possible that the underlying support is a few bricks shy of a load --- for instance I see that pg_utf_mblen thinks there are no UTF8 codes longer than 3 bytes whereas your code goes to 4. I'm not an expert on this stuff, so I don't know what the UTF8 spec actually says. But I do think you are fixing the code at the wrong level. regards, tom lane
В списке pgsql-hackers по дате отправления: