Re: [HACKERS] UNICODE characters above 0x10000
От | John Hansen |
---|---|
Тема | Re: [HACKERS] UNICODE characters above 0x10000 |
Дата | |
Msg-id | 5066E5A966339E42AA04BA10BA706AE5608A@rodrick.geeknet.com.au обсуждение исходный текст |
Ответы |
Re: [HACKERS] UNICODE characters above 0x10000
Re: [HACKERS] UNICODE characters above 0x10000 |
Список | pgsql-patches |
Yes, but the specification allows for 6byte sequences, or 32bit characters. As dennis pointed out, just because they're not used, doesn't mean we should not allow them to be stored, since there might me someone using the high ranges for a private character set, which could very well be included in the specification some day. Regards, John Hansen -----Original Message----- From: Tatsuo Ishii [mailto:t-ishii@sra.co.jp] Sent: Saturday, August 07, 2004 8:09 PM To: tgl@sss.pgh.pa.us Cc: db@zigo.dhs.org; John Hansen; pgsql-hackers@postgresql.org; pgsql-patches@postgresql.org Subject: Re: [PATCHES] [HACKERS] UNICODE characters above 0x10000 > Dennis Bjorklund <db@zigo.dhs.org> writes: > > ... This also means that the start byte can never start with 7 or 8 > > ones, that is illegal and should be tested for and rejected. So the > > longest utf-8 sequence is 6 bytes (and the longest character needs 4 > > bytes (or 31 bits)). > > Tatsuo would know more about this than me, but it looks from here like > our coding was originally designed to support only 16-bit-wide > internal characters (ie, 16-bit pg_wchar datatype width). I believe > that the regex library limitation here is gone, and that as far as > that library is concerned we could assume a 32-bit internal character > width. The question at hand is whether we can support 32-bit > characters or not --- and if not, what's the next bug to fix? pg_wchar has been already 32-bit datatype. However I doubt there's actually a need for 32-but width character sets. Even Unicode only uese up 0x0010FFFF, so 24-bit should be enough... -- Tatsuo Ishii
В списке pgsql-patches по дате отправления: