Re: [PATCHES] UNICODE characters above 0x10000
От | Tatsuo Ishii |
---|---|
Тема | Re: [PATCHES] UNICODE characters above 0x10000 |
Дата | |
Msg-id | 20040807.190913.26271342.t-ishii@sra.co.jp обсуждение исходный текст |
Ответ на | Re: UNICODE characters above 0x10000 (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-hackers |
> Dennis Bjorklund <db@zigo.dhs.org> writes: > > ... This also means that the start byte can never start with 7 or 8 > > ones, that is illegal and should be tested for and rejected. So the > > longest utf-8 sequence is 6 bytes (and the longest character needs 4 > > bytes (or 31 bits)). > > Tatsuo would know more about this than me, but it looks from here like > our coding was originally designed to support only 16-bit-wide internal > characters (ie, 16-bit pg_wchar datatype width). I believe that the > regex library limitation here is gone, and that as far as that library > is concerned we could assume a 32-bit internal character width. The > question at hand is whether we can support 32-bit characters or not --- > and if not, what's the next bug to fix? pg_wchar has been already 32-bit datatype. However I doubt there's actually a need for 32-but width character sets. Even Unicode only uese up 0x0010FFFF, so 24-bit should be enough... -- Tatsuo Ishii
В списке pgsql-hackers по дате отправления: