Re: [PATCHES] UNICODE characters above 0x10000

Поиск

Список

Период

Сортировка

От	Tatsuo Ishii
Тема	Re: [PATCHES] UNICODE characters above 0x10000
Дата	7 августа 2004 г. 07:07:08
Msg-id	20040807.190913.26271342.t-ishii@sra.co.jp обсуждение исходный текст
Ответ на	Re: UNICODE characters above 0x10000 (Tom Lane <tgl@sss.pgh.pa.us>)
Список	pgsql-hackers

Дерево обсуждения

> Dennis Bjorklund <db@zigo.dhs.org> writes:
> > ... This also means that the start byte can never start with 7 or 8
> > ones, that is illegal and should be tested for and rejected. So the
> > longest utf-8 sequence is 6 bytes (and the longest character needs 4
> > bytes (or 31 bits)).
>
> Tatsuo would know more about this than me, but it looks from here like
> our coding was originally designed to support only 16-bit-wide internal
> characters (ie, 16-bit pg_wchar datatype width).  I believe that the
> regex library limitation here is gone, and that as far as that library
> is concerned we could assume a 32-bit internal character width.  The
> question at hand is whether we can support 32-bit characters or not ---
> and if not, what's the next bug to fix?

pg_wchar has been already 32-bit datatype.  However I doubt there's
actually a need for 32-but width character sets. Even Unicode only
uese up 0x0010FFFF, so 24-bit should be enough...
--
Tatsuo Ishii

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [PATCHES] UNICODE characters above 0x10000