Re: C11: should we use char32_t for unicode code points?
| От | Tatsuo Ishii | 
|---|---|
| Тема | Re: C11: should we use char32_t for unicode code points? | 
| Дата | |
| Msg-id | 20251028.173613.18179479132562731.ishii@postgresql.org обсуждение исходный текст  | 
		
| Ответ на | Re: C11: should we use char32_t for unicode code points? (Thomas Munro <thomas.munro@gmail.com>) | 
| Список | pgsql-hackers | 
> The EUC family has direct encoding of 7-bit ASCII and then 3 > selectable character sets represented by sequences with the high bit > set, with details varying between the Chinese (simplified Chinese), > Taiwanese (traditional Chinese), Japanese (2 kinds) and Korean > variants. I don't know if the pg_wchar encoding we're producing in > pg_euc*2wchar_with_len() has a name, but it doesn't appear to match > the description of the standard "fixed" representation on the > Wikipedia page for Extended Unix Code (it's too wide for starters, > looking at the shift distances). Yes. pg_euc*2wchar_with_len() creates "variable length" representation of EUC, 1 byte to 4 bytes range per character. Then, expands each character into pg_wchar. Also it can be converted back to the multibyte representation easily. Note that the standard "fixed" representation of EUC includes ASCII range bytes in *non* ASCII characters, thus I think it is not easy to use for backend safe encoding. Best regards, -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
В списке pgsql-hackers по дате отправления: