Re: [PATCHES] UNICODE characters above 0x10000
От | John Hansen |
---|---|
Тема | Re: [PATCHES] UNICODE characters above 0x10000 |
Дата | |
Msg-id | 5066E5A966339E42AA04BA10BA706AE5608D@rodrick.geeknet.com.au обсуждение исходный текст |
Список | pgsql-hackers |
Well, maybe we'd be better off, compiling a list of (in?)valid ranges from the full unicode database (http://www.unicode.org/Public/UNIDATA/UnicodeData.txt and http://www.unicode.org/Public/UNIDATA/Unihan.txt) and with every release of pg, update the detection logic so only valid characters are allowed? Regards, John Hansen -----Original Message----- From: Tatsuo Ishii [mailto:t-ishii@sra.co.jp] Sent: Saturday, August 07, 2004 8:46 PM To: John Hansen Cc: tgl@sss.pgh.pa.us; db@zigo.dhs.org; pgsql-hackers@postgresql.org; pgsql-patches@postgresql.org Subject: Re: [PATCHES] [HACKERS] UNICODE characters above 0x10000 > Yes, but the specification allows for 6byte sequences, or 32bit > characters. UTF-8 is just an encoding specification, not character set specification. Unicode only has 17 256x256 planes in its specification. > As dennis pointed out, just because they're not used, doesn't mean we > should not allow them to be stored, since there might me someone using > the high ranges for a private character set, which could very well be > included in the specification some day. We should expand it to 64-bit since some day the specification might be changed then:-) More seriously, Unicode is filled with tons of confusion and inconsistency IMO. Remember that once Unicode adovocates said that the merit of Unicode was it only requires 16-bit width. Now they say they need surrogate pairs and 32-bit width chars... Anyway my point is if current specification of Unicode only allows 24-bit range, why we need to allow usage against the specification? -- Tatsuo Ishii
В списке pgsql-hackers по дате отправления: