Re: [HACKERS] Unicode combining characters
От | Tatsuo Ishii |
---|---|
Тема | Re: [HACKERS] Unicode combining characters |
Дата | |
Msg-id | 20011010101201N.t-ishii@sra.co.jp обсуждение исходный текст |
Ответ на | Re: [HACKERS] Unicode combining characters (Patrice Hédé <phede-ml@islande.org>) |
Ответы |
Re: [HACKERS] Unicode combining characters
|
Список | pgsql-patches |
> > After applying your patches, do the 4-bytes UTF-8 convert to UCS-2 (2 > > bytes) or UCS-4 (4 bytes) in pg_utf2wchar_with_len()? If it were 4 > > bytes, we are in trouble. Current regex implementaion does not handle > > 4 byte width charsets. > > *sigh* yes, it does encode to four bytes :( > > Three solutions then : > > 1) we support these supplementary characters, knowing that they won't > work with regexes, > > 2) I back out the change, but then anyone using these characters will > get something weird, since the decoding would be faulty (they would > be handled as 3 bytes UTF-8 chars, and then the fourth byte would > become a "faulty char"... not very good, as the 3-byte version is > still not a valid UTF-8 code !), > > 3) we fix the regex engine within the next 24 hours, before the beta > deadline is activated :/ > > I must say that I doubt that anyone will use these characters in the > next few months : these are mostly chinese extended characters, with > old italic, deseret, and gothic scripts, and bysantine and western > musical symbols, as well as the mathematical alphanumerical symbols. > > I would prefer solution 1), as I think it is better to allow these > characters, even with a temporary restriction on the regex, than to > fail completely on them. As for solution 3), we may still work at it > in the next few months :) [I haven't even looked at the regex engine > yet, so I don't know the implications of what I have just said !] > > What do you think ? I think 2) is not very good, and we should reject these 4-bytes UTF-8 strings. After all, we are not ready for them. BTW, other part of your patches looks good. Peter, what do you think? -- Tatsuo Ishii
В списке pgsql-patches по дате отправления: