Re: Bug in UTF8-Validation Code?
От | Alvaro Herrera |
---|---|
Тема | Re: Bug in UTF8-Validation Code? |
Дата | |
Msg-id | 20070404155032.GH8549@alvh.no-ip.org обсуждение исходный текст |
Ответ на | Re: Bug in UTF8-Validation Code? (Tatsuo Ishii <ishii@postgresql.org>) |
Ответы |
Re: Bug in UTF8-Validation Code?
|
Список | pgsql-hackers |
Tatsuo Ishii wrote: > BTW, every encoding has its own charset. However the relationship > between encoding and charset are not so simple as Unicode. For > example, encoding EUC_JP correponds to multiple charsets, namely > ASCII, JIS X 0201, JIS X 0208 and JIS X 0212. So a function which > returns a "code point" is not quite usefull since it lacks the charset > info. I think we need to continute design discussion, probably > targetting for 8.4, not 8.3. Is Unicode complete as far as Japanese chars go? I mean, is there a character in EUC_JP that is not representable in Unicode? Because if Unicode is complete, ISTM it makes perfect sense to have a unicode_char() (or whatever we end up calling it) that takes an Unicode code point and returns a character in whatever JIS set you want (specified by setting client_encoding to that). Because then you solved the problem nicely. One thing that I find confusing in your text above is whether EUC_JP is an encoding or a charset? I would think that the various JIS X are encodings, and EUC_JP is the charset; or is it the other way around? -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
В списке pgsql-hackers по дате отправления: