Re: postgresql euc/sjis utf8 mappings
От | Joel Rees |
---|---|
Тема | Re: postgresql euc/sjis utf8 mappings |
Дата | |
Msg-id | 20020819184655.3A7F.JOEL@alpsgiken.gr.jp обсуждение исходный текст |
Ответ на | postgresql euc/sjis utf8 mappings (Thomas O'Dowd <tom@nooper.com>) |
Список | pgsql-general |
Hmm. > I've noted that in PostgreSQL 7.2.1 some of the utf8 mappings > of sjis and euc characters were different. One example that caught me out > was the double width ~. > > '〜' (double byte/double width ~) That's not really a tilde. It's referred to as a "wave dash", and is usually used as such in most of what I've seen of word-processing/e-mail type data. (Tilde is a combining character, is it not?) > euc: 0xa1c1 -> 0xe3809c utf8 That's the Unicode wave dash. > sjis: 0x8160 -> 0xefbd9e utf8 That's the Unicode full-width tilde. Now, if I were going by the names, I would choose the Unicode wave dash for that mapping, both of them to 0xe3809c. But if I were to go by the intent of the full-width block, I'd go with the latter, 0xefbd9e, but I'd still be wondering why the Unicode people called it full-width tilde. Hmm. At any rate, mapping euc and s-jis the same should be correct, since euc and s-jis are both just a numerical transform of JIS with ASCII squeezed in. > This caused me problems when a '〜' was loaded using euc and retrieved > using sjis as there was no sjis mapping for 0xe3809c. Another hmm. That's probably going to create surprises sometimes. Good reason to have the source code open. (Just thinking out loud.) Anyway, thanks for the heads-up, Tom. -- Joel Rees <joel@alpsgiken.gr.jp>
В списке pgsql-general по дате отправления: