Re: Java's Unicode Notation
От | Tatsuo Ishii |
---|---|
Тема | Re: Java's Unicode Notation |
Дата | |
Msg-id | 20011111190422Y.t-ishii@sra.co.jp обсуждение исходный текст |
Ответ на | Re: Beta going well ("Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at>) |
Список | pgsql-hackers |
From: Jean-Michel POURE <jm.poure@freesurf.fr> Subject: Java's Unicode Notation Date: Thu, 08 Nov 2001 14:12:04 +0100 Message-ID: <4.2.0.58.20011108141018.00a59dc0@pop.freesurf.fr> > Dear Tatsuo, > > Could it be possible to use the Java Unicode Notation to define UTF-8 > strings in PostgreSQL 7.2. No. It's too late. We are in the beta freeze stage. > Information can be found on http://czyborra.com/utf/ > > Do you think it is hard to implement? > > Best regards, > Jean-Michel POURE > > ************************************************ > Java's Unicode Notation > There are some less compact but more readable ASCII transformations the > most important of which is the Java Unicode Notation as allowed in Java > source code and processed by Java's native2ascii converter: > putwchar(c) > { > if (c >= 0x10000) { > printf ("\\u%04x\\u%04x" , 0xD7C0 + (c >> 10), 0xDC00 | c & 0x3FF); > } > else if (c >= 0x100) printf ("\\u%04x", c); > else putchar (c); > } > The advantage of the \u20ac notation is that it is very easy to type it in > on any old ASCII keyboard and easy to look up the intended character if you > happen to have a copy of the Unicode book or the > {unidata2,names2,unihan}.txt files from the Unicode FTP site or CD-ROM or > know what U+20AC is the �. > What's not so nice about the \u20ac notation is that the small letters are > quite unusual for Unicode characters, the backslashes have to be quoted for > many Unix tools, the four hexdigits without a terminator may appear merged > with the following word as in \u00a333 for ��33, it is unclear when and how > you have to escape the backslash character itself, 6 bytes for one > character may be considered wasteful, and there is no way to clearly > present the characters beyond \uffff without \ud800\udc00 surrogates, and > last but not least the plain hexnumbers may not be very helpful. > JAVA is one of the target and source encodings of yudit and its uniconv > converter. >
В списке pgsql-hackers по дате отправления: