Re: ENCODING (Unicode)
От | Reshat Sabiq |
---|---|
Тема | Re: ENCODING (Unicode) |
Дата | |
Msg-id | 3ECBB509.3060800@purdue.edu обсуждение исходный текст |
Ответ на | Re: ENCODING (Unicode) ("Dave Page" <dpage@vale-housing.co.uk>) |
Ответы |
Re: ENCODING (Unicode)
|
Список | pgadmin-support |
Jean-Michel POURE wrote: > In unicode (UTF-8), characters are coded on 1 byte (US-English letters), 2 > bytes (Western and Eastern Europe languages) and 3 bytes (all other languages > including Asian and Indian languages). Technically, you can store UTF-8 > values in an ASCII-based database. > > But, storing UTF-8 in an ASCII database is not recommanded, for several > reasons : > > - the query parser might not work well with text values (because it will not > know wether 1 UTF-8 letter is made of 1, 2 or 3 bytes). > > - server-side languages are multi-byte safe. If you calculate the lenght of an > UTF-8 string in PLpgSQL stored in an ASCII database, it will probably fail > for special characters. Thanks for your feedback Jean-Michel, You made a good point, I forgot about the queries. I guess each character is converted into 4 bytes while parsing, so it makes a lot of difference between 1 2-byte character (4 bytes), and 2 1-byte characters (8 bytes). However, i haven't heard of UTF-8 supporting 3-byte values. From what i know, special characters are 2 bytes in UTF-8. 2-byte Unicode set is enough to cover all characters, including Asian (with Chinese taking a couple dozen thousands of characters). I read something recently about 3-byte character support in one of the standards (UTF-16?), but the RFC said there are no 3-byte assignments yet, because 2-byte range is currently enough... But you are right, i should use UNICODE encoding when i use characters beyond extended ASCII. As far as applications, i usually use Java, which supports Unicode. I'm glad that PHP does so as well. And i sure look forward to pgAdmin3. Good luck, Reshat.
В списке pgadmin-support по дате отправления: