Re: Vague idea for allowing per-column locale
От | Tatsuo Ishii |
---|---|
Тема | Re: Vague idea for allowing per-column locale |
Дата | |
Msg-id | 20010814140130S.t-ishii@sra.co.jp обсуждение исходный текст |
Ответ на | Re: Vague idea for allowing per-column locale (Tim Allen <tim@proximity.com.au>) |
Список | pgsql-hackers |
> > Storing everything as Unicode is not a good idea, actually. First, > > Unicode tends to consume more storage space than other character > > sets. For example, UTF-8, one of the most commonly used encoding for > > Unicode consumes 3 bytes for Japanese characters, while SJIS only > > consumes 2 bytes. Second, a round trip converison between Unicode and > > other character sets is not always possible. Third, sorting > > issue. There is no convenient way to sort Unicode correctly. > > UTF-16 can handle most Japanese characters in two bytes, afaict. Generally > it seems that utf8 encodes European text more efficiently on average, > whereas utf16 is better for most Asian languages. Same thing can be said to UCS-2. Most multibyte characters could be two bytes within UCS-2. The problem with both UTF-16 and UCS-4 is that data may contain NULL bytes. > I may be mistaken, but I > was under the impression that sorting of unicode characters was a solved > problem. The IBM ICU class library (which does have a C interface), for > example, claims to provide everything you need to sort unicode text in > various locales, and uses utf16 internally: Interesting. Thanks for the info. I will look into this. BTW, "round trip conversion problem" still need to be addressed. -- Tatsuo Ishii
В списке pgsql-hackers по дате отправления: