Re: Concerning about Unicode-aware string handling
От | Craig Ringer |
---|---|
Тема | Re: Concerning about Unicode-aware string handling |
Дата | |
Msg-id | 4FBB16B6.5020103@ringerc.id.au обсуждение исходный текст |
Ответ на | Re: Concerning about Unicode-aware string handling (Andrew Sullivan <ajs@crankycanuck.ca>) |
Список | pgsql-general |
On 05/21/2012 06:59 PM, Andrew Sullivan wrote: > On Mon, May 21, 2012 at 02:44:45AM -0700, John R Pierce wrote: >> support the bastardized UTF-16 'unicode' implemented by Windows NT > To be fair to Microsoft, while the BOM might be an irritant, they do > use a perfectly legitimate encoding of Unicode. There is no Unicode > requirement that code points be stored as UTF-8, and there is a strong > argument to be made that, for some languages, UTF-8 is extremely > inefficient and therefore the least preferred encoding. (Microsoft's > dependence on the BOM with UTF-16 -- really UCS2 -- is problematic, of > course, and appears to be adjusted in funny ways in Win 7.) In fact, until it became clear that UCS-2 (now UTF-16) wasn't enough and we'd need 4 bytes to represent characters, Microsoft's choice of UCS-2 with BOM looked really good. They just didn't realise that UCS-2 would turn into UTF-16 when UCS-4 came on the scene, so they'd be left holding a bastardised half-way mess that's usually-but-not-always 2 bytes per character. MS's choice allowed programs to work with the safe (at the time) assumption that each char was 2 bytes, which made a lot of things way simpler than they are in UTF-8 and was well and truly worth the storage bloat IMO. Pity Unicode had to grow again and break the assumption. -- Craig Ringer
В списке pgsql-general по дате отправления: