Re: [BUGS] COPY FROM/TO losing a single byte of a multibyte UTF-8 sequence
От | Steven Schlansker |
---|---|
Тема | Re: [BUGS] COPY FROM/TO losing a single byte of a multibyte UTF-8 sequence |
Дата | |
Msg-id | 34C92DEC-CD89-403C-BB6D-B21012233F0F@trumpet.io обсуждение исходный текст |
Ответ на | Re: [BUGS] COPY FROM/TO losing a single byte of a multibyte UTF-8 sequence (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: [BUGS] COPY FROM/TO losing a single byte of a multibyte UTF-8 sequence
Re: [BUGS] COPY FROM/TO losing a single byte of a multibyte UTF-8 sequence |
Список | pgsql-hackers |
On Aug 19, 2010, at 3:24 PM, Tom Lane wrote: > Steven Schlansker <steven@trumpet.io> writes: >> >> I'm not at all experienced with character encodings so I could >> be totally off base, but isn't it wrong to ever call isspace(0x85), >> whatever the result may be, given that the actual character is 0xCF85? >> (U+03C5, GREEK SMALL LETTER UPSILON) > > We generally assume that in server-safe encodings, the ctype.h functions > will behave sanely on any single-byte value. You can argue the wisdom > of that, but deciding to change that policy would be a rather massive > code change; I'm not excited about going that direction. Fair enough. I presume there are no "server-safe encodings" for which a multibyte sequence 0x XX20 would be valid - which would break anyway (as the second byte looks like a real space) > You need a setlocale() call, else the program acts as though it's in C > locale regardless of environment. Sigh. I hate C sometimes. :-p Anyway, it looks like this is actually a BSD bug which got copy + pasted into Apple's Darwin source - http://lists.freebsd.org/pipermail/freebsd-i18n/2007-September/000157.html I have a couple of contacts at Apple so I'll see if there's any interest in backporting a fix, but I wouldn't hope for it to happen quickly if at all... Thanks for taking a look into fixing this, I hope you guys can reach consensus on how to get it fixed :) Best, Steven Schlansker
В списке pgsql-hackers по дате отправления: