Re: [BUGS] COPY FROM/TO losing a single byte of a multibyte UTF-8 sequence
От | Tom Lane |
---|---|
Тема | Re: [BUGS] COPY FROM/TO losing a single byte of a multibyte UTF-8 sequence |
Дата | |
Msg-id | 25852.1282333813@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: [BUGS] COPY FROM/TO losing a single byte of a multibyte UTF-8 sequence (Steven Schlansker <steven@trumpet.io>) |
Список | pgsql-hackers |
Steven Schlansker <steven@trumpet.io> writes: > On Aug 19, 2010, at 3:24 PM, Tom Lane wrote: >> We generally assume that in server-safe encodings, the ctype.h functions >> will behave sanely on any single-byte value. You can argue the wisdom >> of that, but deciding to change that policy would be a rather massive >> code change; I'm not excited about going that direction. > Fair enough. I presume there are no "server-safe encodings" for which > a multibyte sequence 0x XX20 would be valid - which would break anyway > (as the second byte looks like a real space) Right: our definition of a "server-safe encoding" is precisely that no byte of a multibyte character looks like ASCII, ie all bytes have their high bit set. We're essentially assuming that the <ctype.h> functions will all return false for any byte with the high bit set, if the selected encoding is multibyte. > Anyway, it looks like this is actually a BSD bug which got copy + > pasted into Apple's Darwin source - > http://lists.freebsd.org/pipermail/freebsd-i18n/2007-September/000157.html Interesting. So the BSD people did fix it upstream? regards, tom lane
В списке pgsql-hackers по дате отправления: