Re: Corruption of multibyte identifiers on UTF-8 locale
От | Victor Snezhko |
---|---|
Тема | Re: Corruption of multibyte identifiers on UTF-8 locale |
Дата | |
Msg-id | uu02ylc78.fsf@indorsoft.ru обсуждение исходный текст |
Ответ на | Re: Corruption of multibyte identifiers on UTF-8 locale (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: Corruption of multibyte identifiers on UTF-8 locale
|
Список | pgsql-bugs |
Tom Lane <tgl@sss.pgh.pa.us> writes: >> correct utf-8 byte sequence is 0xd18231, so it looks like we call >> tolower() somewhere on parts of multibyte characters, and it does the >> same as isspace() - it interprets it's argument as wide character, and >> converts it. > > Indeed, and I am certainly wondering why we should not just say that > you've got a broken locale definition there. There is absolutely no > doubt that the ctype.h functions are defined to work on char, not > wchar. Agreed, but such corruption indicates that there is non-multibyte-safe (octet-wise) case conversion somewhere, at best (with fully working locale) it will cause case conversion to do nothing instead of actual conversion. > They have no business mangling high-bit-set bytes in a multibyte > encoding. -- WBR, Victor V. Snezhko E-mail: snezhko@indorsoft.ru
В списке pgsql-bugs по дате отправления: