Re: BUG #15476: Problem on show_trgm with 4 byte UTF-8 characters
От | Tom Lane |
---|---|
Тема | Re: BUG #15476: Problem on show_trgm with 4 byte UTF-8 characters |
Дата | |
Msg-id | 3873.1541264600@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | BUG #15476: Problem on show_trgm with 4 byte UTF-8 characters (PG Bug reporting form <noreply@postgresql.org>) |
Список | pgsql-bugs |
kenji uno <h8mastre@gmail.com> writes: >> I failed to reproduce this on a Linux machine. It looks to me like the >> problem is that Windows' MultiByteToWideChar doesn't think that UTF8 >> character is valid. > I'm just wondering why my issue occurs only on Windows. > But I knew why: char2wchar's tolen requires +1 output buffer size, due to > null-termination. Oooh ... the problem, effectively, is that the ts_locale.c functions are expecting to get back UTF32 but what they'll actually get on Windows is UTF16. So if the given character is outside the BMP range, char2wchar needs to produce a surrogate pair, which there's not room for given that the output buffer can only hold 1 wchar_t plus trailing null. Then the other problem is that the Windows-Unicode code path in char2wchar just fails for an undersized output buffer, which you would not expect from its documentation. And it fails with a misleading error message, too. I'll see what I can do about this --- thanks for the report! regards, tom lane
В списке pgsql-bugs по дате отправления: