Re: BUG #15476: Problem on show_trgm with 4 byte UTF-8 characters

Поиск

Список

Период

Сортировка

От	Tom Lane
Тема	Re: BUG #15476: Problem on show_trgm with 4 byte UTF-8 characters
Дата	3 ноября 2018 г. 17:03:20
Msg-id	3873.1541264600@sss.pgh.pa.us обсуждение исходный текст
Ответ на	BUG #15476: Problem on show_trgm with 4 byte UTF-8 characters (PG Bug reporting form <noreply@postgresql.org>)
Список	pgsql-bugs

Дерево обсуждения

kenji uno <h8mastre@gmail.com> writes:
>> I failed to reproduce this on a Linux machine.  It looks to me like the
>> problem is that Windows' MultiByteToWideChar doesn't think that UTF8
>> character is valid.

> I'm just wondering why my issue occurs only on Windows.
> But I knew why: char2wchar's tolen requires +1 output buffer size, due to
> null-termination.

Oooh ... the problem, effectively, is that the ts_locale.c functions are
expecting to get back UTF32 but what they'll actually get on Windows is
UTF16.  So if the given character is outside the BMP range, char2wchar
needs to produce a surrogate pair, which there's not room for given that
the output buffer can only hold 1 wchar_t plus trailing null.

Then the other problem is that the Windows-Unicode code path in char2wchar
just fails for an undersized output buffer, which you would not expect
from its documentation.  And it fails with a misleading error message,
too.

I'll see what I can do about this --- thanks for the report!

            regards, tom lane

В списке pgsql-bugs по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: BUG #15476: Problem on show_trgm with 4 byte UTF-8 characters