pgsql: Make ts_locale.c's character-type functions cope with UTF-16.
От | Tom Lane |
---|---|
Тема | pgsql: Make ts_locale.c's character-type functions cope with UTF-16. |
Дата | |
Msg-id | E1gJ09w-00057w-6P@gemulon.postgresql.org обсуждение исходный текст |
Список | pgsql-committers |
Make ts_locale.c's character-type functions cope with UTF-16. On Windows, in UTF8 database encoding, what char2wchar() produces is UTF16 not UTF32, ie, characters above U+FFFF will be represented by surrogate pairs. t_isdigit() and siblings did not account for this and failed to provide a large enough result buffer. That in turn led to bogus "invalid multibyte character for locale" errors, because contrary to what you might think from char2wchar()'s documentation, its Windows code path doesn't cope sanely with buffer overflow. The solution for t_isdigit() and siblings is pretty clear: provide a 3-wchar_t result buffer not 2. char2wchar() also needs some work to provide more consistent, and more accurately documented, buffer overrun behavior. But that's a bigger job and it doesn't actually have any immediate payoff, so leave it for later. Per bug #15476 from Kenji Uno, who deserves credit for identifying the cause of the problem. Back-patch to all active branches. Discussion: https://postgr.es/m/15476-4314f480acf0f114@postgresql.org Branch ------ REL9_4_STABLE Details ------- https://git.postgresql.org/pg/commitdiff/0ae902e39ed8e20abce8b6db2daec7f2abbadb5b Modified Files -------------- src/backend/tsearch/ts_locale.c | 27 +++++++++++++++++++-------- 1 file changed, 19 insertions(+), 8 deletions(-)
В списке pgsql-committers по дате отправления: