BUG #15476: Problem on show_trgm with 4 byte UTF-8 characters
От | PG Bug reporting form |
---|---|
Тема | BUG #15476: Problem on show_trgm with 4 byte UTF-8 characters |
Дата | |
Msg-id | 15476-4314f480acf0f114@postgresql.org обсуждение исходный текст |
Ответы |
Re: BUG #15476: Problem on show_trgm with 4 byte UTF-8 characters
|
Список | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 15476 Logged by: Kenji Uno Email address: h8mastre@gmail.com PostgreSQL version: 9.6.2 Operating system: Windows Server 2012 Japanese Description: # Problem on show_trgm with 4 byte UTF-8 characters On Encoding=UTF-8 database, try: SELECT show_trgm('123'); → OK SELECT show_trgm('日本語'); → probably OK. SELECT show_trgm('🔍'); → ERROR! ERROR: invalid multibyte character for locale HINT: The server's LC_CTYPE locale is probably incompatible with the database encoding. SQL state: 22021 I have reviewed some of your source code. And I have found a suspect point. Please check: t_isdigit, t_isspace, t_isalpha, and t_isprint. https://github.com/postgres/postgres/blob/322548a8abe225f2cfd6a48e07b99e2711d28ef7/src/backend/tsearch/ts_locale.c#L35 char2wchar 4th parameter should take number of input bytes. However they pass character count. int clen = pg_mblen(ptr); ... char2wchar(character, 2, ptr, clen, mylocale); I'm afraid, but could you look into about this?
В списке pgsql-bugs по дате отправления: