Re: unaccent fails when datlocprovider=i and datctype=C
| От | Peter Eisentraut |
|---|---|
| Тема | Re: unaccent fails when datlocprovider=i and datctype=C |
| Дата | |
| Msg-id | 314313bd-9cec-75d2-97fe-9172ef0a593b@enterprisedb.com обсуждение исходный текст |
| Ответ на | unaccent fails when datlocprovider=i and datctype=C (Jeff Davis <pgsql@j-davis.com>) |
| Список | pgsql-bugs |
On 08.03.23 06:49, Jeff Davis wrote: > $ initdb -D data -N --locale-provider=icu --icu-locale=en --locale=C Is it even worth supporting that? What is the point of this kind of setup? > =# create extension unaccent; > ERROR: invalid multibyte character for locale > HINT: The server's LC_CTYPE locale is probably incompatible with the > database encoding. > CONTEXT: line 1 of configuration file > ".../share/tsearch_data/unaccent.rules": "¡ ! > " > > Cause: t_isspace() implementation is incomplete (notice "TODO" > comments): > > Oid collation = DEFAULT_COLLATION_OID; /* TODO */ > pg_locale_t mylocale = 0; /* TODO */ > > if (clen == 1 || lc_ctype_is_c(collation)) > return isspace(TOUCHAR(ptr)); > > char2wchar(character, WC_BUF_LEN, ptr, clen, mylocale); > > return iswspace((wint_t) character[0]); > > If using datlocprovider=c, then the earlier branch goes straight to > isspace(). But if datlocprovider=i, then > lc_ctype_is_c(DEFAULT_COLLATION_OID) returns false, and it goes into > char2wchar(). char2wchar() is essentially a wrapper around mbstowcs(), > which does not work on multibyte input when LC_CTYPE=C. > > Quick fix (attached): check whether datctype is C rather than the > default collation. This seems right. It's unfortunate that we would now have the possibilty that lc_ctype_is_c(DEFAULT_COLLATION_OID) != database_ctype_is_c but that seems to be the nature of things. Maybe a comment somewhere?
В списке pgsql-bugs по дате отправления: