Re: unaccent fails when datlocprovider=i and datctype=C

Поиск

Список

Период

Сортировка

От	Peter Eisentraut
Тема	Re: unaccent fails when datlocprovider=i and datctype=C
Дата	10 марта 2023 г. 08:40:19
Msg-id	314313bd-9cec-75d2-97fe-9172ef0a593b@enterprisedb.com обсуждение исходный текст
Ответ на	unaccent fails when datlocprovider=i and datctype=C (Jeff Davis <pgsql@j-davis.com>)
Список	pgsql-bugs

Дерево обсуждения

On 08.03.23 06:49, Jeff Davis wrote:
> $ initdb -D data -N --locale-provider=icu --icu-locale=en --locale=C

Is it even worth supporting that?  What is the point of this kind of setup?

> =# create extension unaccent;
> ERROR:  invalid multibyte character for locale
> HINT:  The server's LC_CTYPE locale is probably incompatible with the
> database encoding.
> CONTEXT:  line 1 of configuration file
> ".../share/tsearch_data/unaccent.rules": "¡  !
> "
> 
> Cause: t_isspace() implementation is incomplete (notice "TODO"
> comments):
> 
>      Oid         collation = DEFAULT_COLLATION_OID;  /* TODO */
>      pg_locale_t mylocale = 0;   /* TODO */
> 
>      if (clen == 1 || lc_ctype_is_c(collation))
>          return isspace(TOUCHAR(ptr));
> 
>      char2wchar(character, WC_BUF_LEN, ptr, clen, mylocale);
> 
>      return iswspace((wint_t) character[0]);
> 
> If using datlocprovider=c, then the earlier branch goes straight to
> isspace(). But if datlocprovider=i, then
> lc_ctype_is_c(DEFAULT_COLLATION_OID) returns false, and it goes into
> char2wchar(). char2wchar() is essentially a wrapper around mbstowcs(),
> which does not work on multibyte input when LC_CTYPE=C.
> 
> Quick fix (attached): check whether datctype is C rather than the
> default collation.

This seems right.  It's unfortunate that we would now have the 
possibilty that

lc_ctype_is_c(DEFAULT_COLLATION_OID) != database_ctype_is_c

but that seems to be the nature of things.  Maybe a comment somewhere?

В списке pgsql-bugs по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: unaccent fails when datlocprovider=i and datctype=C