Re: BUG #13440: unaccent does not remove all diacritics

Поиск

Список

Период

Сортировка

От	Léonard Benedetti
Тема	Re: BUG #13440: unaccent does not remove all diacritics
Дата	17 февраля 2016 г. 21:07:47
Msg-id	56C4E11D.5050809@mlpo.fr обсуждение
Ответ на	Re: BUG #13440: unaccent does not remove all diacritics (Teodor Sigaev <teodor@sigaev.ru>)
Ответы	Re: BUG #13440: unaccent does not remove all diacritics
Список	pgsql-bugs

Дерево обсуждения

12/02/2016 17:44, Teodor Sigaev wrote :
> I'm inclining to commit this patch becouse it suggests more regular
> way to update unaccent rules. That is nice.
>
> But I have some notices:
> 1 Is it possible to do not restrict generator script to Python V2?
> Python V2, seems, will go away in near future, and it will not be
> comfortable to install V2 for a single task.

Yes I agree, it makes sense; the script was originally Python 2 but
Python 2 is legacy. Moreover, adapting the script for Python 3 seems
trivial.

> 2 As it's easy to see, nowhere in sources of pgsql there is no a UTF-8
> encoding, just ASCII. I don't see reason to make an exception for this
> script.

First of all, the majority of pgsql code is C, a language where default
encoding is not the same everywhere (may depend on the locale settings
or the compiler) so it is logical to use ASCII.

On the other hand, UTF-8 encoding for source code is *a feature of
Python 3* (to quote the documentation: “The default encoding for Python
source code is UTF-8”) so there is no possible ambiguity, and it will
not be a problem. That said, some non-ASCII characters may be removed
without prejudice from the source code of the script (I think in
particular to "“" and "”"). Nevertheless, for some comments, it would be
unfortunate (e.g. “# RegEx to parse rules (e.g. “Đ → D ; […]”)” or “# ℃
°C”).

>
> Thank you.
>

Thus, I propose to adapt the code to Python 3 (the encoding of the
script does not seem to be a problem for the above reasons). I try to do
it shortly.

Thank you for your feedback.

Léonard Benedetti

В списке pgsql-bugs по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: BUG #13440: unaccent does not remove all diacritics