Re: BUG #13440: unaccent does not remove all diacritics
От | Léonard Benedetti |
---|---|
Тема | Re: BUG #13440: unaccent does not remove all diacritics |
Дата | |
Msg-id | 56C4E11D.5050809@mlpo.fr обсуждение исходный текст |
Ответ на | Re: BUG #13440: unaccent does not remove all diacritics (Teodor Sigaev <teodor@sigaev.ru>) |
Ответы |
Re: BUG #13440: unaccent does not remove all diacritics
|
Список | pgsql-bugs |
12/02/2016 17:44, Teodor Sigaev wrote : > I'm inclining to commit this patch becouse it suggests more regular > way to update unaccent rules. That is nice. > > But I have some notices: > 1 Is it possible to do not restrict generator script to Python V2? > Python V2, seems, will go away in near future, and it will not be > comfortable to install V2 for a single task. Yes I agree, it makes sense; the script was originally Python 2 but Python 2 is legacy. Moreover, adapting the script for Python 3 seems trivial. > 2 As it's easy to see, nowhere in sources of pgsql there is no a UTF-8 > encoding, just ASCII. I don't see reason to make an exception for this > script. First of all, the majority of pgsql code is C, a language where default encoding is not the same everywhere (may depend on the locale settings or the compiler) so it is logical to use ASCII. On the other hand, UTF-8 encoding for source code is *a feature of Python 3* (to quote the documentation: “The default encoding for Python source code is UTF-8”) so there is no possible ambiguity, and it will not be a problem. That said, some non-ASCII characters may be removed without prejudice from the source code of the script (I think in particular to "“" and "”"). Nevertheless, for some comments, it would be unfortunate (e.g. “# RegEx to parse rules (e.g. “Đ → D ; […]”)” or “# ℃ °C”). > > Thank you. > Thus, I propose to adapt the code to Python 3 (the encoding of the script does not seem to be a problem for the above reasons). I try to do it shortly. Thank you for your feedback. Léonard Benedetti
В списке pgsql-bugs по дате отправления: