Re: BUG #15548: Unaccent does not remove combining diacritical characters
От | Hugh Ranalli |
---|---|
Тема | Re: BUG #15548: Unaccent does not remove combining diacritical characters |
Дата | |
Msg-id | CAAhbUMOX4QLj6c0O3GnjZYtR2dpAowss832Bq1n7oJyByeR7kQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: BUG #15548: Unaccent does not remove combining diacritical characters (Thomas Munro <thomas.munro@enterprisedb.com>) |
Ответы |
Re: BUG #15548: Unaccent does not remove combining diacritical characters
|
Список | pgsql-bugs |
On Sat, 15 Dec 2018 at 21:26, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
+1 for updating to the latest file from time to time. After
http://unicode.org/cldr/trac/ticket/11383 makes it into a new release,
our special_cases() function will have just the two Cyrillic
characters, which should almost certainly be handled by adding
Cyrillic to the ranges we handle via the usual code path, and DEGREE
CELSIUS and DEGREE FAHRENHEIT. Those degree signs could possibly be
extracted from Unicode.txt (or we could just forget about them), and
then we could drop special_cases().
Well, when I modified the code to handle the new version of the transliteration file, I discovered that was sufficient to handle the old version as well. That's not the way things usually go, but I'll take it. ;-)
I've attached two patches, one to update generate_unaccent_rules.py, and another that updates unaccent.rules from the v34 transliteration file. I'll be happy to add these to the CF. Does anyone need to review them and give me approval before I do so?
Best wishes,
Hugh
В списке pgsql-bugs по дате отправления: