Re: BUG #15347: Unaccent for greek characters does not work
От | Thomas Munro |
---|---|
Тема | Re: BUG #15347: Unaccent for greek characters does not work |
Дата | |
Msg-id | CAEepm=3a_5y+COG6AM0UFZXb4MQmxSdpQK3oGnri1kaP+Uqx5A@mail.gmail.com обсуждение исходный текст |
Ответ на | BUG #15347: Unaccent for greek characters does not work (PG Bug reporting form <noreply@postgresql.org>) |
Ответы |
Re: BUG #15347: Unaccent for greek characters does not work
Re: BUG #15347: Unaccent for greek characters does not work |
Список | pgsql-bugs |
On Thu, Aug 23, 2018 at 3:08 AM, PG Bug reporting form <noreply@postgresql.org> wrote: > The following bug has been logged on the website: > > Bug reference: 15347 > Logged by: Tasos Maschalidis > Email address: tas.o.s@hotmail.com > PostgreSQL version: 9.3.18 > Operating system: Ubuntu 4.8.4 > Description: > > Call to unaccent function with greek characters does not return the greek > characters without the accents as expected (not even just the few diacritics > used in modern Greek). Hello Tasos, Right. We generate the unaccent.rules file from the Unicode data file using the Python script contrib/unaccent/generate_unaccent_rules.py in the PostgreSQL source tree. The script currently limits itself to Latin characters here: def is_plain_letter(codepoint): """Return true if codepoint represents a plain ASCII letter.""" return (codepoint.id >= ord('a') and codepoint.id <= ord('z')) or \ (codepoint.id >= ord('A') and codepoint.id <= ord('Z')) I was not brave enough to support other kinds of characters, because I can't read 'em and check if the results are garbage (if you remove the diacritics from Klingon, it might change the meaning of any word into a declaration of war for all I know). If you know Python and would like to have a go at modifying that script to support Greek, please do! Otherwise perhaps I could try to do it and you could review the results. There is a precedent already that it knows how to remove a diacritic from at least one Cyrillic character. I think there is no reason at all we shouldn't take a patch to support Greek or any other alphabet that a native speaker can advise us on. I think the chances of squeaking a change into PostgreSQL 11 are slim, since it would require a special exception from the Release Management Team at this point. Failing that, it'd be for PostgreSQL 12. We don't usually back-patch unaccent.rules changes because they can affect in indexed data, and we don't want minor version upgrades to break stuff. [1] https://www.postgresql.org/message-id/CAEepm%3D1KRVinFtuDao4L%2BqSBh4T4k3z996EwD5-zgytu4Qa5Fw%40mail.gmail.com -- Thomas Munro http://www.enterprisedb.com
В списке pgsql-bugs по дате отправления: