Re: BUG #18362: unaccent rules and Old Greek text

Поиск
Список
Период
Сортировка
От Peter Eisentraut
Тема Re: BUG #18362: unaccent rules and Old Greek text
Дата
Msg-id 1bcd13b7-6e00-4de1-961e-b7669f05a2da@eisentraut.org
обсуждение исходный текст
Ответ на Re: BUG #18362: unaccent rules and Old Greek text  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: BUG #18362: unaccent rules and Old Greek text  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-bugs
On 14.05.24 16:51, Robert Haas wrote:
> 2. The question of which mappings we actually ought to be adding seems
> a lot harder, because it's not altogether clear what it means to
> "remove an accent". The proposed patch adds a whole lot of rules that
> turn tiny little characters into full-sized characters, boldfaced
> and/or italicized and/or otherwise-fancily-printed characters into
> full-sized characters. Only a handful of the changes are actually
> adding rules that specifically*remove an accent*, but there are
> similar rules that already exist, like turning ⅐ into the
> four-character sequence " 1/7" and blocky-looking versions of each
> letter into standard versions and ㍱ into the three-character sequence
> "hPa". So my naive guess would be that we want all of these rules,
> even though you would not guess from the unaccent documentation that
> it's supposed to do stuff like this.

unaccent actually does both accent removal and ligature expansion. 
(This is documented.)  The cases you show above are ligature expansions.

You can also run generate_unaccent_rules.py with --no-ligatures and then 
you get a smaller list that indeed looks more like just accent removal.

It does look like that whatever it thinks a ligature is has some 
unintuitive results.



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Peter Eisentraut
Дата:
Сообщение: Re: BUG #18362: unaccent rules and Old Greek text
Следующее
От: PG Bug reporting form
Дата:
Сообщение: BUG #18467: postgres_fdw (deparser) ignores LimitOption