Re: BUG #13440: unaccent does not remove all diacritics
От | Michael Gradek |
---|---|
Тема | Re: BUG #13440: unaccent does not remove all diacritics |
Дата | |
Msg-id | CAEP8ZNVKxwBNyQx-CxcTL0hiNax3AScy208fs=8_Qp2cHt8y1A@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: BUG #13440: unaccent does not remove all diacritics (Thomas Munro <thomas.munro@enterprisedb.com>) |
Список | pgsql-bugs |
Thanks everyone, I've been comparing the behavior to that of https://github.com/andrewrk/node-diacritics/blob/master/index.js if that can be of any help. On Monday, June 15, 2015, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Tue, Jun 16, 2015 at 12:55 AM, Tom Lane <tgl@sss.pgh.pa.us > <javascript:;>> wrote: > > Alvaro Herrera <alvherre@2ndquadrant.com <javascript:;>> writes: > >> My terminal shows these characters to be different. One is > >> http://graphemica.com/%C8%9B > >> latin small letter t with comma below (U+021B) > > > >> The other is > >> http://graphemica.com/%C5%A3 > >> latin small letter t with cedilla (U+0163) > > > > Ah-hah -- I did not look closely enough. So the immediate answer for > > Michael is to add another entry to his unaccent.rules file. > > > > Should we add the missing character to the standard unaccent.rules file= ? > > It looks like Romanian also has s with comma. Perhaps we should have > all these characters: > > $ curl -s http://unicode.org/Public/7.0.0/ucd/UnicodeData.txt | egrep > ';LATIN (SMALL|CAPITAL) LETTER [A-Z] WITH ' | wc -l > 702 > > That's quite a lot more than the 187 we currently have. Of those, I > think only the following ligature characters don't fit the above > pattern: =C3=86, =C3=A6, =C4=B2, =C4=B3, =C5=92, =C5=93, =C3=9F. Inciden= tally, I don't believe that the > way we "unaccent" ligatures is correct anyway. Maybe they should be > expanded to AE, ae, IJ, ij, OE, oe, ss, respectively, not A, a, I, i, > O, o, S as we have it, but I guess it depends what the purpose of > unaccent is... > > -- > Thomas Munro > http://www.enterprisedb.com > --=20 Cheers, Mike --=20 Mike Gradek Co-founder and CTO, Busbud Busbud.com <http://busbud.com/> | mike@busbud.com *We're hiring!: Jobs at Busbud <http://www.busbud.com/en/about/jobs>*
В списке pgsql-bugs по дате отправления: