Re: BUG #13440: unaccent does not remove all diacritics
От | Thomas Munro |
---|---|
Тема | Re: BUG #13440: unaccent does not remove all diacritics |
Дата | |
Msg-id | CAEepm=2b1df83h68tTiuk_xGC-PVmru02+rE2xp6_Hs5q_zHSg@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: BUG #13440: unaccent does not remove all diacritics (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-bugs |
On Mon, Jun 15, 2015 at 5:59 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > mike@busbud.com writes: >> Sorry, I couldn't install the most recent minor release, but I did try t= his >> on several different versions. I used Heroku to try a 9.4.3 build, and g= ot >> the same results > >> select '=C8=9B' as input, unaccent('=C8=9B') as observed, 't' as expecte= d; >> input | observed | expected >> -------+----------+---------- >> =C8=9B | =C8=9B | t >> (1 row) > > Hm, I do see > > =C5=A3 t > > in unaccent.rules, so the transformation ought to happen. I suspect > an encoding issue, eg your terminal window is not transmitting characters > in the encoding Postgres thinks you're using. You did not provide any > info about server encoding, client encoding, or client LC_xxx environment= , > so it's hard to debug from here. The one that is in unaccent.rules is apparently t-cedilla, from Gagauz and Romanian: https://en.wiktionary.org/wiki/%C5%A3 The one that is referred to above is apparently t-comma, from Livonian and Romanian, but "[o]ften replaced by =C5=A2 / =C5=A3 (t with cedilla), especially in computing": https://en.wiktionary.org/wiki/%C8%9B --=20 Thomas Munro http://www.enterprisedb.com
В списке pgsql-bugs по дате отправления: