Re: BUG #15548: Unaccent does not remove combining diacritical characters
| От | Thomas Munro |
|---|---|
| Тема | Re: BUG #15548: Unaccent does not remove combining diacritical characters |
| Дата | |
| Msg-id | CA+hUKG+OG4bkwe6hn0yEBq2eY=HKuy9D_z2UgXeKjbrav7db5g@mail.gmail.com обсуждение исходный текст |
| Ответ на | Re: BUG #15548: Unaccent does not remove combining diacritical characters (Thomas Munro <thomas.munro@enterprisedb.com>) |
| Список | pgsql-bugs |
On Tue, Dec 3, 2019 at 9:57 PM Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Sun, Dec 16, 2018 at 8:20 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Hugh Ranalli <hugh@whtc.ca> writes: > > > The problem is that I downloaded the latest version of the Latin-ASCII > > > transliteration file (r34 rather than the r28 specified in the URL). Over 3 > > > years ago (in r29, of course) they changed the file format ( > > > https://unicode.org/cldr/trac/ticket/5873) so that > > > parse_cldr_latin_ascii_transliterator loads an empty rules set. > > > > Ah-hah. > > > > > I'd be > > > happy to either a) support both formats, or b), support just the newest and > > > update the URL. Option b) is cleaner, and I can't imagine why anyone would > > > want to use an older rule set (then again, struggling with Unicode always > > > makes my head hurt; I am not an expert on it). Thoughts? > > > > (b) seems sufficient to me, but perhaps someone else has a different > > opinion. > > > > Whichever we do, I think it should be a separate patch from the feature > > addition for combining diacriticals, just to keep the commit history > > clear. > > +1 for updating to the latest file from time to time. After > http://unicode.org/cldr/trac/ticket/11383 makes it into a new release, > our special_cases() function will have just the two Cyrillic > characters, which should almost certainly be handled by adding > Cyrillic to the ranges we handle via the usual code path, and DEGREE > CELSIUS and DEGREE FAHRENHEIT. Those degree signs could possibly be > extracted from Unicode.txt (or we could just forget about them), and > then we could drop special_cases(). Aha, CLDR 36 included that change, so when we update we can drop a special case.
В списке pgsql-bugs по дате отправления: