Re: BUG #15548: Unaccent does not remove combining diacritical characters
От | Hugh Ranalli |
---|---|
Тема | Re: BUG #15548: Unaccent does not remove combining diacritical characters |
Дата | |
Msg-id | CAAhbUMMzPERSe3KfKKQfR4COJCZSrss1G7KRyUraYJyvrVyOUg@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: BUG #15548: Unaccent does not remove combining diacritical characters (Thomas Munro <thomas.munro@enterprisedb.com>) |
Ответы |
Re: BUG #15548: Unaccent does not remove combining diacritical characters
|
Список | pgsql-bugs |
On Mon, 17 Dec 2018 at 23:05, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
+ʹ '
+ʺ "
+ʻ '
+ʼ '
+ʽ '
+˂ <
+˃ >
+˄ ^
+ˆ ^
+ˈ '
+ˋ `
+ː :
+˖ +
+˗ -
+˜ ~
These aren't the combining codepoints. They're new substitutions defined in r34 of the Latin-ASCII transliteration file. I had wondered about those, too, and did some testing.
I don't think this is quite right.
However, you are correct that something isn't write. In testing why I was getting a different output, I had reverted to the generate_unaccent_rules.py BEFORE my changes. And then I applied my update for the transliteration file format to the reverted version. The patch for generate_unaccent_rules should still be good, but the generated rules file didn't include the combining diacriticals. In generating that, I want to double check some of the additions before re-submitting.
On Mon, 17 Dec 2018 at 23:57, Michael Paquier <michael@paquier.xyz> wrote:
Could you also add some tests in contrib/unaccent/sql/unaccent.sql at
the same time? That would be nice to check easily the extent of the
patches proposed on this thread.
That makes sense. I'm happy to do that. Let me look at that file and see how extensive the other changes (encoding and removal of special characters would be).
Hugh
В списке pgsql-bugs по дате отправления: