Re: BUG #15548: Unaccent does not remove combining diacritical characters
От | Tom Lane |
---|---|
Тема | Re: BUG #15548: Unaccent does not remove combining diacritical characters |
Дата | |
Msg-id | 8506.1545111362@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: BUG #15548: Unaccent does not remove combining diacriticalcharacters (Michael Paquier <michael@paquier.xyz>) |
Ответы |
Re: BUG #15548: Unaccent does not remove combining diacriticalcharacters
|
Список | pgsql-bugs |
Michael Paquier <michael@paquier.xyz> writes: > Could you also add some tests in contrib/unaccent/sql/unaccent.sql at > the same time? That would be nice to check easily the extent of the > patches proposed on this thread. I wonder why unaccent.sql is set up to run its tests in KOI8 client encoding rather than UTF8. It doesn't seem like it's the business of this test script to be verifying transcoding from KOI8 to UTF8 (and if it were meant to do that, it's a pretty incomplete test...). But having it set up like that means that we can't directly add such tests to unaccent.sql, because there are no combining diacritics in the KOI8 character set. We have two unattractive options: * Change client encodings partway through unaccent.sql. I think this would be disastrous for editability of that file; no common tools will understand the encoding change. * Put the new test cases into a separate file with a different client encoding. This is workable, I suppose, but it seems pretty silly when the tests are only a few queries apiece. Another problem I've got with the current setup is that it seems unlikely that many people's editors default to an assumption of KOI8 encoding. Mine guesses that these files are UTF8, and so the test cases look perfectly insane. They do make sense if I transcode the files to UTF8, but I wonder why we're not shipping them as UTF8 in the first place. tl;dr: I think we should convert unaccent.sql and unaccent.out to UTF8 encoding. Then, adding more test cases for this patch will be easy. regards, tom lane
В списке pgsql-bugs по дате отправления: