Re: daitch_mokotoff module

Поиск

Список

Период

Сортировка

От	Tom Lane
Тема	Re: daitch_mokotoff module
Дата	3 января 2022 г. 16:34:36
Msg-id	3563190.1641227676@sss.pgh.pa.us обсуждение исходный текст
Ответ на	Re: daitch_mokotoff module (Dag Lem <dag@nimrod.no>)
Ответы	[PATCH] Run UTF8-dependent tests for citext [Re: daitch_mokotoff module]
Список	pgsql-hackers

Дерево обсуждения

Dag Lem <dag@nimrod.no> writes:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>> (We do have methods for dealing with non-ASCII test cases, but
>> I can't see that this patch is using any of them.)

> I naively assumed that tests would be run in an UTF8 environment.

Nope, not necessarily.

Our current best practice for this is to separate out encoding-dependent
test cases into their own test script, and guard the script with an
initial test on database encoding.  You can see an example in
src/test/modules/test_regex/sql/test_regex_utf8.sql
and the two associated expected-files.  It's a good idea to also cover
as much as you can with pure-ASCII test cases that will run regardless
of the prevailing encoding.

> Running "ack -l '[\x80-\xff]'" in the contrib/ directory reveals that
> two other modules are using UTF8 characters in tests - citext and
> unaccent.

Yeah, neither of those have been upgraded to said best practice.
(If you feel like doing the legwork to improve that situation,
that'd be great.)

> Looking into the unaccent module, I don't quite understand how it will
> work with various encodings, since it doesn't seem to decode its input -
> will it fail if run under anything but ASCII or UTF8?

Its Makefile seems to be forcing the test database to use UTF8.
I think this is a less-than-best-practice choice, because then
we have zero test coverage for other encodings; but it does
prevent test failures.

            regards, tom lane

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: daitch_mokotoff module