Re: Character expansion with ICU collations

Поиск

Список

Период

Сортировка

От	Finnerty, Jim
Тема	Re: Character expansion with ICU collations
Дата	21 июня 2021 г. 13:23:38
Msg-id	10F78B0E-3C4B-4BF8-9EF0-BEE684F4C8CC@amazon.com обсуждение исходный текст
Ответ на	Re: Character expansion with ICU collations ("Finnerty, Jim" <jfinnert@amazon.com>)
Список	pgsql-hackers

Дерево обсуждения

I have a proposal for how to support tailoring rules in ICU collations: The ucol_openRules() function is an alternative
tothe ucol_open() function that PostgreSQL calls today, but it takes the collation strength as one if its parameters so
thelocale string would need to be parsed before creating the collator.  After the collator is created using either
ucol_openRulesor ucol_open, the ucol_setAttribute() function may be used to set individual attributes from
keyword=valuepairs in the locale string as it does now, except that the strength probably can't be changed after
openingthe collator with ucol_openRules.  So the logic in pg_locale.c would need to be reorganized a little bit, but
thatsounds straightforward.
 

One simple solution would be to have the tailoring rules be specified as a new keyword=value pair, such as
colTailoringRules=<rulestring>. Since the <rulestring> may contain single quote characters or PostgreSQL escape
characters,any single quote characters or escapes would need to be escaped using PostgreSQL escape rules.  If
colTailoringRulesis present, colStrength would also be known prior to opening the collator, or would default to
tertiary,and we would keep a local flag indicating that we should not process the colStrength keyword again, if
specified.
 

Representing the TailoringRules as just another keyword=value in the locale string means that we don't need any change
tothe catalog to store it.  It's just part of the locale specification.  I think we wouldn't even need to bump the
catversion.

Are there any tailoring rules, such as expansions and contractions, that we should disallow?  I realize that we don't
handlenondeterministic collations in LIKE or regular expression operations as of PG14, but given expr LIKE 'a%' on a
databasewith a UTF-8 encoding and arbitrary tailoring rules that include expansions and contractions, is it still
guaranteedthat expr must sort BETWEEN 'a' AND ('a' || E'/uFFFF') ?

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Character expansion with ICU collations