Обсуждение: [PATCH] Fix ICU strength not being honored in collation rules

Поиск
Список
Период
Сортировка

[PATCH] Fix ICU strength not being honored in collation rules

От
Luis Felippe
Дата:
Hello,

I have run into an issue where specifying the rules argument for "CREATE COLLATION" changes the collation strength to
tertiary,even if it is explicitly set in the rules string. I discovered that this is because ucol_openRules is called
passingstrength UCOL_DEFAULT_STRENGTH, which overwrites whatever is in the rules string with UCOL_TERTIARY. 

This fix changes this call to pass UCOL_DEFAULT instead. This way, UCOL_TERTIARY is still specified by default, but the
strengthexplicitly set on the rules string is not overwritten. This is important because there is currently no way to
createa collation with custom tailoring rules with strengh other than tertiary. 

What happens currently:

CREATE COLLATION my_col (provider = icu, locale = 'und', rules = '', deterministic = false); -- strengh: tertiary
CREATE COLLATION my_col (provider = icu, locale = 'und', rules = '[strength 2]', deterministic = false); -- strength:
tertiary
CREATE COLLATION my_col (provider = icu, locale = 'und', rules = '[strength 1]', deterministic = false); -- strength:
tertiary

What happens after the patch:

CREATE COLLATION my_col (provider = icu, locale = 'und', rules = '', deterministic = false); -- strengh: tertiary
CREATE COLLATION my_col (provider = icu, locale = 'und', rules = '[strength 2]', deterministic = false); -- strength:
secondary
CREATE COLLATION my_col (provider = icu, locale = 'und', rules = '[strength 1]', deterministic = false); -- strength:
primary

As this only affects cases where the strength is explicitly set but was previously ignores, I do not think it is a
breakingchange. 

I have successfully compiled and tested PostgreSQL after this change, and it behaves as documented above.

Thank you in advance,

Luis
Вложения

Re: [PATCH] Fix ICU strength not being honored in collation rules

От
"Daniel Verite"
Дата:
Luis Felippe wrote:

> This fix changes this call to pass UCOL_DEFAULT instead. This way,
> UCOL_TERTIARY is still specified by default, but the strength explicitly set
> on the rules string is not overwritten. This is important because there is
> currently no way to create a collation with custom tailoring rules with
> strengh other than tertiary.

Yes. There was a previous report recently [1], with a proposed fix [2]
identical to yours.


> As this only affects cases where the strength is explicitly set but was
> previously ignores, I do not think it is a breaking change.

The fix may change sort results for collations affected by the problem
(that's the point of the fix!), so even if it's for the better, it's
theorically
a breaking change for databases that may have collations like that.


[1]

https://www.postgresql.org/message-id/flat/YT2PPF959236618377A072745A280E278F4BE1DA@YT2PPF959236618.CANPRD01.PROD.OUTLOOK.COM

[2] https://commitfest.postgresql.org/patch/6084/


Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/



Re: [PATCH] Fix ICU strength not being honored in collation rules

От
Luis Felippe
Дата:
Daniel Verite wrote:

> Yes. There was a previous report recently [1], with a proposed fix [2]
> identical to yours.

It is great to know this is already being addressed.

> The fix may change sort results for collations affected by the problem
> (that's the point of the fix!), so even if it's for the better, it's
> theorically
> a breaking change for databases that may have collations like that.

While this is technically a breaking change, it only affects cases where the strength attribute is explicitly set.
Caseswhere the strength is indirectly set — for example, by specifying a locale with a different default strength (e.g.
und-u-ks-level-2)— continue to behave as before, where providing any tailoring rules resets the strength to tertiary. 

Explicitly setting the strength attribute is, by definition, an intentional change to the collation strength.
PostgreSQLcurrently accepts this attribute but silently ignores it, which is a clear correctness issue rather than an
intentionalbehavioral characteristic. The fix therefore aligns the implementation with user expectations and with the
documentedmeaning of the attribute. 

Given that the change only impacts explicitly misbehaving cases and brings behavior in line with both specification and
intent,I think it would be reasonable — and beneficial — to include it in the next minor release. 


Best regards,

Luis