Re: Built-in CTYPE provider

Поиск
Список
Период
Сортировка
От Jeff Davis
Тема Re: Built-in CTYPE provider
Дата
Msg-id a21b515000f121244db2a5570d4292d19df23379.camel@j-davis.com
обсуждение исходный текст
Ответ на Re: Built-in CTYPE provider  (Peter Eisentraut <peter@eisentraut.org>)
Ответы Re: Built-in CTYPE provider  (Peter Eisentraut <peter@eisentraut.org>)
Re: Built-in CTYPE provider  (Alexander Lakhin <exclusion@gmail.com>)
Re: Built-in CTYPE provider  (Peter Eisentraut <peter@eisentraut.org>)
Re: Built-in CTYPE provider  ("Daniel Verite" <daniel@manitou-mail.org>)
Список pgsql-hackers
On Tue, 2024-03-19 at 13:41 +0100, Peter Eisentraut wrote:
> * v25-0002-Support-C.UTF-8-locale-in-the-new-builtin-collat.patch
>
> Looks ok.

Committed.

> * v25-0003-Inline-basic-UTF-8-functions.patch

Committed.

> * v25-0004-Use-version-for-builtin-collations.patch
>
> Not sure about the version format "1.0", which implies some sort of
> major/minor or component-based system.  I would just use "1".

The v26 patch was not quite complete, so I didn't commit it yet.
Attached v27-0001 and 0002.

0002 is necessary because otherwise lc_collate_is_c() short-circuits
the version check in pg_newlocale_from_collation(). With 0002, the code
is simpler and all paths go through pg_newlocale_from_collation(), and
the version check happens even when lc_collate_is_c().

But perhaps there was a reason the code was the way it was, so
submitting for review in case I missed something.

> 0005 and 0006 don't contain any test cases.  So I guess they are
> really
> only usable via 0007.  Is that understanding correct?

0005 is not a functional change, it's just a refactoring to use a
callback, which is preparation for 0007.

> Are there any test cases that illustrate the word boundary changes in
> patch 0005?  It might be useful to test those against Oracle as well.

The tests include initcap('123abc') which is '123abc' in the PG_C_UTF8
collation vs '123Abc' in PG_UNICODE_FAST.

The reason for the latter behavior is that the Unicode Default Case
Conversion algorithm for toTitlecase() advances to the next Cased
character before mapping to titlecase, and digits are not Cased. ICU
has a configurable adjustment, and defaults in a way that produces
'123abc'.

New rebased series attached.

Regards,
    Jeff Davis


Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bharath Rupireddy
Дата:
Сообщение: Re: Introduce XID age and inactive timeout based replication slot invalidation
Следующее
От: Melih Mutlu
Дата:
Сообщение: Re: Flushing large data immediately in pqcomm