Обсуждение: pgsql: Support C.UTF-8 locale in the new builtin collation provider.

Поиск
Список
Период
Сортировка

pgsql: Support C.UTF-8 locale in the new builtin collation provider.

От
Jeff Davis
Дата:
Support C.UTF-8 locale in the new builtin collation provider.

The builtin C.UTF-8 locale has similar semantics to the libc locale of
the same name. That is, code point sort order (fast, memcmp-based)
combined with Unicode semantics for character operations such as
pattern matching, regular expressions, and
LOWER()/INITCAP()/UPPER(). The character semantics are based on
Unicode simple case mappings.

The builtin provider's C.UTF-8 offers several important advantages
over libc:

 * faster sorting -- benefits from additional optimizations such as
   abbreviated keys and varstrfastcmp_c
 * faster case conversion, e.g. LOWER(), at least compared with some
   libc implementations
 * available on all platforms with identical semantics, and the
   semantics are stable, testable, and documentable within a given
   Postgres major version

Being based on memcmp, the builtin C.UTF-8 locale does not offer
natural language sort order. But it is an improvement for most use
cases that might otherwise use libc's "C.UTF-8" locale, as well as
many use cases that use libc's "C" locale.

Discussion: https://postgr.es/m/ff4c2f2f9c8fc7ca27c1c24ae37ecaeaeaff6b53.camel%40j-davis.com
Reviewed-by: Daniel Vérité, Peter Eisentraut, Jeremy Schneider

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/f69319f2f1fb16eda4b535bcccec90dff3a6795e

Modified Files
--------------
doc/src/sgml/charset.sgml                    |  27 +++++-
doc/src/sgml/ref/create_collation.sgml       |   2 +-
doc/src/sgml/ref/create_database.sgml        |  13 ++-
doc/src/sgml/ref/initdb.sgml                 |   5 +-
src/backend/regex/regc_pg_locale.c           |  36 ++++++-
src/backend/utils/adt/formatting.c           | 112 ++++++++++++++++++++++
src/backend/utils/adt/pg_locale.c            |  52 +++++++---
src/bin/initdb/initdb.c                      |  16 +++-
src/bin/initdb/t/001_initdb.pl               |  17 ++++
src/bin/pg_upgrade/t/002_pg_upgrade.pl       |   2 +-
src/bin/scripts/t/020_createdb.pl            |  18 ++++
src/include/catalog/catversion.h             |   2 +-
src/include/catalog/pg_collation.dat         |   3 +
src/test/regress/expected/collate.utf8.out   | 136 +++++++++++++++++++++++++++
src/test/regress/expected/collate.utf8_1.out |   8 ++
src/test/regress/parallel_schedule           |   4 +-
src/test/regress/sql/collate.utf8.sql        |  67 +++++++++++++
17 files changed, 494 insertions(+), 26 deletions(-)


Re: pgsql: Support C.UTF-8 locale in the new builtin collation provider.

От
Peter Eisentraut
Дата:
On 19.03.24 23:29, Jeff Davis wrote:
> Support C.UTF-8 locale in the new builtin collation provider.

This commit changed the functions builtin_locale_encoding() and 
builtin_validate_locale() in pg_locale.c, but did not update the 
comments that claim that "C" is the only supported locale name.

(I think it would be best to remove the comment altogether and let the 
code speak for itself.)



Re: pgsql: Support C.UTF-8 locale in the new builtin collation provider.

От
Jeff Davis
Дата:
On Thu, 2024-05-02 at 09:34 +0200, Peter Eisentraut wrote:
> This commit changed the functions builtin_locale_encoding() and
> builtin_validate_locale() in pg_locale.c, but did not update the
> comments that claim that "C" is the only supported locale name.
>
> (I think it would be best to remove the comment altogether and let
> the
> code speak for itself.)

Thank you, committed.

Regards,
    Jeff Davis