Обсуждение: pgsql: Unicode case mapping tables and functions.

Поиск
Список
Период
Сортировка

pgsql: Unicode case mapping tables and functions.

От
Jeff Davis
Дата:
Unicode case mapping tables and functions.

Implements Unicode simple case mapping, in which all code points map
to exactly one other code point unconditionally.

These tables are generated from UnicodeData.txt, which is already
being used by other infrastructure in src/common/unicode. The tables
are checked into the source tree, so they only need to be regenerated
when we update the Unicode version.

In preparation for the builtin collation provider, and possibly useful
for other callers.

Discussion: https://postgr.es/m/ff4c2f2f9c8fc7ca27c1c24ae37ecaeaeaff6b53.camel%40j-davis.com
Reviewed-by: Peter Eisentraut, Daniel Verite, Jeremy Schneider

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/5c40364dd6d9c6a260c8965dffe2e066642d6f79

Modified Files
--------------
src/common/Makefile                               |    1 +
src/common/meson.build                            |    1 +
src/common/unicode/Makefile                       |   15 +-
src/common/unicode/case_test.c                    |  100 +
src/common/unicode/generate-unicode_case_table.pl |  134 +
src/common/unicode/meson.build                    |   31 +
src/common/unicode_case.c                         |  174 ++
src/common/wchar.c                                |    4 +-
src/include/common/unicode_case.h                 |   27 +
src/include/common/unicode_case_table.h           | 3001 +++++++++++++++++++++
src/include/mb/pg_wchar.h                         |   15 +
11 files changed, 3498 insertions(+), 5 deletions(-)


Re: pgsql: Unicode case mapping tables and functions.

От
Heikki Linnakangas
Дата:
On 07/03/2024 21:18, Jeff Davis wrote:
> Unicode case mapping tables and functions.

With -Wtype-limits, I'm seeing this warning:

unicode_case.c: In function ‘convert_case’:
unicode_case.c:107:47: warning: comparison of unsigned expression in ‘< 
0’ is always false [-Wtype-limits]
   107 |         while (src[srcoff] != '\0' && (srclen < 0 || srcoff < 
srclen))
       |                                               ^

That seems like legit issue. The comment in unicode_strlower/upper() says:

>  * String src must be encoded in UTF-8. If srclen < 0, src must be
>  * NUL-terminated.

But srclen is of type size_t, which is unsigned.

-- 
Heikki Linnakangas
Neon (https://neon.tech)




Re: pgsql: Unicode case mapping tables and functions.

От
Jeff Davis
Дата:
On Fri, 2024-03-08 at 10:24 +0200, Heikki Linnakangas wrote:
> On 07/03/2024 21:18, Jeff Davis wrote:
> > Unicode case mapping tables and functions.
>
> With -Wtype-limits, I'm seeing this warning:

Thank you, fixed. Somehow I lost that flag from my script.

Can you please add some recommended compiler warning flags here:

https://wiki.postgresql.org/wiki/Committing_checklist

?

Regards,
    Jeff Davis