Re: Built-in CTYPE provider

Поиск
Список
Период
Сортировка
От Jeff Davis
Тема Re: Built-in CTYPE provider
Дата
Msg-id 7774b3a64f51b3375060c29871cf2b02b3e85dab.camel@j-davis.com
обсуждение исходный текст
Ответ на Re: Built-in CTYPE provider  (Jeremy Schneider <schneider@ardentperf.com>)
Список pgsql-hackers
On Wed, 2023-12-20 at 16:29 -0800, Jeremy Schneider wrote:
> found some more. here's my running list of everything user-facing I
> see
> in core PG code so far that might involve case:
>
> * upper/lower/initcap
> * regexp_*() and *_REGEXP()
> * ILIKE, operators ~* !~* ~~ !~~ ~~* !~~*
> * citext + replace(), split_part(), strpos() and translate()
> * full text search - everything is case folded
> * unaccent? not clear to me whether CTYPE includes accent folding

No, ctype has nothing to do with accents as far as I can tell. I don't
know if I'm using the right terminology, but I think "case" is a
variant of a character whereas "accent" is a modifier/mark, and the
mark is a separate concept from the character itself.

> * ltree
> * pg_trgm
> * core PG parser, case folding of relation names

Let's separate it into groups.

(1) Callers that use a collation OID or pg_locale_t:

  * collation & hashing
  * upper/lower/initcap
  * regex, LIKE, formatting
  * pg_trgm (which uses regexes)
  * maybe postgres_fdw, but might just be a passthrough
  * catalog cache (always uses DEFAULT_COLLATION_OID)
  * citext (always uses DEFAULT_COLLATION_OID, but probably shouldn't)

(2) A long tail of callers that depend on what LC_CTYPE/LC_COLLATE are
set to, or use ad-hoc ASCII-only semantics:

  * core SQL parser downcase_identifier()
  * callers of pg_strcasecmp() (DDL, etc.)
  * GUC name case folding
  * full text search ("mylocale = 0 /* TODO */")
  * a ton of stuff uses isspace(), isdigit(), etc.
  * various callers of tolower()/toupper()
  * some selfuncs.c stuff
  * ...

Might have missed some places.

The user impact of a new builtin provider would affect (1), but only
for those actually using the provider. So there's no compatibility risk
there, but it's good to understand what it will affect.

We can, on a case-by-case basis, also consider using the new APIs I'm
proposing for instances of (2). There would be some compatibility risk
there for existing callers, and we'd have to consider whether it's
worth it or not. Ideally, new callers would either use the new APIs or
use the pg_ascii_* APIs.

Regards,
    Jeff Davis




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Thomas Munro
Дата:
Сообщение: Re: pg_serial bloat
Следующее
От: Jeff Davis
Дата:
Сообщение: Re: Built-in CTYPE provider