Re: Pre-proposal: unicode normalized text

От

Jeff Davis

Тема

Дата

11 октября 2023 г. в 01:08:41

Msg-id

2bab90239c5264fa9a87372c16bbf8759c8f9e64.camel@j-davis.com

обсуждение

Ответ на

Re: Pre-proposal: unicode normalized text (Robert Haas)

Список

pgsql-hackers

Дерево обсуждения

Pre-proposal: unicode normalized text Jeff Davis <pgsql@j-davis.com> 12 сентября 2023 г. в 22:47:10

Re: Pre-proposal: unicode normalized text Peter Eisentraut <peter@eisentraut.org> 2 октября 2023 г. в 08:47:48

Re: Pre-proposal: unicode normalized text Jeff Davis <pgsql@j-davis.com> 3 октября 2023 г. в 22:55:32

Re: Pre-proposal: unicode normalized text Robert Haas <robertmhaas@gmail.com> 2 октября 2023 г. в 20:06:09

Re: Pre-proposal: unicode normalized text Jeff Davis <pgsql@j-davis.com> 3 октября 2023 г. в 19:54:46

Re: Pre-proposal: unicode normalized text Peter Eisentraut <peter@eisentraut.org> 6 октября 2023 г. в 07:58:37

Re: Pre-proposal: unicode normalized text Jeff Davis <pgsql@j-davis.com> 6 октября 2023 г. в 17:22:48

Re: Pre-proposal: unicode normalized text Peter Eisentraut <peter@eisentraut.org> 10 октября 2023 г. в 06:47:31

Re: Pre-proposal: unicode normalized text Robert Haas <robertmhaas@gmail.com> 4 октября 2023 г. в 17:16:22

Re: Pre-proposal: unicode normalized text Jeff Davis <pgsql@j-davis.com> 7 октября 2023 г. в 01:18:01

Re: Pre-proposal: unicode normalized text Peter Eisentraut <peter@eisentraut.org> 10 октября 2023 г. в 06:44:50

Re: Pre-proposal: unicode normalized text Robert Haas <robertmhaas@gmail.com> 10 октября 2023 г. в 14:02:30

Re: Pre-proposal: unicode normalized text Jeff Davis <pgsql@j-davis.com> 11 октября 2023 г. в 01:08:41

Re: Pre-proposal: unicode normalized text Peter Eisentraut <peter@eisentraut.org> 11 октября 2023 г. в 06:56:13

Re: Pre-proposal: unicode normalized text Jeff Davis <pgsql@j-davis.com> 11 октября 2023 г. в 07:37:46

Re: Pre-proposal: unicode normalized text Jeff Davis <pgsql@j-davis.com> 17 октября 2023 г. в 03:32:19

Re: Pre-proposal: unicode normalized text Jeff Davis <pgsql@j-davis.com> 27 октября 2023 г. в 21:15:00

Re: Pre-proposal: unicode normalized text Thomas Munro <thomas.munro@gmail.com> 2 ноября 2023 г. в 21:51:12

Re: Pre-proposal: unicode normalized text Jeff Davis <pgsql@j-davis.com> 3 ноября 2023 г. в 07:49:37

Re: Pre-proposal: unicode normalized text David Rowley <dgrowleyml@gmail.com> 3 ноября 2023 г. в 08:01:42

Re: Pre-proposal: unicode normalized text Jeff Davis <pgsql@j-davis.com> 3 ноября 2023 г. в 18:42:55

Re: Pre-proposal: unicode normalized text Thomas Munro <thomas.munro@gmail.com> 3 ноября 2023 г. в 21:56:44

Re: Pre-proposal: unicode normalized text David Rowley <dgrowleyml@gmail.com> 4 ноября 2023 г. в 02:43:34

Re: Pre-proposal: unicode normalized text John Naylor <johncnaylorls@gmail.com> 3 ноября 2023 г. в 10:11:50

Re: Pre-proposal: unicode normalized text Jeff Davis <pgsql@j-davis.com> 3 ноября 2023 г. в 18:43:57

Re: Pre-proposal: unicode normalized text "Daniel Verite" <daniel@manitou-mail.org> 17 октября 2023 г. в 15:07:40

Re: Pre-proposal: unicode normalized text Robert Haas <robertmhaas@gmail.com> 17 октября 2023 г. в 15:12:28

Re: Pre-proposal: unicode normalized text Isaac Morland <isaac.morland@gmail.com> 17 октября 2023 г. в 15:38:07

Re: Pre-proposal: unicode normalized text Robert Haas <robertmhaas@gmail.com> 17 октября 2023 г. в 15:43:18

Re: Pre-proposal: unicode normalized text Jeff Davis <pgsql@j-davis.com> 17 октября 2023 г. в 16:32:18

Re: Pre-proposal: unicode normalized text Nico Williams <nico@cryptonector.com> 2 ноября 2023 г. в 23:17:33

Re: Pre-proposal: unicode normalized text Peter Eisentraut <peter@eisentraut.org> 11 октября 2023 г. в 06:51:27

Re: Pre-proposal: unicode normalized text Jeff Davis <pgsql@j-davis.com> 11 октября 2023 г. в 07:53:39

Re: Pre-proposal: unicode normalized text Jeff Davis <pgsql@j-davis.com> 4 октября 2023 г. в 20:15:03

Re: Pre-proposal: unicode normalized text Nico Williams <nico@cryptonector.com> 2 ноября 2023 г. в 23:23:19

Re: Pre-proposal: unicode normalized text Nico Williams <nico@cryptonector.com> 2 ноября 2023 г. в 22:54:49

Re: Pre-proposal: unicode normalized text Jeff Davis <pgsql@j-davis.com> 1 марта 2024 г. в 01:02:51

Re: Pre-proposal: unicode normalized text Jeff Davis <pgsql@j-davis.com> 14 марта 2024 г. в 18:07:00

Re: Pre-proposal: unicode normalized text Jeff Davis <pgsql@j-davis.com> 12 декабря 2024 г. в 02:55:55

Re: Pre-proposal: unicode normalized text Nico Williams <nico@cryptonector.com> 4 октября 2023 г. в 17:23:41

Re: Pre-proposal: unicode normalized text Robert Haas <robertmhaas@gmail.com> 4 октября 2023 г. в 17:47:40

Re: Pre-proposal: unicode normalized text Chapman Flack <chap@anastigmatix.net> 4 октября 2023 г. в 18:02:50

Re: Pre-proposal: unicode normalized text Jeff Davis <pgsql@j-davis.com> 4 октября 2023 г. в 20:38:15

Re: Pre-proposal: unicode normalized text Chapman Flack <chap@anastigmatix.net> 4 октября 2023 г. в 21:32:50

Re: Pre-proposal: unicode normalized text Phil Krylov <phil@krylov.eu> 3 ноября 2023 г. в 20:15:30

Re: Pre-proposal: unicode normalized text Nico Williams <nico@cryptonector.com> 4 октября 2023 г. в 22:15:47

Re: Pre-proposal: unicode normalized text Nico Williams <nico@cryptonector.com> 4 октября 2023 г. в 21:15:06

Re: Pre-proposal: unicode normalized text Jeff Davis <pgsql@j-davis.com> 4 октября 2023 г. в 23:01:26

Re: Pre-proposal: unicode normalized text Nico Williams <nico@cryptonector.com> 4 октября 2023 г. в 23:43:37

Re: Pre-proposal: unicode normalized text Robert Haas <robertmhaas@gmail.com> 4 октября 2023 г. в 18:05:58

Re: Pre-proposal: unicode normalized text Isaac Morland <isaac.morland@gmail.com> 4 октября 2023 г. в 18:14:45

Re: Pre-proposal: unicode normalized text Jeff Davis <pgsql@j-davis.com> 4 октября 2023 г. в 21:37:40

Re: Pre-proposal: unicode normalized text Isaac Morland <isaac.morland@gmail.com> 5 октября 2023 г. в 01:02:21

Re: Pre-proposal: unicode normalized text Robert Haas <robertmhaas@gmail.com> 5 октября 2023 г. в 11:31:54

Re: Pre-proposal: unicode normalized text Isaac Morland <isaac.morland@gmail.com> 5 октября 2023 г. в 13:10:23

Re: Pre-proposal: unicode normalized text Jeff Davis <pgsql@j-davis.com> 5 октября 2023 г. в 19:16:34

Re: Pre-proposal: unicode normalized text Jeff Davis <pgsql@j-davis.com> 5 октября 2023 г. в 17:30:51

Re: Pre-proposal: unicode normalized text Peter Eisentraut <peter@eisentraut.org> 6 октября 2023 г. в 08:10:59

Re: Pre-proposal: unicode normalized text Nico Williams <nico@cryptonector.com> 5 октября 2023 г. в 19:14:54

Re: Pre-proposal: unicode normalized text Tom Lane <tgl@sss.pgh.pa.us> 5 октября 2023 г. в 19:49:37

Re: Pre-proposal: unicode normalized text Nico Williams <nico@cryptonector.com> 5 октября 2023 г. в 19:52:37

Re: Pre-proposal: unicode normalized text Jeff Davis <pgsql@j-davis.com> 6 октября 2023 г. в 17:42:09

Re: Pre-proposal: unicode normalized text Robert Haas <robertmhaas@gmail.com> 6 октября 2023 г. в 17:33:06

Re: Pre-proposal: unicode normalized text Jeff Davis <pgsql@j-davis.com> 6 октября 2023 г. в 19:07:17

Re: Pre-proposal: unicode normalized text Robert Haas <robertmhaas@gmail.com> 9 октября 2023 г. в 19:08:22

Re: Pre-proposal: unicode normalized text Matthias van de Meent <boekewurm+postgres@gmail.com> 6 октября 2023 г. в 22:30:00

Re: Pre-proposal: unicode normalized text Isaac Morland <isaac.morland@gmail.com> 6 октября 2023 г. в 19:15:16

Re: Pre-proposal: unicode normalized text Nico Williams <nico@cryptonector.com> 6 октября 2023 г. в 17:38:45

Re: Pre-proposal: unicode normalized text Robert Haas <robertmhaas@gmail.com> 6 октября 2023 г. в 18:17:32

Re: Pre-proposal: unicode normalized text Nico Williams <nico@cryptonector.com> 6 октября 2023 г. в 18:25:44

Re: Pre-proposal: unicode normalized text Robert Haas <robertmhaas@gmail.com> 6 октября 2023 г. в 18:37:06

Re: Pre-proposal: unicode normalized text Nico Williams <nico@cryptonector.com> 2 ноября 2023 г. в 22:38:47

Re: Pre-proposal: unicode normalized text Nico Williams <nico@cryptonector.com> 2 октября 2023 г. в 20:27:08

Re: Pre-proposal: unicode normalized text Jeff Davis <pgsql@j-davis.com> 3 октября 2023 г. в 19:15:10

Re: Pre-proposal: unicode normalized text Nico Williams <nico@cryptonector.com> 3 октября 2023 г. в 20:15:17

Re: Pre-proposal: unicode normalized text Jeff Davis <pgsql@j-davis.com> 3 октября 2023 г. в 22:34:44

Re: Pre-proposal: unicode normalized text Nico Williams <nico@cryptonector.com> 3 октября 2023 г. в 23:01:16

On Tue, 2023-10-10 at 10:02 -0400, Robert Haas wrote:
> On Tue, Oct 10, 2023 at 2:44 AM Peter Eisentraut
>  wrote:
> > Can you restate what this is supposed to be for?  This thread
> > appears to
> > have morphed from "let's normalize everything" to "let's check for
> > unassigned code points", but I'm not sure what we are aiming for
> > now.

It was a "pre-proposal", so yes, the goalposts have moved a bit. Right
now I'm aiming to get some primitives in place that will be useful by
themselves, but also that we can potentially build on.

Attached is a new version of the patch which introduces some SQL
functions as well:

  * unicode_is_valid(text): returns true if all codepoints are
assigned, false otherwise
  * unicode_version(): version of unicode Postgres is built with
  * icu_unicode_version(): version of Unicode ICU is built with

I'm not 100% clear on the consequences of differences between the PG
unicode version and the ICU unicode version, but because normalization
uses the Postgres version of Unicode, I believe the Postgres version of
Unicode should also be available to determine whether a code point is
assigned or not.

We may also find it interesting to use the PG Unicode tables for regex
character classification. This is just an idea and we can discuss
whether that makes sense or not, but having the primitives in place
seems like a good idea regardless.

> Jeff can say what he wants it for, but one obvious application would
> be to have the ability to add a CHECK constraint that forbids
> inserting unassigned code points into your database, which would be
> useful if you're worried about forward-compatibility with collation
> definitions that might be extended to cover those code points in the
> future. Another application would be to find data already in your
> database that has this potential problem.

Exactly. Avoiding unassigned code points also allows you to be forward-
compatible with normalization.

Regards,
	Jeff Davis

В списке pgsql-hackers по дате отправления

Предыдущее

От: Andres Freund

Дата: 11 октября 2023 г. в 00:54:34

Сообщение: Re: broken master regress tests

Следующее

От: Noah Misch

Дата: 11 октября 2023 г. в 01:33:17

Сообщение: interval_ops shall stop using btequalimage (deduplication)