Re: pg_collation.collversion for C.UTF-8
От | Thomas Munro |
---|---|
Тема | Re: pg_collation.collversion for C.UTF-8 |
Дата | |
Msg-id | CA+hUKGKTAEOvh72BoUKX6iwRJ0p3OGXFp1Az96NZ7fXemt33rw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: pg_collation.collversion for C.UTF-8 (Jeff Davis <pgsql@j-davis.com>) |
Ответы |
Re: pg_collation.collversion for C.UTF-8
Re: pg_collation.collversion for C.UTF-8 |
Список | pgsql-hackers |
On Wed, Apr 19, 2023 at 1:30 PM Jeff Davis <pgsql@j-davis.com> wrote: > On Wed, 2023-04-19 at 07:48 +1200, Thomas Munro wrote: > > Many OSes have a locale with this name. I don't know this history, > > who did it first etc, but now I am wondering if they all took the > > "obvious" interpretation, that it should be code-point based, > > extrapolating from "C" (really memcmp order): > > memcmp() is not the same as code-point order in all encodings, right? Right. I wasn't trying to suggest that *we* should assume that, I was just thinking out loud about how a libc implementor would surely think that a "C.encoding" should work, in the spirit of "C", given that the standard doesn't tell us IIUC. It looks like for technical reasons inside glibc, that couldn't be done before 2.35: https://sourceware.org/bugzilla/show_bug.cgi?id=17318 That strengthens my opinion that C.UTF-8 (the real C.UTF-8 supplied by the glibc project) isn't supposed to be versioned, but it's extremely unfortunate that a bunch of OSes (Debian and maybe more) have been sorting text in some other order under that name for years. > I've been thinking that we should have a "provider=none" for the > special cases that use memcmp(). It's not using libc as a collation > provider; it's really postgres in control of the semantics. Yeah, interesting idea.
В списке pgsql-hackers по дате отправления: