Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

Поиск

Список

Период

Сортировка

От	Peter Geoghegan
Тема	Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
Дата	23 марта 2016 г. 02:33:52
Msg-id	CAM3SWZSzE13i=9pDseTn9XzE21kQ_qHnb7JOkDNUs3akH=jswQ@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5) (Robert Haas <robertmhaas@gmail.com>)
Ответы	Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
Список	pgsql-bugs

Дерево обсуждения

On Tue, Mar 22, 2016 at 3:06 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> Well, if we implement a compatibility GUC that shuts off our
> dependency on strxfrm(), people can go back to having 9.5 be no more
> broken than 9.4 was.  I vote we do that and go home.

I don't have a problem with that idea, but I fear "no more broken than
9.4 was" might be a very low bar for certain systems and collations.
Abbreviated key may have simply unmasked the problem in some cases.

Consider:

[vagrant@localhost ~]$ LC_COLLATE=en_us sort strings.txt <-- correct
x xx
x xx"
xxx
xxx"
[vagrant@localhost ~]$ LC_COLLATE=de_DE sort strings.txt <-- wrong
xxx
xxx"
x xx
x xx"
[vagrant@localhost ~]$ ./strxfrm-binary de_DE.UTF-8 'xxx' 'x xx'
"xxx" -> 2323230108080801020202 (11 bytes)
"x xx" -> 2323230108080801020202010235 (14 bytes)
strcmp(arg1, arg2) result: -1
strcoll(arg1, arg2) result: 6

My concern was not merely "academic" (i.e. it was not limited in scope
to things that don't make B-Tree indexes corrupt). Pretty sure that we
need to start thinking of this as a problem with strcoll() that
strxfrm() does not have for more fundamental reasons, because
strcoll() says that the first string in the de_DE sorted list is
*greater* than the third string. That's wrong, and not just because
strxfrm() gives an intuitively correct answer -- it's wrong
specifically because the transitive law has been broken.

--
Peter Geoghegan

В списке pgsql-bugs по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)