Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
От | Peter Geoghegan |
---|---|
Тема | Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5) |
Дата | |
Msg-id | CAM3SWZSzE13i=9pDseTn9XzE21kQ_qHnb7JOkDNUs3akH=jswQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5) (Robert Haas <robertmhaas@gmail.com>) |
Ответы |
Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
|
Список | pgsql-bugs |
On Tue, Mar 22, 2016 at 3:06 PM, Robert Haas <robertmhaas@gmail.com> wrote: > Well, if we implement a compatibility GUC that shuts off our > dependency on strxfrm(), people can go back to having 9.5 be no more > broken than 9.4 was. I vote we do that and go home. I don't have a problem with that idea, but I fear "no more broken than 9.4 was" might be a very low bar for certain systems and collations. Abbreviated key may have simply unmasked the problem in some cases. Consider: [vagrant@localhost ~]$ LC_COLLATE=en_us sort strings.txt <-- correct x xx x xx" xxx xxx" [vagrant@localhost ~]$ LC_COLLATE=de_DE sort strings.txt <-- wrong xxx xxx" x xx x xx" [vagrant@localhost ~]$ ./strxfrm-binary de_DE.UTF-8 'xxx' 'x xx' "xxx" -> 2323230108080801020202 (11 bytes) "x xx" -> 2323230108080801020202010235 (14 bytes) strcmp(arg1, arg2) result: -1 strcoll(arg1, arg2) result: 6 My concern was not merely "academic" (i.e. it was not limited in scope to things that don't make B-Tree indexes corrupt). Pretty sure that we need to start thinking of this as a problem with strcoll() that strxfrm() does not have for more fundamental reasons, because strcoll() says that the first string in the de_DE sorted list is *greater* than the third string. That's wrong, and not just because strxfrm() gives an intuitively correct answer -- it's wrong specifically because the transitive law has been broken. -- Peter Geoghegan
В списке pgsql-bugs по дате отправления: