Обсуждение: BUG #15952: UNIQUE CONSTRAINT does not fulfill its' purpose

Поиск
Список
Период
Сортировка

BUG #15952: UNIQUE CONSTRAINT does not fulfill its' purpose

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      15952
Logged by:          Neszt Tibor
Email address:      tibor@neszt.hu
PostgreSQL version: 9.6.14
Operating system:   Debian 8.11
Description:

Schema:

=> \d cimek
                                    Table "public.cimek"
     Column     |         Type          |                      Modifiers
----------------+-----------------------+-----------------------------------------------------
 kod            | bigint                | not null default
nextval('cimek_kod_seq'::regclass)
 irsz           | character varying(16) | not null default ''::character
varying
 telepules      | character varying(64) | not null default ''::character
varying
 telepulesresz  | character varying(64) | not null default ''::character
varying
 kozternev      | character varying(64) | not null default ''::character
varying
 kozterjelleg   | character varying(32) | not null default ''::character
varying
 hazszam        | character varying(64) | not null default ''::character
varying
 hazszambetujel | character varying(64) | not null default ''::character
varying
 hazszamvege    | character varying(64) | not null default ''::character
varying
 lepcsohaz      | character varying(32) | not null default ''::character
varying
 em             | character varying(32) | not null default ''::character
varying
 ajto           | character varying(64) | not null default ''::character
varying
 ajtobetujel    | character varying(64) | not null default ''::character
varying
 longitude      | character varying(32) | not null default ''::character
varying
 latitude       | character varying(32) | not null default ''::character
varying
Indexes:
    "cimek_pkey" PRIMARY KEY, btree (kod)
    "cimek_irsz_telepules_telepulesresz_kozternev_kozterjelleg_h_key" UNIQUE
CONSTRAINT, btree (irsz, telepules, telepulesresz, kozternev, kozterjelleg,
hazszam, hazszambetujel, hazszamvege, lepcsohaz, em, ajto, ajtobetujel)

Actual behavior:

=> select count(*), min(kod), max(kod), irsz, telepules, telepulesresz,
kozternev, kozterjelleg, hazszam, hazszambetujel, hazszamvege, lepcsohaz,
em, ajto, ajtobetujel from cimek group by irsz, telepules, telepulesresz,
kozternev, kozterjelleg, hazszam, hazszambetujel, hazszamvege, lepcsohaz,
em, ajto, ajtobetujel having count(*) > 1;
 count |  min  |  max  | irsz | telepules | telepulesresz | kozternev |
kozterjelleg | hazszam | hazszambetujel | hazszamvege | lepcsohaz | em |
ajto | ajtobetujel

-------+-------+-------+------+-----------+---------------+-----------+--------------+---------+----------------+-------------+-----------+----+------+-------------
     2 | 51611 | 55269 | 2092 | Budakeszi |               |  Foo      | utca
        | 46      |                |             |           |    |      |
(1 row)

The main problem is that there are two records with the same data set.

The real street name was altered with 'Foo' in this bugreport.

I tried to reproduce the issue locally without success. I encountered this
bug in multiple databases with different table and data. I could upgrade the
database to version 11 with "pg_upgradecluster 11 main -m upgrade -k", and
the issue still remain. To make sure it's not a human error, I tried to dump
the data and restore it. I got duplicate key error as expected.


Re: BUG #15952: UNIQUE CONSTRAINT does not fulfill its' purpose

От
Thomas Munro
Дата:
On Tue, Aug 13, 2019 at 8:23 AM PG Bug reporting form
<noreply@postgresql.org> wrote:
>     "cimek_irsz_telepules_telepulesresz_kozternev_kozterjelleg_h_key" UNIQUE
> CONSTRAINT, btree (irsz, telepules, telepulesresz, kozternev, kozterjelleg,
> hazszam, hazszambetujel, hazszamvege, lepcsohaz, em, ajto, ajtobetujel)

> I tried to reproduce the issue locally without success. I encountered this
> bug in multiple databases with different table and data. I could upgrade the
> database to version 11 with "pg_upgradecluster 11 main -m upgrade -k", and
> the issue still remain. To make sure it's not a human error, I tried to dump
> the data and restore it. I got duplicate key error as expected.

How old is the duplicate data?  I guess you are using the Hungarian
locale as the default collation for your database (see "Collate" in
the output of \l, or check for explicit collations on the relevant
columns).  I think that is one of the collations that has moved around
a bit in recent years in glibc.  For example:

https://sourceware.org/bugzilla/show_bug.cgi?id=13547

-- 
Thomas Munro
https://enterprisedb.com



Re: BUG #15952: UNIQUE CONSTRAINT does not fulfill its' purpose

От
Thomas Munro
Дата:
On Tue, Aug 13, 2019 at 10:25 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> On Tue, Aug 13, 2019 at 8:23 AM PG Bug reporting form
> <noreply@postgresql.org> wrote:
> >     "cimek_irsz_telepules_telepulesresz_kozternev_kozterjelleg_h_key" UNIQUE
> > CONSTRAINT, btree (irsz, telepules, telepulesresz, kozternev, kozterjelleg,
> > hazszam, hazszambetujel, hazszamvege, lepcsohaz, em, ajto, ajtobetujel)

> How old is the duplicate data?  I guess you are using the Hungarian
> locale as the default collation for your database (see "Collate" in
> the output of \l, or check for explicit collations on the relevant
> columns).  I think that is one of the collations that has moved around
> a bit in recent years in glibc.  For example:
>
> https://sourceware.org/bugzilla/show_bug.cgi?id=13547

... and to give a little context about why that is relevant, please
see these links:

https://lists.debian.org/debian-glibc/2019/03/msg00030.html
https://postgresql.verite.pro/blog/2018/08/27/glibc-upgrade.html

They're discussing a big change that affects almost everyone upgrading
their glibc (and similar things happen with non-GNU OSes' libc too),
but in the past there have been upgrades that affected specific
individual locales.  German and Hungarian are known examples from
recent memory, like the commits referenced in that Bugzilla ticket;
you could probably figure out which glibc versions those landed in and
see if that matches your Ubuntu upgrade history.  For example, you can
see that they changed their minds about the sort order of "ssz" vs
"szsz" (you can probably see that by piping a file containing "kasza"
and "kassza" through the sort command on an old enough and new enough
Ubuntu release with LANG=hu_HU.UTF-8), and the hypothesis is that some
rule change like that caused us to take a wrong turn while descending
a btree and then conclude that your duplicated street name wasn't
already in the index, when in fact it was.  Oops.

As Christoph mentioned on the Debian glibc mailing list, we'd like to
be able to be able to handle this sort of thing better; we're entirely
dependent on the OS vendor maintaining a stable sort order so that we
can keep our indexes in the right order, but POSIX offers us no way to
know when they've changed (a problem I'm contemplating raising with
the Austin Group/POSIX maintainers).  One option is to use ICU
collations, because we have some limited ability to track when
definitions changed, potentially invalidating an index, but we haven't
finished adding ICU support yet: for now you can't use an ICU
collation as a database default.  You can still use it explicitly
though.  It's an unfortunate problem; we looked into whether we could
query the glibc version, but it turned out that some of the
distributions back-patch the locale changes without changing the glibc
version.  About the only easy way you can detect this problem is to
run the 'amcheck' index checker, or just assume the worst and REINDEX,
after an OS upgrade.

-- 
Thomas Munro
https://enterprisedb.com