Обсуждение: B-tree Index corruption

Поиск
Список
Период
Сортировка

B-tree Index corruption

От
Filip Sedlák
Дата:
Hi,
I see a weird behaviour which might be a bug in Postgres but I wasn't 
able to find it online.

We have a simple table with a UNIQUE constraint that doesn't work. I'm 
able to insert one row with a conflicting value (while two instances of 
this value are already in the table). The next insert fails. When using 
the underlying index for lookups, the old values are not retrieved 
(consistent with the allowed insert).

We're on version 15.3 and I see some index misbehaviour fixed in 15.5 
but nothing on btree over text values. I saw some posts about glibc 
upgrade changing collation rules but I don't think it's our case because 
we see it with plain ASCII values. More importantly, we last upgraded 
the major OS version in June but I was able to find a row missing from 
the index that was inserted a few weeks ago.

I don't have a self-contained repro script. I can easily test anything 
on our snapshot. But I don't know where to look further. I'd be happy 
for guidance.

See below the psql session demonstrating the problem.

Best regards
Filip




monitora_snapshot=# SELECT version();
                                                        version 

---------------------------------------------------------------------------------------------------------------------
  PostgreSQL 15.3 (Debian 15.3-1.pgdg120+1) on x86_64-pc-linux-gnu, 
compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit
(1 row)

monitora_snapshot=# \d social_hashtag

                                      Table "public.social_hashtag"
  Column  |           Type           | Collation | Nullable | 
       Default
---------+--------------------------+-----------+----------+--------------------------------------------
  id      | integer                  |           | not null | 
nextval('social_hashtag_id_seq'::regclass)
  hashtag | character varying(50)    |           | not null |
  created | timestamp with time zone |           | not null |
Indexes:
     "social_hashtag_pkey" PRIMARY KEY, btree (id)
     "social_hashtag_hashtag_eac29120_like" btree (hashtag 
varchar_pattern_ops)
     "social_hashtag_hashtag_key" UNIQUE CONSTRAINT, btree (hashtag)

monitora_snapshot=# EXPLAIN SELECT id, hashtag
     FROM social_hashtag
     WHERE hashtag = '________________';
                                                  QUERY PLAN 

------------------------------------------------------------------------------------------------------------
  Index Scan using social_hashtag_hashtag_eac29120_like on 
social_hashtag  (cost=0.43..2.65 rows=1 width=18)
    Index Cond: ((hashtag)::text = '________________'::text)
(2 rows)

monitora_snapshot=#
monitora_snapshot=# -- OK, so the select uses the other index. It shows
monitora_snapshot=# -- two rows with the "unique" hashtag.
monitora_snapshot=#
monitora_snapshot=# SELECT id, hashtag
     FROM social_hashtag
     WHERE hashtag = '________________';
    id    |     hashtag
---------+------------------
  1565500 | ________________
  6329472 | ________________
(2 rows)

monitora_snapshot=# -- Still, we can insert
monitora_snapshot=# INSERT INTO social_hashtag
     VALUES (DEFAULT, '________________', NOW());
INSERT 0 1

monitora_snapshot=# -- And retrieve the data
monitora_snapshot=#
monitora_snapshot=# SELECT id, hashtag
     FROM social_hashtag
     WHERE hashtag = '________________';
    id    |     hashtag
---------+------------------
  1565500 | ________________
  6329472 | ________________
  6338012 | ________________
(3 rows)


monitora_snapshot=#
monitora_snapshot=# -- But we can't insert for the second time.
monitora_snapshot=#
monitora_snapshot=# INSERT INTO social_hashtag VALUES (DEFAULT, 
'________________', NOW());
ERROR:  duplicate key value violates unique constraint 
"social_hashtag_hashtag_key"
DETAIL:  Key (hashtag)=(________________) already exists.

monitora_snapshot=# DROP INDEX social_hashtag_hashtag_eac29120_like ;
DROP INDEX

monitora_snapshot=# EXPLAIN SELECT id, hashtag
     FROM social_hashtag
     WHERE hashtag = '________________';
                                             QUERY PLAN 

--------------------------------------------------------------------------------------------------
  Index Scan using social_hashtag_hashtag_key on social_hashtag 
(cost=0.43..2.65 rows=1 width=18)
    Index Cond: ((hashtag)::text = '________________'::text)
(2 rows)

monitora_snapshot=#
monitora_snapshot=# -- Dropping the other index means the select now
monitora_snapshot=# -- uses the index that's used for the constraint.
monitora_snapshot=# -- It sees only the last row.
monitora_snapshot=#
    id    |     hashtag
---------+------------------
  6337933 | ________________

(1 row)




-- 
Filip Sedlák



Re: B-tree Index corruption

От
Tom Lane
Дата:
=?UTF-8?Q?Filip_Sedl=C3=A1k?= <filip@sedlakovi.org> writes:
> We're on version 15.3 and I see some index misbehaviour fixed in 15.5 
> but nothing on btree over text values. I saw some posts about glibc 
> upgrade changing collation rules but I don't think it's our case because 
> we see it with plain ASCII values.

Sadly, you're wrong about that; see the example at

https://wiki.postgresql.org/wiki/Locale_data_changes#Testing_collation

If you did an OS upgrade across a glibc collation change, it's best to
assume all your indexes on textual columns are corrupt and REINDEX
them.

> More importantly, we last upgraded 
> the major OS version in June but I was able to find a row missing from 
> the index that was inserted a few weeks ago.

It can take awhile before the results of inconsistent sorting order
become obvious, but the state of the index will progressively degrade
as new entries get inserted into the wrong subtree.

            regards, tom lane



Re: B-tree Index corruption

От
Filip Sedlák
Дата:
> It can take awhile before the results of inconsistent sorting order
> become obvious, but the state of the index will progressively degrade
> as new entries get inserted into the wrong subtree.

This part, I definitely didn't expect. I'm glad that reindex should fix it!

Thank you very much!
Filip Sedlák