Обсуждение: pg_visibility's pg_check_visible() yields false positive when working in parallel with autovacuum

Поиск
Список
Период
Сортировка

pg_visibility's pg_check_visible() yields false positive when working in parallel with autovacuum

От
Daniel Shelepanov
Дата:

Hello everyone!

I've been examining a floating bug in pg_visibility. To reproduce it, you have to:

1) apply a patch to pg_visibility's tests (you can find it attached)

2) run test-icx.sh (attached) from the root of the postgresql's source tree

pg_vis_clean.patch adds one additional test to pg_visibility test suite and normally it is expected to pass. However, when (I assume so) it happens to run in parallel with autovacuum, the test is failed. test-icx.sh runs several parallel installchecks on contrib/pg_visibility tests and there is a chance that some of them will catch this error.

This is the set-up for a bug reproduction:

create table vacuum_test as select 42 i;
vacuum (disable_page_skipping) vacuum_test;

This is the actual test result:

diff -U3 /tmp/icx/ic01-1/contrib/pg_visibility/expected/pg_visibility.out /tmp/icx/ic01-1/contrib/pg_visibility/results/pg_visibility.out
--- /tmp/icx/ic01-1/contrib/pg_visibility/expected/pg_visibility.out    2022-02-17 23:46:26.005500630 +0300
+++ /tmp/icx/ic01-1/contrib/pg_visibility/results/pg_visibility.out 2022-02-17 23:47:36.066838872 +0300
@@ -247,7 +247,7 @@
 select count(*) > 0 from pg_check_visible('vacuum_test');
  ?column?
 ----------
- f
+ t
 (1 row)

 -- cleanup


The thing is that the underlying file is one page long which should contain the only all-visible tuple and pg_check_visible should return nothing. Is this a real bug or do I miss something? Thank you in advance for the response

P. S.: SELECT version(); result: PostgreSQL 14.2 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, 64-bit


Вложения

Re: pg_visibility's pg_check_visible() yields false positive when working in parallel with autovacuum

От
Andres Freund
Дата:
Hi,

On 2022-02-18 00:09:48 +0300, Daniel Shelepanov wrote:
> I've been examining a floating bug in pg_visibility. To reproduce it, you
> have to:
> 
> 1) apply a patch to pg_visibility's tests (you can find it attached)
> 
> 2) run test-icx.sh (attached) from the root of the postgresql's source tree

Could you try to minimize the script? A 300 line reproducer is quite long. And
it looks like it won't even work in non postgres-pro tree.


One thing to do would be to modify pg_visibility to elog(PANIC, "something")
when it encounters corruption. Then you would have a chance of inspecting the
state of the tuple/page in that moment.

Greetings,

Andres Freund



Re: pg_visibility's pg_check_visible() yields false positive when working in parallel with autovacuum

От
Andres Freund
Дата:
Hi,

On 2022-02-18 08:56:37 -0800, Andres Freund wrote:
> Could you try to minimize the script? A 300 line reproducer is quite long. And
> it looks like it won't even work in non postgres-pro tree.
>
> One thing to do would be to modify pg_visibility to elog(PANIC, "something")
> when it encounters corruption. Then you would have a chance of inspecting the
> state of the tuple/page in that moment.

Oh, I have been able to reliably reproduce this on HEAD. I modified
record_corrupt_item() to PANIC and then:

psql regression:
  BEGIN ;SELECT txid_current();
  <leave open>

psql postgres
  DROP TABLE IF EXISTS vacuum_test_0;
  create table vacuum_test_0 as select 42 i;
  vacuum (disable_page_skipping) vacuum_test_0;
  select * from pg_check_visible('vacuum_test_0');

At which point there immediately is a crash.


This reproduces in earlier versions too, at least back to 10.


I *think* this is a false positive:

- PGPROC->xmin is computed without regard for the database in which the other
  sessions are running. Due to the the txid_current() session this includes an
  older xid.

- During the VACUUM in vis.sql the only connection to the database pgbench
  connects to is VACUUM and thus ignored when determining horizons (due to
  PROC_IN_VACUUM). Therefore the horizon is computed to latestCompletedXid +
  1.

- But during pg_check_visible(), the current session is *not* marked as
  PROC_IN_VACUUM. So the horizon is the xid from the txid_current().

Boom.

Greetings,

Andres Freund

Вложения