Обсуждение: [PATCH] VACUUM: avoid pre-creation transactions holding back cleanup of newly created relations

Поиск
Список
Период
Сортировка

[PATCH] VACUUM: avoid pre-creation transactions holding back cleanup of newly created relations

От
Mikhail Kharitonov
Дата:
Hi hackers,

Attached is a patch that extends VACUUM’s VISHORIZON_DATA handling by tracking
a relation "birth XID" and using it to compute a relation-specific data horizon.
The goal is to avoid cases where transactions that started before a relation
existed to hold back cleanup of that relation. The patch applies
cleanly to current master.

Right now GetOldestNonRemovableTransactionId() for VISHORIZON_DATA uses
a database-wide horizon. Because of that, a single old transaction can hold
the VACUUM horizon and prevent removal of dead tuples, and may also prevent
VACUUM from truncating the relation, even for a table that was created after
that transaction started although that transaction cannot possibly see
the table.

Reproduction steps (two sessions):

Session 1:

BEGIN;
SELECT txid_current();

Session 2:

CREATE TABLE t (id bigserial, payload text);
INSERT INTO t(payload) SELECT repeat('x', 200) FROM generate_series(1, 200000);
DELETE FROM t WHERE id > 150000;
VACUUM (VERBOSE) t;

Without the patch, VACUUM reports "dead but not yet removable" and oldest xmin
matches the XID from session 1, so the cleanup is held back even though
the relation was created later. With the patch applied, the transaction from
session 1 is ignored for this relation’s VISHORIZON_DATA horizon,
so VACUUM can remove the dead tuples and truncate the relation.

The patch adds pg_class.relminxid, sets it at relation creation, and when
computing VISHORIZON_DATA scans ProcArray but ignores backends whose xid/xmin
are unambiguously older than relminxid, since such snapshots cannot require
keeping tuples of a relation that did not exist yet. A regression test is
included: src/test/recovery/t/121_vacuum_xid_horizons.pl.

I’d be grateful for any comments.
______
Thanks,
Mikhail Kharitonov

Вложения
Hi Mikhail,

Thanks for sending a patch!

I'll start by admitting that I am not sure exactly how to do this
feature correctly. But there are some things about your patch that I
think will definitely not work. I also have some suggestions for how
you can split up the project.

There are two distinct parts of this feature. The first part is
recording the transaction creating the table in pg_class. The second
part is using that xid in vacuum to allow actually pruning/freezing
tuples in the table if it was created after the long-running
transactions holding the horizon back.

I would split the patch into those two parts.

The first patch (having the birth xid of a table in pg_class) could
potentially be used in autovacuum to avoid selecting a table which we
won't be able to clean up anyway (in relation_needs_vacanalyze()).
This is the opposite of what your second patch would do, but it would
save us some useless vacuuming by itself. We don't currently calculate
the horizon in relation_needs_vacanalyze() and I'm not sure if it is
cheap enough to do or if we even can do it there, but doing this could
open more opportunities to avoid useless vacuuming.

Now some comments on the implementation:

You get the current transaction id in InsertPgClassTuple(). This
doesn't look like the right place. It's just a helper function to
assemble the record.

I'm also not convinced you handle the pg_upgrade case correctly. (see
how heap_create_with_catalog() has a special IsBinaryUpgrade path).
What will GetCurrentTransactionId() return in this case?

I presume you did it in InsertPgClassTuple() because you wanted index
relations to also have this xid, but I think you will want to do this
separately in index_create(). Anyway, for the case you are solving,
you don't need index creation xids, but it is probably best to have
those for completeness.

You modify GetOldestNonRemovableTransactionId(). I don't think you
should be this ambitious in this patch. You can change
vacuum_get_cutoffs() to take the newer of the table creation xid and
the value returned from GetOldestNonRemovableTransactionId(). Then you
don't have to figure out if all the cases where
GetOldestNonRemovableTransactionId() is used other than vacuum truly
want the creation xid or not. This would help descope the feature.

I also don't understand why you need to add two fields to
RelationData. RelationData has access to the rd_rel which would have
your creation xid. If it isn't valid, then don't use it in
vacuum_get_cutoffs() (and you should clearly explain in a comment what
the cases are where it may not be valid).

Those are just my initial thoughts. I tried searching the archive for
past efforts to add a table creation xid to pg_class and couldn't find
anything. Hopefully someone who has been around longer will notice
this thread and reply, because I'm very interested to know if it was
tried before, and, if so, why it didn't work out.

- Melanie



Hi,

On 2025-12-09 14:10:24 -0500, Melanie Plageman wrote:
> I'll start by admitting that I am not sure exactly how to do this
> feature correctly.

Isn't the whole idea that it would be safe to allow freezing in this case
incorrect?  Consider the following scenario:

A1: BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ; SELECT;
B1: CREATE TABLE foo AS SELECT random();
B2: VACUUM FREEZE foo;
A2: SELECT * FROM foo;

If you allowed freezing of the rows in B2, A2 will see the rows as visible,
despite them not being supposed to be visible.

Note that the catalog lookup during A2 will actually see the catalog data, as
we always use a recent snapshot for catalog lookups (and a lot of stuff would
otherwise be broken).

Greetings,

Andres Freund



On Tue, Dec 9, 2025 at 2:25 PM Andres Freund <andres@anarazel.de> wrote:
>
> Isn't the whole idea that it would be safe to allow freezing in this case
> incorrect?  Consider the following scenario:
>
> A1: BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ; SELECT;
> B1: CREATE TABLE foo AS SELECT random();
> B2: VACUUM FREEZE foo;
> A2: SELECT * FROM foo;
>
> If you allowed freezing of the rows in B2, A2 will see the rows as visible,
> despite them not being supposed to be visible.

Is the reason this isn't a problem for COPY FREEZE because the
freezing happens in the same transaction block as creating the table
so A2 wouldn't be able to see the catalog entry for the table?

- Melanie



Hi,

On 2025-12-09 14:31:09 -0500, Melanie Plageman wrote:
> On Tue, Dec 9, 2025 at 2:25 PM Andres Freund <andres@anarazel.de> wrote:
> >
> > Isn't the whole idea that it would be safe to allow freezing in this case
> > incorrect?  Consider the following scenario:
> >
> > A1: BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ; SELECT;
> > B1: CREATE TABLE foo AS SELECT random();
> > B2: VACUUM FREEZE foo;
> > A2: SELECT * FROM foo;
> >
> > If you allowed freezing of the rows in B2, A2 will see the rows as visible,
> > despite them not being supposed to be visible.
> 
> Is the reason this isn't a problem for COPY FREEZE because the
> freezing happens in the same transaction block as creating the table
> so A2 wouldn't be able to see the catalog entry for the table?

It is a problem for COPY FREEZE - you shouldn't use it unless you're ok with
that. Our docs say:

"Note that all other sessions will immediately be able to see the data once it
has been successfully loaded. This violates the normal rules of MVCC
visibility and users should be aware of the potential problems this might
cause."

Would probably be good to call that out more aggressively.


I don't think that means it's ok for CREATE TABLE to do the same
implicitly. With COPY FREEZE you explicitly opt-in to this behaviour - and
even there it'd be rather nice if we could fix this behaviour.


It might be worth to invent a mechanism that causes errors on table access
under certain visibility conditions. COPY FREEZE isn't the only concerning
command, e.g. rewriting ALTER TABLEs are also problematic (whereas
CLUSTER/VACUUM FULL is very careful to not trigger issues).

Greetings,

Andres Freund