Обсуждение: [PATCH] VACUUM: avoid pre-creation transactions holding back cleanup of newly created relations
[PATCH] VACUUM: avoid pre-creation transactions holding back cleanup of newly created relations
От
Mikhail Kharitonov
Дата:
Hi hackers,
Attached is a patch that extends VACUUM’s VISHORIZON_DATA handling by tracking
a relation "birth XID" and using it to compute a relation-specific data horizon.
The goal is to avoid cases where transactions that started before a relation
existed to hold back cleanup of that relation. The patch applies
cleanly to current master.
Right now GetOldestNonRemovableTransactionId() for VISHORIZON_DATA uses
a database-wide horizon. Because of that, a single old transaction can hold
the VACUUM horizon and prevent removal of dead tuples, and may also prevent
VACUUM from truncating the relation, even for a table that was created after
that transaction started although that transaction cannot possibly see
the table.
Reproduction steps (two sessions):
Session 1:
BEGIN;
SELECT txid_current();
Session 2:
CREATE TABLE t (id bigserial, payload text);
INSERT INTO t(payload) SELECT repeat('x', 200) FROM generate_series(1, 200000);
DELETE FROM t WHERE id > 150000;
VACUUM (VERBOSE) t;
Without the patch, VACUUM reports "dead but not yet removable" and oldest xmin
matches the XID from session 1, so the cleanup is held back even though
the relation was created later. With the patch applied, the transaction from
session 1 is ignored for this relation’s VISHORIZON_DATA horizon,
so VACUUM can remove the dead tuples and truncate the relation.
The patch adds pg_class.relminxid, sets it at relation creation, and when
computing VISHORIZON_DATA scans ProcArray but ignores backends whose xid/xmin
are unambiguously older than relminxid, since such snapshots cannot require
keeping tuples of a relation that did not exist yet. A regression test is
included: src/test/recovery/t/121_vacuum_xid_horizons.pl.
I’d be grateful for any comments.
______
Thanks,
Mikhail Kharitonov
Вложения
Re: [PATCH] VACUUM: avoid pre-creation transactions holding back cleanup of newly created relations
От
Melanie Plageman
Дата:
Hi Mikhail, Thanks for sending a patch! I'll start by admitting that I am not sure exactly how to do this feature correctly. But there are some things about your patch that I think will definitely not work. I also have some suggestions for how you can split up the project. There are two distinct parts of this feature. The first part is recording the transaction creating the table in pg_class. The second part is using that xid in vacuum to allow actually pruning/freezing tuples in the table if it was created after the long-running transactions holding the horizon back. I would split the patch into those two parts. The first patch (having the birth xid of a table in pg_class) could potentially be used in autovacuum to avoid selecting a table which we won't be able to clean up anyway (in relation_needs_vacanalyze()). This is the opposite of what your second patch would do, but it would save us some useless vacuuming by itself. We don't currently calculate the horizon in relation_needs_vacanalyze() and I'm not sure if it is cheap enough to do or if we even can do it there, but doing this could open more opportunities to avoid useless vacuuming. Now some comments on the implementation: You get the current transaction id in InsertPgClassTuple(). This doesn't look like the right place. It's just a helper function to assemble the record. I'm also not convinced you handle the pg_upgrade case correctly. (see how heap_create_with_catalog() has a special IsBinaryUpgrade path). What will GetCurrentTransactionId() return in this case? I presume you did it in InsertPgClassTuple() because you wanted index relations to also have this xid, but I think you will want to do this separately in index_create(). Anyway, for the case you are solving, you don't need index creation xids, but it is probably best to have those for completeness. You modify GetOldestNonRemovableTransactionId(). I don't think you should be this ambitious in this patch. You can change vacuum_get_cutoffs() to take the newer of the table creation xid and the value returned from GetOldestNonRemovableTransactionId(). Then you don't have to figure out if all the cases where GetOldestNonRemovableTransactionId() is used other than vacuum truly want the creation xid or not. This would help descope the feature. I also don't understand why you need to add two fields to RelationData. RelationData has access to the rd_rel which would have your creation xid. If it isn't valid, then don't use it in vacuum_get_cutoffs() (and you should clearly explain in a comment what the cases are where it may not be valid). Those are just my initial thoughts. I tried searching the archive for past efforts to add a table creation xid to pg_class and couldn't find anything. Hopefully someone who has been around longer will notice this thread and reply, because I'm very interested to know if it was tried before, and, if so, why it didn't work out. - Melanie
Hi, On 2025-12-09 14:10:24 -0500, Melanie Plageman wrote: > I'll start by admitting that I am not sure exactly how to do this > feature correctly. Isn't the whole idea that it would be safe to allow freezing in this case incorrect? Consider the following scenario: A1: BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ; SELECT; B1: CREATE TABLE foo AS SELECT random(); B2: VACUUM FREEZE foo; A2: SELECT * FROM foo; If you allowed freezing of the rows in B2, A2 will see the rows as visible, despite them not being supposed to be visible. Note that the catalog lookup during A2 will actually see the catalog data, as we always use a recent snapshot for catalog lookups (and a lot of stuff would otherwise be broken). Greetings, Andres Freund
Re: [PATCH] VACUUM: avoid pre-creation transactions holding back cleanup of newly created relations
От
Melanie Plageman
Дата:
On Tue, Dec 9, 2025 at 2:25 PM Andres Freund <andres@anarazel.de> wrote: > > Isn't the whole idea that it would be safe to allow freezing in this case > incorrect? Consider the following scenario: > > A1: BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ; SELECT; > B1: CREATE TABLE foo AS SELECT random(); > B2: VACUUM FREEZE foo; > A2: SELECT * FROM foo; > > If you allowed freezing of the rows in B2, A2 will see the rows as visible, > despite them not being supposed to be visible. Is the reason this isn't a problem for COPY FREEZE because the freezing happens in the same transaction block as creating the table so A2 wouldn't be able to see the catalog entry for the table? - Melanie
Hi, On 2025-12-09 14:31:09 -0500, Melanie Plageman wrote: > On Tue, Dec 9, 2025 at 2:25 PM Andres Freund <andres@anarazel.de> wrote: > > > > Isn't the whole idea that it would be safe to allow freezing in this case > > incorrect? Consider the following scenario: > > > > A1: BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ; SELECT; > > B1: CREATE TABLE foo AS SELECT random(); > > B2: VACUUM FREEZE foo; > > A2: SELECT * FROM foo; > > > > If you allowed freezing of the rows in B2, A2 will see the rows as visible, > > despite them not being supposed to be visible. > > Is the reason this isn't a problem for COPY FREEZE because the > freezing happens in the same transaction block as creating the table > so A2 wouldn't be able to see the catalog entry for the table? It is a problem for COPY FREEZE - you shouldn't use it unless you're ok with that. Our docs say: "Note that all other sessions will immediately be able to see the data once it has been successfully loaded. This violates the normal rules of MVCC visibility and users should be aware of the potential problems this might cause." Would probably be good to call that out more aggressively. I don't think that means it's ok for CREATE TABLE to do the same implicitly. With COPY FREEZE you explicitly opt-in to this behaviour - and even there it'd be rather nice if we could fix this behaviour. It might be worth to invent a mechanism that causes errors on table access under certain visibility conditions. COPY FREEZE isn't the only concerning command, e.g. rewriting ALTER TABLEs are also problematic (whereas CLUSTER/VACUUM FULL is very careful to not trigger issues). Greetings, Andres Freund