Обсуждение: Defining (and possibly skipping) useless VACUUM operations

Поиск
Список
Период
Сортировка

Defining (and possibly skipping) useless VACUUM operations

От
Peter Geoghegan
Дата:
Robert Haas has written on the subject of useless vacuuming, here:

http://rhaas.blogspot.com/2020/02/useless-vacuuming.html

I'm sure at least a few of us have thought about the problem at some
point. I would like to discuss how we can actually avoid useless
vacuuming, and what our goals should be.

I am currently working on decoupling advancing relfrozenxid from tuple
freezing [1]. That is, I'm teaching VACUUM to keep track of
information that it uses to generate an "optimal value" for the
table's final relfrozenxid: the most recent XID value that might still
be in the table. This patch is based on the observation that we don't
actually have to use the FreezeLimit cutoff for our new
pg_class.relfrozenxid. We need only obey the basic relfrozenxid
invariant, which is that the final value must be <= any extant XID in
the table.  Using FreezeLimit is needlessly conservative.

My draft patch to implement the optimization (which builds on the
patches already posted to [1]) will reliably set pg_class.relfrozenxid
to the same VACUUM's precise original OldestXmin once certain
conditions are met -- reasonably common conditions. For example, the
same precise OldestXmin XID is used for relfrozenxid in the event of a
manual VACUUM (without FREEZE) on a table that was just bulk-loaded,
assuming the system is otherwise idle. Setting relfrozenxid to the
precise lowest safe value happens on a best-effort basis, without
needlessly tying that to things like when or how we freeze tuples.

It now occurs to me to push this patch in another direction, on top of
all that: the OldestXmin behavior hints at a precise, robust way of
defining "useless vacuuming". We can condition skipping a VACUUM (i.e.
whether a VACUUM is considered "definitely won't be useful if allowed
to execute") on whether or not our preexisting pg_class.relfrozenxid
precisely equals our newly-acquired OldestXmin for an about-to-begin
VACUUM operation.  (We'd also want to add an "unchangeable
pg_class.relminmxid" test, I think.)

This definition does seem to be close to ideal: We're virtually
assured that there will be no more useful work for us, in a way that
is grounded in theory but still quite practical. But it's not a slam
dunk. A person could still argue that we shouldn't cancel the VACUUM
before it has begun, even when all these conditions have been met.
This would not be a particularly strong argument, mind you, but it's
still worth taking seriously. We need an exact problem statement that
justifies whatever definition of "useless VACUUM" we settle on.

Here are arguments *against* the skipping behavior I sketched out:

* An aborted transaction might need to be cleaned up, which should be
able to go ahead despite the unchanged OldestXmin. (I think that this
is the argument with the most merit, by quite a bit.)

* In general index AMs may want to do deferred cleanup, say to place
previously deleted pages in the FSM. Although in practice the criteria
for recycling safety used by nbtree and GiST will make that
impossible, there is no fundamental reason why they need to work that
way (XIDs are used, but only because they provide a conveniently
available notion of "logical time" that is sufficient to implement
what Lanin & Shasha call "the drain technique"). Plus GIN really could
do real work in amvacuumcleanup, for the pending list. There are bound
to be a handful of marginal things like this.

* Who are we to intervene like this, anyway? (Makes much more sense if
we don't limit ourselves to autovacuum worker operations.)

Offhand, I suspect that we should only consider skipping "useless"
anti-wraparound autovacuums (not other kinds of autovacuums, not
manual VACUUMs). The arguments against skipping are weakest for the
anti-wraparound case. And the arguments in favor are particularly
strong: we should specifically avoid starting a useless (and possibly
time-consuming) anti-wraparound autovacuum, because that could easily
block an actually-useful autovacuum launched some time later. We
should aim to be in a position to launch an anti-wraparound autovacuum
that can actually advance relfrozenxid as soon as that becomes
possible (e.g. when the DBA drops an old replication slot that was
holding back each VACUUM's OldestXmin). And so "skipping" makes us
much more responsive, which seems like it might matter a lot in
practice. It minimizes the risk of wraparound failure.

There is also a strong argument for logging our failure to clean up
anything in any autovacuum -- we don't do nearly enough alerting when
stuff like this happens (possibly because "useless" is such a squishy
concept right now?). Just logging something still requires defining
"useless VACUUM operation" in a way that is both reliable and
proportionate. So just logging something necessitates solving that
hard problem.

[1] https://commitfest.postgresql.org/36/3433/
-- 
Peter Geoghegan



Re: Defining (and possibly skipping) useless VACUUM operations

От
Robert Haas
Дата:
On Sun, Dec 12, 2021 at 8:47 PM Peter Geoghegan <pg@bowt.ie> wrote:
> I am currently working on decoupling advancing relfrozenxid from tuple
> freezing [1]. That is, I'm teaching VACUUM to keep track of
> information that it uses to generate an "optimal value" for the
> table's final relfrozenxid: the most recent XID value that might still
> be in the table. This patch is based on the observation that we don't
> actually have to use the FreezeLimit cutoff for our new
> pg_class.relfrozenxid. We need only obey the basic relfrozenxid
> invariant, which is that the final value must be <= any extant XID in
> the table.  Using FreezeLimit is needlessly conservative.

Right.

> It now occurs to me to push this patch in another direction, on top of
> all that: the OldestXmin behavior hints at a precise, robust way of
> defining "useless vacuuming". We can condition skipping a VACUUM (i.e.
> whether a VACUUM is considered "definitely won't be useful if allowed
> to execute") on whether or not our preexisting pg_class.relfrozenxid
> precisely equals our newly-acquired OldestXmin for an about-to-begin
> VACUUM operation.  (We'd also want to add an "unchangeable
> pg_class.relminmxid" test, I think.)

I think this is a reasonable line of thinking, but I think it's a
little imprecise. In general, we could be vacuuming a relation to
advance relfrozenxid, but we could also be vacuuming a relation to
advance relminmxid, or we could be vacuuming a relation to fight
bloat, or set pages all-visible. It is possible that there's no hope
of advancing relfrozenxid but that we can still accomplish one of the
other goals. In that case, the vacuuming is not useless.  I think the
place to put logic around this would be in the triggering logic for
autovacuum. If we're going to force a relation to be vacuumed because
of (M)XID wraparound danger, we could first check whether there seems
to be any hope of advancing relfrozenxid(minmxid). If not, we discount
that as a trigger for vacuum, but may still decide to vacuum if some
other trigger warrants it. In most cases, if there's no hope of
advancing relfrozenxid, there won't be any bloat to remove either, but
aborted transactions are a counterexample. And the XID and MXID
horizons can advance at completely different rates.

One reason I haven't pursued this kind of optimization is that it
doesn't really feel like it's fixing the whole problem. It would be a
little bit sad if we did a perfect job preventing useless vacuuming
but still allowed almost-useless vacuuming. Suppose we have a 1TB
relation and we trigger autovacuum. It cleans up a few things but
relfrozenxid is still old. On the next pass, we see that the
system-wide xmin has not advanced, so we don't trigger autovacuum
again. Then on the pass after that we see that the system-wide xmin
has advanced by 1. Shall we trigger an autovacuum of the whole
relation now, to be able to do relfrozenxid++? Seems dubious.

Part of the problem here, for both vacuuming-for-bloat and
vacuuming-for-relfrozenxid-advancement, we would really like to know
the distribution of old XIDs in the table. If we knew that a lot of
the inserts, updates, and deletes that are causing us to vacuum for
bloat containment were in a certain relatively narrow range, then we'd
probably want to not autovacuum for either purpose until the
system-wide xmin has crossed through at least a good chunk of that
range. And it fully crossed over that range then an immediate vacuum
looks extremely appealing: we'll both remove a bunch of dead tuples
and reclaim the associated line pointers, and at the same time we'll
be able to advance relfrozenxid. Nice! But we have no such
information.

So I'm not certain of the way forward here. Just because we can't
prevent almost-useless vacuuming is not a sufficient reason to
continue allowing entirely-useless vacuuming that we can prevent. And
it seems like we need a bunch of new bookkeeping to do any better than
that, which seems like a lot of work. So maybe it's the most practical
path forward for the time being, but it feels like more of a
special-purpose kludge than a truly high-quality solution.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Defining (and possibly skipping) useless VACUUM operations

От
Peter Geoghegan
Дата:
On Tue, Dec 14, 2021 at 6:05 AM Robert Haas <robertmhaas@gmail.com> wrote:
> I think this is a reasonable line of thinking, but I think it's a
> little imprecise. In general, we could be vacuuming a relation to
> advance relfrozenxid, but we could also be vacuuming a relation to
> advance relminmxid, or we could be vacuuming a relation to fight
> bloat, or set pages all-visible. It is possible that there's no hope
> of advancing relfrozenxid but that we can still accomplish one of the
> other goals. In that case, the vacuuming is not useless.  I think the
> place to put logic around this would be in the triggering logic for
> autovacuum. If we're going to force a relation to be vacuumed because
> of (M)XID wraparound danger, we could first check whether there seems
> to be any hope of advancing relfrozenxid(minmxid). If not, we discount
> that as a trigger for vacuum, but may still decide to vacuum if some
> other trigger warrants it. In most cases, if there's no hope of
> advancing relfrozenxid, there won't be any bloat to remove either, but
> aborted transactions are a counterexample. And the XID and MXID
> horizons can advance at completely different rates.

I think that you'd agree that the arguments in favor of skipping are
strongest for an aggressive anti-wraparound autovacuum (as opposed to
any other kind of aggressive VACUUM, including aggressive autovacuum).
Aside from the big benefit I pointed out already (avoiding blocking
useful anti-wraparound vacuums that starts a little later by not
starting a conflicting useless anti-wraparound vacuum now), there is
also more certainty about downsides. We can know the following things
for sure:

* We only launch an (aggressive) anti-wraparound autovacuum because we
need to advance relfrozenxid. In other words, if we didn't need to
advance relfrozenxid then (for better or worse) we definitely wouldn't
be launching anything.

* Our would-be OldestXmin exactly matches the preexisting
pg_class.relfrozenxid (and pg_class.relminmxid). And so it follows
that we're definitely not going to be able to do the thing that is
ostensibly the whole point of anti-wraparound vacuum (advance
relfrozenxid/relminmxid).

> One reason I haven't pursued this kind of optimization is that it
> doesn't really feel like it's fixing the whole problem. It would be a
> little bit sad if we did a perfect job preventing useless vacuuming
> but still allowed almost-useless vacuuming. Suppose we have a 1TB
> relation and we trigger autovacuum. It cleans up a few things but
> relfrozenxid is still old. On the next pass, we see that the
> system-wide xmin has not advanced, so we don't trigger autovacuum
> again. Then on the pass after that we see that the system-wide xmin
> has advanced by 1. Shall we trigger an autovacuum of the whole
> relation now, to be able to do relfrozenxid++? Seems dubious.

I can see what you mean, but just fixing the most extreme case can be
a useful goal. It's often enough to stop the system from going into a
tailspin, which is the real underlying goal here. Things that approach
the most extreme case (but don't quite hit it) don't have that
quality.

An anti-wraparound vacuum is supposed to be a mechanism that the
system escalates to when nothing else triggers an autovacuum worker to
run (which is aggressive but not anti-wraparound). That's not really
true in practice, of course; anti-wraparound av often becomes a
routine thing. But I think that it's a good ideal to strive for -- it
should be rare.

The draft patch series now adds opportunistic freezing -- I should be
able to post a new version in a few days time, once I've tied up some
loose ends. My testing shows an interesting effect, when opportunistic
freezing is applied on top of the relfrozenxid thing: every autovacuum
manages to advance relfrozenxid, and so we'll never have to run an
aggressive autovacuum (much less an aggressive anti-wraparound
autovacuum) in practice. And so (for example) when autovacuum runs
against the pgbench_history table, it always sets its relfrozenxid to
a value very close to the OldestXmin -- usually the exact OldestXmin.

Opportunistic freezing makes us avoid setting the all-visible bit for
a heap page without also setting the all-frozen bit -- when we're
about to do that, we go freeze the heap tuples and then set the entire
page all-frozen (so we freeze anything <= OldestXmin, not <=
FreezeLimit). We also freeze based on this more aggressive <=
OldestXmin cutoff when pruning had to delete some tuples.

The patch still needs more polishing, but I think that we can make
anti-wraparound vacuums truly exceptional with this design -- which
would make autovacuum a lot easier to deal with operationally. This
seems like a feasible goal for Postgres 15, even (though still quite
ambitious). The opportunistic freezing stuff isn't free (the WAL
records aren't tiny), but it's still not all that expensive. Plus I
think that the cost can be further reduced, with a little more work.

> Part of the problem here, for both vacuuming-for-bloat and
> vacuuming-for-relfrozenxid-advancement, we would really like to know
> the distribution of old XIDs in the table.

What I see with the draft patch series is that the oldest XID just
isn't that old anymore, consistently -- we literally never fail to
advance relfrozenxid, in any autovacuum, for any table. And the value
that we end up with is consistently quite recent. This is something
that I see both with BenchmarkSQL, and pgbench. There is a kind of
virtuous circle, which prevents us from ever getting anywhere near
having any table age in the tens of millions of XIDs.

I guess that that makes avoiding useless vacuuming seem like less of a
priority. ISTM that it should be something that is squarely aimed at
keeping things stable in truly pathological cases.

> So I'm not certain of the way forward here. Just because we can't
> prevent almost-useless vacuuming is not a sufficient reason to
> continue allowing entirely-useless vacuuming that we can prevent. And
> it seems like we need a bunch of new bookkeeping to do any better than
> that, which seems like a lot of work. So maybe it's the most practical
> path forward for the time being, but it feels like more of a
> special-purpose kludge than a truly high-quality solution.

I'm sure that either one of us will be able to poke holes in any
definition of "useless" that is continuous (rather than discrete) --
which, on reflection, pretty much means any definition that is
concerned with bloat. I think that you're right about that: the
question there must be "why are we even launching these
bloat-orientated autovacuums that actually find no bloat?".

-- 
Peter Geoghegan



Re: Defining (and possibly skipping) useless VACUUM operations

От
Robert Haas
Дата:
On Tue, Dec 14, 2021 at 1:16 PM Peter Geoghegan <pg@bowt.ie> wrote:
> I think that you'd agree that the arguments in favor of skipping are
> strongest ...

Well I just don't understand why you insist on using the word
"skipping." I think what we're talking about - or at least what we
should be talking about - is whether relation_needs_vacanalyze() sets
*wraparound = true right after the comment that says /* Force vacuum
if table is at risk of wraparound */. And adding some kind of
exception to the logic that's there now.

> What I see with the draft patch series is that the oldest XID just
> isn't that old anymore, consistently -- we literally never fail to
> advance relfrozenxid, in any autovacuum, for any table. And the value
> that we end up with is consistently quite recent. This is something
> that I see both with BenchmarkSQL, and pgbench. There is a kind of
> virtuous circle, which prevents us from ever getting anywhere near
> having any table age in the tens of millions of XIDs.

Yeah, I hadn't thought about it from that perspective, but that does
seem very good. I think it's inevitable that there will be cases where
that doesn't work out - e.g. you can always force the bad case by
holding a table lock until your newborn heads off to college, or just
by overthrottling autovacuum so that it can't get through the database
in any reasonable amount of time - but it will be nice when it does
work out, for sure.

> I guess that that makes avoiding useless vacuuming seem like less of a
> priority. ISTM that it should be something that is squarely aimed at
> keeping things stable in truly pathological cases.

Yes. I think "pathological cases" is a good summary of what's wrong
with autovacuum. When there's nothing too crazy happening, it actually
does pretty well. But, when resources are tight or other corner cases
occur, really dumb things start to happen. So it's reasonable to think
about how we can install guard rails that prevent complete insanity.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Defining (and possibly skipping) useless VACUUM operations

От
Peter Geoghegan
Дата:
On Tue, Dec 14, 2021 at 10:47 AM Robert Haas <robertmhaas@gmail.com> wrote:
> Well I just don't understand why you insist on using the word
> "skipping." I think what we're talking about - or at least what we
> should be talking about - is whether relation_needs_vacanalyze() sets
> *wraparound = true right after the comment that says /* Force vacuum
> if table is at risk of wraparound */. And adding some kind of
> exception to the logic that's there now.

Actually, I agree. Skipping is the wrong term, especially because the
phrase "VACUUM skips..." is already too overloaded. Not necessarily in
vacuumlazy.c itself, but certainly on the mailing list.

> Yeah, I hadn't thought about it from that perspective, but that does
> seem very good. I think it's inevitable that there will be cases where
> that doesn't work out - e.g. you can always force the bad case by
> holding a table lock until your newborn heads off to college, or just
> by overthrottling autovacuum so that it can't get through the database
> in any reasonable amount of time - but it will be nice when it does
> work out, for sure.

Right. But when the patch doesn't manage to totally prevent
anti-wraparound VACUUMs, things still work out a lot better than they
would now. I would expect that in practice this will usually only
happen when non-aggressive autovacuums keep getting canceled. And
sure, it's still not ideal that things have come to that. But because
we now do freezing earlier (when it's relatively inexpensive), and
because we set all-frozen bits incrementally, the anti-wraparound
autovacuum will at least be able to reuse any freezing that we manage
to do in all those canceled autovacuums.

I think that this tends to make anti-wraparound VACUUMs mostly about
not being cancelable -- not so much about reliably advancing
relfrozenxid. I mean it doesn't change the basic rules (there is no
change to the definition of aggressive VACUUM), but in practice I
think that it'll just work that way.  Which makes a great deal of
sense. I hope to be able to totally get rid of
vacuum_freeze_table_age.

The freeze map work in PostgreSQL 9.6 was really great, and very
effective. But I think that it had an undesirable interaction with
vacuum_freeze_min_age: if we set a heap page as all-visible (but not
all frozen) before some of its tuples reached that age (which is very
likely), then tuples <  vacuum_freeze_min_age aren't going to get
frozen until whenever we do an aggressive autovacuum. Very often, this
will only happen when we next do an anti-wraparound VACUUM (at least
before Postgres 13). I suspect we risk running into a "debt cliff" in
the eventual anti-wraparound autovacuum. And so while
vacuum_freeze_min_age kinda made sense prior to 9.6, it now seems to
make a lot less sense.

> > I guess that that makes avoiding useless vacuuming seem like less of a
> > priority. ISTM that it should be something that is squarely aimed at
> > keeping things stable in truly pathological cases.
>
> Yes. I think "pathological cases" is a good summary of what's wrong
> with autovacuum.

This is 100% my focus, in general. The main goal of the patch I'm
working on isn't so much improving performance as making it more
predictable over time. Focussing on freezing while costs are low has a
natural tendency to spread the costs out over time. The system should
never "get in over its head" with debt that vacuum is expected to
eventually deal with.

> When there's nothing too crazy happening, it actually
> does pretty well. But, when resources are tight or other corner cases
> occur, really dumb things start to happen. So it's reasonable to think
> about how we can install guard rails that prevent complete insanity.

Another thing that I really want to stamp out is anything involving a
tiny, seemingly-insignificant adverse event that has the potential to
cause disproportionate impact over time. For example, right now a
non-aggressive VACUUM will never be able to advance relfrozenxid when
it cannot get a cleanup lock on one heap page. It's actually extremely
unlikely that that should have much of any impact, at least when you
determine the new relfrozenxid for the table intelligently. Not
acquiring one cleanup lock on one heap page on a huge table should not
have such an extreme impact.

It's even worse when the systemic impact over time is considered.
Let's say you only have a 20% chance of failing to acquire one or more
cleanup locks during a non-aggressive autovacuum for a given large
table, meaning that you'll fail to advance relfrozenxid in at least
20% of all non-aggressive autovacuums. I think that that might be a
lot worse than it sounds, because the impact compounds over time --
I'm not sure that 20% is much worse than 60%, or much better than 5%
(very hard to model it). But if we make the high-level, abstract idea
of "aggressiveness" more of a continuous thing, and not something
that's defined by sharp (and largely meaningless) XID-based cutoffs,
we have every chance of nipping these problems in the bud (without
needing to model much of anything).

-- 
Peter Geoghegan