Обсуждение: Defining (and possibly skipping) useless VACUUM operations
Robert Haas has written on the subject of useless vacuuming, here: http://rhaas.blogspot.com/2020/02/useless-vacuuming.html I'm sure at least a few of us have thought about the problem at some point. I would like to discuss how we can actually avoid useless vacuuming, and what our goals should be. I am currently working on decoupling advancing relfrozenxid from tuple freezing [1]. That is, I'm teaching VACUUM to keep track of information that it uses to generate an "optimal value" for the table's final relfrozenxid: the most recent XID value that might still be in the table. This patch is based on the observation that we don't actually have to use the FreezeLimit cutoff for our new pg_class.relfrozenxid. We need only obey the basic relfrozenxid invariant, which is that the final value must be <= any extant XID in the table. Using FreezeLimit is needlessly conservative. My draft patch to implement the optimization (which builds on the patches already posted to [1]) will reliably set pg_class.relfrozenxid to the same VACUUM's precise original OldestXmin once certain conditions are met -- reasonably common conditions. For example, the same precise OldestXmin XID is used for relfrozenxid in the event of a manual VACUUM (without FREEZE) on a table that was just bulk-loaded, assuming the system is otherwise idle. Setting relfrozenxid to the precise lowest safe value happens on a best-effort basis, without needlessly tying that to things like when or how we freeze tuples. It now occurs to me to push this patch in another direction, on top of all that: the OldestXmin behavior hints at a precise, robust way of defining "useless vacuuming". We can condition skipping a VACUUM (i.e. whether a VACUUM is considered "definitely won't be useful if allowed to execute") on whether or not our preexisting pg_class.relfrozenxid precisely equals our newly-acquired OldestXmin for an about-to-begin VACUUM operation. (We'd also want to add an "unchangeable pg_class.relminmxid" test, I think.) This definition does seem to be close to ideal: We're virtually assured that there will be no more useful work for us, in a way that is grounded in theory but still quite practical. But it's not a slam dunk. A person could still argue that we shouldn't cancel the VACUUM before it has begun, even when all these conditions have been met. This would not be a particularly strong argument, mind you, but it's still worth taking seriously. We need an exact problem statement that justifies whatever definition of "useless VACUUM" we settle on. Here are arguments *against* the skipping behavior I sketched out: * An aborted transaction might need to be cleaned up, which should be able to go ahead despite the unchanged OldestXmin. (I think that this is the argument with the most merit, by quite a bit.) * In general index AMs may want to do deferred cleanup, say to place previously deleted pages in the FSM. Although in practice the criteria for recycling safety used by nbtree and GiST will make that impossible, there is no fundamental reason why they need to work that way (XIDs are used, but only because they provide a conveniently available notion of "logical time" that is sufficient to implement what Lanin & Shasha call "the drain technique"). Plus GIN really could do real work in amvacuumcleanup, for the pending list. There are bound to be a handful of marginal things like this. * Who are we to intervene like this, anyway? (Makes much more sense if we don't limit ourselves to autovacuum worker operations.) Offhand, I suspect that we should only consider skipping "useless" anti-wraparound autovacuums (not other kinds of autovacuums, not manual VACUUMs). The arguments against skipping are weakest for the anti-wraparound case. And the arguments in favor are particularly strong: we should specifically avoid starting a useless (and possibly time-consuming) anti-wraparound autovacuum, because that could easily block an actually-useful autovacuum launched some time later. We should aim to be in a position to launch an anti-wraparound autovacuum that can actually advance relfrozenxid as soon as that becomes possible (e.g. when the DBA drops an old replication slot that was holding back each VACUUM's OldestXmin). And so "skipping" makes us much more responsive, which seems like it might matter a lot in practice. It minimizes the risk of wraparound failure. There is also a strong argument for logging our failure to clean up anything in any autovacuum -- we don't do nearly enough alerting when stuff like this happens (possibly because "useless" is such a squishy concept right now?). Just logging something still requires defining "useless VACUUM operation" in a way that is both reliable and proportionate. So just logging something necessitates solving that hard problem. [1] https://commitfest.postgresql.org/36/3433/ -- Peter Geoghegan
On Sun, Dec 12, 2021 at 8:47 PM Peter Geoghegan <pg@bowt.ie> wrote: > I am currently working on decoupling advancing relfrozenxid from tuple > freezing [1]. That is, I'm teaching VACUUM to keep track of > information that it uses to generate an "optimal value" for the > table's final relfrozenxid: the most recent XID value that might still > be in the table. This patch is based on the observation that we don't > actually have to use the FreezeLimit cutoff for our new > pg_class.relfrozenxid. We need only obey the basic relfrozenxid > invariant, which is that the final value must be <= any extant XID in > the table. Using FreezeLimit is needlessly conservative. Right. > It now occurs to me to push this patch in another direction, on top of > all that: the OldestXmin behavior hints at a precise, robust way of > defining "useless vacuuming". We can condition skipping a VACUUM (i.e. > whether a VACUUM is considered "definitely won't be useful if allowed > to execute") on whether or not our preexisting pg_class.relfrozenxid > precisely equals our newly-acquired OldestXmin for an about-to-begin > VACUUM operation. (We'd also want to add an "unchangeable > pg_class.relminmxid" test, I think.) I think this is a reasonable line of thinking, but I think it's a little imprecise. In general, we could be vacuuming a relation to advance relfrozenxid, but we could also be vacuuming a relation to advance relminmxid, or we could be vacuuming a relation to fight bloat, or set pages all-visible. It is possible that there's no hope of advancing relfrozenxid but that we can still accomplish one of the other goals. In that case, the vacuuming is not useless. I think the place to put logic around this would be in the triggering logic for autovacuum. If we're going to force a relation to be vacuumed because of (M)XID wraparound danger, we could first check whether there seems to be any hope of advancing relfrozenxid(minmxid). If not, we discount that as a trigger for vacuum, but may still decide to vacuum if some other trigger warrants it. In most cases, if there's no hope of advancing relfrozenxid, there won't be any bloat to remove either, but aborted transactions are a counterexample. And the XID and MXID horizons can advance at completely different rates. One reason I haven't pursued this kind of optimization is that it doesn't really feel like it's fixing the whole problem. It would be a little bit sad if we did a perfect job preventing useless vacuuming but still allowed almost-useless vacuuming. Suppose we have a 1TB relation and we trigger autovacuum. It cleans up a few things but relfrozenxid is still old. On the next pass, we see that the system-wide xmin has not advanced, so we don't trigger autovacuum again. Then on the pass after that we see that the system-wide xmin has advanced by 1. Shall we trigger an autovacuum of the whole relation now, to be able to do relfrozenxid++? Seems dubious. Part of the problem here, for both vacuuming-for-bloat and vacuuming-for-relfrozenxid-advancement, we would really like to know the distribution of old XIDs in the table. If we knew that a lot of the inserts, updates, and deletes that are causing us to vacuum for bloat containment were in a certain relatively narrow range, then we'd probably want to not autovacuum for either purpose until the system-wide xmin has crossed through at least a good chunk of that range. And it fully crossed over that range then an immediate vacuum looks extremely appealing: we'll both remove a bunch of dead tuples and reclaim the associated line pointers, and at the same time we'll be able to advance relfrozenxid. Nice! But we have no such information. So I'm not certain of the way forward here. Just because we can't prevent almost-useless vacuuming is not a sufficient reason to continue allowing entirely-useless vacuuming that we can prevent. And it seems like we need a bunch of new bookkeeping to do any better than that, which seems like a lot of work. So maybe it's the most practical path forward for the time being, but it feels like more of a special-purpose kludge than a truly high-quality solution. -- Robert Haas EDB: http://www.enterprisedb.com
On Tue, Dec 14, 2021 at 6:05 AM Robert Haas <robertmhaas@gmail.com> wrote: > I think this is a reasonable line of thinking, but I think it's a > little imprecise. In general, we could be vacuuming a relation to > advance relfrozenxid, but we could also be vacuuming a relation to > advance relminmxid, or we could be vacuuming a relation to fight > bloat, or set pages all-visible. It is possible that there's no hope > of advancing relfrozenxid but that we can still accomplish one of the > other goals. In that case, the vacuuming is not useless. I think the > place to put logic around this would be in the triggering logic for > autovacuum. If we're going to force a relation to be vacuumed because > of (M)XID wraparound danger, we could first check whether there seems > to be any hope of advancing relfrozenxid(minmxid). If not, we discount > that as a trigger for vacuum, but may still decide to vacuum if some > other trigger warrants it. In most cases, if there's no hope of > advancing relfrozenxid, there won't be any bloat to remove either, but > aborted transactions are a counterexample. And the XID and MXID > horizons can advance at completely different rates. I think that you'd agree that the arguments in favor of skipping are strongest for an aggressive anti-wraparound autovacuum (as opposed to any other kind of aggressive VACUUM, including aggressive autovacuum). Aside from the big benefit I pointed out already (avoiding blocking useful anti-wraparound vacuums that starts a little later by not starting a conflicting useless anti-wraparound vacuum now), there is also more certainty about downsides. We can know the following things for sure: * We only launch an (aggressive) anti-wraparound autovacuum because we need to advance relfrozenxid. In other words, if we didn't need to advance relfrozenxid then (for better or worse) we definitely wouldn't be launching anything. * Our would-be OldestXmin exactly matches the preexisting pg_class.relfrozenxid (and pg_class.relminmxid). And so it follows that we're definitely not going to be able to do the thing that is ostensibly the whole point of anti-wraparound vacuum (advance relfrozenxid/relminmxid). > One reason I haven't pursued this kind of optimization is that it > doesn't really feel like it's fixing the whole problem. It would be a > little bit sad if we did a perfect job preventing useless vacuuming > but still allowed almost-useless vacuuming. Suppose we have a 1TB > relation and we trigger autovacuum. It cleans up a few things but > relfrozenxid is still old. On the next pass, we see that the > system-wide xmin has not advanced, so we don't trigger autovacuum > again. Then on the pass after that we see that the system-wide xmin > has advanced by 1. Shall we trigger an autovacuum of the whole > relation now, to be able to do relfrozenxid++? Seems dubious. I can see what you mean, but just fixing the most extreme case can be a useful goal. It's often enough to stop the system from going into a tailspin, which is the real underlying goal here. Things that approach the most extreme case (but don't quite hit it) don't have that quality. An anti-wraparound vacuum is supposed to be a mechanism that the system escalates to when nothing else triggers an autovacuum worker to run (which is aggressive but not anti-wraparound). That's not really true in practice, of course; anti-wraparound av often becomes a routine thing. But I think that it's a good ideal to strive for -- it should be rare. The draft patch series now adds opportunistic freezing -- I should be able to post a new version in a few days time, once I've tied up some loose ends. My testing shows an interesting effect, when opportunistic freezing is applied on top of the relfrozenxid thing: every autovacuum manages to advance relfrozenxid, and so we'll never have to run an aggressive autovacuum (much less an aggressive anti-wraparound autovacuum) in practice. And so (for example) when autovacuum runs against the pgbench_history table, it always sets its relfrozenxid to a value very close to the OldestXmin -- usually the exact OldestXmin. Opportunistic freezing makes us avoid setting the all-visible bit for a heap page without also setting the all-frozen bit -- when we're about to do that, we go freeze the heap tuples and then set the entire page all-frozen (so we freeze anything <= OldestXmin, not <= FreezeLimit). We also freeze based on this more aggressive <= OldestXmin cutoff when pruning had to delete some tuples. The patch still needs more polishing, but I think that we can make anti-wraparound vacuums truly exceptional with this design -- which would make autovacuum a lot easier to deal with operationally. This seems like a feasible goal for Postgres 15, even (though still quite ambitious). The opportunistic freezing stuff isn't free (the WAL records aren't tiny), but it's still not all that expensive. Plus I think that the cost can be further reduced, with a little more work. > Part of the problem here, for both vacuuming-for-bloat and > vacuuming-for-relfrozenxid-advancement, we would really like to know > the distribution of old XIDs in the table. What I see with the draft patch series is that the oldest XID just isn't that old anymore, consistently -- we literally never fail to advance relfrozenxid, in any autovacuum, for any table. And the value that we end up with is consistently quite recent. This is something that I see both with BenchmarkSQL, and pgbench. There is a kind of virtuous circle, which prevents us from ever getting anywhere near having any table age in the tens of millions of XIDs. I guess that that makes avoiding useless vacuuming seem like less of a priority. ISTM that it should be something that is squarely aimed at keeping things stable in truly pathological cases. > So I'm not certain of the way forward here. Just because we can't > prevent almost-useless vacuuming is not a sufficient reason to > continue allowing entirely-useless vacuuming that we can prevent. And > it seems like we need a bunch of new bookkeeping to do any better than > that, which seems like a lot of work. So maybe it's the most practical > path forward for the time being, but it feels like more of a > special-purpose kludge than a truly high-quality solution. I'm sure that either one of us will be able to poke holes in any definition of "useless" that is continuous (rather than discrete) -- which, on reflection, pretty much means any definition that is concerned with bloat. I think that you're right about that: the question there must be "why are we even launching these bloat-orientated autovacuums that actually find no bloat?". -- Peter Geoghegan
On Tue, Dec 14, 2021 at 1:16 PM Peter Geoghegan <pg@bowt.ie> wrote: > I think that you'd agree that the arguments in favor of skipping are > strongest ... Well I just don't understand why you insist on using the word "skipping." I think what we're talking about - or at least what we should be talking about - is whether relation_needs_vacanalyze() sets *wraparound = true right after the comment that says /* Force vacuum if table is at risk of wraparound */. And adding some kind of exception to the logic that's there now. > What I see with the draft patch series is that the oldest XID just > isn't that old anymore, consistently -- we literally never fail to > advance relfrozenxid, in any autovacuum, for any table. And the value > that we end up with is consistently quite recent. This is something > that I see both with BenchmarkSQL, and pgbench. There is a kind of > virtuous circle, which prevents us from ever getting anywhere near > having any table age in the tens of millions of XIDs. Yeah, I hadn't thought about it from that perspective, but that does seem very good. I think it's inevitable that there will be cases where that doesn't work out - e.g. you can always force the bad case by holding a table lock until your newborn heads off to college, or just by overthrottling autovacuum so that it can't get through the database in any reasonable amount of time - but it will be nice when it does work out, for sure. > I guess that that makes avoiding useless vacuuming seem like less of a > priority. ISTM that it should be something that is squarely aimed at > keeping things stable in truly pathological cases. Yes. I think "pathological cases" is a good summary of what's wrong with autovacuum. When there's nothing too crazy happening, it actually does pretty well. But, when resources are tight or other corner cases occur, really dumb things start to happen. So it's reasonable to think about how we can install guard rails that prevent complete insanity. -- Robert Haas EDB: http://www.enterprisedb.com
On Tue, Dec 14, 2021 at 10:47 AM Robert Haas <robertmhaas@gmail.com> wrote: > Well I just don't understand why you insist on using the word > "skipping." I think what we're talking about - or at least what we > should be talking about - is whether relation_needs_vacanalyze() sets > *wraparound = true right after the comment that says /* Force vacuum > if table is at risk of wraparound */. And adding some kind of > exception to the logic that's there now. Actually, I agree. Skipping is the wrong term, especially because the phrase "VACUUM skips..." is already too overloaded. Not necessarily in vacuumlazy.c itself, but certainly on the mailing list. > Yeah, I hadn't thought about it from that perspective, but that does > seem very good. I think it's inevitable that there will be cases where > that doesn't work out - e.g. you can always force the bad case by > holding a table lock until your newborn heads off to college, or just > by overthrottling autovacuum so that it can't get through the database > in any reasonable amount of time - but it will be nice when it does > work out, for sure. Right. But when the patch doesn't manage to totally prevent anti-wraparound VACUUMs, things still work out a lot better than they would now. I would expect that in practice this will usually only happen when non-aggressive autovacuums keep getting canceled. And sure, it's still not ideal that things have come to that. But because we now do freezing earlier (when it's relatively inexpensive), and because we set all-frozen bits incrementally, the anti-wraparound autovacuum will at least be able to reuse any freezing that we manage to do in all those canceled autovacuums. I think that this tends to make anti-wraparound VACUUMs mostly about not being cancelable -- not so much about reliably advancing relfrozenxid. I mean it doesn't change the basic rules (there is no change to the definition of aggressive VACUUM), but in practice I think that it'll just work that way. Which makes a great deal of sense. I hope to be able to totally get rid of vacuum_freeze_table_age. The freeze map work in PostgreSQL 9.6 was really great, and very effective. But I think that it had an undesirable interaction with vacuum_freeze_min_age: if we set a heap page as all-visible (but not all frozen) before some of its tuples reached that age (which is very likely), then tuples < vacuum_freeze_min_age aren't going to get frozen until whenever we do an aggressive autovacuum. Very often, this will only happen when we next do an anti-wraparound VACUUM (at least before Postgres 13). I suspect we risk running into a "debt cliff" in the eventual anti-wraparound autovacuum. And so while vacuum_freeze_min_age kinda made sense prior to 9.6, it now seems to make a lot less sense. > > I guess that that makes avoiding useless vacuuming seem like less of a > > priority. ISTM that it should be something that is squarely aimed at > > keeping things stable in truly pathological cases. > > Yes. I think "pathological cases" is a good summary of what's wrong > with autovacuum. This is 100% my focus, in general. The main goal of the patch I'm working on isn't so much improving performance as making it more predictable over time. Focussing on freezing while costs are low has a natural tendency to spread the costs out over time. The system should never "get in over its head" with debt that vacuum is expected to eventually deal with. > When there's nothing too crazy happening, it actually > does pretty well. But, when resources are tight or other corner cases > occur, really dumb things start to happen. So it's reasonable to think > about how we can install guard rails that prevent complete insanity. Another thing that I really want to stamp out is anything involving a tiny, seemingly-insignificant adverse event that has the potential to cause disproportionate impact over time. For example, right now a non-aggressive VACUUM will never be able to advance relfrozenxid when it cannot get a cleanup lock on one heap page. It's actually extremely unlikely that that should have much of any impact, at least when you determine the new relfrozenxid for the table intelligently. Not acquiring one cleanup lock on one heap page on a huge table should not have such an extreme impact. It's even worse when the systemic impact over time is considered. Let's say you only have a 20% chance of failing to acquire one or more cleanup locks during a non-aggressive autovacuum for a given large table, meaning that you'll fail to advance relfrozenxid in at least 20% of all non-aggressive autovacuums. I think that that might be a lot worse than it sounds, because the impact compounds over time -- I'm not sure that 20% is much worse than 60%, or much better than 5% (very hard to model it). But if we make the high-level, abstract idea of "aggressiveness" more of a continuous thing, and not something that's defined by sharp (and largely meaningless) XID-based cutoffs, we have every chance of nipping these problems in the bud (without needing to model much of anything). -- Peter Geoghegan