Re: Eager page freeze criteria clarification

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Eager page freeze criteria clarification
Дата
Msg-id 20230908052605.j46kyoamnch53cxh@awork3.anarazel.de
обсуждение исходный текст
Ответ на Re: Eager page freeze criteria clarification  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
Hi,

On 2023-09-06 16:21:31 -0400, Robert Haas wrote:
> On Wed, Sep 6, 2023 at 12:20 PM Peter Geoghegan <pg@bowt.ie> wrote:
> > If VACUUM freezes too aggressively, then (pretty much by definition)
> > we can be sure that the next VACUUM will scan the same pages -- there
> > may be some scope for VACUUM to "learn from its mistake" when we err
> > in the direction of over-freezing. But when VACUUM makes the opposite
> > mistake (doesn't freeze when it should have), it won't scan those same
> > pages again for a long time, by design. It therefore has no plausible
> > way of "learning from its mistakes" before it becomes an extremely
> > expensive and painful lesson (which happens whenever the next
> > aggressive VACUUM takes place). This is in large part a consequence of
> > the way that VACUUM dutifully sets pages all-visible whenever
> > possible. That behavior interacts badly with many workloads, over
> > time.
>
> I think this is an insightful commentary with which I partially agree.
> As I see it, the difference is that when you make the mistake of
> marking something all-visible or freezing it too aggressively, you
> incur a price that you pay almost immediately. When you make the
> mistake of not marking something all-visible when it would have been
> best to do so, you incur a price that you pay later, when the next
> VACUUM happens. When you make the mistake of not marking something
> all-frozen when it would have been best to do so, you incur a price
> that you pay even later, not at the next VACUUM but at some VACUUM
> further off. So there are different trade-offs. When you pay the price
> for a mistake immediately or nearly immediately, it can potentially
> harm the performance of the foreground workload, if you're making a
> lot of mistakes.

We have to make a *lot* of mistakes to badly harm the foreground workload. As
long as we don't constantly trigger FPIs, the worst case effects of freezing
unnecessarily aren't that large, particularly if we keep the number of
XLogInsert()s constant (via the combining of records that Melanie is working
on).  Once/if we want to opportunistically freeze when it *does* trigger an
FPI, we *do* need to be more certain that it's not pointless work.

We could further bound the worst case overhead by having a range
representation for freeze plans. That should be quite doable, we can add
another flag for xl_heap_freeze_plan.frzflags, which indicates the tuples in
the offset array are start/end tids, rather than individual tids.


> That sucks. On the other hand, when you defer paying
> the price until some later bulk operation, the costs of all of your
> mistakes get added up and then you pay the whole price all at once,
> which means you can be suddenly slapped with an enormous bill that you
> weren't expecting. That sucks, too, just in a different way.

It's particularly bad in the case of freezing because there's practically no
backpressure against deferring more work than the system can handle. If we had
made foreground processes freeze one page for every unfrozen page they create,
once the table reaches a certain percentage of old unfrozen pages, it'd be a
different story...


I think it's important that we prevent exploding the WAL volume due to
opportunistic freezing the same page over and over, but as long as we take
care to not do that, I think the impact on the foreground is going to be
small.

I'm sure that we can come up with cases where it's noticeable, e.g. because
the system is already completely bottlecked by WAL IO throughput and a small
increase in WAL volume is going to push things over the edge. But such systems
are going to be in substantial trouble at the next anti-wraparound vacuum and
will be out of commission for days once they hit the anti-wraparound shutdown
limits.


Some backpressure in the form of a small performance decrease for foreground
work might even be good there.



> > VACUUM simply ignores such second-order effects. Perhaps it would be
> > practical to address some of the issues in this area by avoiding
> > setting pages all-visible without freezing them, in some general
> > sense. That at least creates a kind of symmetry between mistakes in
> > the direction of under-freezing and mistakes in the direction of
> > over-freezing. That might enable VACUUM to course-correct in either
> > direction.
> >
> > Melanie is already planning on combining the WAL records (PRUNE,
> > FREEZE_PAGE, and VISIBLE). Perhaps that'll weaken the argument for
> > setting unfrozen pages all-visible even further.
>
> Yeah, so I think the question here is whether it's ever a good idea to
> mark a page all-visible without also freezing it. If it's not, then we
> should either mark fewer pages all-visible, or freeze more of them.
> Maybe I'm all wet here, but I think it depends on the situation. If a
> page is already dirty and has had an FPI since the last checkpoint,
> then it's pretty appealing to freeze whenever we mark all-visible. We
> still have to consider whether the incremental CPU cost and WAL volume
> are worth it, but assuming those costs are small enough not to be a
> big problem, it seems like a pretty good bet. Making a page
> un-all-visible has some cost, but making a page un-all-frozen really
> doesn't, so cool. On the other hand, if we have a page that isn't
> dirty, hasn't had a recent FPI, and doesn't need pruning, but which
> can be marked all-visible, freezing it is a potentially more
> significant cost, because marking the buffer all-visible doesn't force
> a new FPI, and freezing does.

+1

One thing that bothers in this area when using will-trigger-FPI style logic,
is that it makes checksums=on/off have behavioural impact. If we e.g. make
setting all-visible conditional on not triggering an FPI, there will be plenty
workloads where a checksums=off system will get an index only scan, but a
checksums=on system won't.

Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: David Rowley
Дата:
Сообщение: Re: Correct the documentation for work_mem
Следующее
От: Peter Geoghegan
Дата:
Сообщение: Re: Eager page freeze criteria clarification