Re: Combine Prune and Freeze records emitted by vacuum

Поиск
Список
Период
Сортировка
От Melanie Plageman
Тема Re: Combine Prune and Freeze records emitted by vacuum
Дата
Msg-id CAAKRu_YZM2XpZDGUQE5ogXU9KnbfuXNB7_Ec0RybJ_w+Tvhrtw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Combine Prune and Freeze records emitted by vacuum  (Heikki Linnakangas <hlinnaka@iki.fi>)
Список pgsql-hackers
Thanks for committing the new WAL format!

On Mon, Mar 25, 2024 at 3:33 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>
> On 24/03/2024 18:32, Melanie Plageman wrote:
> > On Thu, Mar 21, 2024 at 9:28 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> >>
> >> In heap_page_prune_and_freeze(), we now do some extra work on each live
> >> tuple, to set the all_visible_except_removable correctly. And also to
> >> update live_tuples, recently_dead_tuples and hastup. When we're not
> >> freezing, that's a waste of cycles, the caller doesn't care. I hope it's
> >> enough that it doesn't matter, but is it?
> >
> > Last year on an early version of the patch set I did some pgbench
> > tpcb-like benchmarks -- since there is a lot of on-access pruning in
> > that workload -- and I don't remember it being a showstopper. The code
> > has changed a fair bit since then. However, I think it might be safer
> > to pass a flag "by_vacuum" to heap_page_prune_and_freeze() and skip
> > the rest of the loop after heap_prune_satisifies_vacuum() when
> > on-access pruning invokes it. I had avoided that because it felt ugly
> > and error-prone, however it addresses a few other of your points as
> > well.
>
> Ok. I'm not a fan of the name 'by_vacuum' though. It'd be nice if the
> argument described what it does, rather than who it's for. For example,
> 'need_all_visible'. If set to true, the function determines
> 'all_visible', otherwise it does not.

I like that way of putting it -- describing what it does instead of
who it is for. However, we now have PruneReason as an argument to
heap_page_prune(), which would be usable for this purpose (for
skipping the rest of the first loop). It is not descriptive of how we
would use it in this scenario, though.

> I started to look closer at the loops in heap_prune_chain() and how they
> update all the various flags and counters. There's a lot going on there.
> We have:
>
> - live_tuples counter
> - recently_dead_tuples counter
> - all_visible[_except_removable]
> - all_frozen
> - visibility_cutoff_xid
> - hastup
> - prstate.frozen array
> - nnewlpdead
> - deadoffsets array
>
> And that doesn't even include all the local variables and the final
> dead/redirected arrays.

Yes, there are a lot of things happening. In an early version, I had
hoped for the first loop to be just getting the visibility information
and then to do most of the other stuff as we went in
heap_prune_chain() as you mention below. I couldn't quite get a
version of that working that looked nice. I agree that the whole thing
feels a bit brittle and error-prone. It's hard to be objective after
fiddling with something over the course of a year. I'm trying to take
a step back now and rethink it.

> Some of those are set in the first loop that initializes 'htsv' for each
> tuple on the page. Others are updated in heap_prune_chain(). Some are
> updated in both. It's hard to follow which are set where.

Yep.

> I think recently_dead_tuples is updated incorrectly, for tuples that are
> part of a completely dead HOT chain. For example, imagine a hot chain
> with two tuples: RECENTLY_DEAD -> DEAD. heap_prune_chain() would follow
> the chain, see the DEAD tuple at the end of the chain, and mark both
> tuples for pruning. However, we already updated 'recently_dead_tuples'
> in the first loop, which is wrong if we remove the tuple.
>
> Maybe that's the only bug like this, but I'm a little scared. Is there
> something we could do to make this simpler? Maybe move all the new work
> that we added to the first loop, into heap_prune_chain() ? Maybe
> introduce a few more helper heap_prune_record_*() functions, to update
> the flags and counters also for live and insert/delete-in-progress
> tuples and for dead line pointers? Something like
> heap_prune_record_live() and heap_prune_record_lp_dead().

I had discarded previous attempts to get everything done in
heap_prune_chain() because it was hard to make sure I was doing the
right thing given that it visits the line pointers out of order so
making sure you've considered all of them once and only once was hard.
I hadn't thought of the approach you suggested with record_live() --
that might help. I will work on this tomorrow. I had hoped to get
something out today, but I am still in the middle of rebasing the back
20 patches from your v5 over current master and then adding in the
suggestions that I made in the various diffs on the thread.

> > Note that I still don't think we have a resolution on what to
> > correctly update new_relfrozenxid and new_relminmxid to at the end
> > when presult->nfrozen == 0 and presult->all_frozen is true.
> >
> >          if (presult->nfrozen > 0)
> >          {
> >              presult->new_relfrozenxid = pagefrz->FreezePageRelfrozenXid;
> >              presult->new_relminmxid = pagefrz->FreezePageRelminMxid;
> >          }
> >          else
> >          {
> >              presult->new_relfrozenxid = pagefrz->NoFreezePageRelfrozenXid;
> >              presult->new_relminmxid = pagefrz->NoFreezePageRelminMxid;
> >          }
>
> One approach is to take them out of the PageFreezeResult struct again,
> and pass them as pointers:
>
> void
> heap_page_prune_and_freeze(Relation relation, Buffer buffer,
>         ...
>         TransactionId *new_relfrozenxid,
>         MultiXactId *new_relminmxid,
>         ...
> )

What about the question about whether or not we should be using
FreezePageRelfrozenXid when all_frozen is true and nrfrozen == 0. I
was confused about whether or not we had to do this by the comment in
HeapPageFreeze.

> That would be natural for the caller too, as it wouldn't need to set up
> the old values to HeapPageFreeze before each call, nor copy the new
> values to 'vacrel' after the call. I'm thinking that we'd move the
> responsibility of setting up HeapPageFreeze to
> heap_page_prune_and_freeze(), instead of having the caller set it up. So
> the caller would look something like this:
>
>         heap_page_prune_and_freeze(rel, buf, vacrel->vistest,
>            &vacrel->cutoffs, &vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid,
>            &presult,
>            PRUNE_VACUUM_SCAN, flags,
>            &vacrel->offnum);
>
> In this setup, heap_page_prune_and_freeze() would update
> *new_relfrozenxid and *new_relminmxid when it has a new value for them,
> and leave them unchanged otherwise.

I do prefer having heap_page_prune_and_freeze() own HeapPageFreeze.

- Melanie



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: make dist using git archive
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: pgsql: Track last_inactive_time in pg_replication_slots.