Re: Combine Prune and Freeze records emitted by vacuum

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: Combine Prune and Freeze records emitted by vacuum
Дата
Msg-id 2b2aee64-e78a-44d2-a00f-56fa77d56a6b@iki.fi
обсуждение исходный текст
Ответ на Re: Combine Prune and Freeze records emitted by vacuum  (Melanie Plageman <melanieplageman@gmail.com>)
Ответы Re: Combine Prune and Freeze records emitted by vacuum  (Melanie Plageman <melanieplageman@gmail.com>)
Re: Combine Prune and Freeze records emitted by vacuum  (Melanie Plageman <melanieplageman@gmail.com>)
Список pgsql-hackers
On 24/03/2024 18:32, Melanie Plageman wrote:
> On Thu, Mar 21, 2024 at 9:28 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>>
>> In heap_page_prune_and_freeze(), we now do some extra work on each live
>> tuple, to set the all_visible_except_removable correctly. And also to
>> update live_tuples, recently_dead_tuples and hastup. When we're not
>> freezing, that's a waste of cycles, the caller doesn't care. I hope it's
>> enough that it doesn't matter, but is it?
> 
> Last year on an early version of the patch set I did some pgbench
> tpcb-like benchmarks -- since there is a lot of on-access pruning in
> that workload -- and I don't remember it being a showstopper. The code
> has changed a fair bit since then. However, I think it might be safer
> to pass a flag "by_vacuum" to heap_page_prune_and_freeze() and skip
> the rest of the loop after heap_prune_satisifies_vacuum() when
> on-access pruning invokes it. I had avoided that because it felt ugly
> and error-prone, however it addresses a few other of your points as
> well.

Ok. I'm not a fan of the name 'by_vacuum' though. It'd be nice if the 
argument described what it does, rather than who it's for. For example, 
'need_all_visible'. If set to true, the function determines 
'all_visible', otherwise it does not.

I started to look closer at the loops in heap_prune_chain() and how they 
update all the various flags and counters. There's a lot going on there. 
We have:

- live_tuples counter
- recently_dead_tuples counter
- all_visible[_except_removable]
- all_frozen
- visibility_cutoff_xid
- hastup
- prstate.frozen array
- nnewlpdead
- deadoffsets array

And that doesn't even include all the local variables and the final 
dead/redirected arrays.

Some of those are set in the first loop that initializes 'htsv' for each 
tuple on the page. Others are updated in heap_prune_chain(). Some are 
updated in both. It's hard to follow which are set where.

I think recently_dead_tuples is updated incorrectly, for tuples that are 
part of a completely dead HOT chain. For example, imagine a hot chain 
with two tuples: RECENTLY_DEAD -> DEAD. heap_prune_chain() would follow 
the chain, see the DEAD tuple at the end of the chain, and mark both 
tuples for pruning. However, we already updated 'recently_dead_tuples' 
in the first loop, which is wrong if we remove the tuple.

Maybe that's the only bug like this, but I'm a little scared. Is there 
something we could do to make this simpler? Maybe move all the new work 
that we added to the first loop, into heap_prune_chain() ? Maybe 
introduce a few more helper heap_prune_record_*() functions, to update 
the flags and counters also for live and insert/delete-in-progress 
tuples and for dead line pointers? Something like 
heap_prune_record_live() and heap_prune_record_lp_dead().

>> The 'frz_conflict_horizon' stuff is still fuzzy to me. (Not necessarily
>> these patches's fault). This at least is wrong, because Max(a, b)
>> doesn't handle XID wraparound correctly:
>>
>>>                        if (do_freeze)
>>>                                conflict_xid = Max(prstate.snapshotConflictHorizon,
>>>                                                                   presult->frz_conflict_horizon);
>>>                        else
>>>                                conflict_xid = prstate.snapshotConflictHorizon;
>>
>> Then there's this in lazy_scan_prune():
>>
>>>                /* Using same cutoff when setting VM is now unnecessary */
>>>                if (presult.all_frozen)
>>>                        presult.frz_conflict_horizon = InvalidTransactionId;
>> This does the right thing in the end, but if all the tuples are frozen
>> shouldn't frz_conflict_horizon already be InvalidTransactionId? The
>> comment says it's "newest xmin on the page", and if everything was
>> frozen, all xmins are FrozenTransactionId. In other words, that should
>> be moved to heap_page_prune_and_freeze() so that it doesn't lie to its
>> caller. Also, frz_conflict_horizon is only set correctly if
>> 'all_frozen==true', would be good to mention that in the comments too.
> 
> Yes, this is a good point. I've spent some time swapping all of this
> back into my head. I think we should change the names of all these
> conflict horizon variables and introduce some local variables again.
> In the attached patch, I've updated the name of the variable in
> PruneFreezeResult to vm_conflict_horizon, as it is only used for
> emitting a VM update record. Now, I don't set it until the end of
> heap_page_prune_and_freeze(). It is only updated from
> InvalidTransactionId if the page is not all frozen. As you say, if the
> page is all frozen, there can be no conflict.

Makes sense.

> I've also changed PruneState->snapshotConflictHorizon to
> PruneState->latest_xid_removed.
> 
> And I introduced the local variables visibility_cutoff_xid and
> frz_conflict_horizon. I think it is important we distinguish between
> the latest xid pruned, the latest xmin of tuples frozen, and the
> latest xid of all live tuples on the page.
>
> Though we end up using visibility_cutoff_xid as the freeze conflict
> horizon if the page is all frozen, I think it is more clear to have
> the three variables and name them what they are.

Agreed.

> Note that I still don't think we have a resolution on what to
> correctly update new_relfrozenxid and new_relminmxid to at the end
> when presult->nfrozen == 0 and presult->all_frozen is true.
> 
>          if (presult->nfrozen > 0)
>          {
>              presult->new_relfrozenxid = pagefrz->FreezePageRelfrozenXid;
>              presult->new_relminmxid = pagefrz->FreezePageRelminMxid;
>          }
>          else
>          {
>              presult->new_relfrozenxid = pagefrz->NoFreezePageRelfrozenXid;
>              presult->new_relminmxid = pagefrz->NoFreezePageRelminMxid;
>          }

One approach is to take them out of the PageFreezeResult struct again, 
and pass them as pointers:

void
heap_page_prune_and_freeze(Relation relation, Buffer buffer,
    ...
    TransactionId *new_relfrozenxid,
    MultiXactId *new_relminmxid,
    ...
)

That would be natural for the caller too, as it wouldn't need to set up 
the old values to HeapPageFreeze before each call, nor copy the new 
values to 'vacrel' after the call. I'm thinking that we'd move the 
responsibility of setting up HeapPageFreeze to 
heap_page_prune_and_freeze(), instead of having the caller set it up. So 
the caller would look something like this:

    heap_page_prune_and_freeze(rel, buf, vacrel->vistest,
       &vacrel->cutoffs, &vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid,
       &presult,
       PRUNE_VACUUM_SCAN, flags,
       &vacrel->offnum);

In this setup, heap_page_prune_and_freeze() would update 
*new_relfrozenxid and *new_relminmxid when it has a new value for them, 
and leave them unchanged otherwise.

-- 
Heikki Linnakangas
Neon (https://neon.tech)




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: Possibility to disable `ALTER SYSTEM`
Следующее
От: Nathan Bossart
Дата:
Сообщение: Re: Introduce XID age and inactive timeout based replication slot invalidation