Re: Combine Prune and Freeze records emitted by vacuum

Поиск
Список
Период
Сортировка
От Melanie Plageman
Тема Re: Combine Prune and Freeze records emitted by vacuum
Дата
Msg-id 20240315005658.y7bvbzvnlciqmhd6@liskov
обсуждение исходный текст
Ответ на Re: Combine Prune and Freeze records emitted by vacuum  (Melanie Plageman <melanieplageman@gmail.com>)
Ответы Re: Combine Prune and Freeze records emitted by vacuum  (Heikki Linnakangas <hlinnaka@iki.fi>)
Список pgsql-hackers
On Wed, Mar 13, 2024 at 07:25:56PM -0400, Melanie Plageman wrote:
> On Mon, Mar 11, 2024 at 6:38 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> >
> > On 09/03/2024 22:41, Melanie Plageman wrote:
> > > On Wed, Mar 6, 2024 at 7:59 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> > >> Does GlobalVisTestIsRemovableXid() handle FrozenTransactionId correctly?
> > >
> > > Okay, so I thought a lot about this, and I don't understand how
> > > GlobalVisTestIsRemovableXid() would not handle FrozenTransactionId
> > > correctly.
> > >
> > > vacrel->cutoffs.OldestXmin is computed initially from
> > > GetOldestNonRemovableTransactionId() which uses ComputeXidHorizons().
> > > GlobalVisState is updated by ComputeXidHorizons() (when it is
> > > updated).
> > >
> > > I do see that the comment above GlobalVisTestIsRemovableXid() says
> > >
> > >   * It is crucial that this only gets called for xids from a source that
> > >   * protects against xid wraparounds (e.g. from a table and thus protected by
> > >   * relfrozenxid).
> > >
> > > and then in
> > >
> > >       * Convert 32 bit argument to FullTransactionId. We can do so safely
> > >       * because we know the xid has to, at the very least, be between
> > >       * [oldestXid, nextXid), i.e. within 2 billion of xid.
> > >
> > > I'm not sure what oldestXid is here.
> > > It's true that I don't see GlobalVisTestIsRemovableXid() being called
> > > anywhere else with an xmin as an input. I think that hints that it is
> > > not safe with FrozenTransactionId. But I don't see what could go
> > > wrong.
> > >
> > > Maybe it has something to do with converting it to a FullTransactionId?
> > >
> > >     FullTransactionIdFromU64(U64FromFullTransactionId(rel)  + (int32)
> > > (xid - rel_xid));
> > >
> > > Sorry, I couldn't quite figure it out :(
> >
> > I just tested it. No, GlobalVisTestIsRemovableXid does not work for
> > FrozenTransactionId. I just tested it with state->definitely_needed ==
> > {0, 4000000000} and xid == FrozenTransactionid, and it incorrectly
> > returned 'false'. It treats FrozenTransactionId as if was a regular xid '2'.
> 
> I see. Looking at the original code:
>       if (!TransactionIdPrecedes(xmin,
>                                  vacrel->cutoffs.OldestXmin))
> 
> And the source code for TransactionIdPrecedes:
> 
>     if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
>         return (id1 < id2);
> 
>     diff = (int32) (id1 - id2);
>     return (diff < 0);
> 
> In your example, It seems like you mean GlobalVisState->maybe_needed is
> 0 and GlobalVisState->definitely_needed = 4000000000. In this example,
> if vacrel->cutoffs.OldestXmin was 0, we would get a wrong answer also.
> 
> But, I do see that the comparison done by TransactionIdPrecedes() is is
> quite different than that done by FullTransactionIdPrecedes() because of
> the modulo 2**32 arithmetic.
> 
> Solving the handling of FrozenTransactionId specifically by
> GlobalVisTestIsRemovableXid() seems like it would be done easily in our
> case by wrapping it in a function which checks if
> TransactionIdIsNormal() and returns true if it is not normal. But, I'm
> not sure if I am missing the larger problem.

I've made such a wrapper in attached v3.

> > >> The XLOG_HEAP2_FREEZE_PAGE record is a little smaller than
> > >> XLOG_HEAP2_PRUNE. But we could optimize the XLOG_HEAP2_PRUNE format for
> > >> the case that there's no pruning, just freezing. The record format
> > >> (xl_heap_prune) looks pretty complex as it is, I think it could be made
> > >> both more compact and more clear with some refactoring.
> > >
> > > I'm happy to change up xl_heap_prune format. In its current form,
> > > according to pahole, it has no holes and just 3 bytes of padding at
> > > the end.
> > >
> > > One way we could make it smaller is by moving the isCatalogRel member
> > > to directly after snapshotConflictHorizon and then adding a flags
> > > field and defining flags to indicate whether or not other members
> > > exist at all. We could set bits for HAS_FREEZE_PLANS, HAS_REDIRECTED,
> > > HAS_UNUSED, HAS_DEAD. Then I would remove the non-optional uint16
> > > nredirected, nunused, nplans, ndead and just put the number of
> > > redirected/unused/etc at the beginning of the arrays containing the
> > > offsets.
> >
> > Sounds good.
> >
> > > Then I could write various macros for accessing them. That
> > > would make it smaller, but it definitely wouldn't make it less complex
> > > (IMO).
> >
> > I don't know, it might turn out not that complex. If you define the
> > formats of each of those "sub-record types" as explicit structs, per
> > attached sketch, you won't need so many macros. Some care is still
> > needed with alignment though.
> 
> In the attached v2, I've done as you suggested and made all members
> except flags and snapshotConflictHorizon optional in the xl_heap_prune
> struct and obsoleted the xl_heap_freeze struct. I've kept the actual
> xl_heap_freeze_page struct and heap_xlog_freeze_page() function so that
> we can replay previously made XLOG_HEAP2_FREEZE_PAGE records.
> 
> Because we may set line pointers unused during vacuum's first pass, I
> couldn't use the presence of nowunused without redirected or dead items
> to indicate that this was a record emitted by vacuum's second pass. As
> such, I haven't obsoleted the xl_heap_vacuum struct. I was thinking I
> could add a flag that indicates the record was emitted by vacuum's
> second pass? We would want to distinguish this so that we could set the
> items unused without calling heap_page_prune_execute() -- because that
> calls PageRepairFragmentation() which requires a full cleanup lock.

Okay, so I was going to start using xl_heap_prune for vacuum here too,
but I realized it would be bigger because of the
snapshotConflictHorizon. Do you think there is a non-terrible way to
make the snapshotConflictHorizon optional? Like with a flag?

> I introduced a few sub-record types similar to what you suggested --
> they help a bit with alignment, so I think they are worth keeping. There
> are comments around them, but perhaps a larger diagram of the layout of
> a the new XLOG_HEAP2_PRUNE record would be helpful.

I started doing this, but I can't find a way of laying out the diagram
that pgindent doesn't destroy. I thought I remember other diagrams in
the source code showing the layout of something (something with pages
somewhere?) that don't get messed up by pgindent, but I couldn't find
them.

> There is a bit of duplicated code between heap_xlog_prune() and
> heap2_desc() since they both need to deserialize the record. Before the
> code to do this was small and it didn't matter, but it might be worth
> refactoring it that way now.

I have added a helper function to do the deserialization,
heap_xlog_deserialize_prune_and_freeze(). But I didn't start using it in
heap2_desc() because of the way the pg_waldump build file works. Do you
think the helper belongs in any of waldump's existing sources?

    pg_waldump_sources = files(
        'compat.c',
        'pg_waldump.c',
        'rmgrdesc.c',
    )

    pg_waldump_sources += rmgr_desc_sources
    pg_waldump_sources += xlogreader_sources
    pg_waldump_sources += files('../../backend/access/transam/xlogstats.c')

Otherwise, I assume I am suppose to avoid adding some big new include to
waldump.

> On Wed, Mar 6, 2024 at 7:59 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> >
> > I don't think we need XLOG_HEAP2_FREEZE_PAGE as a separate record type
> > anymore. By removing that, you also get rid of the freeze-only codepath
> > near the end of heap_page_prune(), and the
> > heap_freeze_execute_prepared() function.
> >
> > The XLOG_HEAP2_FREEZE_PAGE record is a little smaller than
> > XLOG_HEAP2_PRUNE. But we could optimize the XLOG_HEAP2_PRUNE format for
> > the case that there's no pruning, just freezing. The record format
> > (xl_heap_prune) looks pretty complex as it is, I think it could be made
> > both more compact and more clear with some refactoring.
> 
> On the point of removing the freeze-only code path from
> heap_page_prune() (now heap_page_prune_and_freeze()): while doing this,
> I realized that heap_pre_freeze_checks() was not being called in the
> case that we decide to freeze because we emitted an FPI while setting
> the hint bit. I've fixed that, however, I've done so by moving
> heap_pre_freeze_checks() into the critical section. I think that is not
> okay? I could move it earlier and not do call it when the hint bit FPI
> leads us to freeze tuples. But, I think that would lead to us doing a
> lot less validation of tuples being frozen when checksums are enabled.
> Or, I could make two critical sections?

I found another approach and just do the pre-freeze checks if we are
considering freezing except for the FPI criteria.

> > HeapPageFreeze has two "trackers", for the "freeze" and "no freeze"
> > cases. heap_page_prune() needs to track both, until it decides whether
> > to freeze or not. But it doesn't make much sense that the caller
> > (lazy_scan_prune()) has to initialize both, and has to choose which of
> > the values to use after the call depending on whether heap_page_prune()
> > froze or not. The two trackers should be just heap_page_prune()'s
> > internal business.
> 
> I've added new_relminmxid and new_relfrozenxid to PruneFreezeResult and
> set them appropriately in heap_page_prune_and_freeze().
> 
> It's a bit sad because if it wasn't for vacrel->skippedallvis,
> vacrel->NewRelfrozenXid and vacrel->NewRelminMxid would be
> vacrel->cutoffs.OldestXmin and vacrel->cutoffs.OldestMxact respectively
> and we could avoid having lazy_scan_prune() initializing the
> HeapPageFreeze altogether and just pass VacuumCutoffs (and
> heap_page_prune_opt() could pass NULL) to heap_page_prune_and_freeze().
> 
> I think it is probably worse to add both of them as additional optional
> arguments, so I've just left lazy_scan_prune() with the job of
> initializing them.
> 
> In heap_page_prune_and_freeze(), I initialize new_relminmxid and
> new_relfrozenxid to InvalidMultiXactId and InvalidTransactionId
> respectively because on-access pruning doesn't have a value to set them
> to. But I wasn't sure if this was okay -- since I don't see validation
> that TransactionIdIsValid() in vac_update_relstats(). It will work now
> -- just worried about future issues. I could add an assert there?

I looked more closely at vac_update_relstats() and it won't update
relfrozenxid unless the proposed value is smaller than the existing one.
And for MultiXactIds, it checks if it is valid. So, this is not an
issue.

I've also optimized the member ordering of PruneFreezeResult. pahole
identified some avoidable holes. I went through each commit and
optimized the PruneResult data structure as members are being added and
removed.

- Melanie

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Thomas Munro
Дата:
Сообщение: Re: broken JIT support on Fedora 40
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Support json_errdetail in FRONTEND builds