Обсуждение: vac_truncate_clog()'s bogus check leads to bogusness

Поиск
Список
Период
Сортировка

vac_truncate_clog()'s bogus check leads to bogusness

От
Andres Freund
Дата:
Hi,

When vac_truncate_clog() returns early, due to one of these paths:

    /*
     * Do not truncate CLOG if we seem to have suffered wraparound already;
     * the computed minimum XID might be bogus.  This case should now be
     * impossible due to the defenses in GetNewTransactionId, but we keep the
     * test anyway.
     */
    if (frozenAlreadyWrapped)
    {
        ereport(WARNING,
                (errmsg("some databases have not been vacuumed in over 2 billion transactions"),
                 errdetail("You might have already suffered transaction-wraparound data loss.")));
        return;
    }

    /* chicken out if data is bogus in any other way */
    if (bogus)
        return;

we haven't released the lwlock that we acquired earlier:

    /* Restrict task to one backend per cluster; see SimpleLruTruncate(). */
    LWLockAcquire(WrapLimitsVacuumLock, LW_EXCLUSIVE);

as this isn't a path raising an error, the lock isn't released during abort.
Until there's some cause for the session to call LWLockReleaseAll(), the lock
is held. Until then neither the process holding the lock, nor any other
process, can finish vacuuming.  We don't even have an assert against a
self-deadlock with an already held lock, oddly enough.


This is somewhat nasty - there's no real way to get out of this without an
immediate restart, and it's hard to pinpoint the problem as well :(.


Ok, the subject line is not the most precise, but it was just too good an
opportunity.


To reproduce (only on a throwaway system please!):

CREATE DATABASE invalid;
UPDATE pg_database SET datfrozenxid = '10002' WHERE datname = 'invalid';
DROP TABLE IF EXISTS foo_tbl; CREATE TABLE foo_tbl(); DROP TABLE foo_tbl; VACUUM FREEZE;
DROP TABLE IF EXISTS foo_tbl; CREATE TABLE foo_tbl(); DROP TABLE foo_tbl; VACUUM FREEZE;
<hang>


Found this while writing a test for the fix for partial dropping of
databases [1].


Separately, I think it's quite bad that we *silently* return from
vac_truncate_clog() when finding a bogus xid. That's a quite severe condition,
we should at least tell the user about it.


Greetings,

Andres Freund

[1] https://postgr.es/m/20230621190204.nsaelabojxppiuix%40awork3.anarazel.de



Re: vac_truncate_clog()'s bogus check leads to bogusness

От
Andres Freund
Дата:
Hi,

On 2023-06-21 15:12:08 -0700, Andres Freund wrote:
> When vac_truncate_clog() returns early, due to one of these paths:
>
> [...]
>
> Separately, I think it's quite bad that we *silently* return from
> vac_truncate_clog() when finding a bogus xid. That's a quite severe condition,
> we should at least tell the user about it.

A related issue is that as far as I can tell the determination of what is
bogus is bogus.

The relevant cutoffs are determined vac_update_datfrozenxid() using:

    /*
     * Identify the latest relfrozenxid and relminmxid values that we could
     * validly see during the scan.  These are conservative values, but it's
     * not really worth trying to be more exact.
     */
    lastSaneFrozenXid = ReadNextTransactionId();
    lastSaneMinMulti = ReadNextMultiXactId();

but doing checks based on thos is bogus, because:

a) a concurrent create table / truncate / vacuum can update
   pg_class.relfrozenxid of some relation in the current database to a newer
   value, after lastSaneFrozenXid already has been determined. If that
   happens, we skip updating pg_database.datfrozenxid.

b) A concurrent vacuum in another database, ending up in vac_truncate_clog(),
   can compute a newer datfrozenxid. In that case the vac_truncate_clog() with
   the outdated lastSaneFrozenXid will not truncate the clog (and also forget
   to release WrapLimitsVacuumLock currently, as reported upthread) and not
   call SetTransactionIdLimit(). The latter is particularly bad, because that
   means we might not come out of "database is not accepting commands" land.

I think in both cases a later call might fix the issue, but that could be some
way out, if autovacuum doesn't see further writes being necessary, and no
further write activity happens, because of ""database is not accepting
commands".


It's not entirely obvious to me how to best fix these. For a second I thought
we just need to acquire a snapshot before determining the sane values, but
that doesn't work, since we update the relevant fields with
heap_inplace_update().

I guess we could just recompute the boundaries before actually believing the
catalog values are bogus?

I think we also add warnings to these paths, so we actually have a chance to
find problems in the field.

Greetings,

Andres Freund



Re: vac_truncate_clog()'s bogus check leads to bogusness

От
Noah Misch
Дата:
On Wed, Jun 21, 2023 at 03:12:08PM -0700, Andres Freund wrote:
> When vac_truncate_clog() returns early
...
> we haven't released the lwlock that we acquired earlier

> Until there's some cause for the session to call LWLockReleaseAll(), the lock
> is held. Until then neither the process holding the lock, nor any other
> process, can finish vacuuming.  We don't even have an assert against a
> self-deadlock with an already held lock, oddly enough.

I agree with this finding.  Would you like to add the lwlock releases, or
would you like me to?

The bug has been in all released versions for 2.5 years, yet it escaped
notice.  That tells us something.  Bogus values have gotten rare?  The
affected session tends to get lucky and call LWLockReleaseAll() soon?

On Wed, Jun 21, 2023 at 05:46:37PM -0700, Andres Freund wrote:
> On 2023-06-21 15:12:08 -0700, Andres Freund wrote:
> > Separately, I think it's quite bad that we *silently* return from
> > vac_truncate_clog() when finding a bogus xid. That's a quite severe condition,
> > we should at least tell the user about it.
> 
> A related issue is that as far as I can tell the determination of what is
> bogus is bogus.
> 
> The relevant cutoffs are determined vac_update_datfrozenxid() using:
> 
>     /*
>      * Identify the latest relfrozenxid and relminmxid values that we could
>      * validly see during the scan.  These are conservative values, but it's
>      * not really worth trying to be more exact.
>      */
>     lastSaneFrozenXid = ReadNextTransactionId();
>     lastSaneMinMulti = ReadNextMultiXactId();
> 
> but doing checks based on thos is bogus, because:
> 
> a) a concurrent create table / truncate / vacuum can update
>    pg_class.relfrozenxid of some relation in the current database to a newer
>    value, after lastSaneFrozenXid already has been determined. If that
>    happens, we skip updating pg_database.datfrozenxid.
> 
> b) A concurrent vacuum in another database, ending up in vac_truncate_clog(),
>    can compute a newer datfrozenxid. In that case the vac_truncate_clog() with
>    the outdated lastSaneFrozenXid will not truncate the clog (and also forget
>    to release WrapLimitsVacuumLock currently, as reported upthread) and not
>    call SetTransactionIdLimit(). The latter is particularly bad, because that
>    means we might not come out of "database is not accepting commands" land.

> I guess we could just recompute the boundaries before actually believing the
> catalog values are bogus?

That's how I'd do it.



Re: vac_truncate_clog()'s bogus check leads to bogusness

От
Andres Freund
Дата:
Hi,

On 2023-06-21 21:50:39 -0700, Noah Misch wrote:
> On Wed, Jun 21, 2023 at 03:12:08PM -0700, Andres Freund wrote:
> > When vac_truncate_clog() returns early
> ...
> > we haven't released the lwlock that we acquired earlier
> 
> > Until there's some cause for the session to call LWLockReleaseAll(), the lock
> > is held. Until then neither the process holding the lock, nor any other
> > process, can finish vacuuming.  We don't even have an assert against a
> > self-deadlock with an already held lock, oddly enough.
> 
> I agree with this finding.  Would you like to add the lwlock releases, or
> would you like me to?

Happy with either.  I do have code and testcase, so I guess it would make
sense for me to do it?


> The bug has been in all released versions for 2.5 years, yet it escaped
> notice.  That tells us something.  Bogus values have gotten rare?  The
> affected session tends to get lucky and call LWLockReleaseAll() soon?

I am not sure either. I suspect that part of it is that people couldn't even
pinpoint the problem when it happened.  Process exit calls LWLockReleaseAll(),
which I assume would avoid the problem in many cases.

Greetings,

Andres Freund



Re: vac_truncate_clog()'s bogus check leads to bogusness

От
Noah Misch
Дата:
On Thu, Jun 22, 2023 at 09:45:18AM -0700, Andres Freund wrote:
> On 2023-06-21 21:50:39 -0700, Noah Misch wrote:
> > On Wed, Jun 21, 2023 at 03:12:08PM -0700, Andres Freund wrote:
> > > When vac_truncate_clog() returns early
> > ...
> > > we haven't released the lwlock that we acquired earlier
> > 
> > > Until there's some cause for the session to call LWLockReleaseAll(), the lock
> > > is held. Until then neither the process holding the lock, nor any other
> > > process, can finish vacuuming.  We don't even have an assert against a
> > > self-deadlock with an already held lock, oddly enough.
> > 
> > I agree with this finding.  Would you like to add the lwlock releases, or
> > would you like me to?
> 
> Happy with either.  I do have code and testcase, so I guess it would make
> sense for me to do it?

Sounds good.  Thanks.



Re: vac_truncate_clog()'s bogus check leads to bogusness

От
Andres Freund
Дата:
Hi,

On 2023-06-21 21:50:39 -0700, Noah Misch wrote:
> On Wed, Jun 21, 2023 at 05:46:37PM -0700, Andres Freund wrote:
> > A related issue is that as far as I can tell the determination of what is
> > bogus is bogus.
> > 
> > The relevant cutoffs are determined vac_update_datfrozenxid() using:
> > 
> >     /*
> >      * Identify the latest relfrozenxid and relminmxid values that we could
> >      * validly see during the scan.  These are conservative values, but it's
> >      * not really worth trying to be more exact.
> >      */
> >     lastSaneFrozenXid = ReadNextTransactionId();
> >     lastSaneMinMulti = ReadNextMultiXactId();
> > 
> > but doing checks based on thos is bogus, because:
> > 
> > a) a concurrent create table / truncate / vacuum can update
> >    pg_class.relfrozenxid of some relation in the current database to a newer
> >    value, after lastSaneFrozenXid already has been determined. If that
> >    happens, we skip updating pg_database.datfrozenxid.
> > 
> > b) A concurrent vacuum in another database, ending up in vac_truncate_clog(),
> >    can compute a newer datfrozenxid. In that case the vac_truncate_clog() with
> >    the outdated lastSaneFrozenXid will not truncate the clog (and also forget
> >    to release WrapLimitsVacuumLock currently, as reported upthread) and not
> >    call SetTransactionIdLimit(). The latter is particularly bad, because that
> >    means we might not come out of "database is not accepting commands" land.
> 
> > I guess we could just recompute the boundaries before actually believing the
> > catalog values are bogus?
> 
> That's how I'd do it.

I was looking at doing that and got confused by the current code. Am I missing
something, or does vac_truncate_clog() have two pretty much identical attempts
at a safety measures?

void
vac_update_datfrozenxid(void)
...
    lastSaneFrozenXid = ReadNextTransactionId();
...
        vac_truncate_clog(newFrozenXid, newMinMulti,
                          lastSaneFrozenXid, lastSaneMinMulti);
}
...
static void
vac_truncate_clog(TransactionId frozenXID,
                  MultiXactId minMulti,
                  TransactionId lastSaneFrozenXid,
                  MultiXactId lastSaneMinMulti)
{
    TransactionId nextXID = ReadNextTransactionId();
...
        /*
         * If things are working properly, no database should have a
         * datfrozenxid or datminmxid that is "in the future".  However, such
         * cases have been known to arise due to bugs in pg_upgrade.  If we
         * see any entries that are "in the future", chicken out and don't do
         * anything.  This ensures we won't truncate clog before those
         * databases have been scanned and cleaned up.  (We will issue the
         * "already wrapped" warning if appropriate, though.)
         */
        if (TransactionIdPrecedes(lastSaneFrozenXid, datfrozenxid) ||
            MultiXactIdPrecedes(lastSaneMinMulti, datminmxid))
            bogus = true;

        if (TransactionIdPrecedes(nextXID, datfrozenxid))
            frozenAlreadyWrapped = true;

lastSaneFrozenXid is a slightly older version of ReadNextTransactionId(),
that's the only difference afaict.


I guess this might be caused by 78db307bb23 adding the check, but using
GetOldestXmin(NULL, true) to determine lastSaneFrozenXid. That was changed
soon after, in 87f830e0ce03.


Am I missing something?


Greetings,

Andres Freund



Re: vac_truncate_clog()'s bogus check leads to bogusness

От
Andres Freund
Дата:
Hi,

On 2023-06-23 18:41:58 -0700, Andres Freund wrote:
> I guess this might be caused by 78db307bb23 adding the check, but using
> GetOldestXmin(NULL, true) to determine lastSaneFrozenXid. That was changed
> soon after, in 87f830e0ce03.

FWIW, the discussion leading up to 87f830e0ce03 is
https://postgr.es/m/4182.1405961004%40sss.pgh.pa.us

Greetings,

Andres Freund



Re: vac_truncate_clog()'s bogus check leads to bogusness

От
Noah Misch
Дата:
On Fri, Jun 23, 2023 at 06:41:58PM -0700, Andres Freund wrote:
> On 2023-06-21 21:50:39 -0700, Noah Misch wrote:
> > On Wed, Jun 21, 2023 at 05:46:37PM -0700, Andres Freund wrote:
> > > A related issue is that as far as I can tell the determination of what is
> > > bogus is bogus.
> > > 
> > > The relevant cutoffs are determined vac_update_datfrozenxid() using:
> > > 
> > >     /*
> > >      * Identify the latest relfrozenxid and relminmxid values that we could
> > >      * validly see during the scan.  These are conservative values, but it's
> > >      * not really worth trying to be more exact.
> > >      */
> > >     lastSaneFrozenXid = ReadNextTransactionId();
> > >     lastSaneMinMulti = ReadNextMultiXactId();
> > > 
> > > but doing checks based on thos is bogus, because:
> > > 
> > > a) a concurrent create table / truncate / vacuum can update
> > >    pg_class.relfrozenxid of some relation in the current database to a newer
> > >    value, after lastSaneFrozenXid already has been determined. If that
> > >    happens, we skip updating pg_database.datfrozenxid.
> > > 
> > > b) A concurrent vacuum in another database, ending up in vac_truncate_clog(),
> > >    can compute a newer datfrozenxid. In that case the vac_truncate_clog() with
> > >    the outdated lastSaneFrozenXid will not truncate the clog (and also forget
> > >    to release WrapLimitsVacuumLock currently, as reported upthread) and not
> > >    call SetTransactionIdLimit(). The latter is particularly bad, because that
> > >    means we might not come out of "database is not accepting commands" land.
> > 
> > > I guess we could just recompute the boundaries before actually believing the
> > > catalog values are bogus?
> > 
> > That's how I'd do it.
> 
> I was looking at doing that and got confused by the current code. Am I missing
> something, or does vac_truncate_clog() have two pretty much identical attempts
> at a safety measures?
> 
> void
> vac_update_datfrozenxid(void)
> ...
>     lastSaneFrozenXid = ReadNextTransactionId();
> ...
>         vac_truncate_clog(newFrozenXid, newMinMulti,
>                           lastSaneFrozenXid, lastSaneMinMulti);
> }
> ...
> static void
> vac_truncate_clog(TransactionId frozenXID,
>                   MultiXactId minMulti,
>                   TransactionId lastSaneFrozenXid,
>                   MultiXactId lastSaneMinMulti)
> {
>     TransactionId nextXID = ReadNextTransactionId();
> ...
>         /*
>          * If things are working properly, no database should have a
>          * datfrozenxid or datminmxid that is "in the future".  However, such
>          * cases have been known to arise due to bugs in pg_upgrade.  If we
>          * see any entries that are "in the future", chicken out and don't do
>          * anything.  This ensures we won't truncate clog before those
>          * databases have been scanned and cleaned up.  (We will issue the
>          * "already wrapped" warning if appropriate, though.)
>          */
>         if (TransactionIdPrecedes(lastSaneFrozenXid, datfrozenxid) ||
>             MultiXactIdPrecedes(lastSaneMinMulti, datminmxid))
>             bogus = true;
> 
>         if (TransactionIdPrecedes(nextXID, datfrozenxid))
>             frozenAlreadyWrapped = true;
> 
> lastSaneFrozenXid is a slightly older version of ReadNextTransactionId(),
> that's the only difference afaict.

I don't think you missed anything.  nextXID and lastSaneFrozenXid are both
just caches of ReadNextTransactionId().  Each can become stale enough to make
those comparisons suggest trouble when all is fine.

> I guess this might be caused by 78db307bb23 adding the check, but using
> GetOldestXmin(NULL, true) to determine lastSaneFrozenXid. That was changed
> soon after, in 87f830e0ce03.

Yeah.  The nextXID check is from 9c54cfb (2002-04), and the newer check
converged with it in 87f830e0ce03 (2014-07).


While less important, some other things look weak in these functions:

- The only non-corruption cause to reach the "don't want to let datfrozenxid
  go backward" code is for GetOldestNonRemovableTransactionId(NULL) to go
  backward, e.g. if a walsender starts up and advertises an xmin.  One could
  eliminate that cause by replacing "newFrozenXid =
  GetOldestNonRemovableTransactionId(NULL)" with initialization from the first
  relfrozenxid, analogous to how vac_truncate_clog() initializes.
  vac_update_datfrozenxid() could then warn if the prevention code intervenes.
  Perhaps, instead of preventing the go-backwards, it should apply the
  go-backward change after warning?  (Unlike datfrozenxid, datminmxid going
  backward already implies corruption.)

- The "some databases have not been vacuumed in over 2 billion transactions"
  message is false more often than not.  More likely, something corrupted a
  frozen ID.  The message is also missing the opportunity to indicate one of
  the affected databases.

- vac_truncate_clog() bogosity checks examine XIDs only, not multis.



Re: vac_truncate_clog()'s bogus check leads to bogusness

От
Andres Freund
Дата:
On 2023-06-22 22:29:12 -0700, Noah Misch wrote:
> On Thu, Jun 22, 2023 at 09:45:18AM -0700, Andres Freund wrote:
> > On 2023-06-21 21:50:39 -0700, Noah Misch wrote:
> > > On Wed, Jun 21, 2023 at 03:12:08PM -0700, Andres Freund wrote:
> > > > When vac_truncate_clog() returns early
> > > ...
> > > > we haven't released the lwlock that we acquired earlier
> > > 
> > > > Until there's some cause for the session to call LWLockReleaseAll(), the lock
> > > > is held. Until then neither the process holding the lock, nor any other
> > > > process, can finish vacuuming.  We don't even have an assert against a
> > > > self-deadlock with an already held lock, oddly enough.
> > > 
> > > I agree with this finding.  Would you like to add the lwlock releases, or
> > > would you like me to?
> > 
> > Happy with either.  I do have code and testcase, so I guess it would make
> > sense for me to do it?
> 
> Sounds good.  Thanks.

Done.