Обсуждение: vac_truncate_clog()'s bogus check leads to bogusness
Hi, When vac_truncate_clog() returns early, due to one of these paths: /* * Do not truncate CLOG if we seem to have suffered wraparound already; * the computed minimum XID might be bogus. This case should now be * impossible due to the defenses in GetNewTransactionId, but we keep the * test anyway. */ if (frozenAlreadyWrapped) { ereport(WARNING, (errmsg("some databases have not been vacuumed in over 2 billion transactions"), errdetail("You might have already suffered transaction-wraparound data loss."))); return; } /* chicken out if data is bogus in any other way */ if (bogus) return; we haven't released the lwlock that we acquired earlier: /* Restrict task to one backend per cluster; see SimpleLruTruncate(). */ LWLockAcquire(WrapLimitsVacuumLock, LW_EXCLUSIVE); as this isn't a path raising an error, the lock isn't released during abort. Until there's some cause for the session to call LWLockReleaseAll(), the lock is held. Until then neither the process holding the lock, nor any other process, can finish vacuuming. We don't even have an assert against a self-deadlock with an already held lock, oddly enough. This is somewhat nasty - there's no real way to get out of this without an immediate restart, and it's hard to pinpoint the problem as well :(. Ok, the subject line is not the most precise, but it was just too good an opportunity. To reproduce (only on a throwaway system please!): CREATE DATABASE invalid; UPDATE pg_database SET datfrozenxid = '10002' WHERE datname = 'invalid'; DROP TABLE IF EXISTS foo_tbl; CREATE TABLE foo_tbl(); DROP TABLE foo_tbl; VACUUM FREEZE; DROP TABLE IF EXISTS foo_tbl; CREATE TABLE foo_tbl(); DROP TABLE foo_tbl; VACUUM FREEZE; <hang> Found this while writing a test for the fix for partial dropping of databases [1]. Separately, I think it's quite bad that we *silently* return from vac_truncate_clog() when finding a bogus xid. That's a quite severe condition, we should at least tell the user about it. Greetings, Andres Freund [1] https://postgr.es/m/20230621190204.nsaelabojxppiuix%40awork3.anarazel.de
Hi, On 2023-06-21 15:12:08 -0700, Andres Freund wrote: > When vac_truncate_clog() returns early, due to one of these paths: > > [...] > > Separately, I think it's quite bad that we *silently* return from > vac_truncate_clog() when finding a bogus xid. That's a quite severe condition, > we should at least tell the user about it. A related issue is that as far as I can tell the determination of what is bogus is bogus. The relevant cutoffs are determined vac_update_datfrozenxid() using: /* * Identify the latest relfrozenxid and relminmxid values that we could * validly see during the scan. These are conservative values, but it's * not really worth trying to be more exact. */ lastSaneFrozenXid = ReadNextTransactionId(); lastSaneMinMulti = ReadNextMultiXactId(); but doing checks based on thos is bogus, because: a) a concurrent create table / truncate / vacuum can update pg_class.relfrozenxid of some relation in the current database to a newer value, after lastSaneFrozenXid already has been determined. If that happens, we skip updating pg_database.datfrozenxid. b) A concurrent vacuum in another database, ending up in vac_truncate_clog(), can compute a newer datfrozenxid. In that case the vac_truncate_clog() with the outdated lastSaneFrozenXid will not truncate the clog (and also forget to release WrapLimitsVacuumLock currently, as reported upthread) and not call SetTransactionIdLimit(). The latter is particularly bad, because that means we might not come out of "database is not accepting commands" land. I think in both cases a later call might fix the issue, but that could be some way out, if autovacuum doesn't see further writes being necessary, and no further write activity happens, because of ""database is not accepting commands". It's not entirely obvious to me how to best fix these. For a second I thought we just need to acquire a snapshot before determining the sane values, but that doesn't work, since we update the relevant fields with heap_inplace_update(). I guess we could just recompute the boundaries before actually believing the catalog values are bogus? I think we also add warnings to these paths, so we actually have a chance to find problems in the field. Greetings, Andres Freund
On Wed, Jun 21, 2023 at 03:12:08PM -0700, Andres Freund wrote: > When vac_truncate_clog() returns early ... > we haven't released the lwlock that we acquired earlier > Until there's some cause for the session to call LWLockReleaseAll(), the lock > is held. Until then neither the process holding the lock, nor any other > process, can finish vacuuming. We don't even have an assert against a > self-deadlock with an already held lock, oddly enough. I agree with this finding. Would you like to add the lwlock releases, or would you like me to? The bug has been in all released versions for 2.5 years, yet it escaped notice. That tells us something. Bogus values have gotten rare? The affected session tends to get lucky and call LWLockReleaseAll() soon? On Wed, Jun 21, 2023 at 05:46:37PM -0700, Andres Freund wrote: > On 2023-06-21 15:12:08 -0700, Andres Freund wrote: > > Separately, I think it's quite bad that we *silently* return from > > vac_truncate_clog() when finding a bogus xid. That's a quite severe condition, > > we should at least tell the user about it. > > A related issue is that as far as I can tell the determination of what is > bogus is bogus. > > The relevant cutoffs are determined vac_update_datfrozenxid() using: > > /* > * Identify the latest relfrozenxid and relminmxid values that we could > * validly see during the scan. These are conservative values, but it's > * not really worth trying to be more exact. > */ > lastSaneFrozenXid = ReadNextTransactionId(); > lastSaneMinMulti = ReadNextMultiXactId(); > > but doing checks based on thos is bogus, because: > > a) a concurrent create table / truncate / vacuum can update > pg_class.relfrozenxid of some relation in the current database to a newer > value, after lastSaneFrozenXid already has been determined. If that > happens, we skip updating pg_database.datfrozenxid. > > b) A concurrent vacuum in another database, ending up in vac_truncate_clog(), > can compute a newer datfrozenxid. In that case the vac_truncate_clog() with > the outdated lastSaneFrozenXid will not truncate the clog (and also forget > to release WrapLimitsVacuumLock currently, as reported upthread) and not > call SetTransactionIdLimit(). The latter is particularly bad, because that > means we might not come out of "database is not accepting commands" land. > I guess we could just recompute the boundaries before actually believing the > catalog values are bogus? That's how I'd do it.
Hi, On 2023-06-21 21:50:39 -0700, Noah Misch wrote: > On Wed, Jun 21, 2023 at 03:12:08PM -0700, Andres Freund wrote: > > When vac_truncate_clog() returns early > ... > > we haven't released the lwlock that we acquired earlier > > > Until there's some cause for the session to call LWLockReleaseAll(), the lock > > is held. Until then neither the process holding the lock, nor any other > > process, can finish vacuuming. We don't even have an assert against a > > self-deadlock with an already held lock, oddly enough. > > I agree with this finding. Would you like to add the lwlock releases, or > would you like me to? Happy with either. I do have code and testcase, so I guess it would make sense for me to do it? > The bug has been in all released versions for 2.5 years, yet it escaped > notice. That tells us something. Bogus values have gotten rare? The > affected session tends to get lucky and call LWLockReleaseAll() soon? I am not sure either. I suspect that part of it is that people couldn't even pinpoint the problem when it happened. Process exit calls LWLockReleaseAll(), which I assume would avoid the problem in many cases. Greetings, Andres Freund
On Thu, Jun 22, 2023 at 09:45:18AM -0700, Andres Freund wrote: > On 2023-06-21 21:50:39 -0700, Noah Misch wrote: > > On Wed, Jun 21, 2023 at 03:12:08PM -0700, Andres Freund wrote: > > > When vac_truncate_clog() returns early > > ... > > > we haven't released the lwlock that we acquired earlier > > > > > Until there's some cause for the session to call LWLockReleaseAll(), the lock > > > is held. Until then neither the process holding the lock, nor any other > > > process, can finish vacuuming. We don't even have an assert against a > > > self-deadlock with an already held lock, oddly enough. > > > > I agree with this finding. Would you like to add the lwlock releases, or > > would you like me to? > > Happy with either. I do have code and testcase, so I guess it would make > sense for me to do it? Sounds good. Thanks.
Hi, On 2023-06-21 21:50:39 -0700, Noah Misch wrote: > On Wed, Jun 21, 2023 at 05:46:37PM -0700, Andres Freund wrote: > > A related issue is that as far as I can tell the determination of what is > > bogus is bogus. > > > > The relevant cutoffs are determined vac_update_datfrozenxid() using: > > > > /* > > * Identify the latest relfrozenxid and relminmxid values that we could > > * validly see during the scan. These are conservative values, but it's > > * not really worth trying to be more exact. > > */ > > lastSaneFrozenXid = ReadNextTransactionId(); > > lastSaneMinMulti = ReadNextMultiXactId(); > > > > but doing checks based on thos is bogus, because: > > > > a) a concurrent create table / truncate / vacuum can update > > pg_class.relfrozenxid of some relation in the current database to a newer > > value, after lastSaneFrozenXid already has been determined. If that > > happens, we skip updating pg_database.datfrozenxid. > > > > b) A concurrent vacuum in another database, ending up in vac_truncate_clog(), > > can compute a newer datfrozenxid. In that case the vac_truncate_clog() with > > the outdated lastSaneFrozenXid will not truncate the clog (and also forget > > to release WrapLimitsVacuumLock currently, as reported upthread) and not > > call SetTransactionIdLimit(). The latter is particularly bad, because that > > means we might not come out of "database is not accepting commands" land. > > > I guess we could just recompute the boundaries before actually believing the > > catalog values are bogus? > > That's how I'd do it. I was looking at doing that and got confused by the current code. Am I missing something, or does vac_truncate_clog() have two pretty much identical attempts at a safety measures? void vac_update_datfrozenxid(void) ... lastSaneFrozenXid = ReadNextTransactionId(); ... vac_truncate_clog(newFrozenXid, newMinMulti, lastSaneFrozenXid, lastSaneMinMulti); } ... static void vac_truncate_clog(TransactionId frozenXID, MultiXactId minMulti, TransactionId lastSaneFrozenXid, MultiXactId lastSaneMinMulti) { TransactionId nextXID = ReadNextTransactionId(); ... /* * If things are working properly, no database should have a * datfrozenxid or datminmxid that is "in the future". However, such * cases have been known to arise due to bugs in pg_upgrade. If we * see any entries that are "in the future", chicken out and don't do * anything. This ensures we won't truncate clog before those * databases have been scanned and cleaned up. (We will issue the * "already wrapped" warning if appropriate, though.) */ if (TransactionIdPrecedes(lastSaneFrozenXid, datfrozenxid) || MultiXactIdPrecedes(lastSaneMinMulti, datminmxid)) bogus = true; if (TransactionIdPrecedes(nextXID, datfrozenxid)) frozenAlreadyWrapped = true; lastSaneFrozenXid is a slightly older version of ReadNextTransactionId(), that's the only difference afaict. I guess this might be caused by 78db307bb23 adding the check, but using GetOldestXmin(NULL, true) to determine lastSaneFrozenXid. That was changed soon after, in 87f830e0ce03. Am I missing something? Greetings, Andres Freund
Hi, On 2023-06-23 18:41:58 -0700, Andres Freund wrote: > I guess this might be caused by 78db307bb23 adding the check, but using > GetOldestXmin(NULL, true) to determine lastSaneFrozenXid. That was changed > soon after, in 87f830e0ce03. FWIW, the discussion leading up to 87f830e0ce03 is https://postgr.es/m/4182.1405961004%40sss.pgh.pa.us Greetings, Andres Freund
On Fri, Jun 23, 2023 at 06:41:58PM -0700, Andres Freund wrote: > On 2023-06-21 21:50:39 -0700, Noah Misch wrote: > > On Wed, Jun 21, 2023 at 05:46:37PM -0700, Andres Freund wrote: > > > A related issue is that as far as I can tell the determination of what is > > > bogus is bogus. > > > > > > The relevant cutoffs are determined vac_update_datfrozenxid() using: > > > > > > /* > > > * Identify the latest relfrozenxid and relminmxid values that we could > > > * validly see during the scan. These are conservative values, but it's > > > * not really worth trying to be more exact. > > > */ > > > lastSaneFrozenXid = ReadNextTransactionId(); > > > lastSaneMinMulti = ReadNextMultiXactId(); > > > > > > but doing checks based on thos is bogus, because: > > > > > > a) a concurrent create table / truncate / vacuum can update > > > pg_class.relfrozenxid of some relation in the current database to a newer > > > value, after lastSaneFrozenXid already has been determined. If that > > > happens, we skip updating pg_database.datfrozenxid. > > > > > > b) A concurrent vacuum in another database, ending up in vac_truncate_clog(), > > > can compute a newer datfrozenxid. In that case the vac_truncate_clog() with > > > the outdated lastSaneFrozenXid will not truncate the clog (and also forget > > > to release WrapLimitsVacuumLock currently, as reported upthread) and not > > > call SetTransactionIdLimit(). The latter is particularly bad, because that > > > means we might not come out of "database is not accepting commands" land. > > > > > I guess we could just recompute the boundaries before actually believing the > > > catalog values are bogus? > > > > That's how I'd do it. > > I was looking at doing that and got confused by the current code. Am I missing > something, or does vac_truncate_clog() have two pretty much identical attempts > at a safety measures? > > void > vac_update_datfrozenxid(void) > ... > lastSaneFrozenXid = ReadNextTransactionId(); > ... > vac_truncate_clog(newFrozenXid, newMinMulti, > lastSaneFrozenXid, lastSaneMinMulti); > } > ... > static void > vac_truncate_clog(TransactionId frozenXID, > MultiXactId minMulti, > TransactionId lastSaneFrozenXid, > MultiXactId lastSaneMinMulti) > { > TransactionId nextXID = ReadNextTransactionId(); > ... > /* > * If things are working properly, no database should have a > * datfrozenxid or datminmxid that is "in the future". However, such > * cases have been known to arise due to bugs in pg_upgrade. If we > * see any entries that are "in the future", chicken out and don't do > * anything. This ensures we won't truncate clog before those > * databases have been scanned and cleaned up. (We will issue the > * "already wrapped" warning if appropriate, though.) > */ > if (TransactionIdPrecedes(lastSaneFrozenXid, datfrozenxid) || > MultiXactIdPrecedes(lastSaneMinMulti, datminmxid)) > bogus = true; > > if (TransactionIdPrecedes(nextXID, datfrozenxid)) > frozenAlreadyWrapped = true; > > lastSaneFrozenXid is a slightly older version of ReadNextTransactionId(), > that's the only difference afaict. I don't think you missed anything. nextXID and lastSaneFrozenXid are both just caches of ReadNextTransactionId(). Each can become stale enough to make those comparisons suggest trouble when all is fine. > I guess this might be caused by 78db307bb23 adding the check, but using > GetOldestXmin(NULL, true) to determine lastSaneFrozenXid. That was changed > soon after, in 87f830e0ce03. Yeah. The nextXID check is from 9c54cfb (2002-04), and the newer check converged with it in 87f830e0ce03 (2014-07). While less important, some other things look weak in these functions: - The only non-corruption cause to reach the "don't want to let datfrozenxid go backward" code is for GetOldestNonRemovableTransactionId(NULL) to go backward, e.g. if a walsender starts up and advertises an xmin. One could eliminate that cause by replacing "newFrozenXid = GetOldestNonRemovableTransactionId(NULL)" with initialization from the first relfrozenxid, analogous to how vac_truncate_clog() initializes. vac_update_datfrozenxid() could then warn if the prevention code intervenes. Perhaps, instead of preventing the go-backwards, it should apply the go-backward change after warning? (Unlike datfrozenxid, datminmxid going backward already implies corruption.) - The "some databases have not been vacuumed in over 2 billion transactions" message is false more often than not. More likely, something corrupted a frozen ID. The message is also missing the opportunity to indicate one of the affected databases. - vac_truncate_clog() bogosity checks examine XIDs only, not multis.
On 2023-06-22 22:29:12 -0700, Noah Misch wrote: > On Thu, Jun 22, 2023 at 09:45:18AM -0700, Andres Freund wrote: > > On 2023-06-21 21:50:39 -0700, Noah Misch wrote: > > > On Wed, Jun 21, 2023 at 03:12:08PM -0700, Andres Freund wrote: > > > > When vac_truncate_clog() returns early > > > ... > > > > we haven't released the lwlock that we acquired earlier > > > > > > > Until there's some cause for the session to call LWLockReleaseAll(), the lock > > > > is held. Until then neither the process holding the lock, nor any other > > > > process, can finish vacuuming. We don't even have an assert against a > > > > self-deadlock with an already held lock, oddly enough. > > > > > > I agree with this finding. Would you like to add the lwlock releases, or > > > would you like me to? > > > > Happy with either. I do have code and testcase, so I guess it would make > > sense for me to do it? > > Sounds good. Thanks. Done.