Re: BUG #17744: Fail Assert while recoverying from pg_basebackup
От | Andres Freund |
---|---|
Тема | Re: BUG #17744: Fail Assert while recoverying from pg_basebackup |
Дата | |
Msg-id | 20230201153252.l6kcfum7trdovw2b@alap3.anarazel.de обсуждение исходный текст |
Ответ на | Re: BUG #17744: Fail Assert while recoverying from pg_basebackup (Kyotaro Horiguchi <horikyota.ntt@gmail.com>) |
Ответы |
Re: BUG #17744: Fail Assert while recoverying from pg_basebackup
Re: BUG #17744: Fail Assert while recoverying from pg_basebackup |
Список | pgsql-bugs |
Hi, On 2023-01-13 18:36:05 +0900, Kyotaro Horiguchi wrote: > At Tue, 10 Jan 2023 07:45:45 +0000, PG Bug reporting form <noreply@postgresql.org> wrote in > > #2 0x0000000000b378e9 in ExceptionalCondition ( > > conditionName=0xd13697 "TransactionIdIsValid(initial)", > > errorType=0xd12df4 "FailedAssertion", fileName=0xd12de8 "procarray.c", > > > > lineNumber=1750) at assert.c:69 > > #3 0x0000000000962195 in ComputeXidHorizons (h=0x7ffe93de25e0) > > at procarray.c:1750 > > #4 0x00000000009628a3 in GetOldestTransactionIdConsideredRunning () > > at procarray.c:2050 > > #5 0x00000000005972bf in CreateRestartPoint (flags=256) at xlog.c:7153 > > #6 0x00000000008cae37 in CheckpointerMain () at checkpointer.c:464 > > The function requires a valid value in > ShmemVariableCache->latestCompleteXid. But it is not initialized and > maintained in this case. The attached quick hack seems working, but > of course more decent fix is needed. I might be missing something, but I suspect the problem here is that we shouldn't have been creating a restart point. Afaict, the setup instructions provided don't configure a recovery.signal, so we'll just perform crash recovery. And I don't think it'd ever make sense to create a restart point during crash recovery? Except that in this case, it's not pure crash recovery, it's restoring from a backup label. Due to which it actually might make sense to create restart points? If you're doing PITR or such you don't really gain anything by doing checkpoints until you've reached consistency, unless you want to optimize for the case that you might need to start/stop the instance multiple times? So maybe it's the right thing to create restart points? Really not sure. If we do want to do restartpoints, we definitely shouldn't try to TruncateSUBTRANS() in the crash-recovery-like-restartpoint case, we've not even done StartupSUBTRANS(), because that's guarded by ArchiveRecoveryRequested. The most obvious (but wrong!), fix would be to change if (EnableHotStandby) TruncateSUBTRANS(GetOldestTransactionIdConsideredRunning()); to if (standbyState != STANDBY_DISABLED) TruncateSUBTRANS(GetOldestTransactionIdConsideredRunning()); except that doesn't work, because we don't have working access to standbyState. Nor the other relevant variables. Gah. We've really made a hash out of the state management for xlog.c. ArchiveRecoveryRequested, InArchiveRecovery, StandbyModeRequested, StandbyMode, EnableHotStandby, LocalHotStandbyActive, ... :(. We use InArchiveRecovery = true, even if there's no archiving involved. Afaict ArchiveRecoveryRequested=false, InArchiveRecovery=true isn't really something the comments around the variables foresee. Greetings, Andres Freund
В списке pgsql-bugs по дате отправления: