Re: BUG #17744: Fail Assert while recoverying from pg_basebackup

Поиск

Список

Период

Сортировка

От	Andres Freund
Тема	Re: BUG #17744: Fail Assert while recoverying from pg_basebackup
Дата	1 февраля 2023 г. 15:32:52
Msg-id	20230201153252.l6kcfum7trdovw2b@alap3.anarazel.de обсуждение исходный текст
Ответ на	Re: BUG #17744: Fail Assert while recoverying from pg_basebackup (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Ответы	Re: BUG #17744: Fail Assert while recoverying from pg_basebackup Re: BUG #17744: Fail Assert while recoverying from pg_basebackup
Список	pgsql-bugs

Дерево обсуждения

Hi,

On 2023-01-13 18:36:05 +0900, Kyotaro Horiguchi wrote:
> At Tue, 10 Jan 2023 07:45:45 +0000, PG Bug reporting form <noreply@postgresql.org> wrote in 
> > #2  0x0000000000b378e9 in ExceptionalCondition (
> >     conditionName=0xd13697 "TransactionIdIsValid(initial)", 
> >     errorType=0xd12df4 "FailedAssertion", fileName=0xd12de8 "procarray.c",
> > 
> >     lineNumber=1750) at assert.c:69
> > #3  0x0000000000962195 in ComputeXidHorizons (h=0x7ffe93de25e0)
> >     at procarray.c:1750
> > #4  0x00000000009628a3 in GetOldestTransactionIdConsideredRunning ()
> >     at procarray.c:2050
> > #5  0x00000000005972bf in CreateRestartPoint (flags=256) at xlog.c:7153
> > #6  0x00000000008cae37 in CheckpointerMain () at checkpointer.c:464
> 
> The function requires a valid value in
> ShmemVariableCache->latestCompleteXid. But it is not initialized and
> maintained in this case.  The attached quick hack seems working, but
> of course more decent fix is needed.

I might be missing something, but I suspect the problem here is that we
shouldn't have been creating a restart point. Afaict, the setup
instructions provided don't configure a recovery.signal, so we'll just
perform crash recovery.

And I don't think it'd ever make sense to create a restart point during
crash recovery?

Except that in this case, it's not pure crash recovery, it's restoring
from a backup label. Due to which it actually might make sense to create
restart points?  If you're doing PITR or such you don't really gain
anything by doing checkpoints until you've reached consistency, unless
you want to optimize for the case that you might need to start/stop the
instance multiple times?

So maybe it's the right thing to create restart points? Really not sure.

If we do want to do restartpoints, we definitely shouldn't try to
TruncateSUBTRANS() in the crash-recovery-like-restartpoint case, we've
not even done StartupSUBTRANS(), because that's guarded by
ArchiveRecoveryRequested.

The most obvious (but wrong!), fix would be to change

    if (EnableHotStandby)
        TruncateSUBTRANS(GetOldestTransactionIdConsideredRunning());
to
    if (standbyState != STANDBY_DISABLED)
        TruncateSUBTRANS(GetOldestTransactionIdConsideredRunning());
except that doesn't work, because we don't have working access to
standbyState. Nor the other relevant variables. Gah.

We've really made a hash out of the state management for
xlog.c. ArchiveRecoveryRequested, InArchiveRecovery,
StandbyModeRequested, StandbyMode, EnableHotStandby,
LocalHotStandbyActive, ... :(.  We use InArchiveRecovery = true, even if
there's no archiving involved. Afaict ArchiveRecoveryRequested=false,
InArchiveRecovery=true isn't really something the comments around the
variables foresee.

Greetings,

Andres Freund

В списке pgsql-bugs по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: BUG #17744: Fail Assert while recoverying from pg_basebackup