Re: BUG #17928: Standby fails to decode WAL on termination of primary

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Re: BUG #17928: Standby fails to decode WAL on termination of primary
Дата
Msg-id CA+hUKGLcT4ttqts4ow1=ZF9c+AwU=YfovfPs=r-Y2n0G-BunFA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BUG #17928: Standby fails to decode WAL on termination of primary  (Michael Paquier <michael@paquier.xyz>)
Ответы Re: BUG #17928: Standby fails to decode WAL on termination of primary  (Michael Paquier <michael@paquier.xyz>)
Список pgsql-bugs
On Mon, Sep 4, 2023 at 3:54 PM Michael Paquier <michael@paquier.xyz> wrote:
> On Mon, Sep 04, 2023 at 03:20:31PM +1200, Thomas Munro wrote:
> > 1.  In the place where we fail to allocate memory for an oversized
> > record, I copied the comment about treating that as a "bogus data"
> > condition.  I suspect that we will soon be converting that to a FATAL
> > error[1], and that'll need to be done in both places.
>
> You mean for the two callers of XLogReadRecordAlloc(), even for the
> case where !allow_oversized?  Using a FATAL on non-FRONTEND would be
> the quickest fix, indeed, but there are argument for standbys where we
> could let these continue, as well.  That would be an improvement over
> the always-FATAL on OOM, of course.

I just mean the two places where "bogus data" is mentioned in that v5 patch.

> > But if you
> > want to be able to distinguish garbage from out-of-memory, and thereby
> > end-of-wal from a FATAL please-insert-more-RAM condition, I think
> > you'd really need this industrial strength validation in all affected
> > branches, and I'd have more work to do, right?  The weak validation we
> > are fixing here is the *real* underlying problem going back many
> > years, right?
>
> Getting the same validation checks for all the branches would be nice.
> FATAL-ing on OOM to force recovery to happen again is a better option
> than assuming that it is the end of recovery.  I am OK to provide
> patches for all the branches for the sake of this thread, if that
> helps.  Switching to a hard FATAL on OOM for the WAL reader in the
> backend is backpatchable, but I'd rather consider that on a different
> thread once the better checks for the record header are in place.

OK, so it sounds like you want to go back to 12.  Let me see if I can
get this TAP test to work in 12... more tomorrow.



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: BUG #17950: Incorrect memory access in gtsvector_picksplit()
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: BUG #17928: Standby fails to decode WAL on termination of primary