Re: BUG #17928: Standby fails to decode WAL on termination of primary

Поиск
Список
Период
Сортировка
От Michael Paquier
Тема Re: BUG #17928: Standby fails to decode WAL on termination of primary
Дата
Msg-id ZNsXBFsFsKcCbP0q@paquier.xyz
обсуждение исходный текст
Ответ на Re: BUG #17928: Standby fails to decode WAL on termination of primary  (Michael Paquier <michael@paquier.xyz>)
Ответы Re: BUG #17928: Standby fails to decode WAL on termination of primary  (Thomas Munro <thomas.munro@gmail.com>)
Список pgsql-bugs
On Tue, Aug 15, 2023 at 12:00:30PM +0900, Michael Paquier wrote:
> Not sure if that will help, but what I was playing with some stuff in
> the lines of:
> -- Store the length up to page boundary.
> select setting::int - ((pg_current_wal_insert_lsn() - '0/0') %
>   setting::int) as boundary from pg_settings where name = 'wal_block_size'
>   \gset
> -- Generate record up to boundary (56 bytes for base size of the record,
> -- stop at 12 bytes before the end of the page.
> select pg_logical_emit_message(false, '', repeat('a', :boundary - 56 - 12));
>
> Then by injecting some FF's on the last page written and forcing
> replay I am able to force some of the error code paths, so I guess
> that's what you were basically doing?

I've been spending some extra time on this one and hacked a TAP test
that reliably reproduces the original issue, using a message similar
to what I mentioned in my previous messages.  I guess that we could
use something like that:
2023-08-15 15:07:03.790 JST [8729] LOG:  redo starts at 0/14EA428
2023-08-15 15:07:03.790 JST [8729] FATAL:  invalid memory alloc
request size 4294969740 2023-08-15
15:07:03.791 JST [8726] LOG:  startup process (PID 8729) exited with exit code 1

The proposed patches pass the test, HEAD does not.  We may want to do
more with page boundaries, and more error patterns, but the idea looks
worth exploring more.  At least this can be used to validate patches.

I've noticed while hacking the test that we don't do a XLogFlush()
after inserting the message's record, so we may lose it on crash.
That makes the test unstable except if an extra record is added after
the logical messages.  The attached patch forces that for the sake of
the test, but I'm spawning a different thread as losing this data
looks like a bug to me.
--
Michael

Вложения

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: BUG #17928: Standby fails to decode WAL on termination of primary
Следующее
От: PG Bug reporting form
Дата:
Сообщение: BUG #18057: unaccent removes intentional spaces