Re: BUG #17928: Standby fails to decode WAL on termination of primary
От | Michael Paquier |
---|---|
Тема | Re: BUG #17928: Standby fails to decode WAL on termination of primary |
Дата | |
Msg-id | ZNsXBFsFsKcCbP0q@paquier.xyz обсуждение исходный текст |
Ответ на | Re: BUG #17928: Standby fails to decode WAL on termination of primary (Michael Paquier <michael@paquier.xyz>) |
Ответы |
Re: BUG #17928: Standby fails to decode WAL on termination of primary
|
Список | pgsql-bugs |
On Tue, Aug 15, 2023 at 12:00:30PM +0900, Michael Paquier wrote: > Not sure if that will help, but what I was playing with some stuff in > the lines of: > -- Store the length up to page boundary. > select setting::int - ((pg_current_wal_insert_lsn() - '0/0') % > setting::int) as boundary from pg_settings where name = 'wal_block_size' > \gset > -- Generate record up to boundary (56 bytes for base size of the record, > -- stop at 12 bytes before the end of the page. > select pg_logical_emit_message(false, '', repeat('a', :boundary - 56 - 12)); > > Then by injecting some FF's on the last page written and forcing > replay I am able to force some of the error code paths, so I guess > that's what you were basically doing? I've been spending some extra time on this one and hacked a TAP test that reliably reproduces the original issue, using a message similar to what I mentioned in my previous messages. I guess that we could use something like that: 2023-08-15 15:07:03.790 JST [8729] LOG: redo starts at 0/14EA428 2023-08-15 15:07:03.790 JST [8729] FATAL: invalid memory alloc request size 4294969740 2023-08-15 15:07:03.791 JST [8726] LOG: startup process (PID 8729) exited with exit code 1 The proposed patches pass the test, HEAD does not. We may want to do more with page boundaries, and more error patterns, but the idea looks worth exploring more. At least this can be used to validate patches. I've noticed while hacking the test that we don't do a XLogFlush() after inserting the message's record, so we may lose it on crash. That makes the test unstable except if an extra record is added after the logical messages. The attached patch forces that for the sake of the test, but I'm spawning a different thread as losing this data looks like a bug to me. -- Michael
Вложения
В списке pgsql-bugs по дате отправления: