Re: BUG #17928: Standby fails to decode WAL on termination of primary
От | Noah Misch |
---|---|
Тема | Re: BUG #17928: Standby fails to decode WAL on termination of primary |
Дата | |
Msg-id | 20230811140008.GB2261449@rfd.leadboat.com обсуждение исходный текст |
Ответ на | Re: BUG #17928: Standby fails to decode WAL on termination of primary (Michael Paquier <michael@paquier.xyz>) |
Ответы |
Re: BUG #17928: Standby fails to decode WAL on termination of primary
|
Список | pgsql-bugs |
On Fri, Aug 11, 2023 at 03:08:26PM +0900, Michael Paquier wrote: > On Thu, Aug 10, 2023 at 07:58:08PM -0700, Noah Misch wrote: > > On Thu, Aug 10, 2023 at 04:45:25PM +0900, Michael Paquier wrote: > >> Good idea to pollute the data with recycled segments. Using a minimal > >> WAL segment size whould help here as well in keeping a test cheap, and > >> two segments should be enough. The alignment calculations and the > >> header size can be known, but the standby records are an issue for the > >> predictability of the test when it comes to adjust the length of the > >> logical message depending on the 8k WAL page, no? > > > > Could be. I expect there would be challenges translating that outline into a > > real test, but I don't know if that will be a major one. The test doesn't > > need to be 100% deterministic. If it fails 25% of the time and is not the > > slowest test in the recovery suite, I'd find that good enough. > > FWIW, I'm having a pretty hard time to get something close enough to a > page border in a reliable way. Perhaps using a larger series of > records and select only one would be more reliable.. Need to test > that a bit more. Interesting. So pg_logical_emit_message(false, 'X', repeat('X', n)) doesn't get close enough, but s/n/n+1/ spills to the next page? If so, I did not anticipate that. > >> FWIW, I came back to this thread while tweaking the error reporting of > >> xlogreader.c for the sake of this thread and this proposal is an > >> improvement to be able to make a distinction between an OOM and an > >> incorrect record: > >> https://www.postgresql.org/message-id/ZMh/WV+CuknqePQQ@paquier.xyz > >> > >> Anyway, agreed that it's an improvement to remove this check out of > >> allocate_recordbuf(). Noah, are you planning to work more on that? > > > > I can push xl_tot_len-validate-v1.patch, particularly given the testing result > > you reported today. I'm content for my part to stop there. > > Okay, fine by me. That's going to help with what I am doing in the > other thread as I'd need to make a better difference between the OOM > and the invalid cases for the allocation path. > > You are planning for a backpatch to take care of the inconsistency, > right? The report has been on 15~ where the prefetching was > introduced. I'd be OK to just do that and not mess up with the stable > branches more than necessary (aka ~14) if nobody complains, especially > REL_11_STABLE planned to be EOL'd in the next minor cycle. I recall earlier messages theorizing that it was just harder to hit in v14, so I'm disinclined to stop at v15. I think the main choice is whether to stop at v11 (normal choice) or v12 (worry about breaking the last v11 point release). I don't have a strong opinion between those.
В списке pgsql-bugs по дате отправления: