Re: BUG #17928: Standby fails to decode WAL on termination of primary
От | Michael Paquier |
---|---|
Тема | Re: BUG #17928: Standby fails to decode WAL on termination of primary |
Дата | |
Msg-id | ZQ9zf1QO8CP4TZRO@paquier.xyz обсуждение исходный текст |
Ответ на | Re: BUG #17928: Standby fails to decode WAL on termination of primary (Thomas Munro <thomas.munro@gmail.com>) |
Ответы |
Re: BUG #17928: Standby fails to decode WAL on termination of primary
|
Список | pgsql-bugs |
On Sun, Sep 24, 2023 at 09:48:42AM +1300, Thomas Munro wrote: > "grison" has a little more detail -- we see > pg_comp_crc32c_sb8(len=4294636456). I'm wondering how to reproduce > this, but among the questions that jump out I have: why was it ever OK > that we load record->xl_tot_len into total_len, perform header > validation, determine that total_len < len (= this record is all on > one page, no reassembly loop needed, so now we're in the single-page > branch), then call ReadPageInternal() again, then call > ValidXLogRecord() which internally loads record->xl_tot_len *again*? > ReadPageInternal() might have changed xl_tot_len, no? That seems to > be a possible pathway to reading past the end of the buffer in the CRC > check, no? > > If that value didn't change underneath us, I think we'd need an > explanation for how we finished up in the single-page branch at > xlogreader.c:842 with a large xl_tot_len, which I'm not seeing yet, > though it might take more coffee. (Possibly supporting the re-read > theory is the fact that it's only happening on a few very slow > computers, though I have no idea why it would only happen on master > [so far at least].) Hmm, it looks pretty clear that this is a HEAD-only thing as the buildfarm shows and as you say, and my primary suspect here would be 71e4cc6b8ec6, I think. Any race condition underneath it would be easier to see on slower machines. So it's likely possible that this has messed up the page insertion logic. -- Michael
Вложения
В списке pgsql-bugs по дате отправления: