Re: BUG #17928: Standby fails to decode WAL on termination of primary
От | Thomas Munro |
---|---|
Тема | Re: BUG #17928: Standby fails to decode WAL on termination of primary |
Дата | |
Msg-id | CA+hUKGLCkTT7zYjzOxuLGahBdQ=McF=z5ZvrjSOnW4EDhVjT-g@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: BUG #17928: Standby fails to decode WAL on termination of primary (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: BUG #17928: Standby fails to decode WAL on termination of primary
|
Список | pgsql-bugs |
On Mon, Sep 25, 2023 at 5:59 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Sure looks like ValidXLogRecord is assuming that record->xl_tot_len > can be trusted without reservation. I think there is something at least theoretically a bit fishy about the ancient code that re-reads the page, doesn't consider short reads, and assumes that record->xl_tot_len hasn't changed since last time it loaded and validated it. But I think that's also very unlikely to go wrong, and it's not the problem here. The problem is that bae868ca removed something we still need: - /* XXX: more validation should be done here */ - if (total_len < SizeOfXLogRecord) - { - report_invalid_record(state, - "invalid record length at %X/%X: expected at least %u, got %u", - LSN_FORMAT_ARGS(RecPtr), - (uint32) SizeOfXLogRecord, total_len); - goto err; - } + /* We'll validate the header once we have the next page. */ gotheader = false; If you happened to run into zeroes where an xl_tot_len is wanted right at the end of a page (or any value not big enough to get you to the next page), we'll fall through to the single-page branch, and then go directly to the CRC check, but then ValidXLogRecord() subtracts SizeOfXLogRecord and gets a crazy big length. The CRC implementation routines on modern computers happened to use pointer arithmetic that terminates immediately without accessing any memory, which is why nothing was obviously wrong on most systems. The _sb8.c implementation for older ARM, MIPS etc use a length-based loop, and read off into deep space. Draft patch attached, including a new test for 039_end_of_wal.pl that fails on all systems without the above code.
Вложения
В списке pgsql-bugs по дате отправления: