Re: BUG #17928: Standby fails to decode WAL on termination of primary

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Re: BUG #17928: Standby fails to decode WAL on termination of primary
Дата
Msg-id CA+hUKGLCkTT7zYjzOxuLGahBdQ=McF=z5ZvrjSOnW4EDhVjT-g@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BUG #17928: Standby fails to decode WAL on termination of primary  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: BUG #17928: Standby fails to decode WAL on termination of primary  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-bugs
On Mon, Sep 25, 2023 at 5:59 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Sure looks like ValidXLogRecord is assuming that record->xl_tot_len
> can be trusted without reservation.

I think there is something at least theoretically a bit fishy about
the ancient code that re-reads the page, doesn't consider short reads,
and assumes that record->xl_tot_len hasn't changed since last time it
loaded and validated it.  But I think that's also very unlikely to go
wrong, and it's not the problem here.

The problem is that bae868ca removed something we still need:

-               /* XXX: more validation should be done here */
-               if (total_len < SizeOfXLogRecord)
-               {
-                       report_invalid_record(state,
-
"invalid record length at %X/%X: expected at least %u, got %u",
-
LSN_FORMAT_ARGS(RecPtr),
-
(uint32) SizeOfXLogRecord, total_len);
-                       goto err;
-               }
+               /* We'll validate the header once we have the next page. */
                gotheader = false;

If you happened to run into zeroes where an xl_tot_len is wanted right
at the end of a page (or any value not big enough to get you to the
next page), we'll fall through to the single-page branch, and then go
directly to the CRC check, but then ValidXLogRecord() subtracts
SizeOfXLogRecord and gets a crazy big length.  The CRC implementation
routines on modern computers happened to use pointer arithmetic that
terminates immediately without accessing any memory, which is why
nothing was obviously wrong on most systems.  The _sb8.c
implementation for older ARM, MIPS etc use a length-based loop, and
read off into deep space.

Draft patch attached, including a new test for 039_end_of_wal.pl that
fails on all systems without the above code.

Вложения

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: BUG #17928: Standby fails to decode WAL on termination of primary
Следующее
От: Tom Lane
Дата:
Сообщение: Re: BUG #17928: Standby fails to decode WAL on termination of primary