Re: BUG #17928: Standby fails to decode WAL on termination of primary
От | Michael Paquier |
---|---|
Тема | Re: BUG #17928: Standby fails to decode WAL on termination of primary |
Дата | |
Msg-id | ZNmMqEXJ6ECTQ0Ys@paquier.xyz обсуждение исходный текст |
Ответ на | Re: BUG #17928: Standby fails to decode WAL on termination of primary (Noah Misch <noah@leadboat.com>) |
Ответы |
Re: BUG #17928: Standby fails to decode WAL on termination of primary
|
Список | pgsql-bugs |
On Sat, Aug 12, 2023 at 08:30:34PM -0700, Noah Misch wrote: > On Sun, Aug 13, 2023 at 03:12:34PM +1200, Thomas Munro wrote: >> On Sun, Aug 13, 2023 at 9:13 AM Noah Misch <noah@leadboat.com> wrote: >>> Any user could call pg_logical_emit_message() to silently terminate the WAL >>> stream, which is far worse than the original bug. So far, I'm seeing one way >>> to clearly fix $SUBJECT without that harm. When a record header spans a page >>> boundary, read the next page to reassemble the header. If >>> !ValidXLogRecordHeader() (invalid xl_rmid or xl_prev), treat as end of WAL. >>> Otherwise, read the whole record in chunks, calculating CRC. If CRC is >>> invalid, treat as end of WAL. Otherwise, ereport(FATAL) for the oversized >>> record. A side benefit would be avoiding useless large allocations (1GB >>> backend, 4GB frontend) as discussed upthread. May as well do the xl_rmid and >>> xl_prev checks in all branches, to avoid needless XLogRecordMaxSize-1 >>> allocations. Thoughts? For invalid-length records in v16+, since every such >>> record is end-of-wal or corruption, those versions could skip the CRC. >> >> That sounds quite strong. But... why couldn't the existing >> xlp_rem_len cross-check protect us from this failure mode? If we >> could defer the allocation until after that check (and the usual >> ValidXLogRecordHeader() check), I think we'd know that we're really >> looking at a size that was actually written in both pages (which must >> also have survived xlp_pageaddr check), no? +1 for relying on xlp_rem_len for that. > Hmm, I think that is right. A coincidental match of the 32-bit CRC is more > probable than having all those fields appear valid by coincidence, especially > xlp_pageaddr. Hmm. [.. thinks ..] It seems to me that we should try to also finish the header validation before attempting XLogReadRecordAlloc() on the total_len as well? It looks like the end result would be to move the first ReadPageInternal done for the header with all its cross-page checks before XLogReadRecordAlloc(). That should remove the need of having gotheader from v1. -- Michael
Вложения
В списке pgsql-bugs по дате отправления: