Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
От | Tom Lane |
---|---|
Тема | Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages |
Дата | |
Msg-id | 17450.1389644346@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages (Heikki Linnakangas <hlinnakangas@vmware.com>) |
Ответы |
Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references
to invalid pages
|
Список | pgsql-bugs |
Heikki Linnakangas <hlinnakangas@vmware.com> writes: > On 01/06/2014 03:48 PM, Andres Freund wrote: >> There just was another case of this reported on IRC by MatheusOl and for >> some reason in his case I noticed the pertinent details and it quickly >> clicked: >> * page 14833 is the one with the error >> * we're actually vacuuming page 38538 >> * lastBlockVacuumed is 0 >> >> In btree_xlog_vacuum() we scan all the pages between lastBlockVacuumed >> and the page vacuumed and acquire a cleanup lock on it. But there isn't >> any guarantee that the intermediate pages are valid, filled pages, >> afaics. > Hmm. So the problem arises if there's an uninitialized page in the > middle of the b-tree relation for some reason. It's unusual for an > uninitialized page to be left in the middle of the relation, but it's > certainly possible, if e.g you crash just after extending the relation. Right. This diagnosis is incomplete in itself, because if the slave has a zeroed page there, shouldn't the master have one too? If the master does have a zeroed page there, how come vacuum didn't fail on the master? The answer is that btvacuumpage will skip over all-zero pages without doing anything more than noting them as free in FSM. When btree_xlog_vacuum rescans the relation, it will also skip over all-zero pages without doing anything --- but XLogReadBufferExtended logs such a page as invalid, and then bitches later when it doesn't see the page dropped or truncated away. >> ISTM we can just use RBM_ZERO_ON_ERROR instead of RBM_NORMAL. > That'd be horrendously dangerous. It would silently zap any page with > any error on it. But we could add a new ReadBufferMode that returns > InvalidBuffer on error, without zeroing the page. The important point is not just that it not damage the page, but that it not log it as invalid. I concur that the right fix requires a new operating mode for XLogReadBufferExtended, perhaps RBM_NORMAL_ZERO_OK. I think the spec for this should be that if the page doesn't exist or contains zeroes, we return InvalidBuffer without logging the page number as invalid. The doesn't-exist case is justified by the expectation that there will be a later RBM_NORMAL call for a larger page number, which will result in a suitable complaint if the page range isn't there. Will go fix this if there's not any objection to that plan. regards, tom lane
В списке pgsql-bugs по дате отправления: