Re: post-recovery amcheck expectations
От | Noah Misch |
---|---|
Тема | Re: post-recovery amcheck expectations |
Дата | |
Msg-id | 20231021035457.cb.nmisch@google.com обсуждение исходный текст |
Ответ на | Re: post-recovery amcheck expectations (Peter Geoghegan <pg@bowt.ie>) |
Ответы |
Re: post-recovery amcheck expectations
|
Список | pgsql-hackers |
On Mon, Oct 09, 2023 at 04:46:26PM -0700, Peter Geoghegan wrote: > On Wed, Oct 4, 2023 at 7:52 PM Noah Misch <noah@leadboat.com> wrote: > Might make sense to test the fix for this issue using a similar > approach: by adding custom code that randomly throws errors at a point > that stresses the implementation. I'm referring to the point at which > VACUUM is between the first and second phase of page deletion: right > before (or directly after) _bt_unlink_halfdead_page() is called. That > isn't fundamentally different to the approach from your TAP test, but > seems like it might add some interesting variation. My initial manual test was like that, actually. > > I lean toward fixing this by > > having amcheck scan left; if left links reach only half-dead or deleted pages, > > that's as good as the present child block being P_LEFTMOST. > > Also my preference. Done mostly that way, except I didn't accept deleted pages. Making this work on !readonly would take more than that, and readonly shouldn't need that. > > There's a > > different error from bt_index_check(), and I've not yet studied how to fix > > that: > > > > ERROR: left link/right link pair in index "not_leftmost_pk" not in agreement > > DETAIL: Block=0 left block=0 left link from block=4294967295. > > This looks like this might be a straightforward case of confusing > P_NONE for InvalidBlockNumber (or vice-versa). They're both used to > indicate "no such block" here. Roughly so. It turned out this scenario was passing leftcurrent=P_NONE to bt_recheck_sibling_links(), causing that function to use BTPageGetOpaque() on the metapage. > > For some other amcheck expectations, the comments suggest reliance on the > > bt_index_parent_check() ShareLock. I haven't tried to make test cases for > > them, but perhaps recovery can trick them the same way. Examples: > > > > errmsg("downlink or sibling link points to deleted block in index \"%s\"", > > errmsg("block %u is not leftmost in index \"%s\"", > > errmsg("block %u is not true root in index \"%s\"", > > These are all much older. They're certainly all from before the > relevant checks were first added (by commit d114cc53), and seem much > less likely to be buggy. After I fixed the original error, the "block %u is not leftmost" surfaced next. The attached patch fixes that, too. I didn't investigate the others. The original test was flaky in response to WAL flush timing, but this one survives thousands of runs.
Вложения
В списке pgsql-hackers по дате отправления: