Re: Corrupted btree index on HEAD because of covering indexes
От | Peter Geoghegan |
---|---|
Тема | Re: Corrupted btree index on HEAD because of covering indexes |
Дата | |
Msg-id | CAH2-WzmRA7eg2QK73d3Oekp9wYucf9L+1VZ5cvJB+2cy4DoLtA@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Corrupted btree index on HEAD because of covering indexes (Peter Geoghegan <pg@bowt.ie>) |
Ответы |
Re: Corrupted btree index on HEAD because of covering indexes
|
Список | pgsql-hackers |
On Sat, Apr 21, 2018 at 6:02 PM, Peter Geoghegan <pg@bowt.ie> wrote: > I refined the amcheck enhancement quite a bit today. It will not just > check that a downlink is not missing; It will also confirm that it > wasn't a legitimately interrupted multi-level deletion, by descending > to the leaf page to match the leaf high key pointer to the top most > parent, which should be the target page (the page that lacked a > downlink according to the new Bloom filter). We need to worry about > multi-level deletions that are interrupted by an error or a hard > crash, which presents a legitimate case where there'll be no downlink > for an internal page in its parent. VACUUM is okay with that, so we > must be too. Attached patch lets amcheck detect the issue when bt_index_parent_check() is called, though only when heapallindexed verification was requested (that is, only when bt_index_parent_check() is doing something with a Bloom filter already). The new checks will probably also detect any possible issue with multi-level page deletions. The patch tightens up our general expectations around half-dead and fully deleted pages, which seems necessary but also independently useful. I'm using work_mem to constrain the size of the second Bloom filter, whereas the heapallindexed Bloom filter is constrained by maintenance_work_mem. This seems fine to me, since we have always used an additional work_mem budget for spool2 when building a unique index within nbtsort.c. Besides, it will probably be very common for the downlink Bloom filter to be less than 1% the size of the first Bloom filter when we have adequate memory for both Bloom filters (i.e. very small). I thought about mentioning this work_mem allocation in the docs, but decided that there was no need, since the CREATE INDEX spool2 work_mem stuff isn't documented anywhere either. Note that the "c.relkind = 'i'" change in the docs is about not breaking the amcheck query when local partitioned indexes happen to be in use (when the user changed the sample SQL query to not just look at pg_catalog indexes). See the "Local partitioned indexes and pageinspect" thread I just started for full details. The new P_ISDELETED() test within bt_downlink_missing_check() is what actually detects the corruption that the test case causes, since the fully deleted leaf page still has a sane top parent block number left behind (that is, we don't get as far as testing "if (BTreeTupleGetTopParent(itup) == state->targetblock)"; that's not how the the leaf page can get corrupt in the test case that Michael posted). Note that there are also two new similar P_ISDELETED() tests added to two existing functions (bt_downlink_check() and bt_check_level_from_leftmost()), but those tests won't detect the corruption that we saw. They're really there to nail down how we think about fully deleted pages. -- Peter Geoghegan
Вложения
В списке pgsql-hackers по дате отправления: