Re: amcheck (B-Tree integrity checking tool)

Поиск
Список
Период
Сортировка
От Anastasia Lubennikova
Тема Re: amcheck (B-Tree integrity checking tool)
Дата
Msg-id 56E2EFF5.3000809@postgrespro.ru
обсуждение исходный текст
Ответ на amcheck (B-Tree integrity checking tool)  (Peter Geoghegan <pg@heroku.com>)
Ответы Re: amcheck (B-Tree integrity checking tool)  (Michael Paquier <michael.paquier@gmail.com>)
Re: amcheck (B-Tree integrity checking tool)  (Peter Geoghegan <pg@heroku.com>)
Re: amcheck (B-Tree integrity checking tool)  (Peter Geoghegan <pg@heroku.com>)
Список pgsql-hackers
01.03.2016 03:09, Peter Geoghegan:
> I was assigned an "action point" during the FOSDEM developer meeting:
> "Post new version of btree consistency checker patch". I attach a new
> WIP version of my consistency checker tool, amcheck. This patch is
> proposed for 9.6, as an extension in contrib -- maybe we can still get
> it in. This is the first time I've added any version of this to a
> commitfest, although I've posted a couple of rough versions of this in
> the past couple of years. The attached version has received a major
> overhaul, and is primarily aimed at finding corruption in production
> systems, although I think it will still have significant value for
> debugging too. Maybe it can help with some of the B-Tree patches in
> the final commitfest, for example. I also have some hope that it will
> become a learning tool for people interested in how B-Tree indexes
> work.
>
> To recap, the extension adds some SQL-callable functions that verify
> certain invariant conditions hold within some particular B-Tree index.
> These are the conditions that index scans rely on always being true.
> The tool's scope may eventually cover other AMs, including heapam, but
> nbtree seems like the best place to start.
>
> Note that no function currently checks that the index is consistent
> with the heap, which would be very useful (that's probably how I'd
> eventually target the heapam, actually).

Hi,
thank you for the contrib module. The patch itself looks great.
It was really useful to have such a tool while working on b-tree patches.

> Invariants
> ========
>
> nbtree invariants that the tool verifies with just an AccessShareLock
> on the relation are:
>
> * That all items are in the correct, opclass order on each page.
>
> * That the page "high key", if any, actually bounds the items on the page.
>
> * That the last item on a page is less than or equal to the first item
> on the next page (the page to its right). The idea here is that the
> key space spans multiple pages, not just one page, so it make sense to
> check the last item where we can.
>
> With an ExclusiveLock + ShareLock, some addition invariants are verified:
>
> * That child pages actually have their parent's downlink as a lower bound.
>
> * Sane right links and left links at each level.
>
> Obviously, this tool is all about distrusting the structure of a
> B-Tree index. That being the case, it's not always totally clear where
> to draw the line. I think I have the balance about right, though.
>
> Interface
> =======
>
> There are only 2 SQL callable functions in the extension, which are
> very similar:
>
> bt_index_check(index regclass)
>
> bt_index_parent_check(index regclass)
>
> The latter is more thorough than the former -- it performs all checks,
> including those checks that I mentioned required an ExclusiveLock. So,
> bt_index_check() only requires an AccessShareLock.
> bt_index_parent_check() requires an ExclusiveLock on the index
> relation, and a ShareLock on its heap relation, almost like REINDEX.
> bt_index_parent_check() performs verification that is a superset of
> the verification performed by bt_index_check() -- mostly, the extra
> verification/work is that it verifies downlinks against child pages.
>
> Both functions raise an error in the event of observing that an
> invariant in a B-Tree was violated, such as items being out of order
> on a page. I've written extensive documentation, which goes into
> practical aspects of using amcheck effectively. It doesn't go into
> significant detail about every invariant that is checked, but gives a
> good idea of roughly what checks are performed.
>
> I could almost justify only having one function with an argument about
> the downlink/child verification, but that would be a significant
> footgun given the varying locking requirements that such a function
> would have.

I've done some testing. And I can say that both announced functions work 
well.
BTW, if you know a good way to corrupt index (and do it reproducible) 
I'd be very glad to see it.

I created an index, then opened the file and changed the order of elements.
And bt_index_check() found the error successfully.

/* Ensure that the data is changed. */
SELECT * FROM bt_page_items('idx', 1);
 itemoffset | ctid  | itemlen | nulls | vars | data
------------+-------+---------+-------+------+-------------------------          1 | (0,3) |      16 | f     | f    |
0300 00 00 00 00 00 00          2 | (0,2) |      16 | f     | f    | 02 00 00 00 00 00 00 00          3 | (0,1) |
16| f     | f    | 01 00 00 00 00 00 00 00
 
(3 rows)


/* Great, amcheck found an error. */
select bt_index_check('idx');
ERROR:  page order invariant violated for index "idx"
DETAIL:  Lower index tid=(1,1) (points to heap tid=(0,3)) higher index 
tid=(1,2) (points to heap tid=(0,2)) page lsn=0/17800D0.

> Locking
> ======
>
> We never rely on something like holding on to a buffer pin as an
> interlock for correctness (the vacuum interlock thing isn't generally
> necessary, because we don't look at the heap at all). We simply pin +
> BT_READ lock a buffer, copy it into local memory allocated by
> palloc(), and then immediately release the buffer lock and drop the
> pin. This is the same in all instances. There is never more than one
> buffer lock or pin held at a time.
>
> We do, on the other hand, have a detailed rationale for why it's okay
> that we don't have an ExclusiveLock on the index relation for checks
> that span the key space of more than one page by following right links
> to compare items across sibling pages. This isn't the same thing as
> having an explicit interlock like a pin -- our interlock is one
> against *recycling* by vacuum, which is based on recentGlobalXmin.
> This rationale requires expert review.

I'm not an expert, but I promise to take a closer look at locking.
I will send another email soon.
> Performance
> ==========
>
> Trying to keep the tool as simple as possible, while still making it
> do verification that is as useful as possible was my priority here,
> not performance. Still, verification completes fairly quickly.
> Certainly, it takes far less time than having to REINDEX the index,
> and doesn't need too much memory. I think that in practice most
> problems that can be detected by the B-Tree checker functions will be
> detected with the lighter variant.

I do not see any performance issues.
I'm sure that if someone wants to check whether the index is corrupted, 
he certainly can wait a minute (.

But I have some concerns about compatibility with my patches.
I've tried to call bt_index_check() over my "including" patch [1] and 
caught a segfault.

LOG:  server process (PID 31794) was terminated by signal 11: 
Segmentation fault
DETAIL:  Failed process was running: select bt_index_check('idx');

I do hope that my patch will be accepted in 9.6, so this conflict looks 
really bad.
I think that error is caused by changes in pages layout. To save some 
space nonkey attributes are truncated
when btree copies the indexed value into inner pages or into high key. 
You can look at index_reform_tuple() calls.

I wonder, whether you can add some check of catalog version to your 
module or this case requires more changes?

With second patch which implements compression [2], amcheck causes 
another error.
postgres=# insert into tbl select 1 from generate_series(0,5);
INSERT 0 6
postgres=# SELECT * FROM bt_page_items('idx', 1); itemoffset |      ctid      | itemlen | nulls | vars | data


------------+----------------+---------+-------+------+----------------------------------------------------------------------------------------
---------------------------------------------------------          1 | (2147483664,6) |      56 | f     | f    | 01 00
0000 00 
 
00 00 00 00 00 00 00 01 00 00 00 00 00 02 00 00 00 00 00 03 00 00 00 00
00 04 00 00 00 00 00 05 00 00 00 00 00 06 00 00 00 00 00

postgres=# select bt_index_check('idx');
ERROR:  cannot check index "idx"
DETAIL:  index is not yet ready for insertions

But I'm sure that it's a problem of my patch. So I'll fix it and try again.

[1] https://commitfest.postgresql.org/9/433/
[2] https://commitfest.postgresql.org/9/494/

-- 
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




В списке pgsql-hackers по дате отправления:

Предыдущее
От: David Steele
Дата:
Сообщение: Re: Proposal: BSD Authentication support
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: amcheck (B-Tree integrity checking tool)