Re: Amcheck verification of GiST and GIN

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: Amcheck verification of GiST and GIN
Дата
Msg-id CAH2-WzmUuuyZJkAnDUnP_TmfQMsa1JeO5yimcAeb5qczGB-0kg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Amcheck verification of GiST and GIN  (Andrey Borodin <amborodin86@gmail.com>)
Ответы Re: Amcheck verification of GiST and GIN  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-hackers
On Sun, Feb 5, 2023 at 4:45 PM Andrey Borodin <amborodin86@gmail.com> wrote:
> Here's v24 == (v23 + a step for pg_amcheck). There's a lot of
> shotgun-style changes, but I hope next index types will be easy to add
> now.

Some feedback on the GiST patch:

* You forgot to initialize GistCheckState.heaptuplespresent to 0.

It might be better to allocate GistCheckState dynamically, using
palloc0(). That's future proof. "Simple and obvious" is usually the
most important goal for managing memory in amcheck code. It can be a
little inefficient if that makes it simpler.

* ISTM that gist_index_check() should allow the caller to omit a
"heapallindexed" argument by specifying "DEFAULT FALSE", for
consistency with bt_index_check().

(Actually there are two versions of bt_index_check(), with
overloading, but that's just because of the way that the extension
evolved over time).

* What's the point in having a custom memory context that is never reset?

I believe that gistgetadjusted() will leak memory here, so there is a
need for some kind of high level strategy for managing memory. The
strategy within verify_nbtree.c is to call MemoryContextReset() right
after every loop iteration within bt_check_level_from_leftmost() --
which is pretty much once every call to bt_target_page_check(). That
kind of approach is obviously not going to suffer any memory leaks.

Again, "simple and obvious" is good for memory management in amcheck.

* ISTM that it would be clearer if the per-page code within
gist_check_parent_keys_consistency() was broken out into its own
function -- a little like bt_target_page_check()..

That way the control flow would be easier to understand when looking
at the code at a high level.

* ISTM that gist_refind_parent() should throw an error about
corruption in the event of a parent page somehow becoming a leaf page.

Obviously this is never supposed to happen, and likely never will
happen, even with corruption. But it seems like a good idea to make
the most conservative possible assumption by throwing an error. If it
never happens anyway, then the fact that we handle it with an error
won't matter -- so the error is harmless. If it does happen then we'll
want to hear about it as soon as possible -- so the error is useful.

* I suggest using c99 style variable declarations in loops.

Especially for things like "for (OffsetNumber offset =
FirstOffsetNumber; ... ; ... )".

--
Peter Geoghegan



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Nathan Bossart
Дата:
Сообщение: Re: improving user.c error messages
Следующее
От: Andres Freund
Дата:
Сообщение: Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?