Re: Amcheck verification of GiST and GIN

Поиск
Список
Период
Сортировка
От Arseniy Mukhin
Тема Re: Amcheck verification of GiST and GIN
Дата
Msg-id CAE7r3MKUWMxCCh6jq7EKbRpAsoQ_+wPXmk0X-Yee9qwBYmfoTw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Amcheck verification of GiST and GIN  (Andrey Borodin <x4mmm@yandex-team.ru>)
Список pgsql-hackers
On Sun, Jun 15, 2025 at 4:24 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
>
>
> Hi Arseniy!
>
> Thanks for finding these problems.
> I had several attempts to wrap my head around original patch with fixes, but when it was broken into several steps it
finallybecame easier for me. 
> Here are some thought about patches.
>

Hi Andrey! Thank you for the review.

>
> > On 10 Jun 2025, at 13:18, Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> wrote:
> > <0001-amcheck-Add-gin_index_check-on-a-multicolumn-index.patch>
>
> The test seems harmless and nice to have. I understand that this test is needed to extend coverage.
> Perhaps, we could verify that some code is actually triggered. Personally, I would be happy if we could some add
injectionpoints with notices at tested branches. But, AFAIK, it's too much of a burden to have injection points in
contribextensions. We had very similar problem with sort patch in btree_gist and eventually gave up. elog(DEBUG) was
nota good solution too, because it was unstable. 
> See 'gin-finish-incomplete-split' or 'hash-aggregate-enter-spill-mode' for reference.

I'm not familiar with injections points much, but I think I got the
idea, sounds interesting. Thank you for the references.

>
> Having some TAP tests sounds like a very good idea.
>
> I'm a bit surprised by excluding some letters from random_string(), but perhaps it's fine for this test.
>

Yeah, there is no reason why we can't use vowels here, so I will add
them so that it doesn't look like there is any point in their absence.

> Somewhere here:
> +               INSERT INTO $relname (a) VALUES (('{' || 'pppppppppp' || random_string(1870) ||'}')::text[]);
> I'd like to have a comment explaining number 1870. And, probably, you expect exactly 2 tuples on root page, right?
>

The idea behind "random_string(1870)" was to get split as fast as
possible, but tuples with size > 2kb are toasted, so we have to use
something about 2k here. I think I took 1870 from some other place
where it was necessary, but here we can round it to 1900. So I'll
replace 1870 with 1900 and add a comment about the size. Also gonna
add some comments about datasets in some tests to make it more clear.

> Are we 100% certain that 'rrrrrrrrr' will always be on root page?

I'm not 100% sure. AFAIK the split algorithm is deterministic and the
idea was that if we use very long tuples, then all other factors will
be too small to influence what key we will see on the root page.

> I do not see much value in having variables $relname and $indexname. I'd just substitute its usages with literals.
ButI'm not sure, maybe this structure will be used in your tests later... 

I added variables just because we use index name and table name
several times, but I don't mind getting rid of them.

>
> In this function
> +sub string_replace_block
> I'd suggest a little bit of comments. Also, perhaps, fsync of files, but 001_verify_heapam.pl does not do fsync. So,
maybeit's OK here too. 
>

Will add a comment here.

> Also, I have a wild idea. Maybe add an assert that block size if 8192 and just exit otherwise?

I like the idea. I thought maybe it would be great to have some
function that every TAP test can use if it needs a certain block size?

> And, maybe instead of gin_clean_pending_list() you can just create an index with fastupdate=off.

Yeah, I think we can do it even simpler if we move index creation to
the end as regression tests do.

>
> > On 10 Jun 2025, at 13:18, Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> wrote:
> > <0003-patch-2-gin_check_parent_keys_consistency.patch>
>
> The patch seems correct to me.
> Except this
> +       my $blkno = 5;  # leaf
> in test reads scary. Will it be stable on buildfarm?
>

Not sure, but I thought that blkno should be more or less the same everywhere.

>
> > On 9 May 2025, at 17:43, Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> wrote:
> >
> > 10) README says "Vacuum never deletes tuples or pages from the entry
> > tree." But check assumes that it's possible to have
> > deleted leaf page with 0 entries.
> >
> >    if (GinPageIsDeleted(page))
> >    {
> >       if (!GinPageIsLeaf(page))
> >          ereport(ERROR,
> >                (errcode(ERRCODE_INDEX_CORRUPTED),
> >                 errmsg("index \"%s\" has deleted internal page %u",
> >                      RelationGetRelationName(rel), blockNo)));
> >       if (PageGetMaxOffsetNumber(page) > InvalidOffsetNumber)
> >          ereport(ERROR,
> >                (errcode(ERRCODE_INDEX_CORRUPTED),
> >                 errmsg("index \"%s\" has deleted page %u with tuples",
> >                      RelationGetRelationName(rel), blockNo)));
> >    }
>
> To enforce such an invariant we must be sure that GIN never deleted entry pages in older versions. I do not have
enoughknowledge of the history for this. 

Agree, good point.

>
> > 11) When we compare entry tree max page key with parent key:
> >
> >             if (ginCompareAttEntries(&state, attnum, current_key,
> >                              current_key_category, parent_key_attnum,
> >                                      parent_key, parent_key_category) > 0)
> >             {
> >                /*
> >                 * There was a discrepancy between parent and child
> >                 * tuples. We need to verify it is not a result of
> >                 * concurrent call of gistplacetopage(). So, lock parent
> >                 * and try to find downlink for current page. It may be
> >                 * missing due to concurrent page split, this is OK.
> >                 */
> >                pfree(stack->parenttup);
> >                stack->parenttup = gin_refind_parent(rel, stack->parentblk,
> >                                            stack->blkno, strategy);
> >
> > I think we can remove gin_refind_parent() and do ereport right away here.
> > The same logic as with 3). AFAIK it's impossible to have a child item
> > with a key that is higher than the cached parent key.
> > Parent key bounds what keys we can insert into the child page, so it
> > seems there is no way how they can appear there.
> >
>
> This logic was copied from GiST check. In GiST "Area of responsibility" of internal tuple can be extended in any
direction.That's why we need to lock parent page. 
> If in GIN internal tuple keyspace is never extended - it's OK to avoid gin_refind_parent().
> But reasoning about GIN concurrency is rather difficult. Unfortunately, we do not have such checks in B-tree
verificationwithout ShareLock. Either way we could peep some idea from there. 
>

Got it.


Here is the new version. I fixed some points that Andrey mentioned.
All of them in the TAP test. Several comments were added, filler size
1870 changed to 1900. Also I added vowels to the replace function and
moved index creation after the data filling. Thank you!


Best regards,
Arseniy Mukhin

Вложения

В списке pgsql-hackers по дате отправления: