Обсуждение: amcheck support for BRIN indexes

Поиск
Список
Период
Сортировка

amcheck support for BRIN indexes

От
Arseniy Mukhin
Дата:
Hi,

It's not for v18, just wanted to share with the community and register
it in the upcoming Commitfest 2025-07.

Here is the patch with amcheck support for BRIN indexes.

Patch uses amcheck common infrastructure that was introduced in [1].
It works and I deleted all the code that
I copied from btree check at first. Great.

During the check we hold ShareUpdateExclusiveLock, so we don't block
regular reads/inserts/updates
and the same time range summarizations/desummarizations are impossible
during the check which simplifies check logic.
While checking we do ereport(ERROR) on the first issue, the same way
as btree, gin checks do.

There are two parts:

First part is 'index structure check':
1) Meta page check
2) Revmap check. Walk revmap and check every valid revmap item points
to the index tuple with the expected range blkno,
and index tuple is consistent with the tuple description. Also it's
not documented, but known from the brin code that
for every empty range we should have allnulls = true, hasnulls =
false. So this is also checked here.
3) Regular pages check. Walk regular pages and check that every index
tuple has a corresponding revmap item that points to it.
We don't check index tuple structures here, because it was already
done in 2 steps.

Regular pages check is optional. Orphan index tuple errors in this
check doesn't necessary mean that index is corrupted,
but AFAIS brin is not supposed to have such orphan index tuples now,
so if we encounter one than probably something
wrong with the index.

Second part is 'all consistent check':
We check all heap tuples are consistent with the index. It's the most
expensive part and it's optional.
Here we call consistent functions for every heap tuple. Also here we
check that fields like 'has_nulls', 'all_nulls',
'empty_range' are consistent with what we see in the heap.


There are two patch files:

0001-brin-refactoring.patch

It's just two tiny changes in the brin code.
1) For index tuple structure check we need to know how on-disk index
tuples look like.
Function that lets you get it 'brtuple_disk_tupdesc' is internal. This
patch makes it extern.
2) Create macros for BRIN_MAX_PAGES_PER_RANGE. For the meta page check.

0002-amechek-brin-support.patch

It's check functionality itself + regression tests + amcheck extension updates.


Some open questions:

1) How to get the correct strategy number for the scan key in "all
heap consistent" check. The consistent function wants
a scan key, and to generate it we have to pick a strategy number. We
can't just use the same strategy number for all
indexes because its meaning differs from opclass to opclass (for
example equal strategy number is 1 in Bloom
and 3 in min_max). We also can't pick an operator and use it for every
check, because opclasses don't have any requirements
about what operators they should support. The solution was to let user
to define check operator
(parameter consistent_operator_names). It's an array as we can have a
multicolumn index. We can use '=' as default check
operator, because AFAIS '=' operator is supported by all core
opclasses except 'box_inclusion_ops', and IMHO it's the
most obvious candidate for such a check. So if param
'consistent_operator_names' is an empty array (param default value),
then for all attributes we use operator '='. In most cases operator
'=' does the job and you don't need to worry about
consistent_operator_names param. Not sure about it, what do you think?

2) The current implementation of "all heap consistent" uses the scan
of the entire heap. So if we have a lot of
unsummarized ranges, we do a lot of wasted work because we can't use
the tuples that belong to the unsummarized ranges.
Instead of having one scan for the entire heap, we can walk the
revmap, take only the summarized ranges, and
scan only the pages that belong to those ranges. So we have one scan
per range. This approach helps us avoid touching
those heap tuples that we can't use for index checks. But I'm not sure
if we should to worry about that because every
autovacuum summarizes all the unsummarized ranges, so don't expect a
lot of unsummarized ranges on average.

TODO list:

1) add TAP tests
2) update SGML docs for amcheck (think it's better to do after patch
is reviewed and more or less finalized)
3) pg_amcheck integration

Big thanks to Tomas Vondra for the first patch idea and initial review.


[1] https://www.postgresql.org/message-id/45AC9B0A-2B45-40EE-B08F-BDCF5739D1E1%40yandex-team.ru


Best regards,
Arseniy Mukhin

Вложения

Re: amcheck support for BRIN indexes

От
Arseniy Mukhin
Дата:
Hi,

Here is a new version.

TAP tests were added. Tried to reproduce more or less every violation.
I don't include 2 tests where disk representation ItemIdData needs to
be changed: it works locally, but I don't think these tests are
portable. While writing tests some minor issues were found and fixed.
Also ci compiler warnings were fixed.


Best regards,
Arseniy Mukhin

Вложения

Re: amcheck support for BRIN indexes

От
Tomas Vondra
Дата:
On 6/8/25 14:39, Arseniy Mukhin wrote:
> Hi,
> 
> Here is a new version.
> 
> TAP tests were added. Tried to reproduce more or less every violation.
> I don't include 2 tests where disk representation ItemIdData needs to
> be changed: it works locally, but I don't think these tests are
> portable. While writing tests some minor issues were found and fixed.
> Also ci compiler warnings were fixed.
> 

Thanks. I've added myself as a reviewer, so that I don't forget about
this for the next CF.


regards

-- 
Tomas Vondra




Re: amcheck support for BRIN indexes

От
Andrey Borodin
Дата:
Hi!

Nice feature!

> On 8 Jun 2025, at 17:39, Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> wrote:
>
> <v2-0001-brin-refactoring.patch>


+#define BRIN_MAX_PAGES_PER_RANGE    131072

I'd personally prefer some words on where does this limit come from. I'm not a BRIN pro, just curious.
Or, at least, maybe we can use a form 128 * 1024, if it's just a sane limit.



> On 8 Jun 2025, at 17:39, Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> wrote:
> <v2-0002-amcheck-brin-support.patch>


A bit more detailed commit message would be very useful.

The test coverage is very decent. The number of possible corruptions in tests is impressive. I don't really have an
experiencewith BRIN to say how different they are, but I want to ask if you are sure that these types of corruption
willbe portable across big-endian machines and such stuff. 

Copyright year in all new files should be 2025.

Some documentation about brin_index_check() would be handy, especially about its parameters. Perhaps, somewhere near
gin_index_check()in amcheck.sgml. 

brin_check_ereport() reports ERRCODE_DATA_CORRUPTED. We should distinguish between ERRCODE_INDEX_CORRUPTED which is a
stormbringerand ERRCODE_DATA_CORRUPTED which is an actual disaster. 

If it's not very difficult - it would be great to use read_stream infrastructure. See btvacuumscan() in nbtree.c
callingread_stream_begin_relation() for example. We cannot use it in logical scans in B-tree\GiST\GIN, but maybe BRIN
cantake some advantage of this new shiny technology. 

+        state->revmapbuf = ReadBufferExtended(idxrel, MAIN_FORKNUM, state->revmapBlk, RBM_NORMAL,
+                                              state->checkstrategy);
+        LockBuffer(state->revmapbuf, BUFFER_LOCK_SHARE);
// usage of state->revmapbuf
+        LockBuffer(state->revmapbuf, BUFFER_LOCK_UNLOCK);
// more usage of state->revmapbuf
+        ReleaseBuffer(state->revmapbuf);

I hope you know what you are doing. BRIN concurrency is not known to me at all.


That's all for first pass through patches. Thanks for working on it!


Best regards, Andrey Borodin.


Re: amcheck support for BRIN indexes

От
Andrey Borodin
Дата:

> On 18 Jun 2025, at 11:33, Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> wrote:
>
> Interesting, I used btree check as reference when started
> writing brin check, and in btree check there 53
> ERRCODE_INDEX_CORRUPTED ereports and only 1 ERRCODE_DATA_CORRUPTED
> ereport. So it was very hard to do, but I managed to pick the wrong
> one. I wonder if this btree check ereport should also be changed to
> ERRCODE_INDEX_CORRUPTED?

It's there in a case of heapallindexes failure. I concur that ERRCODE_INDEX_CORRUPTED is more appropriate in that case
inverify_nbtree.c. 
But I recollect Peter explained this code before somewhere in pgsql-hackers. And the reasoning was something like "if
youlack a tuple in unquie constraints - it's almost certainly subsequent constrain violation and data loss". But I'm
notsure. 
And I could not find this discussion in archives.


Best regards, Andrey Borodin.


Re: amcheck support for BRIN indexes

От
Arseniy Mukhin
Дата:
Hi,

I would like share some thoughts about 'heap all consistent' part and
one of the open questions:

The idea behind 'heap all consistent' is to use heap data to validate
the index. BRIN doesn't store heap tuples, so there is no
straightforward way to check if every tuple was indexed or not. We
have range data, so we need to do something with every heap tuple and
corresponding range. Something that will tell us if the range data
covers the heap tuple or not. What options I see here:

1) We can use the addValue() function. It returns FALSE if range data
was not changed (in other words range data already covers heap tuple
data that we pass to the function). It's very easy to do, we can use
heap tuples directly. But the problem I see here is that addValue()
can return TRUE even if heap tuple data have been already covered by
the range, but range data itself changed for some reason (some new
algorithm were applied  for instance). So I think we can have some
false positives that we can do nothing about.

2) We can use the consistent() function. It requires ScanKey and
returns true if the range data satisfies ScanKey's condition. So we
need to convert every heap tuple into ScanKey somehow. This approach
is implemented now in the patch, so I tried to describe all details
about heap tuple to ScanKey conversion in the comment:

/*
 * Prepare ScanKey for index attribute.
 *
 * ConsistentFn requires ScanKey, so we need to generate ScanKey for every
 * attribute somehow. We want ScanKey that would result in TRUE for every heap
 * tuple within the range when we use its indexed value as sk_argument.
 * To generate such a ScanKey we need to define the right operand type
and the strategy number.
 * Right operand type is a type of data that index is built on, so
it's 'opcintype'.
 * There is no strategy number that we can always use,
 * because every opclass defines its own set of operators it supports
and strategy number
 * for the same operator can differ from opclass to opclass.
 * So to get strategy number we look up an operator that gives us
desired behavior
 * and which both operand types are 'opcintype' and then retrieve the
strategy number for it.
 * Most of the time we can use '='. We let user define operator name
in case opclass doesn't
 * support '=' operator. Also, if such operator doesn't exist, we
can't proceed with the check.
 *
 * Generated once, and will be reused for all heap tuples.
 * Argument field will be filled for every heap tuple before
 * consistent function invocation, so leave it NULL for a while.
 *
 */

With this approach function brin_check() has optional parameter
'consistent_operator_names' that it seems to me could be very
confusing for users. In general I think this is the most complex
approach both in terms of use and implementation.

3) The approach that seems to me the most clear and straightforward:
to add new optional function to BRIN opclass API. The function that
would say if passed value is covered with the current range data. it's
exactly what we want to know, so we can use heap data directly here.
Something like that:

bool withinRange(BrinDesc *bdesc, BrinValues *column, Datum val, bool isnull)

It could be an optional function that will be implemented for all core
BRIN opclasses. So if somebody wants to use it for some custom opclass
they will need to implement it too, but it's not required. I
understand that adding something to the index opclass API requires
very strong arguments. So the argument here is that it will let to do
brin check very robust (without possible false positives as in the
first approach) and easy to use (no additional parameters in the check
function). Also, the withinRange() function could be written in such a
way that it would be more efficient for our task than addValue() or
consistent().

I'd be glad to hear your thoughts!


Best regards,
Arseniy Mukhin



Re: amcheck support for BRIN indexes

От
Álvaro Herrera
Дата:
On 2025-Jul-06, Arseniy Mukhin wrote:

> Sorry, forget to run a full test run with the new patch version. Some
> tests were unhappy with the new unknown support function. Here the new
> version with the fix.

Hello, I think this patch is probably a good idea.  I don't think it
makes sense to introduce a bunch of code in 0003 only to rewrite it
completely in 0005.  I would ask that you re-split your WITHIN_RANGE
(0004) to appear before the amcheck code, and then write the amcheck
code using that new functionality.

>  /*
>   * Return a tuple descriptor used for on-disk storage of BRIN tuples.
>   */
> -static TupleDesc
> +TupleDesc
>  brtuple_disk_tupdesc(BrinDesc *brdesc)

I think we should give this function a better name if it's going to be
exported.  How about brin_tuple_tupdesc?  (in brin_tuple.h we
seem to distinguish "brin tuples" which are the stored ones, from "brin
mem tuples" which are the ones to be used in memory.)

I didn't read the other patches.

Thanks

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
"I'm impressed how quickly you are fixing this obscure issue. I came from 
MS SQL and it would be hard for me to put into words how much of a better job
you all are doing on [PostgreSQL]."
 Steve Midgley, http://archives.postgresql.org/pgsql-sql/2008-08/msg00000.php



Re: amcheck support for BRIN indexes

От
Arseniy Mukhin
Дата:
On Sun, Jul 6, 2025 at 10:49 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:
>
> On 2025-Jul-06, Arseniy Mukhin wrote:
>
> > Sorry, forget to run a full test run with the new patch version. Some
> > tests were unhappy with the new unknown support function. Here the new
> > version with the fix.
>
> Hello, I think this patch is probably a good idea.  I don't think it
> makes sense to introduce a bunch of code in 0003 only to rewrite it
> completely in 0005.  I would ask that you re-split your WITHIN_RANGE
> (0004) to appear before the amcheck code, and then write the amcheck
> code using that new functionality.

Hi, Álvaro!

Thank you for looking into this.

OK, we can easily revert to the version with consistent function if
needed, so let's get rid of it.

>
> >  /*
> >   * Return a tuple descriptor used for on-disk storage of BRIN tuples.
> >   */
> > -static TupleDesc
> > +TupleDesc
> >  brtuple_disk_tupdesc(BrinDesc *brdesc)
>
> I think we should give this function a better name if it's going to be
> exported.  How about brin_tuple_tupdesc?  (in brin_tuple.h we
> seem to distinguish "brin tuples" which are the stored ones, from "brin
> mem tuples" which are the ones to be used in memory.)
>

'brin_tuple_tupdesc' sounds good to me. Done.


So here is a new version. 0001, 0002 - index structure check. 0003,
0004 - all heap indexed using WITHIN_RANGE approach.

Thank you!


Best regards,
Arseniy Mukhin

Вложения

Re: amcheck support for BRIN indexes

От
Tomas Vondra
Дата:

On 7/7/25 13:06, Arseniy Mukhin wrote:
> On Sun, Jul 6, 2025 at 10:49 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:
>>
>> On 2025-Jul-06, Arseniy Mukhin wrote:
>>
>>> Sorry, forget to run a full test run with the new patch version. Some
>>> tests were unhappy with the new unknown support function. Here the new
>>> version with the fix.
>>
>> Hello, I think this patch is probably a good idea.  I don't think it
>> makes sense to introduce a bunch of code in 0003 only to rewrite it
>> completely in 0005.  I would ask that you re-split your WITHIN_RANGE
>> (0004) to appear before the amcheck code, and then write the amcheck
>> code using that new functionality.
> 
> Hi, Álvaro!
> 
> Thank you for looking into this.
> 
> OK, we can easily revert to the version with consistent function if
> needed, so let's get rid of it.
> 

Alvaro, what's your opinion on the introduction of the new WITHIN_RANGE?
I'd probably try to do this using the regular consistent function:

(a) we don't need to add stuff to all BRIN opclasses to support this

(b) it gives us additional testing of the consistent function

(c) building a scan key for equality seems pretty trivial

What do you think?


-- 
Tomas Vondra




Re: amcheck support for BRIN indexes

От
Álvaro Herrera
Дата:
On 2025-Jul-07, Tomas Vondra wrote:

> Alvaro, what's your opinion on the introduction of the new WITHIN_RANGE?
> I'd probably try to do this using the regular consistent function:
> 
> (a) we don't need to add stuff to all BRIN opclasses to support this
> 
> (b) it gives us additional testing of the consistent function
> 
> (c) building a scan key for equality seems pretty trivial
> 
> What do you think?

Oh yeah, if we can build this on top of the existing primitives, then
I'm all for that.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/



Re: amcheck support for BRIN indexes

От
Arseniy Mukhin
Дата:
On Mon, Jul 7, 2025 at 3:21 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:
>
> On 2025-Jul-07, Tomas Vondra wrote:
>
> > Alvaro, what's your opinion on the introduction of the new WITHIN_RANGE?
> > I'd probably try to do this using the regular consistent function:
> >
> > (a) we don't need to add stuff to all BRIN opclasses to support this
> >
> > (b) it gives us additional testing of the consistent function
> >
> > (c) building a scan key for equality seems pretty trivial
> >
> > What do you think?
>
> Oh yeah, if we can build this on top of the existing primitives, then
> I'm all for that.

Thank you for the feedback! I agree with the benefits. Speaking of
(с), it seems most of the time to be really trivial to build such a
ScanKey, but not every opclass supports '=' operator. amcheck should
handle these cases somehow then. I see two options here. The first is
to not provide  'heap all indexed' check for such opclasses, which is
sad because even one core opclass (box_inclusion_ops) doesn't support
'=' operator, postgis brin opclasses don't support it too AFAICS. The
second option is to let the user define which operator to use during
the check, which, I think, makes user experience much worse in this
case. So both options look not good from the user POV as for me, so I
don't know. What do you think about it?

And should I revert the patchset to the consistent function version then?


Best regards,
Arseniy Mukhin



Re: amcheck support for BRIN indexes

От
Tomas Vondra
Дата:

On 7/8/25 14:40, Arseniy Mukhin wrote:
> On Mon, Jul 7, 2025 at 3:21 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:
>>
>> On 2025-Jul-07, Tomas Vondra wrote:
>>
>>> Alvaro, what's your opinion on the introduction of the new WITHIN_RANGE?
>>> I'd probably try to do this using the regular consistent function:
>>>
>>> (a) we don't need to add stuff to all BRIN opclasses to support this
>>>
>>> (b) it gives us additional testing of the consistent function
>>>
>>> (c) building a scan key for equality seems pretty trivial
>>>
>>> What do you think?
>>
>> Oh yeah, if we can build this on top of the existing primitives, then
>> I'm all for that.
> 
> Thank you for the feedback! I agree with the benefits. Speaking of
> (с), it seems most of the time to be really trivial to build such a
> ScanKey, but not every opclass supports '=' operator. amcheck should
> handle these cases somehow then. I see two options here. The first is
> to not provide  'heap all indexed' check for such opclasses, which is
> sad because even one core opclass (box_inclusion_ops) doesn't support
> '=' operator, postgis brin opclasses don't support it too AFAICS. The
> second option is to let the user define which operator to use during
> the check, which, I think, makes user experience much worse in this
> case. So both options look not good from the user POV as for me, so I
> don't know. What do you think about it?
> 
> And should I revert the patchset to the consistent function version then?
> 

Yeah, that's a good point. The various opclasses may support different
operators, and we don't know which "strategy" to fill into the scan key.
Minmax needs BTEqualStrategyNumber, inclusion RTContainsStrategyNumber,
and so on.

I wonder if there's a way to figure this out from the catalogs, but I
can't think of anything. Maybe requiring the user to supply the operator
(and not checking heap if it's not provided)

  SELECT brin_index_check(..., '@>');

would not be too bad ...


Alvaro, any ideas?


-- 
Tomas Vondra




Re: amcheck support for BRIN indexes

От
Arseniy Mukhin
Дата:
On Tue, Jul 8, 2025 at 4:21 PM Tomas Vondra <tomas@vondra.me> wrote:
>
>
>
> On 7/8/25 14:40, Arseniy Mukhin wrote:
> > On Mon, Jul 7, 2025 at 3:21 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:
> >>
> >> On 2025-Jul-07, Tomas Vondra wrote:
> >>
> >>> Alvaro, what's your opinion on the introduction of the new WITHIN_RANGE?
> >>> I'd probably try to do this using the regular consistent function:
> >>>
> >>> (a) we don't need to add stuff to all BRIN opclasses to support this
> >>>
> >>> (b) it gives us additional testing of the consistent function
> >>>
> >>> (c) building a scan key for equality seems pretty trivial
> >>>
> >>> What do you think?
> >>
> >> Oh yeah, if we can build this on top of the existing primitives, then
> >> I'm all for that.
> >
> > Thank you for the feedback! I agree with the benefits. Speaking of
> > (с), it seems most of the time to be really trivial to build such a
> > ScanKey, but not every opclass supports '=' operator. amcheck should
> > handle these cases somehow then. I see two options here. The first is
> > to not provide  'heap all indexed' check for such opclasses, which is
> > sad because even one core opclass (box_inclusion_ops) doesn't support
> > '=' operator, postgis brin opclasses don't support it too AFAICS. The
> > second option is to let the user define which operator to use during
> > the check, which, I think, makes user experience much worse in this
> > case. So both options look not good from the user POV as for me, so I
> > don't know. What do you think about it?
> >
> > And should I revert the patchset to the consistent function version then?
> >
>
> Yeah, that's a good point. The various opclasses may support different
> operators, and we don't know which "strategy" to fill into the scan key.
> Minmax needs BTEqualStrategyNumber, inclusion RTContainsStrategyNumber,
> and so on.
>
> I wonder if there's a way to figure this out from the catalogs, but I
> can't think of anything. Maybe requiring the user to supply the operator
> (and not checking heap if it's not provided)
>
>   SELECT brin_index_check(..., '@>');
>
> would not be too bad ...

This doesn't change much, but it seems we need an array of operators
in the case of a multi-column index. And one more thing, it's
theoretically possible that opclass doesn't have the operator we need
at all. I'm not aware of such cases, but perhaps in the future
something like GIN tsvector_ops could be created for BRIN.
tsvector_ops doesn't have an operator that could be used here.


Best regards,
Arseniy Mukhin



Re: amcheck support for BRIN indexes

От
Arseniy Mukhin
Дата:
Hi,

While reviewing gist amcheck patch [1] I realized that brin amcheck
also must check if current snapshot is OK with index indcheckxmin (as
btree, gist do it). Currently this check is contained in btree amcheck
code, but other AMs need it for heapallindexed as well, so I moved it
from btree to verify_common (0003 patch).

Also I returned a consistentFn approach in heapallindexed as it seems
more preferable. But it's not a big deal to return to the within_range
approach if needed.


[1]
https://www.postgresql.org/message-id/flat/41F2A10C-4577-413B-9140-BE81CCE04A60%40yandex-team.ru#dc22ff33596f63f554cc551958131cde


Best regards,
Arseniy Mukhin

Вложения

Re: amcheck support for BRIN indexes

От
Arseniy Mukhin
Дата:
Hi,

On Tue, Jul 22, 2025 at 6:43 PM Arseniy Mukhin
<arseniy.mukhin.dev@gmail.com> wrote:
>
> Hi,
>
> While reviewing gist amcheck patch [1] I realized that brin amcheck
> also must check if current snapshot is OK with index indcheckxmin (as
> btree, gist do it). Currently this check is contained in btree amcheck
> code, but other AMs need it for heapallindexed as well, so I moved it
> from btree to verify_common (0003 patch).

There was a compiler warning on CI, so there is a new version with the
fix (adds #include to verify_common).
I also moved it to PG19-2.


Best regards,
Arseniy Mukhin



Re: amcheck support for BRIN indexes

От
Arseniy Mukhin
Дата:
On Fri, Aug 1, 2025 at 11:22 PM Arseniy Mukhin
<arseniy.mukhin.dev@gmail.com> wrote:
>
> Hi,
>
> On Tue, Jul 22, 2025 at 6:43 PM Arseniy Mukhin
> <arseniy.mukhin.dev@gmail.com> wrote:
> >
> > Hi,
> >
> > While reviewing gist amcheck patch [1] I realized that brin amcheck
> > also must check if current snapshot is OK with index indcheckxmin (as
> > btree, gist do it). Currently this check is contained in btree amcheck
> > code, but other AMs need it for heapallindexed as well, so I moved it
> > from btree to verify_common (0003 patch).
>
> There was a compiler warning on CI, so there is a new version with the
> fix (adds #include to verify_common).
> I also moved it to PG19-2.


Sorry for the noise, forgot to attach the files.

Best regards,
Arseniy Mukhin

Вложения

Re: amcheck support for BRIN indexes

От
Álvaro Herrera
Дата:
On 2025-Jul-08, Tomas Vondra wrote:

> On 7/8/25 14:40, Arseniy Mukhin wrote:

> > Thank you for the feedback! I agree with the benefits. Speaking of
> > (с), it seems most of the time to be really trivial to build such a
> > ScanKey, but not every opclass supports '=' operator. amcheck should
> > handle these cases somehow then. I see two options here. The first is
> > to not provide  'heap all indexed' check for such opclasses, which is
> > sad because even one core opclass (box_inclusion_ops) doesn't support
> > '=' operator, postgis brin opclasses don't support it too AFAICS. The
> > second option is to let the user define which operator to use during
> > the check, which, I think, makes user experience much worse in this
> > case. So both options look not good from the user POV as for me, so I
> > don't know. What do you think about it?
> > 
> > And should I revert the patchset to the consistent function version then?
> 
> Yeah, that's a good point. The various opclasses may support different
> operators, and we don't know which "strategy" to fill into the scan key.
> Minmax needs BTEqualStrategyNumber, inclusion RTContainsStrategyNumber,
> and so on.

Hmm, maybe we can make the operator argument to the function an optional
argument.  Then, if it's not given, use equality for the cases where
that works; if equality doesn't work for the column in that opclass,
throw an error to request an operator.  That way we support the most
common case in the easy way, and for the other cases the user has to
work a little harder -- but I think it's not too bad.

I think you should have tests with indexes on more than one column.

This syntax looks terrible
  SELECT brin_index_check('brintest_idx'::REGCLASS, true, true, '{"@>"}');

the thingy at the end looks like '90s modem line noise.  Can we maybe
use a variadic argument, so that if you have multiple indexed columns
you specify the operators in separate args somehow and avoid the quoting
and array decoration?  I imagine something like

  SELECT brin_index_check('brintest_idx'::REGCLASS, true, true, '@>', '=');
or whatever.  (To think about: if I want '=' to be omitted, but I have
the second column using a type that doesn't support the '=', what's a
good syntax to use?)


Regarding 0003: I think the new function should be just
CheckIndexCheckXMin(Relation idxrel, Snapshot snap)
and make the caller responsible for the snapshot handling.  Otherwise
you end up in the weird situation in 0004 where you have to do
  UnregisterSnapshot(RegisterSnapshotAndDoStuff())
instead of the more ordinary
  RegisterSnapshot()
  CheckIndexCheckXMin()
  UnregisterSnapshot()


You need an amcheck.sgml update for the new function.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/



Re: amcheck support for BRIN indexes

От
Arseniy Mukhin
Дата:
Hi,

Thank you for looking into it!

On Tue, Aug 5, 2025 at 2:21 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:
>
> On 2025-Jul-08, Tomas Vondra wrote:
>
> > On 7/8/25 14:40, Arseniy Mukhin wrote:
>
> > > Thank you for the feedback! I agree with the benefits. Speaking of
> > > (с), it seems most of the time to be really trivial to build such a
> > > ScanKey, but not every opclass supports '=' operator. amcheck should
> > > handle these cases somehow then. I see two options here. The first is
> > > to not provide  'heap all indexed' check for such opclasses, which is
> > > sad because even one core opclass (box_inclusion_ops) doesn't support
> > > '=' operator, postgis brin opclasses don't support it too AFAICS. The
> > > second option is to let the user define which operator to use during
> > > the check, which, I think, makes user experience much worse in this
> > > case. So both options look not good from the user POV as for me, so I
> > > don't know. What do you think about it?
> > >
> > > And should I revert the patchset to the consistent function version then?
> >
> > Yeah, that's a good point. The various opclasses may support different
> > operators, and we don't know which "strategy" to fill into the scan key.
> > Minmax needs BTEqualStrategyNumber, inclusion RTContainsStrategyNumber,
> > and so on.
>
> Hmm, maybe we can make the operator argument to the function an optional
> argument.  Then, if it's not given, use equality for the cases where
> that works; if equality doesn't work for the column in that opclass,
> throw an error to request an operator.  That way we support the most
> common case in the easy way, and for the other cases the user has to
> work a little harder -- but I think it's not too bad.

Yes, the operator list is an optional argument now. Like you said, if
it's not passed to the function call, the equality operator is used.

I realized that solving the problem with opclasses without equality
operator by letting user to define operator list has several
drawbacks:

It's not very convenient to call automatically? Because the calls are
different from index to index. You can't just call
brin_index_check('index', true, true) on everything. Maybe I'm wrong,
but it seems like amcheck is a tool that is often used to periodically
check the health of Postgres clusters (and there can be many of them),
so users probably don't want to get into the details of each index.

Also, it seems like we don't want the user to define the operator to
check. We want them to pass in the "correct" operator if there is no
equality operator. So there's no choice, we just want users to figure
out what the correct operator is and pass it in. But we already know
what the "correct" operator is. Maybe we should just implement an
opclass <-> "correct" operator mapping on the database side? We also
need opclass developers to be able to add such a mapping if they want
their opclass to be supported by amcheck. Then during the check we can
look up into the mapping and use the operators. I was thinking about a
new catalog table or maybe adding it to BrinOpcInfo? Probably there is
a better way to do it? If the mapping doesn't have an operator for
opclass - no problem, we can skip the consistentFn call for such
columns and maybe log a message about it. This way we don't have all
these problems with operator list argument and with false positives
when a user fails to realize what the "correct" operator is.

> I think you should have tests with indexes on more than one column.

There is one multicolumn index test, I added another one where the
list of operators is passed. Also added tests with invalid operator
list.

>
> This syntax looks terrible
>   SELECT brin_index_check('brintest_idx'::REGCLASS, true, true, '{"@>"}');
>
> the thingy at the end looks like '90s modem line noise.  Can we maybe
> use a variadic argument, so that if you have multiple indexed columns
> you specify the operators in separate args somehow and avoid the quoting
> and array decoration?  I imagine something like
>
>   SELECT brin_index_check('brintest_idx'::REGCLASS, true, true, '@>', '=');
> or whatever.  (To think about: if I want '=' to be omitted, but I have
> the second column using a type that doesn't support the '=', what's a
> good syntax to use?)
>

Good idea! It looks better. I changed the operator list to variadic
argument. For now brin_index_check() expects the user to define all or
nothing. I think if we want to allow the user to omit operators for
some columns, then we need to know the indexes of the columns for
which the passed operators are intended. Something like this maybe:

        brin_index_check('brintest_idx', true, true, 1, '@>', 3, '@>');

>
> Regarding 0003: I think the new function should be just
> CheckIndexCheckXMin(Relation idxrel, Snapshot snap)
> and make the caller responsible for the snapshot handling.  Otherwise
> you end up in the weird situation in 0004 where you have to do
>   UnregisterSnapshot(RegisterSnapshotAndDoStuff())
> instead of the more ordinary
>   RegisterSnapshot()
>   CheckIndexCheckXMin()
>   UnregisterSnapshot()
>

Done. BTW gist amcheck also needs 0003 [0]. Maybe we can move it to
the separate thread and commit, so both patches can use it? What do
you think, Andrey?

>
> You need an amcheck.sgml update for the new function.

Done. The operator list is the most controversial part. It was not
easy to figure out how to describe the criteria of the "correct"
operator, I hope it's not very confusing.

And I have a question, is it somehow helpful that 'index structure
check' and 'heap all indexed' are in the different patches? It makes
it a bit more difficult to update the patchset, so if it's not useful
for reviewers I would probably merge it in the next version.

So here is a new version.

Thank you!

[0] https://www.postgresql.org/message-id/flat/5FC1B5B6-FB35-44A2-AB62-632F14E958C5@yandex-team.ru


Best regards,
Arseniy Mukhin

Вложения