Обсуждение: amcheck support for BRIN indexes
Hi, It's not for v18, just wanted to share with the community and register it in the upcoming Commitfest 2025-07. Here is the patch with amcheck support for BRIN indexes. Patch uses amcheck common infrastructure that was introduced in [1]. It works and I deleted all the code that I copied from btree check at first. Great. During the check we hold ShareUpdateExclusiveLock, so we don't block regular reads/inserts/updates and the same time range summarizations/desummarizations are impossible during the check which simplifies check logic. While checking we do ereport(ERROR) on the first issue, the same way as btree, gin checks do. There are two parts: First part is 'index structure check': 1) Meta page check 2) Revmap check. Walk revmap and check every valid revmap item points to the index tuple with the expected range blkno, and index tuple is consistent with the tuple description. Also it's not documented, but known from the brin code that for every empty range we should have allnulls = true, hasnulls = false. So this is also checked here. 3) Regular pages check. Walk regular pages and check that every index tuple has a corresponding revmap item that points to it. We don't check index tuple structures here, because it was already done in 2 steps. Regular pages check is optional. Orphan index tuple errors in this check doesn't necessary mean that index is corrupted, but AFAIS brin is not supposed to have such orphan index tuples now, so if we encounter one than probably something wrong with the index. Second part is 'all consistent check': We check all heap tuples are consistent with the index. It's the most expensive part and it's optional. Here we call consistent functions for every heap tuple. Also here we check that fields like 'has_nulls', 'all_nulls', 'empty_range' are consistent with what we see in the heap. There are two patch files: 0001-brin-refactoring.patch It's just two tiny changes in the brin code. 1) For index tuple structure check we need to know how on-disk index tuples look like. Function that lets you get it 'brtuple_disk_tupdesc' is internal. This patch makes it extern. 2) Create macros for BRIN_MAX_PAGES_PER_RANGE. For the meta page check. 0002-amechek-brin-support.patch It's check functionality itself + regression tests + amcheck extension updates. Some open questions: 1) How to get the correct strategy number for the scan key in "all heap consistent" check. The consistent function wants a scan key, and to generate it we have to pick a strategy number. We can't just use the same strategy number for all indexes because its meaning differs from opclass to opclass (for example equal strategy number is 1 in Bloom and 3 in min_max). We also can't pick an operator and use it for every check, because opclasses don't have any requirements about what operators they should support. The solution was to let user to define check operator (parameter consistent_operator_names). It's an array as we can have a multicolumn index. We can use '=' as default check operator, because AFAIS '=' operator is supported by all core opclasses except 'box_inclusion_ops', and IMHO it's the most obvious candidate for such a check. So if param 'consistent_operator_names' is an empty array (param default value), then for all attributes we use operator '='. In most cases operator '=' does the job and you don't need to worry about consistent_operator_names param. Not sure about it, what do you think? 2) The current implementation of "all heap consistent" uses the scan of the entire heap. So if we have a lot of unsummarized ranges, we do a lot of wasted work because we can't use the tuples that belong to the unsummarized ranges. Instead of having one scan for the entire heap, we can walk the revmap, take only the summarized ranges, and scan only the pages that belong to those ranges. So we have one scan per range. This approach helps us avoid touching those heap tuples that we can't use for index checks. But I'm not sure if we should to worry about that because every autovacuum summarizes all the unsummarized ranges, so don't expect a lot of unsummarized ranges on average. TODO list: 1) add TAP tests 2) update SGML docs for amcheck (think it's better to do after patch is reviewed and more or less finalized) 3) pg_amcheck integration Big thanks to Tomas Vondra for the first patch idea and initial review. [1] https://www.postgresql.org/message-id/45AC9B0A-2B45-40EE-B08F-BDCF5739D1E1%40yandex-team.ru Best regards, Arseniy Mukhin
Вложения
Hi, Here is a new version. TAP tests were added. Tried to reproduce more or less every violation. I don't include 2 tests where disk representation ItemIdData needs to be changed: it works locally, but I don't think these tests are portable. While writing tests some minor issues were found and fixed. Also ci compiler warnings were fixed. Best regards, Arseniy Mukhin
Вложения
On 6/8/25 14:39, Arseniy Mukhin wrote: > Hi, > > Here is a new version. > > TAP tests were added. Tried to reproduce more or less every violation. > I don't include 2 tests where disk representation ItemIdData needs to > be changed: it works locally, but I don't think these tests are > portable. While writing tests some minor issues were found and fixed. > Also ci compiler warnings were fixed. > Thanks. I've added myself as a reviewer, so that I don't forget about this for the next CF. regards -- Tomas Vondra
Hi! Nice feature! > On 8 Jun 2025, at 17:39, Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> wrote: > > <v2-0001-brin-refactoring.patch> +#define BRIN_MAX_PAGES_PER_RANGE 131072 I'd personally prefer some words on where does this limit come from. I'm not a BRIN pro, just curious. Or, at least, maybe we can use a form 128 * 1024, if it's just a sane limit. > On 8 Jun 2025, at 17:39, Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> wrote: > <v2-0002-amcheck-brin-support.patch> A bit more detailed commit message would be very useful. The test coverage is very decent. The number of possible corruptions in tests is impressive. I don't really have an experiencewith BRIN to say how different they are, but I want to ask if you are sure that these types of corruption willbe portable across big-endian machines and such stuff. Copyright year in all new files should be 2025. Some documentation about brin_index_check() would be handy, especially about its parameters. Perhaps, somewhere near gin_index_check()in amcheck.sgml. brin_check_ereport() reports ERRCODE_DATA_CORRUPTED. We should distinguish between ERRCODE_INDEX_CORRUPTED which is a stormbringerand ERRCODE_DATA_CORRUPTED which is an actual disaster. If it's not very difficult - it would be great to use read_stream infrastructure. See btvacuumscan() in nbtree.c callingread_stream_begin_relation() for example. We cannot use it in logical scans in B-tree\GiST\GIN, but maybe BRIN cantake some advantage of this new shiny technology. + state->revmapbuf = ReadBufferExtended(idxrel, MAIN_FORKNUM, state->revmapBlk, RBM_NORMAL, + state->checkstrategy); + LockBuffer(state->revmapbuf, BUFFER_LOCK_SHARE); // usage of state->revmapbuf + LockBuffer(state->revmapbuf, BUFFER_LOCK_UNLOCK); // more usage of state->revmapbuf + ReleaseBuffer(state->revmapbuf); I hope you know what you are doing. BRIN concurrency is not known to me at all. That's all for first pass through patches. Thanks for working on it! Best regards, Andrey Borodin.
> On 18 Jun 2025, at 11:33, Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> wrote: > > Interesting, I used btree check as reference when started > writing brin check, and in btree check there 53 > ERRCODE_INDEX_CORRUPTED ereports and only 1 ERRCODE_DATA_CORRUPTED > ereport. So it was very hard to do, but I managed to pick the wrong > one. I wonder if this btree check ereport should also be changed to > ERRCODE_INDEX_CORRUPTED? It's there in a case of heapallindexes failure. I concur that ERRCODE_INDEX_CORRUPTED is more appropriate in that case inverify_nbtree.c. But I recollect Peter explained this code before somewhere in pgsql-hackers. And the reasoning was something like "if youlack a tuple in unquie constraints - it's almost certainly subsequent constrain violation and data loss". But I'm notsure. And I could not find this discussion in archives. Best regards, Andrey Borodin.
Hi, I would like share some thoughts about 'heap all consistent' part and one of the open questions: The idea behind 'heap all consistent' is to use heap data to validate the index. BRIN doesn't store heap tuples, so there is no straightforward way to check if every tuple was indexed or not. We have range data, so we need to do something with every heap tuple and corresponding range. Something that will tell us if the range data covers the heap tuple or not. What options I see here: 1) We can use the addValue() function. It returns FALSE if range data was not changed (in other words range data already covers heap tuple data that we pass to the function). It's very easy to do, we can use heap tuples directly. But the problem I see here is that addValue() can return TRUE even if heap tuple data have been already covered by the range, but range data itself changed for some reason (some new algorithm were applied for instance). So I think we can have some false positives that we can do nothing about. 2) We can use the consistent() function. It requires ScanKey and returns true if the range data satisfies ScanKey's condition. So we need to convert every heap tuple into ScanKey somehow. This approach is implemented now in the patch, so I tried to describe all details about heap tuple to ScanKey conversion in the comment: /* * Prepare ScanKey for index attribute. * * ConsistentFn requires ScanKey, so we need to generate ScanKey for every * attribute somehow. We want ScanKey that would result in TRUE for every heap * tuple within the range when we use its indexed value as sk_argument. * To generate such a ScanKey we need to define the right operand type and the strategy number. * Right operand type is a type of data that index is built on, so it's 'opcintype'. * There is no strategy number that we can always use, * because every opclass defines its own set of operators it supports and strategy number * for the same operator can differ from opclass to opclass. * So to get strategy number we look up an operator that gives us desired behavior * and which both operand types are 'opcintype' and then retrieve the strategy number for it. * Most of the time we can use '='. We let user define operator name in case opclass doesn't * support '=' operator. Also, if such operator doesn't exist, we can't proceed with the check. * * Generated once, and will be reused for all heap tuples. * Argument field will be filled for every heap tuple before * consistent function invocation, so leave it NULL for a while. * */ With this approach function brin_check() has optional parameter 'consistent_operator_names' that it seems to me could be very confusing for users. In general I think this is the most complex approach both in terms of use and implementation. 3) The approach that seems to me the most clear and straightforward: to add new optional function to BRIN opclass API. The function that would say if passed value is covered with the current range data. it's exactly what we want to know, so we can use heap data directly here. Something like that: bool withinRange(BrinDesc *bdesc, BrinValues *column, Datum val, bool isnull) It could be an optional function that will be implemented for all core BRIN opclasses. So if somebody wants to use it for some custom opclass they will need to implement it too, but it's not required. I understand that adding something to the index opclass API requires very strong arguments. So the argument here is that it will let to do brin check very robust (without possible false positives as in the first approach) and easy to use (no additional parameters in the check function). Also, the withinRange() function could be written in such a way that it would be more efficient for our task than addValue() or consistent(). I'd be glad to hear your thoughts! Best regards, Arseniy Mukhin
On 2025-Jul-06, Arseniy Mukhin wrote: > Sorry, forget to run a full test run with the new patch version. Some > tests were unhappy with the new unknown support function. Here the new > version with the fix. Hello, I think this patch is probably a good idea. I don't think it makes sense to introduce a bunch of code in 0003 only to rewrite it completely in 0005. I would ask that you re-split your WITHIN_RANGE (0004) to appear before the amcheck code, and then write the amcheck code using that new functionality. > /* > * Return a tuple descriptor used for on-disk storage of BRIN tuples. > */ > -static TupleDesc > +TupleDesc > brtuple_disk_tupdesc(BrinDesc *brdesc) I think we should give this function a better name if it's going to be exported. How about brin_tuple_tupdesc? (in brin_tuple.h we seem to distinguish "brin tuples" which are the stored ones, from "brin mem tuples" which are the ones to be used in memory.) I didn't read the other patches. Thanks -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ "I'm impressed how quickly you are fixing this obscure issue. I came from MS SQL and it would be hard for me to put into words how much of a better job you all are doing on [PostgreSQL]." Steve Midgley, http://archives.postgresql.org/pgsql-sql/2008-08/msg00000.php
On Sun, Jul 6, 2025 at 10:49 PM Álvaro Herrera <alvherre@kurilemu.de> wrote: > > On 2025-Jul-06, Arseniy Mukhin wrote: > > > Sorry, forget to run a full test run with the new patch version. Some > > tests were unhappy with the new unknown support function. Here the new > > version with the fix. > > Hello, I think this patch is probably a good idea. I don't think it > makes sense to introduce a bunch of code in 0003 only to rewrite it > completely in 0005. I would ask that you re-split your WITHIN_RANGE > (0004) to appear before the amcheck code, and then write the amcheck > code using that new functionality. Hi, Álvaro! Thank you for looking into this. OK, we can easily revert to the version with consistent function if needed, so let's get rid of it. > > > /* > > * Return a tuple descriptor used for on-disk storage of BRIN tuples. > > */ > > -static TupleDesc > > +TupleDesc > > brtuple_disk_tupdesc(BrinDesc *brdesc) > > I think we should give this function a better name if it's going to be > exported. How about brin_tuple_tupdesc? (in brin_tuple.h we > seem to distinguish "brin tuples" which are the stored ones, from "brin > mem tuples" which are the ones to be used in memory.) > 'brin_tuple_tupdesc' sounds good to me. Done. So here is a new version. 0001, 0002 - index structure check. 0003, 0004 - all heap indexed using WITHIN_RANGE approach. Thank you! Best regards, Arseniy Mukhin
Вложения
On 7/7/25 13:06, Arseniy Mukhin wrote: > On Sun, Jul 6, 2025 at 10:49 PM Álvaro Herrera <alvherre@kurilemu.de> wrote: >> >> On 2025-Jul-06, Arseniy Mukhin wrote: >> >>> Sorry, forget to run a full test run with the new patch version. Some >>> tests were unhappy with the new unknown support function. Here the new >>> version with the fix. >> >> Hello, I think this patch is probably a good idea. I don't think it >> makes sense to introduce a bunch of code in 0003 only to rewrite it >> completely in 0005. I would ask that you re-split your WITHIN_RANGE >> (0004) to appear before the amcheck code, and then write the amcheck >> code using that new functionality. > > Hi, Álvaro! > > Thank you for looking into this. > > OK, we can easily revert to the version with consistent function if > needed, so let's get rid of it. > Alvaro, what's your opinion on the introduction of the new WITHIN_RANGE? I'd probably try to do this using the regular consistent function: (a) we don't need to add stuff to all BRIN opclasses to support this (b) it gives us additional testing of the consistent function (c) building a scan key for equality seems pretty trivial What do you think? -- Tomas Vondra
On 2025-Jul-07, Tomas Vondra wrote: > Alvaro, what's your opinion on the introduction of the new WITHIN_RANGE? > I'd probably try to do this using the regular consistent function: > > (a) we don't need to add stuff to all BRIN opclasses to support this > > (b) it gives us additional testing of the consistent function > > (c) building a scan key for equality seems pretty trivial > > What do you think? Oh yeah, if we can build this on top of the existing primitives, then I'm all for that. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
On Mon, Jul 7, 2025 at 3:21 PM Álvaro Herrera <alvherre@kurilemu.de> wrote: > > On 2025-Jul-07, Tomas Vondra wrote: > > > Alvaro, what's your opinion on the introduction of the new WITHIN_RANGE? > > I'd probably try to do this using the regular consistent function: > > > > (a) we don't need to add stuff to all BRIN opclasses to support this > > > > (b) it gives us additional testing of the consistent function > > > > (c) building a scan key for equality seems pretty trivial > > > > What do you think? > > Oh yeah, if we can build this on top of the existing primitives, then > I'm all for that. Thank you for the feedback! I agree with the benefits. Speaking of (с), it seems most of the time to be really trivial to build such a ScanKey, but not every opclass supports '=' operator. amcheck should handle these cases somehow then. I see two options here. The first is to not provide 'heap all indexed' check for such opclasses, which is sad because even one core opclass (box_inclusion_ops) doesn't support '=' operator, postgis brin opclasses don't support it too AFAICS. The second option is to let the user define which operator to use during the check, which, I think, makes user experience much worse in this case. So both options look not good from the user POV as for me, so I don't know. What do you think about it? And should I revert the patchset to the consistent function version then? Best regards, Arseniy Mukhin
On 7/8/25 14:40, Arseniy Mukhin wrote: > On Mon, Jul 7, 2025 at 3:21 PM Álvaro Herrera <alvherre@kurilemu.de> wrote: >> >> On 2025-Jul-07, Tomas Vondra wrote: >> >>> Alvaro, what's your opinion on the introduction of the new WITHIN_RANGE? >>> I'd probably try to do this using the regular consistent function: >>> >>> (a) we don't need to add stuff to all BRIN opclasses to support this >>> >>> (b) it gives us additional testing of the consistent function >>> >>> (c) building a scan key for equality seems pretty trivial >>> >>> What do you think? >> >> Oh yeah, if we can build this on top of the existing primitives, then >> I'm all for that. > > Thank you for the feedback! I agree with the benefits. Speaking of > (с), it seems most of the time to be really trivial to build such a > ScanKey, but not every opclass supports '=' operator. amcheck should > handle these cases somehow then. I see two options here. The first is > to not provide 'heap all indexed' check for such opclasses, which is > sad because even one core opclass (box_inclusion_ops) doesn't support > '=' operator, postgis brin opclasses don't support it too AFAICS. The > second option is to let the user define which operator to use during > the check, which, I think, makes user experience much worse in this > case. So both options look not good from the user POV as for me, so I > don't know. What do you think about it? > > And should I revert the patchset to the consistent function version then? > Yeah, that's a good point. The various opclasses may support different operators, and we don't know which "strategy" to fill into the scan key. Minmax needs BTEqualStrategyNumber, inclusion RTContainsStrategyNumber, and so on. I wonder if there's a way to figure this out from the catalogs, but I can't think of anything. Maybe requiring the user to supply the operator (and not checking heap if it's not provided) SELECT brin_index_check(..., '@>'); would not be too bad ... Alvaro, any ideas? -- Tomas Vondra
On Tue, Jul 8, 2025 at 4:21 PM Tomas Vondra <tomas@vondra.me> wrote: > > > > On 7/8/25 14:40, Arseniy Mukhin wrote: > > On Mon, Jul 7, 2025 at 3:21 PM Álvaro Herrera <alvherre@kurilemu.de> wrote: > >> > >> On 2025-Jul-07, Tomas Vondra wrote: > >> > >>> Alvaro, what's your opinion on the introduction of the new WITHIN_RANGE? > >>> I'd probably try to do this using the regular consistent function: > >>> > >>> (a) we don't need to add stuff to all BRIN opclasses to support this > >>> > >>> (b) it gives us additional testing of the consistent function > >>> > >>> (c) building a scan key for equality seems pretty trivial > >>> > >>> What do you think? > >> > >> Oh yeah, if we can build this on top of the existing primitives, then > >> I'm all for that. > > > > Thank you for the feedback! I agree with the benefits. Speaking of > > (с), it seems most of the time to be really trivial to build such a > > ScanKey, but not every opclass supports '=' operator. amcheck should > > handle these cases somehow then. I see two options here. The first is > > to not provide 'heap all indexed' check for such opclasses, which is > > sad because even one core opclass (box_inclusion_ops) doesn't support > > '=' operator, postgis brin opclasses don't support it too AFAICS. The > > second option is to let the user define which operator to use during > > the check, which, I think, makes user experience much worse in this > > case. So both options look not good from the user POV as for me, so I > > don't know. What do you think about it? > > > > And should I revert the patchset to the consistent function version then? > > > > Yeah, that's a good point. The various opclasses may support different > operators, and we don't know which "strategy" to fill into the scan key. > Minmax needs BTEqualStrategyNumber, inclusion RTContainsStrategyNumber, > and so on. > > I wonder if there's a way to figure this out from the catalogs, but I > can't think of anything. Maybe requiring the user to supply the operator > (and not checking heap if it's not provided) > > SELECT brin_index_check(..., '@>'); > > would not be too bad ... This doesn't change much, but it seems we need an array of operators in the case of a multi-column index. And one more thing, it's theoretically possible that opclass doesn't have the operator we need at all. I'm not aware of such cases, but perhaps in the future something like GIN tsvector_ops could be created for BRIN. tsvector_ops doesn't have an operator that could be used here. Best regards, Arseniy Mukhin
Hi, While reviewing gist amcheck patch [1] I realized that brin amcheck also must check if current snapshot is OK with index indcheckxmin (as btree, gist do it). Currently this check is contained in btree amcheck code, but other AMs need it for heapallindexed as well, so I moved it from btree to verify_common (0003 patch). Also I returned a consistentFn approach in heapallindexed as it seems more preferable. But it's not a big deal to return to the within_range approach if needed. [1] https://www.postgresql.org/message-id/flat/41F2A10C-4577-413B-9140-BE81CCE04A60%40yandex-team.ru#dc22ff33596f63f554cc551958131cde Best regards, Arseniy Mukhin
Вложения
Hi, On Tue, Jul 22, 2025 at 6:43 PM Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> wrote: > > Hi, > > While reviewing gist amcheck patch [1] I realized that brin amcheck > also must check if current snapshot is OK with index indcheckxmin (as > btree, gist do it). Currently this check is contained in btree amcheck > code, but other AMs need it for heapallindexed as well, so I moved it > from btree to verify_common (0003 patch). There was a compiler warning on CI, so there is a new version with the fix (adds #include to verify_common). I also moved it to PG19-2. Best regards, Arseniy Mukhin
On Fri, Aug 1, 2025 at 11:22 PM Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> wrote: > > Hi, > > On Tue, Jul 22, 2025 at 6:43 PM Arseniy Mukhin > <arseniy.mukhin.dev@gmail.com> wrote: > > > > Hi, > > > > While reviewing gist amcheck patch [1] I realized that brin amcheck > > also must check if current snapshot is OK with index indcheckxmin (as > > btree, gist do it). Currently this check is contained in btree amcheck > > code, but other AMs need it for heapallindexed as well, so I moved it > > from btree to verify_common (0003 patch). > > There was a compiler warning on CI, so there is a new version with the > fix (adds #include to verify_common). > I also moved it to PG19-2. Sorry for the noise, forgot to attach the files. Best regards, Arseniy Mukhin
Вложения
On 2025-Jul-08, Tomas Vondra wrote: > On 7/8/25 14:40, Arseniy Mukhin wrote: > > Thank you for the feedback! I agree with the benefits. Speaking of > > (с), it seems most of the time to be really trivial to build such a > > ScanKey, but not every opclass supports '=' operator. amcheck should > > handle these cases somehow then. I see two options here. The first is > > to not provide 'heap all indexed' check for such opclasses, which is > > sad because even one core opclass (box_inclusion_ops) doesn't support > > '=' operator, postgis brin opclasses don't support it too AFAICS. The > > second option is to let the user define which operator to use during > > the check, which, I think, makes user experience much worse in this > > case. So both options look not good from the user POV as for me, so I > > don't know. What do you think about it? > > > > And should I revert the patchset to the consistent function version then? > > Yeah, that's a good point. The various opclasses may support different > operators, and we don't know which "strategy" to fill into the scan key. > Minmax needs BTEqualStrategyNumber, inclusion RTContainsStrategyNumber, > and so on. Hmm, maybe we can make the operator argument to the function an optional argument. Then, if it's not given, use equality for the cases where that works; if equality doesn't work for the column in that opclass, throw an error to request an operator. That way we support the most common case in the easy way, and for the other cases the user has to work a little harder -- but I think it's not too bad. I think you should have tests with indexes on more than one column. This syntax looks terrible SELECT brin_index_check('brintest_idx'::REGCLASS, true, true, '{"@>"}'); the thingy at the end looks like '90s modem line noise. Can we maybe use a variadic argument, so that if you have multiple indexed columns you specify the operators in separate args somehow and avoid the quoting and array decoration? I imagine something like SELECT brin_index_check('brintest_idx'::REGCLASS, true, true, '@>', '='); or whatever. (To think about: if I want '=' to be omitted, but I have the second column using a type that doesn't support the '=', what's a good syntax to use?) Regarding 0003: I think the new function should be just CheckIndexCheckXMin(Relation idxrel, Snapshot snap) and make the caller responsible for the snapshot handling. Otherwise you end up in the weird situation in 0004 where you have to do UnregisterSnapshot(RegisterSnapshotAndDoStuff()) instead of the more ordinary RegisterSnapshot() CheckIndexCheckXMin() UnregisterSnapshot() You need an amcheck.sgml update for the new function. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
Hi, Thank you for looking into it! On Tue, Aug 5, 2025 at 2:21 PM Álvaro Herrera <alvherre@kurilemu.de> wrote: > > On 2025-Jul-08, Tomas Vondra wrote: > > > On 7/8/25 14:40, Arseniy Mukhin wrote: > > > > Thank you for the feedback! I agree with the benefits. Speaking of > > > (с), it seems most of the time to be really trivial to build such a > > > ScanKey, but not every opclass supports '=' operator. amcheck should > > > handle these cases somehow then. I see two options here. The first is > > > to not provide 'heap all indexed' check for such opclasses, which is > > > sad because even one core opclass (box_inclusion_ops) doesn't support > > > '=' operator, postgis brin opclasses don't support it too AFAICS. The > > > second option is to let the user define which operator to use during > > > the check, which, I think, makes user experience much worse in this > > > case. So both options look not good from the user POV as for me, so I > > > don't know. What do you think about it? > > > > > > And should I revert the patchset to the consistent function version then? > > > > Yeah, that's a good point. The various opclasses may support different > > operators, and we don't know which "strategy" to fill into the scan key. > > Minmax needs BTEqualStrategyNumber, inclusion RTContainsStrategyNumber, > > and so on. > > Hmm, maybe we can make the operator argument to the function an optional > argument. Then, if it's not given, use equality for the cases where > that works; if equality doesn't work for the column in that opclass, > throw an error to request an operator. That way we support the most > common case in the easy way, and for the other cases the user has to > work a little harder -- but I think it's not too bad. Yes, the operator list is an optional argument now. Like you said, if it's not passed to the function call, the equality operator is used. I realized that solving the problem with opclasses without equality operator by letting user to define operator list has several drawbacks: It's not very convenient to call automatically? Because the calls are different from index to index. You can't just call brin_index_check('index', true, true) on everything. Maybe I'm wrong, but it seems like amcheck is a tool that is often used to periodically check the health of Postgres clusters (and there can be many of them), so users probably don't want to get into the details of each index. Also, it seems like we don't want the user to define the operator to check. We want them to pass in the "correct" operator if there is no equality operator. So there's no choice, we just want users to figure out what the correct operator is and pass it in. But we already know what the "correct" operator is. Maybe we should just implement an opclass <-> "correct" operator mapping on the database side? We also need opclass developers to be able to add such a mapping if they want their opclass to be supported by amcheck. Then during the check we can look up into the mapping and use the operators. I was thinking about a new catalog table or maybe adding it to BrinOpcInfo? Probably there is a better way to do it? If the mapping doesn't have an operator for opclass - no problem, we can skip the consistentFn call for such columns and maybe log a message about it. This way we don't have all these problems with operator list argument and with false positives when a user fails to realize what the "correct" operator is. > I think you should have tests with indexes on more than one column. There is one multicolumn index test, I added another one where the list of operators is passed. Also added tests with invalid operator list. > > This syntax looks terrible > SELECT brin_index_check('brintest_idx'::REGCLASS, true, true, '{"@>"}'); > > the thingy at the end looks like '90s modem line noise. Can we maybe > use a variadic argument, so that if you have multiple indexed columns > you specify the operators in separate args somehow and avoid the quoting > and array decoration? I imagine something like > > SELECT brin_index_check('brintest_idx'::REGCLASS, true, true, '@>', '='); > or whatever. (To think about: if I want '=' to be omitted, but I have > the second column using a type that doesn't support the '=', what's a > good syntax to use?) > Good idea! It looks better. I changed the operator list to variadic argument. For now brin_index_check() expects the user to define all or nothing. I think if we want to allow the user to omit operators for some columns, then we need to know the indexes of the columns for which the passed operators are intended. Something like this maybe: brin_index_check('brintest_idx', true, true, 1, '@>', 3, '@>'); > > Regarding 0003: I think the new function should be just > CheckIndexCheckXMin(Relation idxrel, Snapshot snap) > and make the caller responsible for the snapshot handling. Otherwise > you end up in the weird situation in 0004 where you have to do > UnregisterSnapshot(RegisterSnapshotAndDoStuff()) > instead of the more ordinary > RegisterSnapshot() > CheckIndexCheckXMin() > UnregisterSnapshot() > Done. BTW gist amcheck also needs 0003 [0]. Maybe we can move it to the separate thread and commit, so both patches can use it? What do you think, Andrey? > > You need an amcheck.sgml update for the new function. Done. The operator list is the most controversial part. It was not easy to figure out how to describe the criteria of the "correct" operator, I hope it's not very confusing. And I have a question, is it somehow helpful that 'index structure check' and 'heap all indexed' are in the different patches? It makes it a bit more difficult to update the patchset, so if it's not useful for reviewers I would probably merge it in the next version. So here is a new version. Thank you! [0] https://www.postgresql.org/message-id/flat/5FC1B5B6-FB35-44A2-AB62-632F14E958C5@yandex-team.ru Best regards, Arseniy Mukhin