Обсуждение: Re: bt_index_parent_check and concurrently build indexes
> On 9 Dec 2024, at 23:51, Michail Nikolaev <michail.nikolaev@gmail.com> wrote: > > * Modify bt_index_parent_check to use an MVCC snapshot for the heap scan Hi! Interesting bug. It's amazing how long it stand, giving that it would be triggered by almost any check after updating a table. From my POV correct fix direction is to use approach similar to index building. E.i. remove "if (!state->readonly)" check. Are there any known downsides of this? Best regards, Andrey Borodin.
> On 13 Dec 2024, at 04:59, Michail Nikolaev <michail.nikolaev@gmail.com> wrote:
> <v3-0001-amcheck-Fix-bt_index_parent_check-behavior-with-C.patch>
+# Copyright (c) 2021-2024, PostgreSQL Global Development Group
I think usually write only commit year. Something tells me you can safely write 2025 there.
+Test::More->builder->todo_start('filesystem bug')
+ if PostgreSQL::Test::Utils::has_wal_read_bug;
Can't wrap my head why do you need this?
+# it fails, because it is expect to find the deleted row in index
I think this comment describes behavior before the fix in present tense.
- if (snapshot != SnapshotAny)
- UnregisterSnapshot(snapshot);
Snapshot business seems incorrect to me here...
Best regards, Andrey Borodin.
Hi Michail
> It turns out that bt_index_parent_check is not suitable for validating indexes built concurrently.
Your finding is right on point! We recently used bt_index_parent_check to verify concurrently built indexes in a concurrent workload, bt_index_parent_check often gave such false positive error.
- indexinfo->ii_Concurrent = !state->readonly;
+ indexinfo->ii_Concurrent = true;
+ indexinfo->ii_Concurrent = true;
One suggestion to this change is that we might need to update the amcheck doc to reflect that
"This consists of a “dummy” CREATE INDEX CONCURRENTLY operation" rather than "CREATE INDEX" operation.
Regards,
Donghang
On Mon, Jun 02, 2025 at 05:40:18PM -0700, Donghang Lin wrote: > Your finding is right on point! We recently used bt_index_parent_check to > verify concurrently built indexes in a concurrent workload, > bt_index_parent_check often gave such false positive error. Good thing is that this is tracked in the CF app: https://commitfest.postgresql.org/patch/5438/ Peter, could you look at that? amcheck and btree are both in your area of expertise. Getting this error because of routine CIC or REINDEX CONCURRENTLY runs is annoying. -- Michael
Вложения
Hello, Donghang! > One suggestion to this change is that we might need to update the amcheck doc to reflect that > "This consists of a “dummy” CREATE INDEX CONCURRENTLY operation" rather than "CREATE INDEX" operation. +1, done. Also fixed some typos in the commit message. Best regards, Mikhail.
Вложения
Hello, everyone! Rebased. Best regards, Mikhail.
Вложения
> On 21 Jun 2025, at 21:10, Mihail Nikalayeu <mihailnikalayeu@gmail.com> wrote: > > Rebased IMO the patch is RfC, I've just updated the status of the CF iteam. Thanks. Best regards, Andrey Borodin.
Hello, everyone! Could someone please take a look? We have a confirmed bug, which happens in production [0], probably needs to be backported and has a simple patch to address for about 9 months already.... I am not trying to blame someone, just to highlight how it sometimes feels from the non-committer side. Best regards, Mikhail. [0]: https://www.postgresql.org/message-id/flat/CAA%3DD8a2Q23miHJtHRDk_TSQKvd6oHk8wGpkQt99B%3D9yd-oqnfg%40mail.gmail.com#e563629249821b5f61e44a635b758ad1
Hi, I pushed this to 17 and up. Thanks! -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "Selbst das größte Genie würde nicht weit kommen, wenn es alles seinem eigenen Innern verdanken wollte." (Johann Wolfgang von Goethe) Ni aún el genio más grande llegaría muy lejos si quisiera sacarlo todo de su propio interior.
> Hi, I pushed this to 17 and up. Thanks! Thanks! I'll update my CIC patch then - now it is not required to have any other patch inbuilt.
Hi Álvaro,
Thanks for looking at it. The test in the commit still fails back to 14.
I think it should be backported to 14 and up.
Thanks,
Donghang
On Thu, Dec 4, 2025 at 9:47 AM Álvaro Herrera <alvherre@kurilemu.de> wrote:
Hi, I pushed this to 17 and up. Thanks!
--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"Selbst das größte Genie würde nicht weit kommen, wenn es
alles seinem eigenen Innern verdanken wollte." (Johann Wolfgang von Goethe)
Ni aún el genio más grande llegaría muy lejos si
quisiera sacarlo todo de su propio interior.
Hello, On 2025-Dec-04, Donghang Lin wrote: > Hi Álvaro, > > Thanks for looking at it. The test in the commit still fails back to 14. > I think it should be backported to 14 and up. You're correct that the bug is older. I'm not sure about backporting the fix to 16 and earlier though, because it depends on BtreeCheckState having the snapshot member which was only added in commit 5ae2087202af. It's not a difficult patch -- attached. I was being cautious and under the impression that the bug had never been actually encountered by users, but I just noticed that in your reply from 2 Jun 2025 you mentioned hitting this problem. So maybe we should bite the bullet ... Is anybody against backpatching this to 14-16? (One thing I realized is that the comment for BtreeCheckState->snapshot says that it's used for the uniqueness test only, but that's no longer the case. So that needs fixed ...) -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
Вложения
Hi Álvaro
You're correct that the bug is older. I'm not sure about backporting
the fix to 16 and earlier though, because it depends on BtreeCheckState
having the snapshot member which was only added in commit 5ae2087202af.
It's not a difficult patch -- attached.
The issue itself in fact doesn't depend on BtreeCheckState's snapshot member.
So the fix for 14-16 can make snapshot a local variable in the bt_check_every_level
function to make the scope small. Do you think it's worth changing it to a local variable?
I don't think the issue itself is related to 5ae2087202af but actually 7f563c09f890 where
the heapallindexed flag is introduced for bt_index_check. The committed test
also doesn't check the uniqueness of an index. Should the commit message be restated
to reflect this and also restate it's backport to 14 and up?
I applied the patch in 14 and noticed that it needs to fix the test number to get the test passed.
15 and 16 do not need this diff.
-use Test::More tests => 3;
+use Test::More tests => 4;
+use Test::More tests => 4;
(One thing I realized is that the comment for BtreeCheckState->snapshot
says that it's used for the uniqueness test only, but that's no longer
the case. So that needs fixed ...)
I think we'll have another chance to reflect that the issue is since 7f563c09f890 but not 5ae2087202af
in master for this fix. I can give it a try.
Thanks,
Donghang
On 2025-Dec-06, Donghang Lin wrote: > The issue itself in fact doesn't depend on BtreeCheckState's snapshot > member. So the fix for 14-16 can make snapshot a local variable in > the bt_check_every_level function to make the scope small. Do you > think it's worth changing it to a local variable? Ah, excellent observation. Yes, it's definitely worth it. > I don't think the issue itself is related to 5ae2087202af but actually > 7f563c09f890 where the heapallindexed flag is introduced for > bt_index_check. Hmm, interesting. I wonder what made me think it was about uniqueness. > The committed test also doesn't check the uniqueness of an index. > Should the commit message be restated to reflect this and also restate > it's backport to 14 and up? Sure. > I think we'll have another chance to reflect that the issue is since > 7f563c09f890 but not 5ae2087202af in master for this fix. I can give > it a try. Go ahead. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ Tom: There seems to be something broken here. Teodor: I'm in sackcloth and ashes... Fixed. http://postgr.es/m/482D1632.8010507@sigaev.ru