Re: Avoiding another needless ERROR during nbtree page deletion
От | Peter Geoghegan |
---|---|
Тема | Re: Avoiding another needless ERROR during nbtree page deletion |
Дата | |
Msg-id | CAH2-Wzn5PjqCT5OyBUDE_zyhqxvDiRmh5F_1QhogfXL9Zf=F4g@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Avoiding another needless ERROR during nbtree page deletion (Heikki Linnakangas <hlinnaka@iki.fi>) |
Ответы |
Re: Avoiding another needless ERROR during nbtree page deletion
Re: Avoiding another needless ERROR during nbtree page deletion |
Список | pgsql-hackers |
On Sun, May 21, 2023 at 11:51 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > Any idea what might cause this corruption? Not really, no. As far as I know the specific case that was brought to my attention (that put me on the path to writing this patch) was just an isolated incident. The interesting detail (if any) is that it was a relatively recent version of Postgres (13), and that there were no other known problems. This means that there is a plausible remaining gap in the defensive checks in nbtree VACUUM on recent versions -- we might have expected to avoid a hard ERROR in some other way, from one of the earlier checks, but that didn't happen on at least one occasion. You can find several references to the "right sibling's left-link doesn't match:" error message by googling. Most of them are clearly from the page split ERROR. But there are some from VACUUM, too: https://stackoverflow.com/questions/49307292/error-in-postgresql-right-siblings-left-link-doesnt-match-block-5-links-to-8 Granted, that was from a 9.2 database -- before your 9.4 work that made this whole area much more robust. > This comment notes that this is similar to what we did with the left > sibling, but there isn't really any mention at the left sibling code > about avoiding hard ERRORs. Feels a bit backwards. Maybe move the > comment about avoiding the hard ERROR to where the left sibling is > handled. Or explain it in the function comment and just have short > "shouldn't happen, but avoid hard ERROR if the index is corrupt" comment > here. Good point. Will do it that way. > > Also attached is a bugfix for a minor issue in amcheck's > > bt_index_parent_check() function, which I noticed in passing, while I > > tested the first patch. > You could check that the left sibling is indeed a half-dead page. It's very hard to see, but...I think that we do. Sort of. Since bt_recheck_sibling_links() is prepared to check that the left sibling's right link points back to the target page. One problem with that is that it only happens in the AccessShareLock case, whereas we're concerned with fixing an issue in the ShareLock case. Another problem is that it's awkward and complicated to explain. It's not obvious that it's worth trying to explain all this and/or making sure that it happens in the ShareLock case, so that we have everything covered. I'm unsure. > ERRCODE_NO_DATA doesn't look right. Let's just leave out the errcode. Agreed. -- Peter Geoghegan
В списке pgsql-hackers по дате отправления: