Re: BUG #17245: Index corruption involving deduplicated entries
От | Andres Freund |
---|---|
Тема | Re: BUG #17245: Index corruption involving deduplicated entries |
Дата | |
Msg-id | 20211029011923.utmolntkasenzreh@alap3.anarazel.de обсуждение исходный текст |
Ответ на | Re: BUG #17245: Index corruption involving deduplicated entries (Peter Geoghegan <pg@bowt.ie>) |
Ответы |
Re: BUG #17245: Index corruption involving deduplicated entries
|
Список | pgsql-bugs |
Hi, It's not the cause of this problem, but I did find a minor issue: the retry path in lazy_scan_prune() looses track of the deleted tuple count when retrying. The retry codepath also made me wonder if there could be problems if we do FreezeMultiXactId() multiple times due to retry. I think we can end up creating multiple multixactids for the same tuple (if the members change, which is likely in the retry path). But that should be fine, I think. On 2021-10-28 16:04:44 -0700, Peter Geoghegan wrote: > > Didn't 14 change the logic when index vacuums are done? That could cause > > previously existing issues to manifest with a higher likelihood. > > I don't follow. The new logic that skips index vacuuming kicks in 1) > in an anti-wraparound vacuum emergency, and 2) when there are very few > LP_DEAD line pointers in the heap. We can rule 1 out, I think, because > the XIDs we see are in the low millions, and our starting point was a > database that was upgraded via a dump and reload. Right. > The second criteria for skipping index vacuuming (the "less than 2% of > heap pages have any LP_DEAD items" thing) might well have been hit on > these tables -- it is after all very common. But I don't see how that > could matter. We're never going to get to a code path inside > vacuumlazy.c that sets LP_DEAD items from VACUUM's dead_tuples array > to LP_UNUSED (how could reached such a code path without also index > vacuuming, given the way things are set up inside lazy_vacuum()?). > We're always going to have the opportunity to do index vacuuming with > any left-behind LP_DEAD line pointers in the next VACUUM -- right > after the later VACUUM successfully returns from > lazy_vacuum_all_indexes(). Shrug. It doesn't seem that hard to believe that repeatedly trying to prune the same page could unearth some bugs. E.g. via the heap_prune_record_unused() path in heap_prune_chain(). Hm. I assume somebody checked and verified that old_snapshot_threshold is not in use? Seems unlikely, but wrongly entering that heap_prune_record_unused() path could certainly cause issues like we're observing. Greetings, Andres Freund
В списке pgsql-bugs по дате отправления: