Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
От | Andres Freund |
---|---|
Тема | Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum |
Дата | |
Msg-id | 20211211045710.ljtuu4gfloh754rs@alap3.anarazel.de обсуждение исходный текст |
Ответ на | Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum (Andres Freund <andres@anarazel.de>) |
Ответы |
Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum |
Список | pgsql-bugs |
Hi, On 2021-11-22 14:34:36 -0800, Andres Freund wrote: > I think I'm actually just not sure what the better approach for the > backbranches is. I'm worried about the size of your patch, I'm worried about > the craziness of the current architecture (if you can call it that) of > heap_page_prune() with my patch. Given the lack of further discussion, let's go with the "minimal" fix and your stuff later. Attached is further polished patch - no significant changes, plenty copy-editing stuff. 0001 is the bugfix 0002 adds the additional assertions - I'm wondering about only committing that in HEAD? I think your patch should easily be able to use prstate->htsv? 0003 removes the redundant RelationGetNumberOfBlocks() calls. As 0001 does not change the total number of RelationGetNumberOfBlocks() calls I'm inclined to just apply this to HEAD? 0004 is a patch that tries to address your point about the GlobalVisTestFor() placement, sort of My earlier point that moving it to earlier would be a bad idea because it'd make the horizon "older" was bogus - GlobalVisTestFor() doesn't itself do any horizon determination. However moving the GlobalVisTestFor() earlier still seems wrong - imo the vacuum_set_xid_limits() call should be moved to later. The attached patch moves both, wrapped in a new function, to just before the scan in lazy_scan_heap(). This clearly is HEAD only material - I'm only bringing it up here because the issue of the GlobalVisTestFor() placement was raised in here... > > > I think it's worth to clean up the regression test I wrote and use it > > > regardless of which version of the fix we end up choosing. But I'm a bit bit > > > on the fence - it's quite complicated. > > > > +1 to the idea of keeping it somewhere. Without necessarily running it > > on the BF. > > > > > OTOH, I think with the framework in place it'd not be too hard to write a few > > > more tests for odd pruning scenarios... > > > > Can we commit a test like this without running it by default, at all? > > It's not like it has never been done before. > > Is there a reason not to run it if we commit it? IME tests not run by default > tend to bitrot very quickly. The only issue that I can think of is that it > might be hard to make the test useful and robust in the presence of > debug_discard_caches != 0. I tried pretty hard to get the test to be reliable enough to commit it. But in the end I think using the index locking as a "control" mechanism just isn't great - as evidenced by 0003 breaking the test entirely. I think unfortunately there's not much hope for testing this unless we get wait points - which seems not too close :(. Greetings, Andres Freund
Вложения
В списке pgsql-bugs по дате отправления: