Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
От | Peter Geoghegan |
---|---|
Тема | Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum |
Дата | |
Msg-id | CAH2-WzkpG9KLQF5sYHaOO_dSVdOjM+dv=nTEn85oNfMUTk836Q@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum (Peter Geoghegan <pg@bowt.ie>) |
Ответы |
Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
|
Список | pgsql-bugs |
On Tue, Nov 9, 2021 at 3:31 PM Peter Geoghegan <pg@bowt.ie> wrote: > Attached is a WIP fix for the bug. The idea here is to follow all HOT > chains in an initial pass over the page, while even following LIVE > heap-only tuples. Any heap-only tuples that we don't determine are > part of some valid HOT chain (following an initial pass over the whole > heap page) will now be processed in a second pass over the page. I realized that I could easily go further than in v1, and totally get rid of the "marked" array (which tracks whether we have decided to mark an item as LP_DEAD/LP_UNUSED/a new LP_REDIRECT/newly pointed to by another LP_REDIRECT). In my v1 from earlier today we already had an array that records whether or not each item is part of any known valid chain, which is strictly better than knowing whether or not they were "marked" earlier. So why bother with the "marked" array at all, even for assertions? It is less robust (not to mention less efficient) than just using the new "fromvalidchain" array. Attached is v2, which gets rid of the "marked" array as described. It also has better worked out comments and assertions. The patch has stood up to a fair amount of stress-testing. I repeated Alexander's original test case for over an hour with this. Getting the test case to cause an assertion failure would usually take about 5 minutes without any fix. I have yet to do any work on validating the performance of this patch, though that definitely needs to happen. Anybody have any thoughts on how far this should be backpatched? We'll probably need to do that for Postgres 14. Less sure about other branches, which haven't been directly demonstrated to be affected by the bug so far. Haven't tried to break earlier branches with Alexander's test case, though I will note again that Alexander couldn't do that when he tried. -- Peter Geoghegan
Вложения
В списке pgsql-bugs по дате отправления: