Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune()
От | Matthias van de Meent |
---|---|
Тема | Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune() |
Дата | |
Msg-id | CAEze2Wj7O5tnM_U151Baxr5ObTJafwH=71_JEmgJV+6eBgjL7g@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune() (Peter Geoghegan <pg@bowt.ie>) |
Ответы |
Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune()
|
Список | pgsql-bugs |
On Fri, 29 Oct 2021 at 20:17, Peter Geoghegan <pg@bowt.ie> wrote: > > On Fri, Oct 29, 2021 at 6:30 AM Alexander Lakhin <exclusion@gmail.com> wrote: > > I can propose the debugging patch to reproduce the issue that replaces > > the hang with the assert and modifies a pair of crash-causing test > > scripts to simplify the reproducing. (Sorry, I have no time now to prune > > down the scripts further as I have to leave for a week.) > > This bug is similar to the one fixed in commit d9d8aa9b. And so I > wonder if code like GlobalVisTestFor() is missing something that it > needs for partitioned tables. Without `autovacuum = off; fsync = off` I could not replicate the issue in the configured 10m time window; with those options I did get the reported trace in minutes. I think that I also have found the culprit, which is something we talked about in [0]: GlobalVisState->maybe_needed was not guaranteed to never move backwards when recalculated, and because vacuum can update its snapshot bounds (heap_prune_satisfies_vacuum -> GlobalVisTestIsRemovableFullXid -> GlobalVisUpdate) this maybe_needed could move backwards, resulting in the observed behaviour. It was my understanding based on the mail conversation that Andres would fix this observed issue too while fixing [0] (whose fix was included with beta 2), but apparently I was wrong; I can't find the code for 'maybe_needed'-won't-move-backwards-in-a-backend. I (again) propose the attached patch, which ensures that this maybe_needed field will not move backwards for a backend. It is based on 14, but should be applied on head as well, because it's lacking there as well. Another alternative would be to replace the use of vacrel->OldestXmin with `vacrel->vistest->maybe_needed` in lazy_scan_prune, but I believe that is not legal in how vacuum works (we cannot unilaterally decide that we want to retain tuples < OldestXmin). Note: After fixing the issue with retreating maybe_needed I also hit your segfault, and I'm still trying to find out what the source of that issue might be. I do think it is an issue seperate from stuck vacuum, though. Kind regards, Matthias van de Meent [0] https://www.postgresql.org/message-id/flat/20210609184506.rqm5rikoikm47csf%40alap3.anarazel.de#e9d55b5cfff34238a24dc85c8c75a46f
Вложения
В списке pgsql-bugs по дате отправления: