Re: BUG #18031: Segmentation fault after deadlock within VACUUM's parallel worker
От | Masahiko Sawada |
---|---|
Тема | Re: BUG #18031: Segmentation fault after deadlock within VACUUM's parallel worker |
Дата | |
Msg-id | CAD21AoAVW54YBBPCXKubODCBX9+NS0XW5MDuFNZ9SRt_GBHs5Q@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: BUG #18031: Segmentation fault after deadlock within VACUUM's parallel worker (Andres Freund <andres@anarazel.de>) |
Ответы |
Re: BUG #18031: Segmentation fault after deadlock within VACUUM's parallel worker
|
Список | pgsql-bugs |
Hi, On Tue, Jul 25, 2023 at 12:39 PM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > Thanks for finding / debugging this issue! > > On 2023-07-21 17:01:11 +0900, Masahiko Sawada wrote: > > I've reproduced the issue in my environment with the provided script. > > The crashed process is not a parallel vacuum worker actually but a > > parallel worker for rebuilding the index. The scenario seems that when > > detecting a deadlock, the process removes itself from the wait queue > > by RemoveFromWaitQueue() (called by CheckDeadLock()), and then > > RemoveFromWaitQueue() is called again by LockErrorCleanup() while > > aborting the transaction. With commit 5764f611e, in > > RemoveFromWaitQueue() we remove the process from the wait queue using > > dclist_delete_from(): > > > > /* Remove proc from lock's wait queue */ > > dclist_delete_from(&waitLock->waitProcs, &proc->links); > > : > > /* Clean up the proc's own state, and pass it the ok/fail signal */ > > proc->waitLock = NULL; > > proc->waitProcLock = NULL; > > proc->waitStatus = PROC_WAIT_STATUS_ERROR; > > > > However, since dclist_delete_from() doesn't clear proc->links, in > > LockErrorCleanup(), dlist_node_is_detached() still returns false: > > > > if (!dlist_node_is_detached(&MyProc->links)) > > { > > /* We could not have been granted the lock yet */ > > RemoveFromWaitQueue(MyProc, lockAwaited->hashcode); > > } > > Indeed :( > > > > leading to calling RemoveFromWaitQueue() again. I think we should use > > dclist_delete_from_thoroughly() instead. With the attached patch, the > > issue doesn't happen in my environment. > > Yep. Do you want to push that fix, or should I? Thank you for your confirmation. I can push the fix if you're okay. I've attached the patch. > > > > Another thing I noticed is that the Assert(waitLock) in > > RemoveFromWaitQueue() is useless actually, since we access *waitLock > > before that: > > > > I think we should fix it as well. This fix is also included in the > > attached patch. > > Don't really have an opinion on that. It's been this way for longer, afaict. True. I would leave this part alone. We can fix it in a separate commit later if necessary. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Вложения
В списке pgsql-bugs по дате отправления: