Re: [sqlsmith] Unpinning error in parallel worker
От | Thomas Munro |
---|---|
Тема | Re: [sqlsmith] Unpinning error in parallel worker |
Дата | |
Msg-id | CAEepm=0Y_nw=kp-YZtnkvhySYxu4PPONWeF4ap=M05g-STgKbg@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: [sqlsmith] Unpinning error in parallel worker (Jonathan Rudenberg <jonathan@titanous.com>) |
Ответы |
Re: [sqlsmith] Unpinning error in parallel worker
|
Список | pgsql-hackers |
On Wed, Apr 25, 2018 at 2:21 AM, Jonathan Rudenberg <jonathan@titanous.com> wrote: > This issue happened again in production, here are the stack traces from three we grabbed before nuking the >400 hangingbackends. > > [...] > #4 0x000055fccb93b21c in LWLockAcquire+188() at /usr/lib/postgresql/10/bin/postgres at lwlock.c:1233 > #5 0x000055fccb925fa7 in dsm_create+151() at /usr/lib/postgresql/10/bin/postgres at dsm.c:493 > #6 0x000055fccb6f2a6f in InitializeParallelDSM+511() at /usr/lib/postgresql/10/bin/postgres at parallel.c:266 > [...] Thank you. These stacks are all blocked trying to acquire DynamicSharedMemoryControlLock. My theory is that they can't because one backend -- the one that emitted the error "FATAL: cannot unpin a segment that is not pinned" -- is deadlocked against itself. After emitting that error you can see from Andreas's "seabisquit" stack that that shmem_exit() runs dsm_backend_shutdown() which runs dsm_detach() which tries to acquire DynamicSharedMemoryControlLock again, even though we already hold it at that point. I'll write a patch to fix that unpleasant symptom. While holding DynamicSharedMemoryControlLock we shouldn't raise any errors without releasing it first, because the error handling path will try to acquire it again. That's a horrible failure mode as you have discovered. But that isn't the root problem: we shouldn't be raising that error, and I'd love to see the stack of the one process that did that and then self-deadlocked. I will have another go at trying to reproduce it here today. -- Thomas Munro http://www.enterprisedb.com
В списке pgsql-hackers по дате отправления: