Re: BUG #15585: infinite DynamicSharedMemoryControlLock waiting inparallel query

Поиск

Список

Период

Сортировка

От	Thomas Munro
Тема	Re: BUG #15585: infinite DynamicSharedMemoryControlLock waiting inparallel query
Дата	24 января 2019 г. 19:56:00
Msg-id	CAEepm=3ynb5nBhKQRts0bNETA1HzNxz6-3RTPOzCbM8oQ9yPdg@mail.gmail.com обсуждение исходный текст
Ответ на	Re: BUG #15585: infinite DynamicSharedMemoryControlLock waiting in parallel query (Sergei Kornilov <sk@zsrv.org>)
Ответы	Re: BUG #15585: infinite DynamicSharedMemoryControlLock waiting in parallel query
Список	pgsql-bugs

Дерево обсуждения

On Thu, Jan 24, 2019 at 11:56 PM Sergei Kornilov <sk@zsrv.org> wrote:
> We should not call dsm_backend_shutdown twice in same process, right? So we tried call dsm_detach on same segment
0x5624578710c8twice, but this is unexpected behavior and refcnt would be incorrect. And seems we can not LWLockAcquire
lockand then LWLockAcquire same lock again without release. And here we have infinite waiting. 

Yeah, I think your analysis is right.  It shouldn't do so while
holding the lock.  dsm_unpin_segment() should perhaps release it
before it raises an error, something like:

diff --git a/src/backend/storage/ipc/dsm.c b/src/backend/storage/ipc/dsm.c
index 36904d2676..b989c0b94a 100644
--- a/src/backend/storage/ipc/dsm.c
+++ b/src/backend/storage/ipc/dsm.c
@@ -924,9 +924,15 @@ dsm_unpin_segment(dsm_handle handle)
         * called on a segment which is pinned.
         */
        if (control_slot == INVALID_CONTROL_SLOT)
+       {
+               LWLockRelease(DynamicSharedMemoryControlLock);
                elog(ERROR, "cannot unpin unknown segment handle");
+       }
        if (!dsm_control->item[control_slot].pinned)
+       {
+               LWLockRelease(DynamicSharedMemoryControlLock);
                elog(ERROR, "cannot unpin a segment that is not pinned");
+       }
        Assert(dsm_control->item[control_slot].refcnt > 1);

        /*

I have contemplated that before, but not done it because I'm not sure
about the state of the system after that; we just shouldn't be in this
situation, because if we are, it means that we can error out when
later segments (in the array dsa_release_in_place() loops through)
remain pinned forever and we'll leak memory and run out of DSM slots.
Segment pinning is opting out of resource owner control, which means
the client code is responsible for not screwing it up.  Perhaps that
suggests we should PANIC, or perhaps just LOG and continue, but I'm
not sure.

I think the root cause is earlier and in a different process (see
ProcessInterrupt() in the stack).  Presumably one that reported
"dsa_area could not attach to segment" is closer to the point where
things go wrong.  If you are in a position to reproduce this on a
modified source tree, it'd be good to see the back trace for that, to
figure out which of a couple of possible code paths reach it.  Perhaps
you could do that by enabling core files and changing this:

-                       elog(ERROR, "dsa_area could not attach to segment");
+                       elog(PANIC, "dsa_area could not attach to segment");

I have so far not succeeded in reaching that condition.

--
Thomas Munro
http://www.enterprisedb.com

В списке pgsql-bugs по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: BUG #15585: infinite DynamicSharedMemoryControlLock waiting inparallel query