Re: Unexpected "shared memory block is still in use"
От | Tom Lane |
---|---|
Тема | Re: Unexpected "shared memory block is still in use" |
Дата | |
Msg-id | 17642.1557442078@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: Unexpected "shared memory block is still in use" (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: Unexpected "shared memory block is still in use"
|
Список | pgsql-hackers |
I wrote: > However, I have a new theory, after noticing that c09850992 moved the > check for shm_nattch == 0. Previously, if a shmem segment had zero attach > count, it was unconditionally considered not-a-threat. Now, we'll try > shmat() anyway, and if that fails for any reason other than EACCES, we say > SHMSTATE_ANALYSIS_FAILURE which leads to the described error report. > So I suspect that what we hit was a race condition whereby some other > parallel test was using the same shmem ID and we managed to see its > segment successfully in shmctl but then it was gone by the time we did > shmat. This leads me to think that EINVAL and EIDRM failures from > shmat had better be considered SHMSTATE_ENOENT not > SHMSTATE_ANALYSIS_FAILURE. > In principle this is a longstanding race condition, but I wonder > whether we made it more probable by moving the shm_nattch check. Hah --- this is a real race condition, and I can demonstrate it very easily by inserting a sleep right there, as in the attached for-testing-only patch. The particular parallelism level I use is make -s check-world -j4 PROVE_FLAGS='-j4 --quiet --nocolor --nocount' on a dual-socket 4-cores-per-socket Xeon machine. With that command and this patch, I frequently get multiple failures per run, and they all report either EINVAL or EIDRM. The patch generally reports that nattch had been 1, so my thought that that change might've made it worse seems unfounded. But we have absolutely got a hittable race condition here. The real fix should be on the order of if (errno == EACCES) return SHMSTATE_FOREIGN; + else if (errno == EINVAL || errno == EIDRM) + return SHMSTATE_ENOENT; else return SHMSTATE_ANALYSIS_FAILURE; (plus comments of course). regards, tom lane diff --git a/src/backend/port/sysv_shmem.c b/src/backend/port/sysv_shmem.c index a9d7bf9..d390ba4 100644 --- a/src/backend/port/sysv_shmem.c +++ b/src/backend/port/sysv_shmem.c @@ -387,6 +387,8 @@ PGSharedMemoryAttach(IpcMemoryId shmId, if (stat(DataDir, &statbuf) < 0) return SHMSTATE_ANALYSIS_FAILURE; /* can't stat; be conservative */ + pg_usleep(100000); + /* * Attachment fails if we have no write permission. Since that will never * happen with Postgres IPCProtection, such a failure shows the segment is @@ -398,8 +400,9 @@ PGSharedMemoryAttach(IpcMemoryId shmId, { if (errno == EACCES) return SHMSTATE_FOREIGN; - else - return SHMSTATE_ANALYSIS_FAILURE; + elog(LOG, "shmat(0x%lx) failed: %m, nattch had been %ld", + (long) shmId, (long) shmStat.shm_nattch); + return SHMSTATE_ANALYSIS_FAILURE; } *addr = hdr;
В списке pgsql-hackers по дате отправления: