Re: Unexpected "shared memory block is still in use"
От | Noah Misch |
---|---|
Тема | Re: Unexpected "shared memory block is still in use" |
Дата | |
Msg-id | 20190510072213.GA1098093@rfd.leadboat.com обсуждение исходный текст |
Ответ на | Re: Unexpected "shared memory block is still in use" (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: Unexpected "shared memory block is still in use"
|
Список | pgsql-hackers |
On Thu, May 09, 2019 at 06:47:58PM -0400, Tom Lane wrote: > I wrote: > > However, I have a new theory, after noticing that c09850992 moved the > > check for shm_nattch == 0. Previously, if a shmem segment had zero attach > > count, it was unconditionally considered not-a-threat. Now, we'll try > > shmat() anyway, and if that fails for any reason other than EACCES, we say > > SHMSTATE_ANALYSIS_FAILURE which leads to the described error report. > > So I suspect that what we hit was a race condition whereby some other > > parallel test was using the same shmem ID and we managed to see its > > segment successfully in shmctl but then it was gone by the time we did > > shmat. This leads me to think that EINVAL and EIDRM failures from > > shmat had better be considered SHMSTATE_ENOENT not > > SHMSTATE_ANALYSIS_FAILURE. > > In principle this is a longstanding race condition, but I wonder > > whether we made it more probable by moving the shm_nattch check. > > Hah --- this is a real race condition, and I can demonstrate it very > easily by inserting a sleep right there, as in the attached > for-testing-only patch. > > The particular parallelism level I use is > > make -s check-world -j4 PROVE_FLAGS='-j4 --quiet --nocolor --nocount' > > on a dual-socket 4-cores-per-socket Xeon machine. With that command and > this patch, I frequently get multiple failures per run, and they all > report either EINVAL or EIDRM. > > The patch generally reports that nattch had been 1, so my thought that > that change might've made it worse seems unfounded. But we have > absolutely got a hittable race condition here. The real fix should > be on the order of > > if (errno == EACCES) > return SHMSTATE_FOREIGN; > + else if (errno == EINVAL || errno == EIDRM) > + return SHMSTATE_ENOENT; > else > return SHMSTATE_ANALYSIS_FAILURE; > > (plus comments of course). Looks good. That is basically a defect in commit c09850992; the race passed from irrelevance to relevance when that commit subjected more segments to the test. Thanks for diagnosing it.
В списке pgsql-hackers по дате отправления: