Ancient lock bug figured out
От | Tom Lane |
---|---|
Тема | Ancient lock bug figured out |
Дата | |
Msg-id | 12872.975367809@sss.pgh.pa.us обсуждение исходный текст |
Список | pgsql-hackers |
proc.c has the following code --- unchanged since Postgres95 --- in HandleDeadlock(): /* --------------------- * Check to see if we've been awoken by anyone in the interim. * * If we have we canreturn and resume our transaction -- happy day. * Before we are awoken the process releasing the lock grants it to * us so we know that we don't have to wait anymore. * * Damn these names are LONG! -mer * --------------------- */ if (IpcSemaphoreGetCount(MyProc->sem.semId, MyProc->sem.semNum) == IpcSemaphoreDefaultStartValue) { UnlockLockTable(); return; } /* * you would think this would be unnecessary, but... * * this also means we've been removed already. in someports (e.g., * sparc and aix) the semop(2) implementation is such that we can * actually end up in this handlerafter someone has removed us from * the queue and bopped the semaphore *but the test above fails to * detectthe semaphore update* (presumably something weird having to * do with the order in which the semaphore wakeup signaland SIGALRM * get handled). */ if (MyProc->links.prev == INVALID_OFFSET || MyProc->links.next == INVALID_OFFSET) { UnlockLockTable(); return; } Well, the reason control can get to the "apparently unnecessary" part is not some weird portability glitch; it is that the first test is WRONG. IpcSemaphoreGetCount() calls semctl(GETNCNT), which returns not the current value of the semaphore as this code expects, but the number of waiters on the semaphore. Since the process doing this test is the only one that'll ever wait on that semaphore, it is impossible for IpcSemaphoreGetCount() to return anything but zero, and so the first if() has never ever succeeded in the entire history of Postgres. IpcSemaphoreGetCount is used nowhere else. I see no particularly good reason to have it at all, and certainly not to have it with a name so easily mistaken for IpcSemaphoreGetValue. It's about to be toast. Next question is whether to recode the first test "correctly" with IpcSemaphoreGetValue(), or just remove it. Since we clearly have done just fine with testing our internal link pointers to detect removal from the queue, I'm inclined to remove the first test and thus save an unnecessary kernel call. Comments? regards, tom lane
В списке pgsql-hackers по дате отправления: