Questionable coding in proc.c & lock.c
От | Tom Lane |
---|---|
Тема | Questionable coding in proc.c & lock.c |
Дата | |
Msg-id | 9030.964728860@sss.pgh.pa.us обсуждение исходный текст |
Ответы |
RE: Questionable coding in proc.c & lock.c
RE: Questionable coding in proc.c & lock.c |
Список | pgsql-hackers |
I spent some time looking around for possible causes of the recently reported deadlock conditions. I couldn't find any smoking gun, but I found a couple of things that look fishy: 1. I notice that ProcReleaseAll does not act quite the same way that ProcRelease does. Look in particular at the logic for handling wakeupNeeded in each one. There is some useless code in ProcRelease, but the bottom line is that it *will* call ProcLockWakeup() unless it finds there are no remaining holders (and deletes the lock instead). OTOH, ProcReleaseAll will only call ProcLockWakeup() if this test succeeds for some lockmode: if (!wakeupNeeded && xidLook->holders[i] > 0 && lockMethodTable->ctl->conflictTab[i] & lock->waitMask) wakeupNeeded = true; I spent some time trying to see a way that this might fail to wake up a waiter who needs to be woken up. I don't see one, but either this code is broken or ProcRelease is doing unnecessary work. 2. The loop in ProcLockWakeup() has been broken since the day it was written. It's trying to scan through the list of waiting processes and awaken all that can legitimately be awoken. However, as soon as it finds one that cannot be woken, it sets last_locktype = proc->token and then continues the loop *without advancing proc*. All subsequent iterations will compare proc->token == last_locktype, find they are equal, and then continue --- again without advancing proc. It's not an infinite loop because there's a list-length counter, but none of the proc entries beyond the first unawakenable one will be examined. The net effect is that the behavior is the same as it was in 6.3 or so, when the loop was just "while (LockResolveConflicts(...) succeeds) do remove-and-awaken-first-entry". So the question is, is there ever a case where it is necessary to wake processes that are in the list beyond one that can't be woken? I haven't been able to conjure up an example, but I am suspicious. This error cannot explain any new hangs seen in 7.0, since the effective behavior of ProcLockWakeup() hasn't changed since Postgre95. But perhaps someone made a change elsewhere relying on the assumption that ProcLockWakeup() would wake up any available waiter. Comments anyone? regards, tom lane
В списке pgsql-hackers по дате отправления: