Re: Apparent deadlock 7.0.1
От | Tom Lane |
---|---|
Тема | Re: Apparent deadlock 7.0.1 |
Дата | |
Msg-id | 14084.960432230@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: Apparent deadlock 7.0.1 (Michael Simms <grim@ewtoo.org>) |
Список | pgsql-hackers |
Michael Simms <grim@ewtoo.org> writes: >>>> I have noticed a deadlock happening on 7.0.1 on updates. >>>> The backends just lock, and take up as much CPU as they can. I kill >>>> the postmaster, and the backends stay alive, using CPU at the highest >>>> rate possible. The operations arent that expensive, just a single line >>>> of update. >>>> Anyone else seen this? Anyone dealing with this? >> >> News to me. What sort of hardware are you running on? It sort of >> sounds like the spinlock code not working as it should --- and since >> spinlocks are done with platform-dependent assembler, it matters... > The hardware/software is: > Linux kernel 2.2.15 (SMP kernel) > Glibc 2.1.1 > Dual Intel PIII/500 Dual CPUs huh? I have heard of motherboards that have (misdesigned) memory caching such that the two CPUs don't reliably see each others' updates to a shared memory location. Naturally that plays hell with the spinlock code :-(. It might be necessary to insert some kind of cache- flushing instruction into the spinlock wait loop to ensure that the CPUs see each others' changes to the lock. This is all theory at this point, and a hole in the theory is that the backends ought to give up with a "stuck spinlock" error after a minute or two of not being able to grab the lock. I assume you have left them go at it for longer than that without seeing such an error? Anyway, the next step is to "kill -ABORT" some of the stuck processes and get backtraces from their coredumps to see where they are stuck. If you find they are inside s_lock() then it's definitely some kind of spinlock problem. If not... regards, tom lane
В списке pgsql-hackers по дате отправления: