Tom Lane <tgl@sss.pgh.pa.us> writes:
> > On contented case you'll want task switch anyway, so the futex
> > managing should not matter.
>
> No, we DON'T want a task switch. That's the entire point: in a
> multiprocessor, it's a good bet that the spinlock is held by a task
> running on another processor, and doing a task switch will take orders
> of magnitude longer than just spinning until the lock is released.
> You should yield only after spinning long enough to make it a strong
> probability that the spinlock is held by a process that's lost the
> CPU and needs to be rescheduled.
Does the futex code make any attempt to record the CPU of the process grabbing
the lock? Clearly it wouldn't be a guarantee of anything but if it's only used
for short-lived spinlocks while acquiring longer lived locks then maybe?
> No; that page still says specifically "So a process calling
> sched_yield() now must wait until all other runnable processes in the
> system have used up their time slices before it will get the processor
> again." I can prove that that is NOT what happens, at least not on
> a multi-CPU Opteron with current FC4 kernel. However, if the newer
> kernels penalize a process calling sched_yield as heavily as this page
> claims, then it's not what we want anyway ...
Well it would be no worse than select or any other random i/o syscall.
It seems to me what you've found is an outright bug in the linux scheduler.
Perhaps posting it to linux-kernel would be worthwhile.
--
greg