Re: Adjustment of spinlock sleep delays
От | Mike Mascari |
---|---|
Тема | Re: Adjustment of spinlock sleep delays |
Дата | |
Msg-id | 3F3019A5.70102@mascari.com обсуждение исходный текст |
Ответ на | Adjustment of spinlock sleep delays (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: Adjustment of spinlock sleep delays
|
Список | pgsql-hackers |
Tom Lane wrote: > I've been thinking about Ludwig Lim's recent report of a "stuck > spinlock" failure on a heavily loaded machine. Although I originally > found this hard to believe, there is a scenario which makes it > plausible. Suppose that we have a bunch of recently-started backends > as well as one or more that have been running a long time --- long > enough that the scheduler has niced them down a priority level or two. > Now suppose that one of the old-timers gets interrupted while holding > a spinlock (an event of small but nonzero probability), and that before > it can get scheduled again, several of the newer, higher-priority > backends all start trying to acquire the same spinlock. The "acquire" > code looks like "try to grab the spinlock a few times, then sleep for > 10 msec, then try again; give up after 1 minute". If there are enough > backends trying this that cycling through all of them takes at least > 10 msec, then the lower-priority backend will never get scheduled, and > after a minute we get the dreaded "stuck spinlock". > > To forestall this scenario, I'm thinking of introducing backoff into the > sleep intervals --- that is, after first failure to get the spinlock, > sleep 10 msec; after the second, sleep 20 msec, then 40, etc, with a > maximum sleep time of maybe a second. The number of iterations would be > reduced so that we still time out after a minute's total delay. > > Comments? Should there be any correlation between the manner by which the backoff occurs and the number of active backends? Mike Mascari mascarm@mascari.com
В списке pgsql-hackers по дате отправления: