Adjustment of spinlock sleep delays
От | Tom Lane |
---|---|
Тема | Adjustment of spinlock sleep delays |
Дата | |
Msg-id | 12131.1060114288@sss.pgh.pa.us обсуждение исходный текст |
Ответы |
Re: Adjustment of spinlock sleep delays
Re: Adjustment of spinlock sleep delays |
Список | pgsql-hackers |
I've been thinking about Ludwig Lim's recent report of a "stuck spinlock" failure on a heavily loaded machine. Although I originally found this hard to believe, there is a scenario which makes it plausible. Suppose that we have a bunch of recently-started backends as well as one or more that have been running a long time --- long enough that the scheduler has niced them down a priority level or two. Now suppose that one of the old-timers gets interrupted while holding a spinlock (an event of small but nonzero probability), and that before it can get scheduled again, several of the newer, higher-priority backends all start trying to acquire the same spinlock. The "acquire" code looks like "try to grab the spinlock a few times, then sleep for 10 msec, then try again; give up after 1 minute". If there are enough backends trying this that cycling through all of them takes at least 10 msec, then the lower-priority backend will never get scheduled, and after a minute we get the dreaded "stuck spinlock". To forestall this scenario, I'm thinking of introducing backoff into the sleep intervals --- that is, after first failure to get the spinlock, sleep 10 msec; after the second, sleep 20 msec, then 40, etc, with a maximum sleep time of maybe a second. The number of iterations would be reduced so that we still time out after a minute's total delay. Comments? regards, tom lane
В списке pgsql-hackers по дате отправления: