Re: An example of bugs for Hot Standby

Поиск

Список

Период

Сортировка

От	Simon Riggs
Тема	Re: An example of bugs for Hot Standby
Дата	17 декабря 2009 г. 21:54:39
Msg-id	1261090456.634.4975.camel@ebony обсуждение исходный текст
Ответ на	Re: An example of bugs for Hot Standby (Simon Riggs <simon@2ndQuadrant.com>)
Ответы	Re: An example of bugs for Hot Standby (Hiroyuki Yamada <yamada@kokolink.net>)
Список	pgsql-hackers

Дерево обсуждения

On Wed, 2009-12-16 at 14:05 +0000, Simon Riggs wrote:
> On Wed, 2009-12-16 at 10:33 +0000, Simon Riggs wrote:
> > On Tue, 2009-12-15 at 20:25 +0900, Hiroyuki Yamada wrote:
> > > Hot Standby node can freeze when startup process calls LockBufferForCleanup().
> > > This bug can be reproduced by the following procedure.
> > 
> > Interesting. Looks like this can happen, which is a shame cos I just
> > removed the wait checking code after not ever having seen a wait.
> > 
> > Thanks for the report.
> > 
> > Must-fix item for HS.
> 
> So this deadlock can happen at two places:
> 
> 1. When a relation lock waits behind an AccessExclusiveLock and then
> Startup runs LockBufferForCleanup()
> 
> 2. When Startup is a pin count waiter and a lock acquire begins to wait
> on a relation lock
> 
> So we must put in direct deadlock detection in both places. We can't use
> the normal deadlock detector because in case (1) the backend might
> already have exceeded deadlock_timeout.
> 
> Proposal:

Better proposal
* It's possible for 3-way deadlocks to occur in Hot Standby mode.* If a user backend sleeps on a lock while it holds a
bufferpin that* leaves open the risk of deadlock. The user backend will only sleep* if it waits behind an
AccessExclusiveLockheld by Startup process.* If the Startup process then tries to access any buffer that is pinned*
thenit too will sleep and neither process will ever wake.** We need to make a deadlock check in two places: in the user
backend*when we sleep on a lock, and in the Startup process when we sleep* on a buffer pin. We need both checks because
thedeadlock can occur* from both directions.** Just before a user backend sleeps on a lock, we accumulate a list of*
bufferspinned by the backend. We then grab the an LWlock* and then check each of the buffers to see if the Startup
processis* waiting on them. If so, we release the lock and throw deadlock error.* If Startup process is not waiting we
thenrecord the pinned buffers* in the BufferDeadlockRisk data structure and release the lock.* When we later get the
lockwe remove the deadlock risk.** When the Startup process is about to wait on a buffer pin it checks* the buffer it
isabout to pin in the BufferDeadlockRisk list. If the* buffer is already held by one or more lock waiters then we send
a*conflict cancel to them and wait for them to die before rechecking* the buffer lock.

This way we only cancel direct deadlocks.

It doesn't solve general problem of buffer waits, but they may be
solvable by different mechanism.

-- Simon Riggs           www.2ndQuadrant.com

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Stephen Frost
Дата: 17 декабря 2009 г., 21:17:11
Сообщение: Re: [PATCH] remove redundant ownership checks

Следующее

От: Greg Williamson
Дата: 17 декабря 2009 г., 22:22:20
Сообщение: Re: PATCH: Spurious "22" in hstore.sgml

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: An example of bugs for Hot Standby

Предыдущее

Следующее