Re: Inadequate thought about buffer locking during hot standby replay

Поиск

Список

Период

Сортировка

От	Simon Riggs
Тема	Re: Inadequate thought about buffer locking during hot standby replay
Дата	10 ноября 2012 г. 17:05:40
Msg-id	CA+U5nM+p6ze4PMd6YvcMDrVru01_OcYZ=43T7EzsoqQgEpt8eQ@mail.gmail.com обсуждение исходный текст
Ответ на	Inadequate thought about buffer locking during hot standby replay (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы	Re: Inadequate thought about buffer locking during hot standby replay
Список	pgsql-hackers

Дерево обсуждения

On 9 November 2012 23:24, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> During normal running, operations such as btree page splits are
> extremely careful about the order in which they acquire and release
> buffer locks, if they're doing something that concurrently modifies
> multiple pages.
>
> During WAL replay, that all goes out the window.  Even if an individual
> WAL-record replay function does things in the right order for "standard"
> cases, RestoreBkpBlocks has no idea what it's doing.  So if one or more
> of the referenced pages gets treated as a full-page image, we are left
> with no guarantee whatsoever about what order the pages are restored in.
> That never mattered when the code was originally designed, but it sure
> matters during Hot Standby when other queries might be able to see the
> intermediate states.
>
> I can't prove that this is the cause of bug #7648, but it's fairly easy
> to see that it could explain the symptom.  You only need to assume that
> the page-being-split had been handled as a full-page image, and that the
> new right-hand page had gotten allocated by extending the relation.
> Then there will be an interval just after RestoreBkpBlocks does its
> thing where the updated left-hand sibling is in the index and is not
> locked in any way, but its right-link points off the end of the index.
> If a few indexscans come along before the replay process gets to
> continue, you'd get exactly the reported errors.
>
> I'm inclined to think that we need to fix this by getting rid of
> RestoreBkpBlocks per se, and instead having the per-WAL-record restore
> routines dictate when each full-page image is restored (and whether or
> not to release the buffer lock immediately).  That's not going to be a
> small change unfortunately :-(

No, but it looks like a clear bug scenario and a clear resolution also.

I'll start looking at it.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Inadequate thought about buffer locking during hot standby replay