Re: BUG #18426: Canceling vacuum while truncating a relation leads to standby PANIC

Поиск

Список

Период

Сортировка

От	Thomas Munro
Тема	Re: BUG #18426: Canceling vacuum while truncating a relation leads to standby PANIC
Дата	12 апреля 2024 г. 11:14:31
Msg-id	CA+hUKGL8iy7TYgCh_RgWFiAT81MhsA5DyGP4_cWbTdc0CMm2-g@mail.gmail.com обсуждение исходный текст
Ответ на	Re: BUG #18426: Canceling vacuum while truncating a relation leads to standby PANIC (Thomas Munro <thomas.munro@gmail.com>)
Ответы	Re: BUG #18426: Canceling vacuum while truncating a relation leads to standby PANIC
Список	pgsql-bugs

Дерево обсуждения

On Fri, Apr 12, 2024 at 6:41 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> On Thu, Apr 11, 2024 at 6:00 AM Alexander Lakhin <exclusion@gmail.com> wrote:
> > 10.04.2024 14:00, PG Bug reporting form wrote:
> > > The following bug has been logged on the website:
> > >
> > > Bug reference:      18426
> > > ...
> > > A demo test for the issue to follow...
>
> I didn't try your test but your explanation seems clear.
> RelationTruncate() logs first, then calls smgrtruncate() which drops
> buffers and then truncates files.  The dropping-the-buffers phase is
> now interruptible, since commit d87251048a0f.  If you interrupt it
> there, the situation is bad: you have logged the truncation, but left
> (1) buffers and (2) untruncated files on the primary.  Relation size
> being out of sync is a recipe for that PANIC next time the WAL
> mentions blocks past the (primary's) end.  First thought is that that
> particular wait might need to hold interrupts.  Hmm.  The comments for
> RelationTruncate() contemplate but reject a critical section.
> Presumably it's waiting for another backend to flush data, and that
> other backend will eventually finish doing that or fail/crash.

That surely needs fixing, but while thinking about the difference
between holding interrupts and declaring a critical section, I'm
wondering if the lack of the latter has other pre-existing nasty
failure modes:

1.  We throw away potentially dirty buffers, and then we ereport while
trying to truncate a file: now what stops some old ghost block
contents from coming back to life (read from disk in the untruncated
file)?
2.  We already told downstream servers to truncate.  Now the sizes are
out of sync, so what stops us logging more references to the ghost
pages and panicking replicas?  (Same as this interruption issue).

В списке pgsql-bugs по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: BUG #18426: Canceling vacuum while truncating a relation leads to standby PANIC