Re: BUG #18426: Canceling vacuum while truncating a relation leads to standby PANIC

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Re: BUG #18426: Canceling vacuum while truncating a relation leads to standby PANIC
Дата
Msg-id CA+hUKGL8iy7TYgCh_RgWFiAT81MhsA5DyGP4_cWbTdc0CMm2-g@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BUG #18426: Canceling vacuum while truncating a relation leads to standby PANIC  (Thomas Munro <thomas.munro@gmail.com>)
Ответы Re: BUG #18426: Canceling vacuum while truncating a relation leads to standby PANIC  (Thomas Munro <thomas.munro@gmail.com>)
Список pgsql-bugs
On Fri, Apr 12, 2024 at 6:41 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> On Thu, Apr 11, 2024 at 6:00 AM Alexander Lakhin <exclusion@gmail.com> wrote:
> > 10.04.2024 14:00, PG Bug reporting form wrote:
> > > The following bug has been logged on the website:
> > >
> > > Bug reference:      18426
> > > ...
> > > A demo test for the issue to follow...
>
> I didn't try your test but your explanation seems clear.
> RelationTruncate() logs first, then calls smgrtruncate() which drops
> buffers and then truncates files.  The dropping-the-buffers phase is
> now interruptible, since commit d87251048a0f.  If you interrupt it
> there, the situation is bad: you have logged the truncation, but left
> (1) buffers and (2) untruncated files on the primary.  Relation size
> being out of sync is a recipe for that PANIC next time the WAL
> mentions blocks past the (primary's) end.  First thought is that that
> particular wait might need to hold interrupts.  Hmm.  The comments for
> RelationTruncate() contemplate but reject a critical section.
> Presumably it's waiting for another backend to flush data, and that
> other backend will eventually finish doing that or fail/crash.

That surely needs fixing, but while thinking about the difference
between holding interrupts and declaring a critical section, I'm
wondering if the lack of the latter has other pre-existing nasty
failure modes:

1.  We throw away potentially dirty buffers, and then we ereport while
trying to truncate a file: now what stops some old ghost block
contents from coming back to life (read from disk in the untruncated
file)?
2.  We already told downstream servers to truncate.  Now the sizes are
out of sync, so what stops us logging more references to the ghost
pages and panicking replicas?  (Same as this interruption issue).



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Devrim Gündüz
Дата:
Сообщение: Re: Facing issue while installing postgres14 on rhel 9.2 machine
Следующее
От: PG Bug reporting form
Дата:
Сообщение: BUG #18429: Inconsistent results on similar queries with join lateral