Re: BUG #18426: Canceling vacuum while truncating a relation leads to standby PANIC
От | Thomas Munro |
---|---|
Тема | Re: BUG #18426: Canceling vacuum while truncating a relation leads to standby PANIC |
Дата | |
Msg-id | CA+hUKGL8iy7TYgCh_RgWFiAT81MhsA5DyGP4_cWbTdc0CMm2-g@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: BUG #18426: Canceling vacuum while truncating a relation leads to standby PANIC (Thomas Munro <thomas.munro@gmail.com>) |
Ответы |
Re: BUG #18426: Canceling vacuum while truncating a relation leads to standby PANIC
|
Список | pgsql-bugs |
On Fri, Apr 12, 2024 at 6:41 PM Thomas Munro <thomas.munro@gmail.com> wrote: > On Thu, Apr 11, 2024 at 6:00 AM Alexander Lakhin <exclusion@gmail.com> wrote: > > 10.04.2024 14:00, PG Bug reporting form wrote: > > > The following bug has been logged on the website: > > > > > > Bug reference: 18426 > > > ... > > > A demo test for the issue to follow... > > I didn't try your test but your explanation seems clear. > RelationTruncate() logs first, then calls smgrtruncate() which drops > buffers and then truncates files. The dropping-the-buffers phase is > now interruptible, since commit d87251048a0f. If you interrupt it > there, the situation is bad: you have logged the truncation, but left > (1) buffers and (2) untruncated files on the primary. Relation size > being out of sync is a recipe for that PANIC next time the WAL > mentions blocks past the (primary's) end. First thought is that that > particular wait might need to hold interrupts. Hmm. The comments for > RelationTruncate() contemplate but reject a critical section. > Presumably it's waiting for another backend to flush data, and that > other backend will eventually finish doing that or fail/crash. That surely needs fixing, but while thinking about the difference between holding interrupts and declaring a critical section, I'm wondering if the lack of the latter has other pre-existing nasty failure modes: 1. We throw away potentially dirty buffers, and then we ereport while trying to truncate a file: now what stops some old ghost block contents from coming back to life (read from disk in the untruncated file)? 2. We already told downstream servers to truncate. Now the sizes are out of sync, so what stops us logging more references to the ghost pages and panicking replicas? (Same as this interruption issue).
В списке pgsql-bugs по дате отправления: