Re: BUG #18146: Rows reappearing in Tables after Auto-Vacuum Failure in PostgreSQL on Windows
От | Thomas Munro |
---|---|
Тема | Re: BUG #18146: Rows reappearing in Tables after Auto-Vacuum Failure in PostgreSQL on Windows |
Дата | |
Msg-id | CA+hUKG+-2rjGZC2kwqr2NMLBcEBp4uf59QT1advbWYF_uc+0Aw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: BUG #18146: Rows reappearing in Tables after Auto-Vacuum Failure in PostgreSQL on Windows (Thomas Munro <thomas.munro@gmail.com>) |
Ответы |
Re: BUG #18146: Rows reappearing in Tables after Auto-Vacuum Failure in PostgreSQL on Windows
|
Список | pgsql-bugs |
On Fri, Oct 6, 2023 at 9:18 AM Thomas Munro <thomas.munro@gmail.com> wrote: > On Fri, Oct 6, 2023 at 8:55 AM Thomas Munro <thomas.munro@gmail.com> wrote: > > ... DELAY_CHKPT_COMPLETE ... > > About that... If the lights go out after the truncation and the > delayed logging of the checkpoint, how do we know the truncation has > actually reached the disk? mdtruncate() enqueues fsync() calls, but > if we were already in phase 2 (see proc.h) of a checkpoint at that > moment, they might be processed by the *next* checkpoint, not the one > whose phase 3 we've carefully delayed there, no? I didn't look into this for very long so I might be missing something here, but I think there could be at least one bad sequence. If you insert a couple of sleeps to jinx the scheduling, and hack RelationTruncate() to request a checkpoint at a carefully chosen wrong moment (see attached), then: postgres=# create table t (i int); CREATE TABLE postgres=# insert into t select 1 from generate_series(1, 100); INSERT 0 100 postgres=# checkpoint; -- checkpoint #1 just puts some data on disk CHECKPOINT postgres=# delete from t; DELETE 100 postgres=# vacuum freeze t; -- truncates, starts unlucky checkpoint #2 VACUUM If you trace the checkpointer's system calls you will see that base/5/16384 (or whatever t's relfilenode is for you) is *not* fsync'd by checkpoint #2. The following checkpoint #3 might eventually do it, but if the kernel loses power after checkpoint #2 completes and there is no checkpoint #3, the kernel might forget the truncation, and yet replay starts too late to redo it. I think that bad sequence looks like this: P1: log truncate P2: choose redo LSN P1: DropRelationBuffers() P2: CheckPointBuffers() P2: ProcessSyncRequests() P1: ftruncate() P1: RegisterSyncRequest() P2: log checkpoint *** system loses power *** I realise it is a different problem than the one reported, but it's close. My initial thought is that perhaps we shouldn't allow a redo LSN to be chosen until the sync request is registered, which is also fairly close to the critical section boundaries being discussed for ftruncate() error case. But that's not a phase the checkpoint delay machinery currently knows how to delay. And there may well be better ways...
Вложения
В списке pgsql-bugs по дате отправления: