Re: BUG #18146: Rows reappearing in Tables after Auto-Vacuum Failure in PostgreSQL on Windows

Поиск

Список

Период

Сортировка

От	Thomas Munro
Тема	Re: BUG #18146: Rows reappearing in Tables after Auto-Vacuum Failure in PostgreSQL on Windows
Дата	6 октября 2023 г. 04:46:23
Msg-id	CA+hUKG+-2rjGZC2kwqr2NMLBcEBp4uf59QT1advbWYF_uc+0Aw@mail.gmail.com обсуждение исходный текст
Ответ на	Re: BUG #18146: Rows reappearing in Tables after Auto-Vacuum Failure in PostgreSQL on Windows (Thomas Munro <thomas.munro@gmail.com>)
Ответы	Re: BUG #18146: Rows reappearing in Tables after Auto-Vacuum Failure in PostgreSQL on Windows
Список	pgsql-bugs

Дерево обсуждения

On Fri, Oct 6, 2023 at 9:18 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> On Fri, Oct 6, 2023 at 8:55 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> > ... DELAY_CHKPT_COMPLETE ...
>
> About that...  If the lights go out after the truncation and the
> delayed logging of the checkpoint, how do we know the truncation has
> actually reached the disk?  mdtruncate() enqueues fsync() calls, but
> if we were already in phase 2 (see proc.h) of a checkpoint at that
> moment, they might be processed by the *next* checkpoint, not the one
> whose phase 3 we've carefully delayed there, no?

I didn't look into this for very long so I might be missing something
here, but I think there could be at least one bad sequence.  If you
insert a couple of sleeps to jinx the scheduling, and hack
RelationTruncate() to request a checkpoint at a carefully chosen wrong
moment (see attached), then:

postgres=# create table t (i int);
CREATE TABLE
postgres=# insert into t select 1 from generate_series(1, 100);
INSERT 0 100
postgres=# checkpoint; -- checkpoint #1 just puts some data on disk
CHECKPOINT
postgres=# delete from t;
DELETE 100
postgres=# vacuum freeze t; -- truncates, starts unlucky checkpoint #2
VACUUM

If you trace the checkpointer's system calls you will see that
base/5/16384 (or whatever t's relfilenode is for you) is *not* fsync'd
by checkpoint #2.  The following checkpoint #3 might eventually do it,
but if the kernel loses power after checkpoint #2 completes and there
is no checkpoint #3, the kernel might forget the truncation, and yet
replay starts too late to redo it.  I think that bad sequence looks
like this:

P1: log truncate
P2:                        choose redo LSN
P1: DropRelationBuffers()
P2:                        CheckPointBuffers()
P2:                        ProcessSyncRequests()
P1: ftruncate()
P1: RegisterSyncRequest()
P2:                        log checkpoint
             *** system loses power ***

I realise it is a different problem than the one reported, but it's
close.  My initial thought is that perhaps we shouldn't allow a redo
LSN to be chosen until the sync request is registered, which is also
fairly close to the critical section boundaries being discussed for
ftruncate() error case.  But that's not a phase the checkpoint delay
machinery currently knows how to delay.  And there may well be better
ways...

Вложения

hack.diff

В списке pgsql-bugs по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: BUG #18146: Rows reappearing in Tables after Auto-Vacuum Failure in PostgreSQL on Windows

Вложения