Re: Unlogged relation copy is not fsync'd

Поиск

Список

Период

Сортировка

От	Melanie Plageman
Тема	Re: Unlogged relation copy is not fsync'd
Дата	4 сентября 2023 г. 13:59:14
Msg-id	CAAKRu_b=mvoY+mAnVgK4FDy+BquLMTFxhcyAa8sd9FmPnfHxGA@mail.gmail.com обсуждение исходный текст
Ответ на	Unlogged relation copy is not fsync'd (Heikki Linnakangas <hlinnaka@iki.fi>)
Ответы	Re: Unlogged relation copy is not fsync'd
Список	pgsql-hackers

Дерево обсуждения

On Fri, Aug 25, 2023 at 8:47 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>
> I noticed another missing fsync() with unlogged tables, similar to the
> one at [1].
>
> RelationCopyStorage does this:
>
> >       /*
> >        * When we WAL-logged rel pages, we must nonetheless fsync them.  The
> >        * reason is that since we're copying outside shared buffers, a CHECKPOINT
> >        * occurring during the copy has no way to flush the previously written
> >        * data to disk (indeed it won't know the new rel even exists).  A crash
> >        * later on would replay WAL from the checkpoint, therefore it wouldn't
> >        * replay our earlier WAL entries. If we do not fsync those pages here,
> >        * they might still not be on disk when the crash occurs.
> >        */
> >       if (use_wal ||  copying_initfork)
> >               smgrimmedsync(dst, forkNum);
>
> That 'copying_initfork' condition is wrong. The first hint of that is
> that 'use_wal' is always true for an init fork. I believe this was meant
> to check for 'relpersistence == RELPERSISTENCE_UNLOGGED'. Otherwise,
> this bad thing can happen:
>
> 1. Create an unlogged table
> 2. ALTER TABLE unlogged_tbl SET TABLESPACE ... -- This calls
> RelationCopyStorage
> 3. a checkpoint happens while the command is running
> 4. After the ALTER TABLE has finished, shut down postgres cleanly.
> 5. Lose power
>
> When you recover, the unlogged table is not reset, because it was a
> clean postgres shutdown. But the part of the file that was copied after
> the checkpoint at step 3 was never fsync'd, so part of the file is lost.
> I was able to reproduce with a VM that I kill to simulate step 5.
>
> This is the same scenario that the smgrimmedsync() call above protects
> from for WAL-logged relations. But we got the condition wrong: instead
> of worrying about the init fork, we need to call smgrimmedsync() for all
> *other* forks of an unlogged relation.
>
> Fortunately the fix is trivial, see attached. I suspect we have similar
> problems in a few other places, though. end_heap_rewrite(), _bt_load(),
> and gist_indexsortbuild have a similar-looking smgrimmedsync() calls,
> with no consideration for unlogged relations at all. I haven't tested
> those, but they look wrong to me.

The patch looks reasonable to me. Is this [1] case in hash index build
that I reported but didn't take the time to reproduce similar?

- Melanie

[1] https://www.postgresql.org/message-id/CAAKRu_bPc81M121pOEU7W%3D%2BwSWEebiLnrie4NpaFC%2BkWATFtSA%40mail.gmail.com

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Unlogged relation copy is not fsync'd