Re: BUG #10533: 9.4 beta1 assertion failure in autovacuum process

Поиск

Список

Период

Сортировка

От	Andres Freund
Тема	Re: BUG #10533: 9.4 beta1 assertion failure in autovacuum process
Дата	6 июня 2014 г. 23:12:04
Msg-id	20140606231149.GA24880@awork2.anarazel.de обсуждение исходный текст
Ответ на	Re: BUG #10533: 9.4 beta1 assertion failure in autovacuum process (Tom Lane <tgl@sss.pgh.pa.us>)
Список	pgsql-bugs

Дерево обсуждения

On 2014-06-06 18:21:45 -0400, Tom Lane wrote:
> Also, there are a bunch of fsync_fname() calls inside critical sections in
> replication/slot.c.  Seems at best pretty damn risky; what's more, the
> critical sections cover only the fsyncs and not anything else, which is
> flat out broken.  If it was okay to fail just before calling the fsync,
> why is it critical to not fail inside it?  Somebody was not thinking
> clearly there.

No, it actually makes sense. If:
* the open, write or fsync to the temp file fails: no permanent state
  has changed. We can gracefully error out.
* rename(tmpfile, realname) fails: we know (by posix) that the file
  hasn't been renamed. The old state is still valid.
* if the fsync() to the new file fails (damn unlikely) we don't know
  which state is valid. So if we'd crash in that moment we might loose
  our reservation on resources (e.g. catalog xmin). And might start to
  decode with the wrong catalog state. Bad. On startup we'll try to
  fsync the slot files again, so we won't startup until that's clear.

Why is it that risky? We fdatasync() files while inside a critical
section all the time. And we've done the space allocation (the fsync on
the old filename) and the rename() outside the critical section.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-bugs по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: BUG #10533: 9.4 beta1 assertion failure in autovacuum process