Re: BUG #10533: 9.4 beta1 assertion failure in autovacuum process
От | Andres Freund |
---|---|
Тема | Re: BUG #10533: 9.4 beta1 assertion failure in autovacuum process |
Дата | |
Msg-id | 20140606231149.GA24880@awork2.anarazel.de обсуждение исходный текст |
Ответ на | Re: BUG #10533: 9.4 beta1 assertion failure in autovacuum process (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-bugs |
On 2014-06-06 18:21:45 -0400, Tom Lane wrote: > Also, there are a bunch of fsync_fname() calls inside critical sections in > replication/slot.c. Seems at best pretty damn risky; what's more, the > critical sections cover only the fsyncs and not anything else, which is > flat out broken. If it was okay to fail just before calling the fsync, > why is it critical to not fail inside it? Somebody was not thinking > clearly there. No, it actually makes sense. If: * the open, write or fsync to the temp file fails: no permanent state has changed. We can gracefully error out. * rename(tmpfile, realname) fails: we know (by posix) that the file hasn't been renamed. The old state is still valid. * if the fsync() to the new file fails (damn unlikely) we don't know which state is valid. So if we'd crash in that moment we might loose our reservation on resources (e.g. catalog xmin). And might start to decode with the wrong catalog state. Bad. On startup we'll try to fsync the slot files again, so we won't startup until that's clear. Why is it that risky? We fdatasync() files while inside a critical section all the time. And we've done the space allocation (the fsync on the old filename) and the rename() outside the critical section. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
В списке pgsql-bugs по дате отправления: