Re: Changeset Extraction v7.0 (was logical changeset generation)
От | Andres Freund |
---|---|
Тема | Re: Changeset Extraction v7.0 (was logical changeset generation) |
Дата | |
Msg-id | 20140123120503.GB7182@awork2.anarazel.de обсуждение исходный текст |
Ответ на | Re: Changeset Extraction v7.0 (was logical changeset generation) (Robert Haas <robertmhaas@gmail.com>) |
Ответы |
Re: Changeset Extraction v7.0 (was logical changeset generation)
|
Список | pgsql-hackers |
Hi, On 2014-01-22 13:00:44 -0500, Robert Haas wrote: > Well, apparently, one is going to PANIC and reinitialize the system. > I presume that upon reinitialization we'll decide that the slot is > gone, and thus won't recreate it in shared memory. Yea, and if it's half-gone we'll continue deletion. And since yesterday evening we'll even fsync things during startup to handle scenarios similar to 20140122162115.GL21170@alap3.anarazel.de . > Of course, if the entire system suffers a hard power failure after that and before the > directory is succesfully fsync'd, then the slot could reappear on the > next startup. Which is also exactly what would happen if we removed > the slot from shared memory after doing the unlink, and then the > system suffered a hard power failure before the directory contents > made it to disk. Except that we also panicked. Yes, but that could only happen as long as no relevant data has been lost since we hold relevant locks during this. > In the case of shared buffers, the way we handle fsync failures is by > not allowing the system to checkpoint until all of the fsyncs succeed. I don't think shared buffers fsyncs are the apt comparison. It's more something like UpdateControlFile(). Which PANICs. I really don't get why you fight PANICs in general that much. There are some nasty PANICs in postgres which can happen in legitimate situations, which should be made to fail more gracefully, but this surely isn't one of them. We're doing rename(), unlink() and rmdir(). That's it. We should concentrate on the ones that legitimately can happen, not the ones created by an admin running a chmod -R 000 . ; rm -rf $PGDATA or mount -o remount,ro /. We don't increase reliability by a bit adding codepaths that will never get tested. > If there's an OS-level reset before that happens, WAL replay will > perform the same buffer modifications over again and the next > checkpoint will again try to flush them to disk and will not complete > unless it does. That forms a closed system where we never advance the > redo pointer over the covering WAL record until the changes it covers > are on the disk. But I don't think this code has any similar > interlock; if it does, I missed it. No, it doesn't (until the first rename() at least), but the number of failure scenarios is far smaller. Greetings, Andres Freund --Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
В списке pgsql-hackers по дате отправления: