Re: [PATCHES] Infrastructure changes for recovery
От | Tom Lane |
---|---|
Тема | Re: [PATCHES] Infrastructure changes for recovery |
Дата | |
Msg-id | 22856.1222650961@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: [PATCHES] Infrastructure changes for recovery (Simon Riggs <simon@2ndQuadrant.com>) |
Ответы |
Re: [PATCHES] Infrastructure changes for recovery
|
Список | pgsql-hackers |
Simon Riggs <simon@2ndQuadrant.com> writes: >> It does nothing AFAICS for the >> problem that when restarting archive recovery from a restartpoint, >> it's not clear when it is safe to start letting in backends. You need >> to get past the highest LSN that has made it out to disk, and there is >> no good way to know what that is. > AFAICS when we set minRecoveryLoc we *never* unset it. It's recorded in > the controlfile, so whenever we restart we can see that it has been set > previously and now we are beyond it. Right ... > So if we crash during recovery and > then restart *after* we reached minRecoveryLoc then we resume in safe > mode almost immediately. Wrong. What minRecoveryLoc is is an upper bound for the LSNs that might be on-disk in the filesystem backup that an archive recovery starts from. (Defined as such, it never changes during a restartpoint crash/restart.) Once you pass that, the on-disk state as modified by any dirty buffers inside the recovery process represents a consistent database state. However, the on-disk state alone is not guaranteed consistent. As you flush some (not all) of your shared buffers you enter other not-certainly-consistent on-disk states. If we crash in such a state, we know how to use the last restartpoint plus WAL replay to recover to another state in which disk + dirty buffers are consistent. However, we reach such a state only when we have read WAL to beyond the highest LSN that has reached disk --- and in recovery mode there is no clean way to determine what that was. Perhaps a solution is to make XLogFLush not be a no-op in recovery mode, but have it scribble a highest-LSN somewhere on stable storage (maybe scribble on pg_control itself, or maybe better someplace else). I'm not totally sure about that. But I am sure that doing nothing will be unreliable. regards, tom lane
В списке pgsql-hackers по дате отправления: