Simon Riggs wrote:
>> We could avoid that by performing a good old startup checkpoint, but I
>> quite like the fast failover time we get without it.
>
> ISTM it's either slow failover or (fast failover, but restart archive
> recovery if crashes).
>
> I would suggest that at end of recovery we write the last LSN to the
> control file, so if we crash recover then we will always end archive
> recovery at the same place again should we re-enter it. So we would have
> a recovery_target_lsn that overrides recovery_target_xid etc..
Hmm, we don't actually want to end recovery at the same point again: if
there's some updates right after the database came up, but before the
first checkpoint and crash, those actions need to be replayed too.
> Given where we are, I would suggest we go for the slow failover option
> in this release.
Agreed. We could do it for crash recovery, but I'd rather not have two
different ways to do it. It's not as important for crash recovery, either.
>>> We should continue to measure performance of recovery in the light
>> of
>>> these changes. I still feel that fsyncing the control file on each
>>> XLogFileRead() will give a noticeable performance penalty, mostly
>>> because we know doing exactly the same thing in normal running
>> caused a
>>> performance penalty. But that is easily changed and cannot be done
>> with
>>> any certainty without wider feedback, so no reason to delay code
>> commit.
>>
>> I've changed the way minRecoveryPoint is updated now anyway, so it no
>> longer happens every XLogFileRead().
>
> Care to elucidate?
I got rid of minSafeStartPoint, advancing minRecoveryPoint instead. And
it's advanced in XLogFlush instead of XLogFileRead. I'll post an updated
patch soon.
-- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com