Re: BUG #8686: Standby could not restart.
От | Heikki Linnakangas |
---|---|
Тема | Re: BUG #8686: Standby could not restart. |
Дата | |
Msg-id | 52CABFD8.8090101@vmware.com обсуждение исходный текст |
Ответ на | Re: BUG #8686: Standby could not restart. (Tomonari Katsumata <t.katsumata1122@gmail.com>) |
Ответы |
Re: BUG #8686: Standby could not restart.
|
Список | pgsql-bugs |
On 12/23/2013 08:15 AM, Tomonari Katsumata wrote: >> /* >>> * Initialize shared replayEndRecPtr, >>> lastReplayedEndRecPtr, and >>> * recoveryLastXTime. >>> * >>> * This is slightly confusing if we're starting from an >>> online >>> * checkpoint; we've just read and replayed the >>> checkpoint record, but >>> * we're going to start replay from its redo pointer, >>> which precedes >>> * the location of the checkpoint record itself. So even >>> though the >>> * last record we've replayed is indeed ReadRecPtr, we >>> haven't >>> * replayed all the preceding records yet. That's OK for >>> the current >>> * use of these variables. >>> */ >>> SpinLockAcquire(&xlogctl->info_lck); >>> xlogctl->replayEndRecPtr = ReadRecPtr; >>> xlogctl->lastReplayedEndRecPtr = EndRecPtr; >>> xlogctl->recoveryLastXTime = 0; >>> xlogctl->currentChunkStartTime = 0; >>> xlogctl->recoveryPause = false; >>> SpinLockRelease(&xlogctl->info_lck); >>> >> >> I think we need to fix that confusion. Your patch will do it by not >> setting EndRecPtr yet; that fixes the bug, but leaves those variables in a >> slightly strange state; I'm not sure what EndRecPtr points to in that case >> (0 ?), but ReadRecPtr would be set I guess. >> > Yes, the values were set like below. > ReadRecPtr:1/8E7F0B0 > EndRecPtr:0/0 > > > >> >> Perhaps we should reset replayEndRecPtr and lastReplayedEndRecPtr to the >> REDO point here, instead of ReadRecPtr/EndRecPtr. > > I made another patch. > I added a ReadRecord to make sure the REDO location is present or not. > The similar process are done when we use backup_label. > > Because the ReadRecord returns a record already read, > I set ReadRecPtr of the record to EndRecPtr. > And also I set record->xl_prev to ReadRecPtr. > As you said, it also worked fine. > > I'm not sure we should do same thing when crash recovery occurs, but now I > added the process when archive recovery is needed. > > Please see attached patch. Hmm. That would still initialize lastReplayedEndRecPtr to the checkpoint record, when you do crash recovery (or begin archive recovery from a filesystem snapshot, where you perform crash recovery before starting to read the archive). I'm not very comfortable with that, even though I don't see an immediate problem with it. I also noticed that CheckRecoveryConsistency() compares backupEndPoint with EndRecPtr, but minRecoveryPoint with lastReplayedEndRecPtr. That's correct as the code stands, but it's an accident waiting to happen: if you called CheckRecoveryConsistency() after reading a record with ReadRecord(), but before fully replaying it, it might conclude that it's reached the backup end location one record too early. And it's inconsistent, anyway. I propose the attached patch. I haven't tested it, but I think it's a slightly more robust fix. - Heikki
Вложения
В списке pgsql-bugs по дате отправления: