Re: Use of backup_label not noted in log
От | Andres Freund |
---|---|
Тема | Re: Use of backup_label not noted in log |
Дата | |
Msg-id | 20231118214915.4vylhux6qq6mhzqr@awork3.anarazel.de обсуждение исходный текст |
Ответ на | Re: Use of backup_label not noted in log (Andres Freund <andres@anarazel.de>) |
Ответы |
Re: Use of backup_label not noted in log
Re: Use of backup_label not noted in log |
Список | pgsql-hackers |
Hi, On 2023-11-18 10:01:42 -0800, Andres Freund wrote: > > What about adding it to the "redo starts at" message, something like > > > > redo starts at 12/12345678, taken from control file > > > > or > > > > redo starts at 12/12345678, taken from backup label > > I think it'd make sense to log use of backup_label earlier than that - the > locations from backup_label might end up not being available in the archive, > the primary or locally, and we'll error out with "could not locate a valid > checkpoint record". > > I'd probably just do it within the if (read_backup_label()) block in > InitWalRecovery(), *before* the ReadCheckpointRecord(). Not enamored with the phrasing of the log messages, but here's a prototype: When starting up with backup_label present: LOG: starting from base backup with redo LSN A/34100028, checkpoint LSN A/34100080 on timeline ID 1 When restarting before reaching the end of the backup, but after backup_label has been removed: LOG: continuing to start from base backup with redo LSN A/34100028 LOG: entering standby mode LOG: redo starts at A/3954B958 Note that the LSN in the "continuing" case is the one the backup started at, not where recovery will start. I've wondered whether it's worth also adding an explicit message just after ReachedEndOfBackup(), but it seems far less urgent due to the existing "consistent recovery state reached at %X/%X" message. We are quite inconsistent about how we spell LSNs. Sometimes with LSN preceding, sometimes not. Sometimes with (LSN). Etc. > I do like the idea of expanding the "redo starts at" message > though. E.g. including minRecoveryLSN, ControlFile->backupStartPoint, > ControlFile->backupEndPoint would provide information about when the node > might become consistent. Playing around with this a bit, I'm wondering if we instead should remove that message, and emit something more informative earlier on. If there's a problem, you kinda want the information before we finally get to the loop in PerformWalLRecovery(). If e.g. there's no WAL you'll only get LOG: invalid checkpoint record PANIC: could not locate a valid checkpoint record which is delightfully lacking in details. There also are some other oddities: If the primary is down when starting up, and we'd need WAL from the primary for the first record, the "redo start at" message is delayed until that happens, because we emit the message not before we read the first record, but after. That's just plain odd. And sometimes we'll start referencing the LSN at which we are starting recovery before the "redo starts at" message. If e.g. we shut down at a restart point, we'll emit LOG: consistent recovery state reached at ... before LOG: redo starts at ... But that's all clearly just material for HEAD. Greetings, Andres Freund
Вложения
В списке pgsql-hackers по дате отправления: