Re: Requiring recovery.signal or standby.signal when recovering with a backup_label
От | Michael Paquier |
---|---|
Тема | Re: Requiring recovery.signal or standby.signal when recovering with a backup_label |
Дата | |
Msg-id | ZRPY9VwlNYygle4o@paquier.xyz обсуждение исходный текст |
Ответ на | Re: Requiring recovery.signal or standby.signal when recovering with a backup_label (Bowen Shi <zxwsbg12138@gmail.com>) |
Список | pgsql-hackers |
On Thu, Sep 21, 2023 at 11:45:06AM +0800, Bowen Shi wrote: > First I encountered the problem " FATAL: could not find > recovery.signal or standby.signal when recovering with backup_label ", > then I deleted the backup_label file and started the instance > successfully. Doing that is equal to corrupting your instance as recovery would begin from the latest redo LSN stored in the control file, but the physical relation files included in the backup may include blocks that require records that are needed before this redo LSN and the LSN stored in the backup_label. >> Delete a backup_label from a fresh base backup can easily lead to data >> corruption, as the startup process would pick up as LSN to start >> recovery from the control file rather than the backup_label file. >> This would happen if a checkpoint updates the redo LSN in the control >> file while a backup happens and the control file is copied after the >> checkpoint, for instance. If one wishes to deploy a new primary from >> a base backup, recovery.signal is the way to go, making sure that the >> new primary is bumped into a new timeline once recovery finishes, on >> top of making sure that the startup process starts recovery from a >> position where the cluster would be able to achieve a consistent >> state. And that's what I mean here. In more details. So, really, don't do that. > ereport(FATAL, > (errmsg("could not find redo location referenced by checkpoint record"), > errhint("If you are restoring from a backup, touch > \"%s/recovery.signal\" and add required recovery options.\n" > "If you are not restoring from a backup, try removing the file > \"%s/backup_label\".\n" > "Be careful: removing \"%s/backup_label\" will result in a corrupt > cluster if restoring from a backup.", > DataDir, DataDir, DataDir))); > > There are two similar error messages in xlogrecovery.c. Maybe we can > modify the error messages to be similar. The patch adds the following message, which is written this way to be consistent with the two others, already: + ereport(FATAL, + (errmsg("could not find recovery.signal or standby.signal when recovering with backup_label"), + errhint("If you are restoring from a backup, touch \"%s/recovery.signal\" or \"%s/standby.signal\" and addrequired recovery options.", + DataDir, DataDir))); But you have an interesting point here, why isn't standby.signal also mentioned in the two existing comments? Depending on what's wanted by the user this can be equally useful to report back. Attached is a slightly updated patch, where I have also removed the check on ArchiveRecoveryRequested because the FATAL generated for !ArchiveRecoveryRequested makes sure that it is useless after reading the backup_label file. This patch has been around for a few months now. Do others have opinions about the direction taken here to make the presence of recovery.signal or standby.signal a hard requirement when a backup_label file is found (HEAD only)? -- Michael
Вложения
В списке pgsql-hackers по дате отправления: