Re: Requiring recovery.signal or standby.signal when recovering with a backup_label

Поиск

Список

Период

Сортировка

От	Michael Paquier
Тема	Re: Requiring recovery.signal or standby.signal when recovering with a backup_label
Дата	27 сентября 2023 г. 07:25:41
Msg-id	ZRPY9VwlNYygle4o@paquier.xyz обсуждение исходный текст
Ответ на	Re: Requiring recovery.signal or standby.signal when recovering with a backup_label (Bowen Shi <zxwsbg12138@gmail.com>)
Список	pgsql-hackers

Дерево обсуждения

On Thu, Sep 21, 2023 at 11:45:06AM +0800, Bowen Shi wrote:
> First I encountered the problem " FATAL:  could not find
> recovery.signal or standby.signal when recovering with backup_label ",
> then I deleted the backup_label file and started the instance
> successfully.

Doing that is equal to corrupting your instance as recovery would
begin from the latest redo LSN stored in the control file, but the
physical relation files included in the backup may include blocks that
require records that are needed before this redo LSN and the LSN
stored in the backup_label.

>> Delete a backup_label from a fresh base backup can easily lead to data
>> corruption, as the startup process would pick up as LSN to start
>> recovery from the control file rather than the backup_label file.
>> This would happen if a checkpoint updates the redo LSN in the control
>> file while a backup happens and the control file is copied after the
>> checkpoint, for instance. If one wishes to deploy a new primary from
>> a base backup, recovery.signal is the way to go, making sure that the
>> new primary is bumped into a new timeline once recovery finishes, on
>> top of making sure that the startup process starts recovery from a
>> position where the cluster would be able to achieve a consistent
>> state.

And that's what I mean here.  In more details.  So, really, don't do
that.

> ereport(FATAL,
> (errmsg("could not find redo location referenced by checkpoint record"),
> errhint("If you are restoring from a backup, touch
> \"%s/recovery.signal\" and add required recovery options.\n"
> "If you are not restoring from a backup, try removing the file
> \"%s/backup_label\".\n"
> "Be careful: removing \"%s/backup_label\" will result in a corrupt
> cluster if restoring from a backup.",
> DataDir, DataDir, DataDir)));
>
> There are two similar error messages in xlogrecovery.c. Maybe we can
> modify the error messages to be similar.

The patch adds the following message, which is written this way to be
consistent with the two others, already:

+    ereport(FATAL,
+            (errmsg("could not find recovery.signal or standby.signal when recovering with backup_label"),
+             errhint("If you are restoring from a backup, touch \"%s/recovery.signal\" or \"%s/standby.signal\" and
addrequired recovery options.", 
+                     DataDir, DataDir)));

But you have an interesting point here, why isn't standby.signal also
mentioned in the two existing comments?  Depending on what's wanted by
the user this can be equally useful to report back.

Attached is a slightly updated patch, where I have also removed the
check on ArchiveRecoveryRequested because the FATAL generated for
!ArchiveRecoveryRequested makes sure that it is useless after reading
the backup_label file.

This patch has been around for a few months now.  Do others have
opinions about the direction taken here to make the presence of
recovery.signal or standby.signal a hard requirement when a
backup_label file is found (HEAD only)?
--
Michael

Вложения

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Requiring recovery.signal or standby.signal when recovering with a backup_label

Вложения