Re: Add checks in pg_rewind to abort if backup_label file is present

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: Add checks in pg_rewind to abort if backup_label file is present
Дата
Msg-id 1f299164-f70b-499d-9863-f0e9406ace4a@iki.fi
обсуждение исходный текст
Ответ на Add checks in pg_rewind to abort if backup_label file is present  (Krishnakumar R <kksrcv001@gmail.com>)
Список pgsql-hackers
On 05/12/2023 19:36, Krishnakumar R wrote:
> Hi,
> 
> Please find the attached patch for $subject and associated test. Please review.

Thanks for picking up this long-standing TODO!

> +/*
> + * Check if a file is present using the connection to the
> + * database.
> + */
> +static bool
> +libpq_is_file_present(rewind_source *source, const char *path)
> +{
> +    PGconn       *conn = ((libpq_source *) source)->conn;
> +    PGresult   *res;
> +    const char *paramValues[1];
> +
> +    paramValues[0] = path;
> +    res = PQexecParams(conn, "SELECT pg_stat_file($1)",
> +                       1, NULL, paramValues, NULL, NULL, 1);
> +    if (PQresultStatus(res) != PGRES_TUPLES_OK)
> +        return false;
> +
> +    return true;
> +}

The backup_label file cannot be present when the server is running. No 
need to check for that when connected to a live server.

> --- a/src/bin/pg_rewind/pg_rewind.c
> +++ b/src/bin/pg_rewind/pg_rewind.c
> @@ -729,7 +729,11 @@ perform_rewind(filemap_t *filemap, rewind_source *source,
>  static void
>  sanityChecks(void)
>  {
> -    /* TODO Check that there's no backup_label in either cluster */
> +    if (source->is_file_present(source, "backup_label"))
> +        pg_fatal("The backup_label file is present in source cluster");
> +
> +    if (is_file_present(datadir_target, "backup_label"))
> +        pg_fatal("The backup_label file is present in target  cluster");
>  
>      /* Check system_identifier match */
>      if (ControlFile_target.system_identifier != ControlFile_source.system_identifier)

The error message isn't very user friendly. It's pretty dangerous 
actually: I think a lot of users would just delete the backup_label file 
when they see that message. Because then the file is no longer present 
and problem solved, right?

The point of checking for backup_label is that if it's present, the 
cluster wasn't really shut down cleanly. The correct fix is to start it, 
let WAL recovery finish, and shut it down again. The error message 
should make that clear. Perhaps make it similar to the existing "target 
server must be shut down cleanly" message.

I think today if you try to run pg_rewind on a cluster that was restored 
from a backup, so that backup_label is present, you get the "target 
server must be shut down cleanly" message. But we don't have any tests 
for it. We do have a test for when the server is still running, but not 
for the restored-from-backup case. Would be nice to add one.

Perhaps move the backup_label check later in sanityChecks(), after 
checking the state in the control file. That way, you still normally hit 
the "target server must be shut down cleanly" case, and the backup_label 
check would be just an additional "can't happen" sanity check.

In createBackupLabel() we have this:

>     /* TODO: move old file out of the way, if any. */
>     open_target_file("backup_label", true); /* BACKUP_LABEL_FILE */
>     write_target_range(buf, 0, len);
>     close_target_file();

That TODO comment needs to go away now. And we probably should complain 
if the file already exists. With your patch, we already checked earlier 
that it doesn't exists, so if it exists when we reach that code, 
something's gone wrong.

-- 
Heikki Linnakangas
Neon (https://neon.tech)




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: backtrace_on_internal_error
Следующее
От: Robert Haas
Дата:
Сообщение: Re: UBSan pointer overflow in xlogreader.c