Обсуждение: Add system identifier to backup manifest

Поиск
Список
Период
Сортировка

Add system identifier to backup manifest

От
Amul Sul
Дата:
Hi All,

With the attached patch, the backup manifest will have a new key item as
"System-Identifier" 64-bit integer whose value is derived from pg_control while
generating it, and the manifest version bumps to 2.

This helps to identify the correct database server and/or backup for the
subsequent backup operations.  pg_verifybackup validates the manifest system
identifier against the backup control file and fails if they don’t match.
Similarly, pg_basebackup increment backup will fail if the manifest system
identifier does not match with the server system identifier.  The
pg_combinebackup is already a bit smarter -- checks the system identifier from
the pg_control of all the backups, with this patch the manifest system
identifier also validated. 

For backward compatibility, the manifest system identifier validation will be
skipped for version 1.

--
Regards,
Amul Sul
EDB: http://www.enterprisedb.com
Вложения

Re: Add system identifier to backup manifest

От
Alvaro Herrera
Дата:
On 2024-Jan-17, Amul Sul wrote:

> This helps to identify the correct database server and/or backup for the
> subsequent backup operations.  pg_verifybackup validates the manifest system
> identifier against the backup control file and fails if they don’t match.
> Similarly, pg_basebackup increment backup will fail if the manifest system
> identifier does not match with the server system identifier.  The
> pg_combinebackup is already a bit smarter -- checks the system identifier
> from
> the pg_control of all the backups, with this patch the manifest system
> identifier also validated.

Hmm, okay, but what if I take a full backup from a primary server and
later I want an incremental from a standby, or the other way around?
Will this prevent me from using such a combination?

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"You're _really_ hosed if the person doing the hiring doesn't understand
relational systems: you end up with a whole raft of programmers, none of
whom has had a Date with the clue stick."              (Andrew Sullivan)
https://postgr.es/m/20050809113420.GD2768@phlogiston.dyndns.org



Re: Add system identifier to backup manifest

От
Amul Sul
Дата:
On Wed, Jan 17, 2024 at 5:15 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On 2024-Jan-17, Amul Sul wrote:

> This helps to identify the correct database server and/or backup for the
> subsequent backup operations.  pg_verifybackup validates the manifest system
> identifier against the backup control file and fails if they don’t match.
> Similarly, pg_basebackup increment backup will fail if the manifest system
> identifier does not match with the server system identifier.  The
> pg_combinebackup is already a bit smarter -- checks the system identifier
> from
> the pg_control of all the backups, with this patch the manifest system
> identifier also validated.

Hmm, okay, but what if I take a full backup from a primary server and
later I want an incremental from a standby, or the other way around?
Will this prevent me from using such a combination?
 
Yes, that worked for me where the system identifier was the same on
master as well standby.

Regards,
Amul

Re: Add system identifier to backup manifest

От
Robert Haas
Дата:
On Wed, Jan 17, 2024 at 6:45 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> Hmm, okay, but what if I take a full backup from a primary server and
> later I want an incremental from a standby, or the other way around?
> Will this prevent me from using such a combination?

The system identifier had BETTER match in such cases. If it doesn't,
somebody's run pg_resetwal on your standby since it was created... and
in that case, no incremental backup for you!

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Add system identifier to backup manifest

От
Robert Haas
Дата:
On Wed, Jan 17, 2024 at 6:31 AM Amul Sul <sulamul@gmail.com> wrote:
> With the attached patch, the backup manifest will have a new key item as
> "System-Identifier" 64-bit integer whose value is derived from pg_control while
> generating it, and the manifest version bumps to 2.
>
> This helps to identify the correct database server and/or backup for the
> subsequent backup operations.  pg_verifybackup validates the manifest system
> identifier against the backup control file and fails if they don’t match.
> Similarly, pg_basebackup increment backup will fail if the manifest system
> identifier does not match with the server system identifier.  The
> pg_combinebackup is already a bit smarter -- checks the system identifier from
> the pg_control of all the backups, with this patch the manifest system
> identifier also validated.

Thanks for working on this. Without this, I think what happens is that
you can potentially take an incremental backup from the "wrong"
server, if the states of the systems are such that all of the other
sanity checks pass. When you run pg_combinebackup, it'll discover the
problem and tell you, but you ideally want to discover such errors at
backup time rather than at restore time. This addresses that. And,
overall, I think it's a pretty good patch. But I nonetheless have a
bunch of comments.

-      The associated value is always the integer 1.
+      The associated value is the integer, either 1 or 2.

is an integer. Beginning in <productname>PostgreSQL</productname> 17,
it is 2; in older versions, it is 1.

+ context.identity_cb = manifest_process_identity;

I'm not really on board with calling the system identifier "identity"
throughout the patch. I think it should just say system_identifier. If
we were going to abbreviate, I'd prefer something like "sysident" that
looks like it's derived from "system identifier" rather than
"identity" which is a different word altogether. But I don't think we
should abbreviate unless doing so creates *ridiculously* long
identifier names.

+static void
+manifest_process_identity(JsonManifestParseContext *context,
+   int manifest_version,
+   uint64 manifest_system_identifier)
+{
+ uint64 system_identifier;
+
+ /* Manifest system identifier available in version 2 or later */
+ if (manifest_version == 1)
+ return;

I think you've got the wrong idea here. I think this function would
only get called if System-Identifier is present in the manifest, so if
it's a v1 manifest, this would never get called, so this if-statement
would not ever do anything useful. I think what you should do is (1)
if the client supplies a v1 manifest, reject it, because surely that's
from an older server version that doesn't support incremental backup;
but do that when the version is parsed rather than here; and (2) also
detect and reject the case when it's supposedly a v2 manifest but this
is absent.

(1) should really be done when the version number is parsed, so I
suspect you may need to add manifest_version_cb.

+static void
+combinebackup_identity_cb(JsonManifestParseContext *context,
+   int manifest_version,
+   uint64 manifest_system_identifier)
+{
+ parser_context *private_context = context->private_data;
+ uint64 system_identifier = private_context->system_identifier;
+
+ /* Manifest system identifier available in version 2 or later */
+ if (manifest_version == 1)
+ return;

Very similar to the above case. Just reject a version 1 manifest as
soon as we see the version number. In this function, set a flag
indicating we saw the system identifier; if at the end of parsing that
flag is not set, kaboom.

- parse_manifest_file(manifest_path, &context.ht, &first_wal_range);
+ parse_manifest_file(manifest_path, &context.ht, &first_wal_range,
+ context.backup_directory);

Don't do this! parse_manifest_file() should just record everything
found in the manifest in the context object. Syntax validation should
happen while parsing the manifest (e.g. "CAT/DOG" is not a valid LSN
and we should reject that at this stage) but semantic validation
should happen later (e.g. "0/0" can't be a the correct backup end LSN
but we don't figure that out while parsing, but rather later). I think
you should actually move validation of the system identifier to the
point where the directory walk encounters the control file (and update
the docs and tests to match that decision). Imagine if you wanted to
validate a tar-format backup; then you wouldn't have random access to
the directory. You'd see the manifest file first, and then all the
files in a random order, with one chance to look at each one.

(This is, in fact, a feature I think we should implement.)

- if (strcmp(token, "1") != 0)
+ parse->manifest_version = atoi(token);
+ if (parse->manifest_version != 1 && parse->manifest_version != 2)
  json_manifest_parse_failure(parse->context,
  "unexpected manifest version");

Please either (a) don't do a string-to-integer conversion and just
strcmp() twice or (b) use strtol so that you can check that it
succeeded. I don't want to accept manifest version 1a as 1.

+/*
+ * Validate manifest system identifier against the database server system
+ * identifier.
+ */

This comment assumes you know what the callback is going to do, but
you don't. This should be more like the comment for
json_manifest_finalize_file or json_manifest_finalize_wal_range.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Add system identifier to backup manifest

От
Sravan Kumar
Дата:
I have also done a review of the patch and some testing. The patch looks good, and I agree with Robert's comments.

On Wed, Jan 17, 2024 at 8:40 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Wed, Jan 17, 2024 at 6:31 AM Amul Sul <sulamul@gmail.com> wrote:
> > With the attached patch, the backup manifest will have a new key item as
> > "System-Identifier" 64-bit integer whose value is derived from pg_control while
> > generating it, and the manifest version bumps to 2.
> >
> > This helps to identify the correct database server and/or backup for the
> > subsequent backup operations.  pg_verifybackup validates the manifest system
> > identifier against the backup control file and fails if they don’t match.
> > Similarly, pg_basebackup increment backup will fail if the manifest system
> > identifier does not match with the server system identifier.  The
> > pg_combinebackup is already a bit smarter -- checks the system identifier from
> > the pg_control of all the backups, with this patch the manifest system
> > identifier also validated.
>
> Thanks for working on this. Without this, I think what happens is that
> you can potentially take an incremental backup from the "wrong"
> server, if the states of the systems are such that all of the other
> sanity checks pass. When you run pg_combinebackup, it'll discover the
> problem and tell you, but you ideally want to discover such errors at
> backup time rather than at restore time. This addresses that. And,
> overall, I think it's a pretty good patch. But I nonetheless have a
> bunch of comments.
>
> -      The associated value is always the integer 1.
> +      The associated value is the integer, either 1 or 2.
>
> is an integer. Beginning in <productname>PostgreSQL</productname> 17,
> it is 2; in older versions, it is 1.
>
> + context.identity_cb = manifest_process_identity;
>
> I'm not really on board with calling the system identifier "identity"
> throughout the patch. I think it should just say system_identifier. If
> we were going to abbreviate, I'd prefer something like "sysident" that
> looks like it's derived from "system identifier" rather than
> "identity" which is a different word altogether. But I don't think we
> should abbreviate unless doing so creates *ridiculously* long
> identifier names.
>
> +static void
> +manifest_process_identity(JsonManifestParseContext *context,
> +   int manifest_version,
> +   uint64 manifest_system_identifier)
> +{
> + uint64 system_identifier;
> +
> + /* Manifest system identifier available in version 2 or later */
> + if (manifest_version == 1)
> + return;
>
> I think you've got the wrong idea here. I think this function would
> only get called if System-Identifier is present in the manifest, so if
> it's a v1 manifest, this would never get called, so this if-statement
> would not ever do anything useful. I think what you should do is (1)
> if the client supplies a v1 manifest, reject it, because surely that's
> from an older server version that doesn't support incremental backup;
> but do that when the version is parsed rather than here; and (2) also
> detect and reject the case when it's supposedly a v2 manifest but this
> is absent.
>
> (1) should really be done when the version number is parsed, so I
> suspect you may need to add manifest_version_cb.
>
> +static void
> +combinebackup_identity_cb(JsonManifestParseContext *context,
> +   int manifest_version,
> +   uint64 manifest_system_identifier)
> +{
> + parser_context *private_context = context->private_data;
> + uint64 system_identifier = private_context->system_identifier;
> +
> + /* Manifest system identifier available in version 2 or later */
> + if (manifest_version == 1)
> + return;
>
> Very similar to the above case. Just reject a version 1 manifest as
> soon as we see the version number. In this function, set a flag
> indicating we saw the system identifier; if at the end of parsing that
> flag is not set, kaboom.
>
> - parse_manifest_file(manifest_path, &context.ht, &first_wal_range);
> + parse_manifest_file(manifest_path, &context.ht, &first_wal_range,
> + context.backup_directory);
>
> Don't do this! parse_manifest_file() should just record everything
> found in the manifest in the context object. Syntax validation should
> happen while parsing the manifest (e.g. "CAT/DOG" is not a valid LSN
> and we should reject that at this stage) but semantic validation
> should happen later (e.g. "0/0" can't be a the correct backup end LSN
> but we don't figure that out while parsing, but rather later). I think
> you should actually move validation of the system identifier to the
> point where the directory walk encounters the control file (and update
> the docs and tests to match that decision). Imagine if you wanted to
> validate a tar-format backup; then you wouldn't have random access to
> the directory. You'd see the manifest file first, and then all the
> files in a random order, with one chance to look at each one.
>
> (This is, in fact, a feature I think we should implement.)
>
> - if (strcmp(token, "1") != 0)
> + parse->manifest_version = atoi(token);
> + if (parse->manifest_version != 1 && parse->manifest_version != 2)
>   json_manifest_parse_failure(parse->context,
>   "unexpected manifest version");
>
> Please either (a) don't do a string-to-integer conversion and just
> strcmp() twice or (b) use strtol so that you can check that it
> succeeded. I don't want to accept manifest version 1a as 1.
>
> +/*
> + * Validate manifest system identifier against the database server system
> + * identifier.
> + */
>
> This comment assumes you know what the callback is going to do, but
> you don't. This should be more like the comment for
> json_manifest_finalize_file or json_manifest_finalize_wal_range.

This comment caught me off-guard too. After some testing and detailed review I found that this is
called by pg_verifybackup and pg_combinebackup both of which do not validate against any
running database system.
 
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com
>
>


--
Thanks & Regards,
Sravan Velagandula
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Add system identifier to backup manifest

От
Michael Paquier
Дата:
On Wed, Jan 17, 2024 at 08:46:09AM -0500, Robert Haas wrote:
> On Wed, Jan 17, 2024 at 6:45 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>> Hmm, okay, but what if I take a full backup from a primary server and
>> later I want an incremental from a standby, or the other way around?
>> Will this prevent me from using such a combination?
>
> The system identifier had BETTER match in such cases. If it doesn't,
> somebody's run pg_resetwal on your standby since it was created... and
> in that case, no incremental backup for you!

There is an even stronger check than that at replay as we also store
the system identifier in XLogLongPageHeaderData and cross-check it
with the contents of the control file.  Having a field in the backup
manifest makes for a much faster detection, even if that's not the
same as replaying things, it can avoid a lot of problems when
combining backup pieces.  I'm +1 for Amul's patch concept.
--
Michael

Вложения

Re: Add system identifier to backup manifest

От
Amul Sul
Дата:
On Wed, Jan 17, 2024 at 8:40 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Jan 17, 2024 at 6:31 AM Amul Sul <sulamul@gmail.com> wrote:
> With the attached patch, the backup manifest will have a new key item as
> "System-Identifier" 64-bit integer whose value is derived from pg_control while
> generating it, and the manifest version bumps to 2.
>
> This helps to identify the correct database server and/or backup for the
> subsequent backup operations.  pg_verifybackup validates the manifest system
> identifier against the backup control file and fails if they don’t match.
> Similarly, pg_basebackup increment backup will fail if the manifest system
> identifier does not match with the server system identifier.  The
> pg_combinebackup is already a bit smarter -- checks the system identifier from
> the pg_control of all the backups, with this patch the manifest system
> identifier also validated.

Thanks for working on this. Without this, I think what happens is that
you can potentially take an incremental backup from the "wrong"
server, if the states of the systems are such that all of the other
sanity checks pass. When you run pg_combinebackup, it'll discover the
problem and tell you, but you ideally want to discover such errors at
backup time rather than at restore time. This addresses that. And,
overall, I think it's a pretty good patch. But I nonetheless have a
bunch of comments.

Thank you for the review.
 

-      The associated value is always the integer 1.
+      The associated value is the integer, either 1 or 2.

is an integer. Beginning in <productname>PostgreSQL</productname> 17,
it is 2; in older versions, it is 1.

Ok, 


+ context.identity_cb = manifest_process_identity;

I'm not really on board with calling the system identifier "identity"
throughout the patch. I think it should just say system_identifier. If
we were going to abbreviate, I'd prefer something like "sysident" that
looks like it's derived from "system identifier" rather than
"identity" which is a different word altogether. But I don't think we
should abbreviate unless doing so creates *ridiculously* long
identifier names.

Ok, used "system identifier" at all the places.  
 
+static void
+manifest_process_identity(JsonManifestParseContext *context,
+   int manifest_version,
+   uint64 manifest_system_identifier)
+{
+ uint64 system_identifier;
+
+ /* Manifest system identifier available in version 2 or later */
+ if (manifest_version == 1)
+ return;

I think you've got the wrong idea here. I think this function would
only get called if System-Identifier is present in the manifest, so if
it's a v1 manifest, this would never get called, so this if-statement
would not ever do anything useful. I think what you should do is (1)
if the client supplies a v1 manifest, reject it, because surely that's
from an older server version that doesn't support incremental backup;
but do that when the version is parsed rather than here; and (2) also
detect and reject the case when it's supposedly a v2 manifest but this
is absent.

(1) should really be done when the version number is parsed, so I
suspect you may need to add manifest_version_cb.

+static void
+combinebackup_identity_cb(JsonManifestParseContext *context,
+   int manifest_version,
+   uint64 manifest_system_identifier)
+{
+ parser_context *private_context = context->private_data;
+ uint64 system_identifier = private_context->system_identifier;
+
+ /* Manifest system identifier available in version 2 or later */
+ if (manifest_version == 1)
+ return;

Very similar to the above case. Just reject a version 1 manifest as
soon as we see the version number. In this function, set a flag
indicating we saw the system identifier; if at the end of parsing that
flag is not set, kaboom.

Ok, I added a version_cb callback. Using this pg_combinebackup & pg_basebackup
will report an error for manifest version 1, whereas pg_verifybackup
doesn't (not needed IIUC).
 

- parse_manifest_file(manifest_path, &context.ht, &first_wal_range);
+ parse_manifest_file(manifest_path, &context.ht, &first_wal_range,
+ context.backup_directory);

Don't do this! parse_manifest_file() should just record everything
found in the manifest in the context object. Syntax validation should
happen while parsing the manifest (e.g. "CAT/DOG" is not a valid LSN
and we should reject that at this stage) but semantic validation
should happen later (e.g. "0/0" can't be a the correct backup end LSN
but we don't figure that out while parsing, but rather later). I think
you should actually move validation of the system identifier to the
point where the directory walk encounters the control file (and update
the docs and tests to match that decision). Imagine if you wanted to
validate a tar-format backup; then you wouldn't have random access to
the directory. You'd see the manifest file first, and then all the
files in a random order, with one chance to look at each one.


Agree.  I have moved the system identifier validation after manifest parsing.
But, not in the directory walkthrough since in pg_combinebackup, we don't
really needed to open the pg_control file to get the system identifier, which we
have from the check_control_files(). 
 
(This is, in fact, a feature I think we should implement.)

- if (strcmp(token, "1") != 0)
+ parse->manifest_version = atoi(token);
+ if (parse->manifest_version != 1 && parse->manifest_version != 2)
  json_manifest_parse_failure(parse->context,
  "unexpected manifest version");

Please either (a) don't do a string-to-integer conversion and just
strcmp() twice or (b) use strtol so that you can check that it
succeeded. I don't want to accept manifest version 1a as 1.
 
Understood, corrected in the attached version.  
 
+/*
+ * Validate manifest system identifier against the database server system
+ * identifier.
+ */

This comment assumes you know what the callback is going to do, but
you don't. This should be more like the comment for
json_manifest_finalize_file or json_manifest_finalize_wal_range.
 
Ok.

Updated version is attached.

Regards,
Amul
Вложения

Re: Add system identifier to backup manifest

От
Amul Sul
Дата:


On Thu, Jan 18, 2024 at 6:39 AM Sravan Kumar <sravanvcybage@gmail.com> wrote:
I have also done a review of the patch and some testing. The patch looks good, and I agree with Robert's comments.

Thank you for your review, testing and the offline discussion.  

Regards,
Amul

Re: Add system identifier to backup manifest

От
Amul Sul
Дата:


On Fri, Jan 19, 2024 at 10:36 PM Amul Sul <sulamul@gmail.com> wrote:
On Wed, Jan 17, 2024 at 8:40 PM Robert Haas <robertmhaas@gmail.com> wrote:


Updated version is attached.

Another updated version attached -- fix missing manifest version check in
pg_verifybackup before system identifier validation.

Regards,
Amul
Вложения

Re: Add system identifier to backup manifest

От
Amul Sul
Дата:


On Mon, Jan 22, 2024 at 10:08 AM Amul Sul <sulamul@gmail.com> wrote:


On Fri, Jan 19, 2024 at 10:36 PM Amul Sul <sulamul@gmail.com> wrote:
On Wed, Jan 17, 2024 at 8:40 PM Robert Haas <robertmhaas@gmail.com> wrote:


Updated version is attached.

Another updated version attached -- fix missing manifest version check in
pg_verifybackup before system identifier validation.

Thinking a bit more on this, I realized parse_manifest_file() has many out
parameters. Instead parse_manifest_file() should simply return manifest data
like load_backup_manifest().  Attached 0001 patch doing the same, and removed
parser_context structure, and added manifest_data, and did the required
adjustments to pg_verifybackup code.

Regards,
Amul

Вложения

Re: Add system identifier to backup manifest

От
Robert Haas
Дата:
On Mon, Jan 22, 2024 at 2:22 AM Amul Sul <sulamul@gmail.com> wrote:
> Thinking a bit more on this, I realized parse_manifest_file() has many out
> parameters. Instead parse_manifest_file() should simply return manifest data
> like load_backup_manifest().  Attached 0001 patch doing the same, and removed
> parser_context structure, and added manifest_data, and did the required
> adjustments to pg_verifybackup code.

        InitializeBackupManifest(&manifest, opt->manifest,
-
opt->manifest_checksum_type);
+
opt->manifest_checksum_type,
+                                                        GetSystemIdentifier());

InitializeBackupManifest() can just call GetSystemIdentifier() itself,
instead of passing another parameter, I think.

+       if (manifest_version == 1)
+               context->error_cb(context,
+                                                 "%s: backup manifest
doesn't support incremental changes",
+
private_context->backup_directory);

I think this is weird. First, there doesn't seem to be any reason to
bounce through error_cb() here. You could just call pg_fatal(), as we
do elsewhere in this file. Second, there doesn't seem to be any need
to include the backup directory in this error message. We include the
file name (not the directory name) in errors that pertain to the file
itself, like if we can't open or read it. But we don't do that for
semantic errors about the manifest contents (cf.
combinebackup_per_file_cb). This file would need a lot fewer charges
if you didn't feed the backup directory name through here. Third, the
error message is not well-chosen, because a user who looks at it won't
understand WHY the manifest doesn't support incremental changes. I
suggest "backup manifest version 1 does not support incremental
backup".

+       /* Incremental backups supported on manifest version 2 or later */
+       if (manifest_version == 1)
+               context->error_cb(context,
+                                                 "incremental backups
cannot be taken for this backup");

Let's use the same error message as in the previous case here also.

+       for (i = 0; i < n_backups; i++)
+       {
+               if (manifests[i]->system_identifier != system_identifier)
+               {
+                       char       *controlpath;
+
+                       controlpath = psprintf("%s/%s",
prior_backup_dirs[i], "global/pg_control");
+
+                       pg_fatal("manifest is from different database
system: manifest database system identifier is %llu, %s system
identifier is %llu",
+                                        (unsigned long long)
manifests[i]->system_identifier,
+                                        controlpath,
+                                        (unsigned long long)
system_identifier);
+               }
+       }

check_control_files() already verifies that all of the control files
contain the same system identifier as each other, so what we're really
checking here is that the backup manifest in each directory has the
same system identifier as the control file in that same directory. One
problem is that backup manifests are optional here, as per the comment
in load_backup_manifests(), so you need to skip over NULL entries
cleanly to avoid seg faulting if one is missing. I also think the
error message should be changed. How about "%s: manifest system
identifier is %llu, but control file has %llu"?

+               context->error_cb(context,
+                                                 "manifest is from
different database system: manifest database system identifier is
%llu, pg_control database system identifier is %llu",
+                                                 (unsigned long long)
manifest_system_identifier,
+                                                 (unsigned long long)
system_identifier);

And here, while I'm kibitzing, how about "manifest system identifier
is %llu, but this system's identifier is %llu"?

-       qr/could not open directory/,
+       qr/could not open file/,

I don't think that the expected error message here should be changing.
Does it really, with the latest patch version? Why? Can we fix that?

+                       else if (!parse->saw_system_identifier_field &&
+
strcmp(parse->manifest_version, "1") != 0)

I don't think this has any business testing the manifest version.
That's something to sort out at some later stage.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Add system identifier to backup manifest

От
Amul Sul
Дата:
On Wed, Jan 24, 2024 at 10:53 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Jan 22, 2024 at 2:22 AM Amul Sul <sulamul@gmail.com> wrote:
> Thinking a bit more on this, I realized parse_manifest_file() has many out
> parameters. Instead parse_manifest_file() should simply return manifest data
> like load_backup_manifest().  Attached 0001 patch doing the same, and removed
> parser_context structure, and added manifest_data, and did the required
> adjustments to pg_verifybackup code.

        InitializeBackupManifest(&manifest, opt->manifest,
-
opt->manifest_checksum_type);
+
opt->manifest_checksum_type,
+                                                        GetSystemIdentifier());

InitializeBackupManifest() can just call GetSystemIdentifier() itself,
instead of passing another parameter, I think.

Ok.
 

+       if (manifest_version == 1)
+               context->error_cb(context,
+                                                 "%s: backup manifest
doesn't support incremental changes",
+
private_context->backup_directory);

I think this is weird. First, there doesn't seem to be any reason to
bounce through error_cb() here. You could just call pg_fatal(), as we
do elsewhere in this file. Second, there doesn't seem to be any need
to include the backup directory in this error message. We include the
file name (not the directory name) in errors that pertain to the file
itself, like if we can't open or read it. But we don't do that for
semantic errors about the manifest contents (cf.
combinebackup_per_file_cb). This file would need a lot fewer charges
if you didn't feed the backup directory name through here. Third, the
error message is not well-chosen, because a user who looks at it won't
understand WHY the manifest doesn't support incremental changes. I
suggest "backup manifest version 1 does not support incremental
backup".

+       /* Incremental backups supported on manifest version 2 or later */
+       if (manifest_version == 1)
+               context->error_cb(context,
+                                                 "incremental backups
cannot be taken for this backup");

Let's use the same error message as in the previous case here also.

Ok.


+       for (i = 0; i < n_backups; i++)
+       {
+               if (manifests[i]->system_identifier != system_identifier)
+               {
+                       char       *controlpath;
+
+                       controlpath = psprintf("%s/%s",
prior_backup_dirs[i], "global/pg_control");
+
+                       pg_fatal("manifest is from different database
system: manifest database system identifier is %llu, %s system
identifier is %llu",
+                                        (unsigned long long)
manifests[i]->system_identifier,
+                                        controlpath,
+                                        (unsigned long long)
system_identifier);
+               }
+       }

check_control_files() already verifies that all of the control files
contain the same system identifier as each other, so what we're really
checking here is that the backup manifest in each directory has the
same system identifier as the control file in that same directory. One
problem is that backup manifests are optional here, as per the comment
in load_backup_manifests(), so you need to skip over NULL entries
cleanly to avoid seg faulting if one is missing. I also think the
error message should be changed. How about "%s: manifest system
identifier is %llu, but control file has %llu"?

Ok.


+               context->error_cb(context,
+                                                 "manifest is from
different database system: manifest database system identifier is
%llu, pg_control database system identifier is %llu",
+                                                 (unsigned long long)
manifest_system_identifier,
+                                                 (unsigned long long)
system_identifier);

And here, while I'm kibitzing, how about "manifest system identifier
is %llu, but this system's identifier is %llu"?

I used "database system identifier" instead of "this system's identifier " like
we are using in WalReceiverMain() and libpqrcv_identify_system().


-       qr/could not open directory/,
+       qr/could not open file/,

I don't think that the expected error message here should be changing.
Does it really, with the latest patch version? Why? Can we fix that?
 
Because, we were trying to access pg_control to check the system identifier
before any other backup directory/file validation.


+                       else if (!parse->saw_system_identifier_field &&
+
strcmp(parse->manifest_version, "1") != 0)

I don't think this has any business testing the manifest version.
That's something to sort out at some later stage.

That is for backward compatibility, otherwise, we would have an "expected
system identifier" error for manifest version 1.

Thank you for the review-comments, updated version attached.

Regards,
Amul
Вложения

Re: Add system identifier to backup manifest

От
Robert Haas
Дата:
On Thu, Jan 25, 2024 at 2:52 AM Amul Sul <sulamul@gmail.com> wrote:
> Thank you for the review-comments, updated version attached.

I generally agree with 0001. I spent a long time thinking about your
decision to make verifier_context contain a pointer to manifest_data
instead of, as it does currently, a pointer to manifest_files_hash. I
don't think that's a horrible idea, but it also doesn't seem to be
used anywhere currently. One advantage of the current approach is that
we know that none of the code downstream of verify_backup_directory()
or verify_backup_checksums() actually cares about anything other than
the manifest_files_hash. That's kind of nice. If we didn't change this
as you have done here, then we would need to continue passing the WAL
ranges to parse_required_walI() and the system identifier would have
to be passed explicitly to the code that checks the system identifier,
but that's not such a bad thing, either. It makes it clear which
functions are using which information.

But before you go change anything there, exactly when should 0002 be
checking the system identifier in the control file? What happens now
is that we first walk over the directory tree and make sure we have
the files (verify_backup_directory) and then go through and verify
checksums in a second pass (verify_backup_checksums). We do this
because it lets us report problems that can be detected cheaply --
like missing files -- relatively quickly, and problems that are more
expensive to detect -- like mismatching checksums -- only after we've
reported all the cheap-to-detect problems. At what stage should we
verify the control file? I don't really like verifying it first, as
you've done, because I think the error message change in
004_options.pl is a clear regression. When the whole directory is
missing, it's much more pleasant to complain about the directory being
missing than some file inside the directory being missing.

What I'd be inclined to suggest is that you have verify_backup_file()
notice when the file it's being asked to verify is the control file,
and have it check the system identifier at that stage. I think if you
do that, then the error message change in 004_options.pl goes away.
Now, to do that, you'd need to have the whole manifest_data available
from the context, not just the manifest_files_hash, so that you can
see the expected system identifier. And, interestingly, if you take
this approach, then it appears to me that 0001 is correct as-is and
doesn't need any changes.

Thoughts?

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Add system identifier to backup manifest

От
Amul Sul
Дата:
On Thu, Feb 1, 2024 at 3:06 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Jan 25, 2024 at 2:52 AM Amul Sul <sulamul@gmail.com> wrote:
> Thank you for the review-comments, updated version attached.

I generally agree with 0001. I spent a long time thinking about your
decision to make verifier_context contain a pointer to manifest_data
instead of, as it does currently, a pointer to manifest_files_hash. I
don't think that's a horrible idea, but it also doesn't seem to be
used anywhere currently. One advantage of the current approach is that
we know that none of the code downstream of verify_backup_directory()
or verify_backup_checksums() actually cares about anything other than
the manifest_files_hash. That's kind of nice. If we didn't change this
as you have done here, then we would need to continue passing the WAL
ranges to parse_required_walI() and the system identifier would have
to be passed explicitly to the code that checks the system identifier,
but that's not such a bad thing, either. It makes it clear which
functions are using which information.
 
I intended to minimize the out param of parse_manifest_file(), which currently
returns manifest_files_hash and manifest_wal_range, and I need two more --
manifest versions and the system identifier.

But before you go change anything there, exactly when should 0002 be
checking the system identifier in the control file? What happens now
is that we first walk over the directory tree and make sure we have
the files (verify_backup_directory) and then go through and verify
checksums in a second pass (verify_backup_checksums). We do this
because it lets us report problems that can be detected cheaply --
like missing files -- relatively quickly, and problems that are more
expensive to detect -- like mismatching checksums -- only after we've
reported all the cheap-to-detect problems. At what stage should we
verify the control file? I don't really like verifying it first, as
you've done, because I think the error message change in
004_options.pl is a clear regression. When the whole directory is
missing, it's much more pleasant to complain about the directory being
missing than some file inside the directory being missing.

What I'd be inclined to suggest is that you have verify_backup_file()
notice when the file it's being asked to verify is the control file,
and have it check the system identifier at that stage. I think if you
do that, then the error message change in 004_options.pl goes away.
Now, to do that, you'd need to have the whole manifest_data available
from the context, not just the manifest_files_hash, so that you can
see the expected system identifier. And, interestingly, if you take
this approach, then it appears to me that 0001 is correct as-is and
doesn't need any changes.
 
Yeah, we can do that, but I think it is a bit inefficient to have strcmp()
check for the pg_control file on each verify_backup_file() call, despite, we
know that path.  Also, I think, we need additional handling to ensure that the
system identifier has been verified in verify_backup_file(), what if the
pg_control file itself missing from the backup -- might be a rare case, but
possible.

For now, we can do the system identifier validation after
verify_backup_directory().

Regards,
Amul

Re: Add system identifier to backup manifest

От
Robert Haas
Дата:
On Thu, Feb 1, 2024 at 2:18 AM Amul Sul <sulamul@gmail.com> wrote:
> I intended to minimize the out param of parse_manifest_file(), which currently
> returns manifest_files_hash and manifest_wal_range, and I need two more --
> manifest versions and the system identifier.

Sure, but you could do context.ht = manifest_data->files instead of
context.manifest = manifest_data. The question isn't whether you
should return the whole manifest_data from parse_manifest_file -- I
agree with that decision -- but rather whether you should feed the
whole thing through into the context, or just the file hash.

> Yeah, we can do that, but I think it is a bit inefficient to have strcmp()
> check for the pg_control file on each verify_backup_file() call, despite, we
> know that path.  Also, I think, we need additional handling to ensure that the
> system identifier has been verified in verify_backup_file(), what if the
> pg_control file itself missing from the backup -- might be a rare case, but
> possible.
>
> For now, we can do the system identifier validation after
> verify_backup_directory().

Yes, that's another option, but I don't think it's as good.

Suppose you do it that way. Then what will happen when the file is
altogether missing or inaccessible? I think verify_backup_file() will
complain, and then you'll have to do something ugly to avoid having
verify_system_identifier() emit the same complaint all over again.
Remember, unless you find some complicated way of passing data around,
it won't know whether verify_backup_file() emitted a warning or not --
it can redo the stat() and see what happens, but it's not absolutely
guaranteed to be the same as what happened before. Making sure that
you always emit any given complaint once rather than twice or zero
times is going to be tricky.

It seems more natural to me to just accept the (small) cost of a
strcmp() in verify_backup_file(). If the initial stat() fails, it
emits whatever complaint is appropriate and returns and the logic to
check the system identifier is never reached. If it succeeds, you can
proceed to try to open the file and do what you need to do.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Add system identifier to backup manifest

От
Amul Sul
Дата:
On Fri, Feb 2, 2024 at 12:03 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Feb 1, 2024 at 2:18 AM Amul Sul <sulamul@gmail.com> wrote:
> I intended to minimize the out param of parse_manifest_file(), which currently
> returns manifest_files_hash and manifest_wal_range, and I need two more --
> manifest versions and the system identifier.

Sure, but you could do context.ht = manifest_data->files instead of
context.manifest = manifest_data. The question isn't whether you
should return the whole manifest_data from parse_manifest_file -- I
agree with that decision -- but rather whether you should feed the
whole thing through into the context, or just the file hash.

> Yeah, we can do that, but I think it is a bit inefficient to have strcmp()
> check for the pg_control file on each verify_backup_file() call, despite, we
> know that path.  Also, I think, we need additional handling to ensure that the
> system identifier has been verified in verify_backup_file(), what if the
> pg_control file itself missing from the backup -- might be a rare case, but
> possible.
>
> For now, we can do the system identifier validation after
> verify_backup_directory().

Yes, that's another option, but I don't think it's as good.

Suppose you do it that way. Then what will happen when the file is
altogether missing or inaccessible? I think verify_backup_file() will
complain, and then you'll have to do something ugly to avoid having
verify_system_identifier() emit the same complaint all over again.
Remember, unless you find some complicated way of passing data around,
it won't know whether verify_backup_file() emitted a warning or not --
it can redo the stat() and see what happens, but it's not absolutely
guaranteed to be the same as what happened before. Making sure that
you always emit any given complaint once rather than twice or zero
times is going to be tricky.

It seems more natural to me to just accept the (small) cost of a
strcmp() in verify_backup_file(). If the initial stat() fails, it
emits whatever complaint is appropriate and returns and the logic to
check the system identifier is never reached. If it succeeds, you can
proceed to try to open the file and do what you need to do.
 
Ok, I did that way in the attached version, I have passed the control file's
full path as a second argument to verify_system_identifier() what we gets in
verify_backup_file(), but that is not doing any useful things with it, since we
were using get_controlfile() to open the control file, which takes the
directory as an input and computes the full path on its own.

Regards,
Amul

Вложения

Re: Add system identifier to backup manifest

От
Michael Paquier
Дата:
On Wed, Feb 14, 2024 at 12:29:07PM +0530, Amul Sul wrote:
> Ok, I did that way in the attached version, I have passed the control file's
> full path as a second argument to verify_system_identifier() what we gets in
> verify_backup_file(), but that is not doing any useful things with it,
> since we
> were using get_controlfile() to open the control file, which takes the
> directory as an input and computes the full path on its own.

I've read through the patch, and that's pretty cool.

-static void
-parse_manifest_file(char *manifest_path, manifest_files_hash **ht_p,
-                    manifest_wal_range **first_wal_range_p)
+static manifest_data *
+parse_manifest_file(char *manifest_path)

In 0001, should the comment describing this routine be updated as
well?

+   identifier with pg_control of the backup directory or fails verification

This is missing a <filename> markup here.

+      <productname>PostgreSQL</productname> 17, it is 2; in older versions,
+      it is 1.

Perhaps a couple of <literal>s here.

+    if (strcmp(parse->manifest_version, "1") != 0 &&
+        strcmp(parse->manifest_version, "2") != 0)
+        json_manifest_parse_failure(parse->context,
+                                    "unexpected manifest version");
+
+    /* Parse version. */
+    version = strtoi64(parse->manifest_version, &ep, 10);
+    if (*ep)
+        json_manifest_parse_failure(parse->context,
+                                    "manifest version not an integer");
+
+    /* Invoke the callback for version */
+    context->version_cb(context, version);

Shouldn't these two checks be reversed?  And is there actually a need
for the first check at all knowing that the version callback should be
in charge of performing the validation vased on the version received?

+my $node2;
+{
+    local $ENV{'INITDB_TEMPLATE'} = undef;

Not sure that it is a good idea to duplicate this pattern twice.
Shouldn't this be something we'd want to control with an option in the
init() method instead?

+static void
+verify_system_identifier(verifier_context *context, char *controlpath)

Relying both on controlpath, being a full path to the control file
including the data directory, and context->backup_directory to read
the contents of the control file looks a bit weird.  Wouldn't it be
cleaner to just use one of them?
--
Michael

Вложения

Re: Add system identifier to backup manifest

От
Amul Sul
Дата:
On Thu, Feb 15, 2024 at 7:18 AM Michael Paquier <michael@paquier.xyz> wrote:
On Wed, Feb 14, 2024 at 12:29:07PM +0530, Amul Sul wrote:
> Ok, I did that way in the attached version, I have passed the control file's
> full path as a second argument to verify_system_identifier() what we gets in
> verify_backup_file(), but that is not doing any useful things with it,
> since we
> were using get_controlfile() to open the control file, which takes the
> directory as an input and computes the full path on its own.

I've read through the patch, and that's pretty cool.

Thank you for looking into this.


-static void
-parse_manifest_file(char *manifest_path, manifest_files_hash **ht_p,
-                                       manifest_wal_range **first_wal_range_p)
+static manifest_data *
+parse_manifest_file(char *manifest_path)

In 0001, should the comment describing this routine be updated as
well?

Ok, updated in the attached version.
 

+   identifier with pg_control of the backup directory or fails verification

This is missing a <filename> markup here.

Done, in the attached version.
 

+      <productname>PostgreSQL</productname> 17, it is 2; in older versions,
+      it is 1.

Perhaps a couple of <literal>s here.
Done. 


+       if (strcmp(parse->manifest_version, "1") != 0 &&
+               strcmp(parse->manifest_version, "2") != 0)
+               json_manifest_parse_failure(parse->context,
+                                                                       "unexpected manifest version");
+
+       /* Parse version. */
+       version = strtoi64(parse->manifest_version, &ep, 10);
+       if (*ep)
+               json_manifest_parse_failure(parse->context,
+                                                                       "manifest version not an integer");
+
+       /* Invoke the callback for version */
+       context->version_cb(context, version);

Shouldn't these two checks be reversed?  And is there actually a need
for the first check at all knowing that the version callback should be
in charge of performing the validation vased on the version received?

Make sense, reversed the order.

I think, particular allowed versions should be placed at the central place, and
the callback can check and react on the versions suitable to them, IMHO.
 
+my $node2;
+{
+       local $ENV{'INITDB_TEMPLATE'} = undef;

Not sure that it is a good idea to duplicate this pattern twice.
Shouldn't this be something we'd want to control with an option in the
init() method instead?
 
Yes, I did that in a separate patch, see 0001 patch.


+static void
+verify_system_identifier(verifier_context *context, char *controlpath)

Relying both on controlpath, being a full path to the control file
including the data directory, and context->backup_directory to read
the contents of the control file looks a bit weird.  Wouldn't it be
cleaner to just use one of them?
 
Well, yes, I had to have the same feeling, how about having another function
that can accept a full path of pg_control?

I tried in the 0002 patch, where the original function is renamed to
get_dir_controlfile(), which accepts the data directory path as before, and
get_controlfile() now accepts the full path of the pg_control file.

Kindly have a look at the attached version.

Regards,
Amul

Вложения

Re: Add system identifier to backup manifest

От
Robert Haas
Дата:
On Thu, Feb 15, 2024 at 3:05 PM Amul Sul <sulamul@gmail.com> wrote:
> Kindly have a look at the attached version.

IMHO, 0001 looks fine, except probably the comment could be phrased a
bit more nicely. That can be left for whoever commits this to
wordsmith. Michael, what are your plans?

0002 seems like a reasonable approach, but there's a hunk in the wrong
patch: 0004 modifies pg_combinebackup's check_control_files to use
get_dir_controlfile() rather than git_controlfile(), but it looks to
me like that should be part of 0002.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Add system identifier to backup manifest

От
Michael Paquier
Дата:
On Thu, Feb 15, 2024 at 05:41:46PM +0530, Robert Haas wrote:
> On Thu, Feb 15, 2024 at 3:05 PM Amul Sul <sulamul@gmail.com> wrote:
> > Kindly have a look at the attached version.
>
> IMHO, 0001 looks fine, except probably the comment could be phrased a
> bit more nicely.

And the new option should be documented at the top of the init()
routine for perldoc.

> That can be left for whoever commits this to
> wordsmith. Michael, what are your plans?

Not much, so feel free to not wait for me.  I've just read through the
patch because I like the idea/feature :)

> 0002 seems like a reasonable approach, but there's a hunk in the wrong
> patch: 0004 modifies pg_combinebackup's check_control_files to use
> get_dir_controlfile() rather than git_controlfile(), but it looks to
> me like that should be part of 0002.

I'm slightly concerned about 0002 that silently changes the meaning of
get_controlfile().  That would impact extension code without people
knowing about it when compiling, just when they run their stuff under
17~.
--
Michael

Вложения

Re: Add system identifier to backup manifest

От
Amul Sul
Дата:

On Mon, Feb 19, 2024 at 4:22 AM Michael Paquier <michael@paquier.xyz> wrote:
On Thu, Feb 15, 2024 at 05:41:46PM +0530, Robert Haas wrote:
> On Thu, Feb 15, 2024 at 3:05 PM Amul Sul <sulamul@gmail.com> wrote:
> > Kindly have a look at the attached version.
>
> IMHO, 0001 looks fine, except probably the comment could be phrased a
> bit more nicely.

And the new option should be documented at the top of the init()
routine for perldoc.

Added in the attached version.


> That can be left for whoever commits this to
> wordsmith. Michael, what are your plans?

Not much, so feel free to not wait for me.  I've just read through the
patch because I like the idea/feature :)
 
Thank you, that helped a lot.

> 0002 seems like a reasonable approach, but there's a hunk in the wrong
> patch: 0004 modifies pg_combinebackup's check_control_files to use
> get_dir_controlfile() rather than git_controlfile(), but it looks to
> me like that should be part of 0002.
 
Fixed in the attached version. 


I'm slightly concerned about 0002 that silently changes the meaning of
get_controlfile().  That would impact extension code without people
knowing about it when compiling, just when they run their stuff under
17~.

Agreed, now they will have an error as _could not read file "<DataDir>": Is a
directory_. But, IIUC, that what usually happens with the dev version, and the
extension needs to be updated for compatibility with the newer version for the
same reason.

Regards,
Amul




Вложения

Re: Add system identifier to backup manifest

От
Michael Paquier
Дата:
On Mon, Feb 19, 2024 at 12:06:19PM +0530, Amul Sul wrote:
> On Mon, Feb 19, 2024 at 4:22 AM Michael Paquier <michael@paquier.xyz> wrote:
>> And the new option should be documented at the top of the init()
>> routine for perldoc.
>
> Added in the attached version.

I've done some wordsmithing on 0001 and it is OK, so I've applied it
to move the needle.  Hope that helps.
--
Michael

Вложения

Re: Add system identifier to backup manifest

От
Michael Paquier
Дата:
On Mon, Feb 19, 2024 at 12:06:19PM +0530, Amul Sul wrote:
> Agreed, now they will have an error as _could not read file "<DataDir>": Is
> a directory_. But, IIUC, that what usually happens with the dev version, and
> the extension needs to be updated for compatibility with the newer version for
> the same reason.

I was reading through the remaining pieces of the patch set, and are
you sure that there is a need for 0002 at all?  The only reason why
get_dir_controlfile() is introduced is to be able to get the contents
of a control file with a full path to it, and not a data folder.  Now,
if I look closely, with 0002~0004 applied, the only two callers of
get_controlfile() are pg_combinebackup.c and pg_verifybackup.c.  Both
of them have an access to the backup directories, which point to the
root of the data folder.  pg_combinebackup can continue to use
backup_dirs[i].  pg_verifybackup has an access to the backup directory
in the context data, if I'm reading this right, so you could just use
that in verify_system_identifier().
--
Michael

Вложения

Re: Add system identifier to backup manifest

От
Amul Sul
Дата:


On Wed, Feb 21, 2024 at 10:01 AM Michael Paquier <michael@paquier.xyz> wrote:
On Mon, Feb 19, 2024 at 12:06:19PM +0530, Amul Sul wrote:
> On Mon, Feb 19, 2024 at 4:22 AM Michael Paquier <michael@paquier.xyz> wrote:
>> And the new option should be documented at the top of the init()
>> routine for perldoc.
>
> Added in the attached version.

I've done some wordsmithing on 0001 and it is OK, so I've applied it
to move the needle.  Hope that helps.

Thank you very much.

Regards,
Amul

Re: Add system identifier to backup manifest

От
Amul Sul
Дата:
On Fri, Mar 1, 2024 at 11:28 AM Michael Paquier <michael@paquier.xyz> wrote:
On Mon, Feb 19, 2024 at 12:06:19PM +0530, Amul Sul wrote:
> Agreed, now they will have an error as _could not read file "<DataDir>": Is
> a directory_. But, IIUC, that what usually happens with the dev version, and
> the extension needs to be updated for compatibility with the newer version for
> the same reason.

I was reading through the remaining pieces of the patch set, and are
you sure that there is a need for 0002 at all?  The only reason why
get_dir_controlfile() is introduced is to be able to get the contents
of a control file with a full path to it, and not a data folder.  Now,
if I look closely, with 0002~0004 applied, the only two callers of
get_controlfile() are pg_combinebackup.c and pg_verifybackup.c.  Both
of them have an access to the backup directories, which point to the
root of the data folder.  pg_combinebackup can continue to use
backup_dirs[i].  pg_verifybackup has an access to the backup directory
in the context data, if I'm reading this right, so you could just use
that in verify_system_identifier().

Yes, you are correct. Both the current caller of get_controlfile() has
access to the root directory.

I have dropped the 0002 patch -- I don't have a very strong opinion to refactor
get_controlfile() apart from saying that it might be good to have both versions :) .

Regards,
Amul



Вложения

Re: Add system identifier to backup manifest

От
Robert Haas
Дата:
On Mon, Mar 4, 2024 at 12:35 AM Amul Sul <sulamul@gmail.com> wrote:
> Yes, you are correct. Both the current caller of get_controlfile() has
> access to the root directory.
>
> I have dropped the 0002 patch -- I don't have a very strong opinion to refactor
> get_controlfile() apart from saying that it might be good to have both versions :) .

I don't have an enormously strong opinion on what the right thing to
do is here either, but I am not convinced that the change proposed by
Michael is an improvement. After all, that leaves us with the
situation where we know the path to the control file in three
different places. First, verify_backup_file() does a strcmp() against
the string "global/pg_control" to decide whether to call
verify_backup_file(). Then, verify_system_identifier() uses that
string to construct a pathname to the file that it will be read. Then,
get_controlfile() reconstructs the same pathname using it's own logic.
That's all pretty disagreeable. Hard-coded constants are hard to avoid
completely, but here it looks an awful lot like we're trying to
hardcode the same constant into as many different places as we can.
The now-dropped patch seems like an effort to avoid this, and while
it's possible that it wasn't the best way to avoid this, I still think
avoiding it somehow is probably the right idea.

I get a compiler warning with 0002, too:

../pgsql/src/backend/backup/basebackup_incremental.c:960:22: warning:
call to undeclared function 'GetSystemIdentifier'; ISO C99 and later
do not support implicit function declarations
[-Wimplicit-function-declaration]
        system_identifier = GetSystemIdentifier();
                            ^
1 warning generated.

But I've committed 0001.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Add system identifier to backup manifest

От
Robert Haas
Дата:
On Mon, Mar 4, 2024 at 2:47 PM Robert Haas <robertmhaas@gmail.com> wrote:
> I don't have an enormously strong opinion on what the right thing to
> do is here either, but I am not convinced that the change proposed by
> Michael is an improvement. After all, that leaves us with the
> situation where we know the path to the control file in three
> different places. First, verify_backup_file() does a strcmp() against
> the string "global/pg_control" to decide whether to call
> verify_backup_file(). Then, verify_system_identifier() uses that
> string to construct a pathname to the file that it will be read. Then,
> get_controlfile() reconstructs the same pathname using it's own logic.
> That's all pretty disagreeable. Hard-coded constants are hard to avoid
> completely, but here it looks an awful lot like we're trying to
> hardcode the same constant into as many different places as we can.
> The now-dropped patch seems like an effort to avoid this, and while
> it's possible that it wasn't the best way to avoid this, I still think
> avoiding it somehow is probably the right idea.

So with that in mind, here's my proposal. This is an adjustment of
Amit's previous refactoring patch. He renamed the existing
get_controlfile() to get_dir_controlfile() and made a new
get_controlfile() that accepted the path to the control file itself. I
chose to instead leave the existing get_controlfile() alone and add a
new get_controlfile_by_exact_path(). I think this is better, because
most of the existing callers find it more convenient to pass the path
to the data directory rather than the path to the controlfile, so the
patch is smaller this way, and less prone to cause headaches for
people back-patching or maintaining out-of-core code. But it still
gives us a way to avoid repeatedly constructing the same pathname.

If nobody objects, I plan to commit this version.

--
Robert Haas
EDB: http://www.enterprisedb.com

Вложения

Re: Add system identifier to backup manifest

От
Michael Paquier
Дата:
On Wed, Mar 06, 2024 at 11:05:36AM -0500, Robert Haas wrote:
> So with that in mind, here's my proposal. This is an adjustment of
> Amit's previous refactoring patch. He renamed the existing
> get_controlfile() to get_dir_controlfile() and made a new
> get_controlfile() that accepted the path to the control file itself. I
> chose to instead leave the existing get_controlfile() alone and add a
> new get_controlfile_by_exact_path(). I think this is better, because
> most of the existing callers find it more convenient to pass the path
> to the data directory rather than the path to the controlfile, so the
> patch is smaller this way, and less prone to cause headaches for
> people back-patching or maintaining out-of-core code. But it still
> gives us a way to avoid repeatedly constructing the same pathname.

Yes, that was my primary concern with the previous versions of the
patch.

> If nobody objects, I plan to commit this version.

You are not changing silently the internals of get_controlfile(), so
no objections here.  The name of the new routine could be shorter, but
being short of ideas what you are proposing looks fine by me.
--
Michael

Вложения

Re: Add system identifier to backup manifest

От
Amul Sul
Дата:
On Thu, Mar 7, 2024 at 9:37 AM Michael Paquier <michael@paquier.xyz> wrote:
On Wed, Mar 06, 2024 at 11:05:36AM -0500, Robert Haas wrote:
> So with that in mind, here's my proposal. This is an adjustment of
> Amit's previous refactoring patch. He renamed the existing
> get_controlfile() to get_dir_controlfile() and made a new
> get_controlfile() that accepted the path to the control file itself. I
> chose to instead leave the existing get_controlfile() alone and add a
> new get_controlfile_by_exact_path(). I think this is better, because
> most of the existing callers find it more convenient to pass the path
> to the data directory rather than the path to the controlfile, so the
> patch is smaller this way, and less prone to cause headaches for
> people back-patching or maintaining out-of-core code. But it still
> gives us a way to avoid repeatedly constructing the same pathname.

Yes, that was my primary concern with the previous versions of the
patch.

> If nobody objects, I plan to commit this version.

You are not changing silently the internals of get_controlfile(), so
no objections here.  The name of the new routine could be shorter, but
being short of ideas what you are proposing looks fine by me.

Could be get_controlfile_by_path() ?

Regards,
Amul

Re: Add system identifier to backup manifest

От
Robert Haas
Дата:
On Wed, Mar 6, 2024 at 11:22 PM Amul Sul <sulamul@gmail.com> wrote:
>> You are not changing silently the internals of get_controlfile(), so
>> no objections here.  The name of the new routine could be shorter, but
>> being short of ideas what you are proposing looks fine by me.
>
> Could be get_controlfile_by_path() ?

It could. I just thought this was clearer. I agree that it's a bit
long, but I don't think this is worth bikeshedding very much. If at a
later time somebody feels strongly that it needs to be changed, so be
it. Right now, getting on with the business at hand is more important,
IMHO.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Add system identifier to backup manifest

От
Robert Haas
Дата:
On Thu, Mar 7, 2024 at 9:16 AM Robert Haas <robertmhaas@gmail.com> wrote:
> It could. I just thought this was clearer. I agree that it's a bit
> long, but I don't think this is worth bikeshedding very much. If at a
> later time somebody feels strongly that it needs to be changed, so be
> it. Right now, getting on with the business at hand is more important,
> IMHO.

Here's a new version of the patch set, rebased over my version of 0001
and with various other corrections:

* Tidy up grammar in documentation.
* In manifest_process_version, the test checked whether the manifest
version == 1, but the comment talked about versions >= 2. Make the
comment match the code.
* In load_backup_manifest, avoid changing the existing placement of a
variable declaration.
* Rename verify_system_identifier to verify_control_file because if we
were verifying multiple things about the control file we'd still want
to only read it one.
* Tweak the coding of verify_backup_file and verify_control_file to
avoid repeated path construction.
* Remove saw_system_identifier_field. This looks like it's trying to
enforce a rule that the system identifier must immediately follow the
version, but we don't insist on anything like that for files or wal
ranges, so there seems to be no reason to do it here.
* Remove bogus "unrecognized top-level field" test from
005_bad_manifest.pl. The JSON included here doesn't include any
unrecognized top-level field, so the fact that we were getting that
error message was wrong. After removing saw_system_identifier_field,
we no longer get the wrong error message any more, so the test started
failing.
* Remove "expected system identifier" test from 005_bad_manifest.pl.
This was basically a test that saw_system_identifier_field was
working.
* Header comment adjustment for
json_manifest_finalize_system_identifier. The last sentence was
cut-and-pasted from somewhere that it made sense to here, where it
doesn't. There's only ever one system identifier.

I plan to commit this, barring objections.

--
Robert Haas
EDB: http://www.enterprisedb.com

Вложения

Re: Add system identifier to backup manifest

От
Amul Sul
Дата:
On Fri, Mar 8, 2024 at 1:22 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Mar 7, 2024 at 9:16 AM Robert Haas <robertmhaas@gmail.com> wrote:
> It could. I just thought this was clearer. I agree that it's a bit
> long, but I don't think this is worth bikeshedding very much. If at a
> later time somebody feels strongly that it needs to be changed, so be
> it. Right now, getting on with the business at hand is more important,
> IMHO.

Here's a new version of the patch set, rebased over my version of 0001
and with various other corrections:

* Tidy up grammar in documentation.
* In manifest_process_version, the test checked whether the manifest
version == 1, but the comment talked about versions >= 2. Make the
comment match the code.
* In load_backup_manifest, avoid changing the existing placement of a
variable declaration.
* Rename verify_system_identifier to verify_control_file because if we
were verifying multiple things about the control file we'd still want
to only read it one.
* Tweak the coding of verify_backup_file and verify_control_file to
avoid repeated path construction.
* Remove saw_system_identifier_field. This looks like it's trying to
enforce a rule that the system identifier must immediately follow the
version, but we don't insist on anything like that for files or wal
ranges, so there seems to be no reason to do it here.
* Remove bogus "unrecognized top-level field" test from
005_bad_manifest.pl. The JSON included here doesn't include any
unrecognized top-level field, so the fact that we were getting that
error message was wrong. After removing saw_system_identifier_field,
we no longer get the wrong error message any more, so the test started
failing.
* Remove "expected system identifier" test from 005_bad_manifest.pl.
This was basically a test that saw_system_identifier_field was
working.
* Header comment adjustment for
json_manifest_finalize_system_identifier. The last sentence was
cut-and-pasted from somewhere that it made sense to here, where it
doesn't. There's only ever one system identifier.


Thank you for the improvement. 

The caller of verify_control_file() has the full path of the control file that
can pass it and avoid recomputing. With this change, it doesn't really need
verifier_context argument -- only the manifest's system identifier is enough
along with the control file path.  Did the same in the attached delta patch
for v11-0002 patch, please have a look, thanks.

Regards,
Amul
 
Вложения

Re: Add system identifier to backup manifest

От
Robert Haas
Дата:
On Fri, Mar 8, 2024 at 12:14 AM Amul Sul <sulamul@gmail.com> wrote:
> Thank you for the improvement.
>
> The caller of verify_control_file() has the full path of the control file that
> can pass it and avoid recomputing. With this change, it doesn't really need
> verifier_context argument -- only the manifest's system identifier is enough
> along with the control file path.  Did the same in the attached delta patch
> for v11-0002 patch, please have a look, thanks.

Those seem like sensible changes. I incorporated them and committed. I also:

* ran pgindent, which changed a bit of your formatting
* changed some BAIL_OUT calls to die; I think we should hardly ever be
using BAIL_OUT, as that terminates the entire TAP test run, not just
the current file

Thanks,

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Add system identifier to backup manifest

От
Amul Sul
Дата:


On Thu, Mar 14, 2024 at 12:48 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Mar 8, 2024 at 12:14 AM Amul Sul <sulamul@gmail.com> wrote:
> Thank you for the improvement.
>
> The caller of verify_control_file() has the full path of the control file that
> can pass it and avoid recomputing. With this change, it doesn't really need
> verifier_context argument -- only the manifest's system identifier is enough
> along with the control file path.  Did the same in the attached delta patch
> for v11-0002 patch, please have a look, thanks.

Those seem like sensible changes. I incorporated them and committed. I also:

* ran pgindent, which changed a bit of your formatting
* changed some BAIL_OUT calls to die; I think we should hardly ever be
using BAIL_OUT, as that terminates the entire TAP test run, not just
the current file

Thank you, Robert.

Regards,
Amul

Re: Add system identifier to backup manifest

От
Robert Haas
Дата:
On Thu, Mar 14, 2024 at 11:05 AM Amul Sul <sulamul@gmail.com> wrote:
> Thank you, Robert.

Thanks for the patch!

--
Robert Haas
EDB: http://www.enterprisedb.com