Обсуждение: Add a different archive_command example for Linux / Unix

Поиск
Список
Период
Сортировка

Add a different archive_command example for Linux / Unix

От
PG Doc comments form
Дата:
The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/16/bug-reporting.html
Description:

Hello,
in
https://www.postgresql.org/docs/16/continuous-archiving.html#BACKUP-ARCHIVING-WAL
the example given could be improved for Linux / Unix environment.

As cp command is buffered it means it could return success although the data
didn't reach the disk
which puts backups at risk.

I propose to use dd command with its fsync option.

So the actual equivalent example would be :

archive_command = 'dd if=%p of=/mnt/server/archivedir/%f bs=1M
conv=fsync,excl status=none' # Unix

What do you think ?

Regards
Gilles

Re: Add a different archive_command example for Linux / Unix

От
Stephen Frost
Дата:
Greetings,

* PG Doc comments form (noreply@postgresql.org) wrote:
> Hello,
> in
> https://www.postgresql.org/docs/16/continuous-archiving.html#BACKUP-ARCHIVING-WAL
> the example given could be improved for Linux / Unix environment.
>
> As cp command is buffered it means it could return success although the data
> didn't reach the disk
> which puts backups at risk.

Yup.

> I propose to use dd command with its fsync option.
>
> So the actual equivalent example would be :
>
> archive_command = 'dd if=%p of=/mnt/server/archivedir/%f bs=1M
> conv=fsync,excl status=none' # Unix
>
> What do you think ?

This doesn't fsync the directory though, for one thing, and there are
other considerations beyond that when having archive_command run and
more generally when doing backups with PG.  In short, the example in the
documentation is not to ever be used but is intended to show how the
replacement is done when the command is called, so that backup tool
authors know how it works.

In reality though, to write backup software for PG, you really do need
to know PG in much more detail than the documentation provides, which
means reading the source- for example, backup software should be
checking the pg_control file's CRC as it's possible to read it just as
it's being written and end up with an invalid pg_control file in the
backup, making the backup invalid.  There's been some discussion about
how to improve this situation but nothing exists today from PG, so
backup authors have to handle it.  This is just one example, there are
lots of others- unlogged table handling, temporary file handling, etc,
etc.

I'd strongly recommend using one of the existing well maintained backup
tools which have been written specifically for PG for your backups.
Writing a new backup tool for PG is a good bit of work and isn't
really reasonable to do with shell scripts or simple unix commands.

I do feel that we could improve the documentation around this by
dropping comments like "using any convenient file-system-backup tool
such as tar or cpio" as those don't, for example, support any way to
reasonably deal with unlogged tables by themselves.  Technically you
could scan the data directory and provide an exclude file, or not
include unlogged table files in the list of files to include, but then
you're starting to get into things like how to tell if a file is
associated with an unlogged table or not and while that's deep in the
documentation, we don't make any mention or reference to unlogged tables
in the backup documentation.  Perhaps an addition to the low-level
documentation under 'Backing Up The Data Directory' along these lines
would be helpful:

#########
You should omit any unlogged relation files (other than the 'init' fork)
as they will be reset to be empty upon recovery and backing them up will
simply increase your backup size (potentially significantly) and slow
down the restore process.  Unlogged tables have an init fork (link to
storage-init.html) which is a file with the same filenode number as the
relation but with a suffix added of '_init' (link to
storage-file-layout.html).  When an '_init' fork exists for a given
relation, the '_init' file should be included in the backup, but all
other files for that relation (the 'main' fork, which does not have a
suffix, and all other forks which exist other than the 'init' fork)
should be excluded from the backup.
#########

There's also no way for tar or cpio to directly validate that the copy
of pg_control that they copied is valid.  Worse, on a restore, they'll
restore pg_control more-or-less whenever and then if the restore doesn't
complete for whatever reason, you might end up with a cluster that can
be started, run for a while, but be missing whole tables.  While it was
only demonstrated relatively recently that the pg_control file can, in
fact, be invalid when read during a backup, it's a real issue that's
been around for, probably, forever, and there isn't really a good way to
address it today.  Still, perhaps we should include in the
documentation, again under the 'Backup Up The Data Directory', something
like:

#########
You must be sure to check that the copy of pg_control which was
copied is valid.  The pg_controldata tool can be used for this purpose-
after making the copy of the 'data_dir/global/pg_control' file which
will be included in the backup, restore it into a new directory
'test_data_dir/global' and then run 'pg_controldata' on 'test_data_dir'
and ensure that no error is returned.  If an error is returned, attempt
to re-copy the 'data_dir/global/pg_control' file and test again (during
the backup), or fail the backup entirely.  A backup without a valid
pg_control file is not able to be restored.

Further, on restore, it is strongly recommended to restore the
pg_control file last and only after the successfully restoring the rest
of the database, to prevent the database from being started either while
the restore is happening or in the event that not all of the files which
were part of the backup were able to be restored.
#########

Of course, there's more, such as the lack of any discussion about making
sure to have a separate manifest of all of the files that were copied as
part of the backup to allow you to make sure that, for example, the tar
file that was created for the backup didn't just get truncated somewhere
along the way.  The above alludes to this idea, at least.

I will note that while we don't talk about all of these things in the
documentation, pg_basebackup and pg_receivewal do handle things like
doing an fsync() after the backup has completed or when receiving WAL,
including of directories, skipping unlogged relations and temporary
files, includes a manifest with a per-file checksum which the
pg_verifybackup tool can be used to check, etc.  While pg_basebackup
doesn't yet handle the issue of an invalid read of pg_control, there's
ongoing discussion about how to address that and patches have been
proposed to deal with it.  Hopefully something will be done soon there
as pg_basebackup, particularly when run against a replica where the
pg_control file is rewritten much more frequently, in certain cases,
seems to be at particular risk.  This isn't to say it's very likely-
it's not, just that it is a thing that could happen and could lead to an
invalid backup.

Really, rewriting the entire "how to perform a low-level backup" and
re-focusing that section of the documentation to be for backup tool
authors rather than having it seem like it's a unix administrator or
general DBA's documentation for performing backups would really go a
long way towards improving the situation, for my ... slightly more than
2 cents (or your favorite currency), anyway.

Thanks!

Stephen

Вложения

Re: Add a different archive_command example for Linux / Unix

От
gparc@free.fr
Дата:
Thanks Stephen for your detailed reply and broad perspective.
But I see the cp example command used **as is** most of the time.

Regards
Gilles

----- Mail original -----
> De: "Stephen Frost" <sfrost@snowman.net>
> À: "gparc" <gparc@online.fr>, "pgsql-docs" <pgsql-docs@lists.postgresql.org>
> Envoyé: Jeudi 8 Février 2024 13:00:13
> Objet: Re: Add a different archive_command example for Linux / Unix

> Greetings,
>
> * PG Doc comments form (noreply@postgresql.org) wrote:
>> Hello,
>> in
>> https://www.postgresql.org/docs/16/continuous-archiving.html#BACKUP-ARCHIVING-WAL
>> the example given could be improved for Linux / Unix environment.
>>
>> As cp command is buffered it means it could return success although the data
>> didn't reach the disk
>> which puts backups at risk.
>
> Yup.
>
>> I propose to use dd command with its fsync option.
>>
>> So the actual equivalent example would be :
>>
>> archive_command = 'dd if=%p of=/mnt/server/archivedir/%f bs=1M
>> conv=fsync,excl status=none' # Unix
>>
>> What do you think ?
>
> This doesn't fsync the directory though, for one thing, and there are
> other considerations beyond that when having archive_command run and
> more generally when doing backups with PG.  In short, the example in the
> documentation is not to ever be used but is intended to show how the
> replacement is done when the command is called, so that backup tool
> authors know how it works.
>
> In reality though, to write backup software for PG, you really do need
> to know PG in much more detail than the documentation provides, which
> means reading the source- for example, backup software should be
> checking the pg_control file's CRC as it's possible to read it just as
> it's being written and end up with an invalid pg_control file in the
> backup, making the backup invalid.  There's been some discussion about
> how to improve this situation but nothing exists today from PG, so
> backup authors have to handle it.  This is just one example, there are
> lots of others- unlogged table handling, temporary file handling, etc,
> etc.
>
> I'd strongly recommend using one of the existing well maintained backup
> tools which have been written specifically for PG for your backups.
> Writing a new backup tool for PG is a good bit of work and isn't
> really reasonable to do with shell scripts or simple unix commands.
>
> I do feel that we could improve the documentation around this by
> dropping comments like "using any convenient file-system-backup tool
> such as tar or cpio" as those don't, for example, support any way to
> reasonably deal with unlogged tables by themselves.  Technically you
> could scan the data directory and provide an exclude file, or not
> include unlogged table files in the list of files to include, but then
> you're starting to get into things like how to tell if a file is
> associated with an unlogged table or not and while that's deep in the
> documentation, we don't make any mention or reference to unlogged tables
> in the backup documentation.  Perhaps an addition to the low-level
> documentation under 'Backing Up The Data Directory' along these lines
> would be helpful:
>
> #########
> You should omit any unlogged relation files (other than the 'init' fork)
> as they will be reset to be empty upon recovery and backing them up will
> simply increase your backup size (potentially significantly) and slow
> down the restore process.  Unlogged tables have an init fork (link to
> storage-init.html) which is a file with the same filenode number as the
> relation but with a suffix added of '_init' (link to
> storage-file-layout.html).  When an '_init' fork exists for a given
> relation, the '_init' file should be included in the backup, but all
> other files for that relation (the 'main' fork, which does not have a
> suffix, and all other forks which exist other than the 'init' fork)
> should be excluded from the backup.
> #########
>
> There's also no way for tar or cpio to directly validate that the copy
> of pg_control that they copied is valid.  Worse, on a restore, they'll
> restore pg_control more-or-less whenever and then if the restore doesn't
> complete for whatever reason, you might end up with a cluster that can
> be started, run for a while, but be missing whole tables.  While it was
> only demonstrated relatively recently that the pg_control file can, in
> fact, be invalid when read during a backup, it's a real issue that's
> been around for, probably, forever, and there isn't really a good way to
> address it today.  Still, perhaps we should include in the
> documentation, again under the 'Backup Up The Data Directory', something
> like:
>
> #########
> You must be sure to check that the copy of pg_control which was
> copied is valid.  The pg_controldata tool can be used for this purpose-
> after making the copy of the 'data_dir/global/pg_control' file which
> will be included in the backup, restore it into a new directory
> 'test_data_dir/global' and then run 'pg_controldata' on 'test_data_dir'
> and ensure that no error is returned.  If an error is returned, attempt
> to re-copy the 'data_dir/global/pg_control' file and test again (during
> the backup), or fail the backup entirely.  A backup without a valid
> pg_control file is not able to be restored.
>
> Further, on restore, it is strongly recommended to restore the
> pg_control file last and only after the successfully restoring the rest
> of the database, to prevent the database from being started either while
> the restore is happening or in the event that not all of the files which
> were part of the backup were able to be restored.
> #########
>
> Of course, there's more, such as the lack of any discussion about making
> sure to have a separate manifest of all of the files that were copied as
> part of the backup to allow you to make sure that, for example, the tar
> file that was created for the backup didn't just get truncated somewhere
> along the way.  The above alludes to this idea, at least.
>
> I will note that while we don't talk about all of these things in the
> documentation, pg_basebackup and pg_receivewal do handle things like
> doing an fsync() after the backup has completed or when receiving WAL,
> including of directories, skipping unlogged relations and temporary
> files, includes a manifest with a per-file checksum which the
> pg_verifybackup tool can be used to check, etc.  While pg_basebackup
> doesn't yet handle the issue of an invalid read of pg_control, there's
> ongoing discussion about how to address that and patches have been
> proposed to deal with it.  Hopefully something will be done soon there
> as pg_basebackup, particularly when run against a replica where the
> pg_control file is rewritten much more frequently, in certain cases,
> seems to be at particular risk.  This isn't to say it's very likely-
> it's not, just that it is a thing that could happen and could lead to an
> invalid backup.
>
> Really, rewriting the entire "how to perform a low-level backup" and
> re-focusing that section of the documentation to be for backup tool
> authors rather than having it seem like it's a unix administrator or
> general DBA's documentation for performing backups would really go a
> long way towards improving the situation, for my ... slightly more than
> 2 cents (or your favorite currency), anyway.
>
> Thanks!
>
> Stephen



Re: Add a different archive_command example for Linux / Unix

От
Stephen Frost
Дата:
Greetings,

* gparc@free.fr (gparc@free.fr) wrote:
> Thanks Stephen for your detailed reply and broad perspective.
> But I see the cp example command used **as is** most of the time.

In those cases- how would changing it to be a dd command be helpful?
The directory still wouldn't be fsync'd and there's a very good chance
that the rest of the documentation isn't followed or understood either,
leading almost certainly to broken backup setups.  This wouldn't be the
only issue in any case, to be sure.

This comes back to my earlier suggestion that perhaps we should just
change it to something like:

archive_command = 'backup_tool %p /mnt/server/archivedir/%f'

and not talk about specific tools that exist but don't perform in the
manner we actually expect from an archive command that we're using.  We
already make it pretty clear to anyone who knows the tools mentioned
that the 'example' command won't work, if you read everything under that
section.

Alternatively, we could actually document the tools we're aware of that
do work and which do strive, at least, to try and be good backup tools
and good archive commands for PG.  That would certainly be a service to
our users and might result in far fewer misconfigured systems using the
examples because they thought (despite the explicit note in our
documentation) that they were recommendations.

Thanks,

Stephen

Вложения

Re: Add a different archive_command example for Linux / Unix

От
gparc@free.fr
Дата:
Hello Stephen,

For the missing fsync directory, in case of a system crash which I had in mind using this command,
I thought that fsck will fixed the discrepancy.

I support your proposal i.e. archive_command = 'backup_tool %p /mnt/server/archivedir/%f'
as at least people will investigate what choices they have for backup tools.

Regards
Gilles



----- Mail original -----
> De: "Stephen Frost" <sfrost@snowman.net>
> À: "gparc" <gparc@free.fr>
> Cc: "gparc" <gparc@online.fr>, "pgsql-docs" <pgsql-docs@lists.postgresql.org>
> Envoyé: Jeudi 8 Février 2024 22:54:29
> Objet: Re: Add a different archive_command example for Linux / Unix

> Greetings,
>
> * gparc@free.fr (gparc@free.fr) wrote:
>> Thanks Stephen for your detailed reply and broad perspective.
>> But I see the cp example command used **as is** most of the time.
>
> In those cases- how would changing it to be a dd command be helpful?
> The directory still wouldn't be fsync'd and there's a very good chance
> that the rest of the documentation isn't followed or understood either,
> leading almost certainly to broken backup setups.  This wouldn't be the
> only issue in any case, to be sure.
>
> This comes back to my earlier suggestion that perhaps we should just
> change it to something like:
>
> archive_command = 'backup_tool %p /mnt/server/archivedir/%f'
>
> and not talk about specific tools that exist but don't perform in the
> manner we actually expect from an archive command that we're using.  We
> already make it pretty clear to anyone who knows the tools mentioned
> that the 'example' command won't work, if you read everything under that
> section.
>
> Alternatively, we could actually document the tools we're aware of that
> do work and which do strive, at least, to try and be good backup tools
> and good archive commands for PG.  That would certainly be a service to
> our users and might result in far fewer misconfigured systems using the
> examples because they thought (despite the explicit note in our
> documentation) that they were recommendations.
>
> Thanks,
>
> Stephen