Обсуждение: pg_basebackup and pg_receivewal timing, missing WAL files

Поиск
Список
Период
Сортировка

pg_basebackup and pg_receivewal timing, missing WAL files

От
Tim
Дата:
Hello all,

I'm attempting to create a long term, stand alone backup archive script for a pretty active and large database (3TB) which will store backups going back a few months. 
Using the combination of pg_basebackup in TAR format, with the `Xn` flag, and running pg_receivewal in the background, I'm ending up with a backup that is missing quite a few WAL files.

Trying to figure out where my understanding of the process is incorrect, and what I need to do to correct this. 

Backup process:
    * Start pg_receivewal to archive location
    * Then run: `pg_basebackup -h {host} -p {port} -D - -Ft -Xn --checkpoint=fast | ...`
    * This backup is taken from a standby node 
    * Once pg_basebackup finishes, send SIGINT to pg_receivewal

Recovery Process:
    * Untar basebackup to data directory 
    * Copy archived WAL files to a separate directory on the same drive
    * settings: restore_command = 'cp /var/lib/pgsql/wal_archive/%f %p'
                     recovery_target = immediate

Once I start the DB, I immediately get 
cp: cannot stat '/var/lib/pgsql/wal_archive/0000001200003CF80000003F': No such file or directory
    
WAL FILES: 
0000001200003CF80000003F << First WAL file the DB looks for 
0000001200003CF700000075 << The next WAL file 
----------------------------------------  << Missing WAL files from pg_receivewal archive
...
0000001200003CF8000000C8 << First WAL file in pg_receivewal archive 

backup_label file:
START WAL LOCATION: 3CF7/75974400 (file 0000001200003CF700000075)
CHECKPOINT LOCATION: 3CF8/3F1139C0
BACKUP METHOD: streamed
BACKUP FROM: standby
START TIME: 2022-06-08 12:51:52 EDT
LABEL: pg_basebackup base backup
START TIMELINE: 18

I can see that START WAL LOCATION & CHECKPOINT LOCATION are where the basebackup starts to recover from, but how come these are not in my pg_receivewal archive, if I'm starting it before starting the pg_basebackup process? How can I ensure they are part of this archive? 

Since this is a recent backup, I have a separate WAL archive from where I can just recover those using a different restore_command, and once that missing gap of WAL files is recovered from this archive, stopping the server 
and switching back to `restore_command = 'cp /var/lib/pgsql/wal_archive/%f %p'` and recovering the rest of the WAL files from the pg_receivewal archive works fine. The DB recovers and accepts connections shortly after. 

The whole point of this is to have a several months old, stand alone backup, I need to have all the WAL files available in that original WAL archive created with pg_receivewal.  


Appreciate the help, thank you in advance!

Re: pg_basebackup and pg_receivewal timing, missing WAL files

От
Stephen Frost
Дата:
Greetings,

* Tim (timfosho@gmail.com) wrote:
> I'm attempting to create a long term, stand alone backup archive script for
> a pretty active and large database (3TB) which will store backups going
> back a few months.
> Using the combination of pg_basebackup in TAR format, with the `Xn` flag,
> and running pg_receivewal in the background, I'm ending up with a backup
> that is missing quite a few WAL files.
>
> Trying to figure out where my understanding of the process is incorrect,
> and what I need to do to correct this.
>
> *Backup *process:
>
>     * Start pg_receivewal to archive location
>     * Then run: `pg_basebackup -h {host} -p {port} -D - -Ft -Xn
> --checkpoint=fast | ...`

What's wrong with using -Xs instead..?  Basically does exactly what you
seem to be trying to do here.

>     * This backup is taken from a *standby node*

Where is the pg_receivewal run?  Against the primary or the replica?

>     * Once pg_basebackup finishes, send SIGINT to pg_receivewal
>
> *Recovery* Process:
>
>     * Untar basebackup to data directory
>     * Copy archived WAL files to a separate directory on the same drive
>     * settings: restore_command = 'cp /var/lib/pgsql/wal_archive/%f %p'
>                      recovery_target = immediate
>
>
> Once I start the DB, I immediately get
>
> cp: cannot stat '/var/lib/pgsql/wal_archive/0000001200003CF80000003F': No
> such file or directory
>
>
>
> *WAL FILES: *
>
> 0000001200003CF80000003F << First WAL file the DB looks for
> 0000001200003CF700000075 << The next WAL file
> ----------------------------------------  << Missing WAL files from
> pg_receivewal archive
> ...
> 0000001200003CF8000000C8 << First WAL file in pg_receivewal archive
>
> *backup_label *file:
>
> START WAL LOCATION: 3CF7/75974400 (file 0000001200003CF700000075)
> CHECKPOINT LOCATION: 3CF8/3F1139C0
> BACKUP METHOD: streamed
> BACKUP FROM: standby
> START TIME: 2022-06-08 12:51:52 EDT
> LABEL: pg_basebackup base backup
> START TIMELINE: 18
>
>
> I can see that START WAL LOCATION & CHECKPOINT LOCATION are where the
> basebackup starts to recover from, but how come these are not in my
> pg_receivewal archive, if I'm starting it before starting the pg_basebackup
> process? How can I ensure they are part of this archive?

Well, using -Xs should do that.

> Since this is a recent backup, I have a separate WAL archive from where I
> can just recover those using a different restore_command, and once that
> missing gap of WAL files is recovered from this archive, stopping the
> server
> and switching back to `restore_command = 'cp /var/lib/pgsql/wal_archive/%f
> %p'` and recovering the rest of the WAL files from the pg_receivewal
> archive works fine. The DB recovers and accepts connections shortly after.
>
> The whole point of this is to have a several months old, stand alone
> backup, I need to have all the WAL files available in that original WAL
> archive created with pg_receivewal.

The use-case generally makes sense, in pgbackrest we have --archive-copy
for more-or-less the same kind of thing, though you can now just use a
separate newly created repo that you back up the standalone backup to
(when multiple repos are configured, archive-push will archive to all of
them) and then archive that copy.  The latter is what we'd generally
recommend these days, but with archive-copy and a full backup, you can
just grab the full backup directory from inside the repo and it'll have
everything.

Thanks,

Stephen

Вложения

Re: pg_basebackup and pg_receivewal timing, missing WAL files

От
Tim
Дата:
We currently use pgbackrest as our primary backup tool actually, and it works great! The issue I ran into with trying to set it up that way with pgbackrest, is when configuring the second repo, pgbackrest will have to archive all WAL files to it, going back to the oldest backup. And we have this (admittedly arbitrary) legal policy requiring a single instance of a 6 month old backup for auditing purposes. So pgbackrest has to keep WALs that far back, the main repo only needs 4 weeks of WALs for PITR, its a bit overkill archiving WALs twice and also storing them an extra 5 months. So I opted to write a custom backup script.

Though I was not aware of the `--archive-copy` option, I wonder if I can just turn that option on for the main repo, and then extract the particular backup I need to another location. We use Azure Blobs for backups.

 Unfortunately `-Xs` cannot be used with pg_basebackup in TAR format, I'm using the AzCopy utility, by piping pg_basebackup into it, to stream the basebackup directly into Azure Blob storage, without storing it anywhere first (Which pgbackrest does natively as well!). So there is no other way without using the TAR format option.

Nevertheless, it seems like I need the contents of the pg_wal directory at the start of pg_basebackup, since pg_receivewal only streams WAL generated after it starts, the restored DB is missing all the files contained within, I've adjusted my script to copy them as well and testing currently. 

On Thu, Jun 9, 2022 at 2:36 PM Stephen Frost <sfrost@snowman.net> wrote:
Greetings,

* Tim (timfosho@gmail.com) wrote:
> I'm attempting to create a long term, stand alone backup archive script for
> a pretty active and large database (3TB) which will store backups going
> back a few months.
> Using the combination of pg_basebackup in TAR format, with the `Xn` flag,
> and running pg_receivewal in the background, I'm ending up with a backup
> that is missing quite a few WAL files.
>
> Trying to figure out where my understanding of the process is incorrect,
> and what I need to do to correct this.
>
> *Backup *process:
>
>     * Start pg_receivewal to archive location
>     * Then run: `pg_basebackup -h {host} -p {port} -D - -Ft -Xn
> --checkpoint=fast | ...`

What's wrong with using -Xs instead..?  Basically does exactly what you
seem to be trying to do here.

>     * This backup is taken from a *standby node*

Where is the pg_receivewal run?  Against the primary or the replica?

>     * Once pg_basebackup finishes, send SIGINT to pg_receivewal
>
> *Recovery* Process:
>
>     * Untar basebackup to data directory
>     * Copy archived WAL files to a separate directory on the same drive
>     * settings: restore_command = 'cp /var/lib/pgsql/wal_archive/%f %p'
>                      recovery_target = immediate
>
>
> Once I start the DB, I immediately get
>
> cp: cannot stat '/var/lib/pgsql/wal_archive/0000001200003CF80000003F': No
> such file or directory
>
>
>
> *WAL FILES: *
>
> 0000001200003CF80000003F << First WAL file the DB looks for
> 0000001200003CF700000075 << The next WAL file
> ----------------------------------------  << Missing WAL files from
> pg_receivewal archive
> ...
> 0000001200003CF8000000C8 << First WAL file in pg_receivewal archive
>
> *backup_label *file:
>
> START WAL LOCATION: 3CF7/75974400 (file 0000001200003CF700000075)
> CHECKPOINT LOCATION: 3CF8/3F1139C0
> BACKUP METHOD: streamed
> BACKUP FROM: standby
> START TIME: 2022-06-08 12:51:52 EDT
> LABEL: pg_basebackup base backup
> START TIMELINE: 18
>
>
> I can see that START WAL LOCATION & CHECKPOINT LOCATION are where the
> basebackup starts to recover from, but how come these are not in my
> pg_receivewal archive, if I'm starting it before starting the pg_basebackup
> process? How can I ensure they are part of this archive?

Well, using -Xs should do that.

> Since this is a recent backup, I have a separate WAL archive from where I
> can just recover those using a different restore_command, and once that
> missing gap of WAL files is recovered from this archive, stopping the
> server
> and switching back to `restore_command = 'cp /var/lib/pgsql/wal_archive/%f
> %p'` and recovering the rest of the WAL files from the pg_receivewal
> archive works fine. The DB recovers and accepts connections shortly after.
>
> The whole point of this is to have a several months old, stand alone
> backup, I need to have all the WAL files available in that original WAL
> archive created with pg_receivewal.

The use-case generally makes sense, in pgbackrest we have --archive-copy
for more-or-less the same kind of thing, though you can now just use a
separate newly created repo that you back up the standalone backup to
(when multiple repos are configured, archive-push will archive to all of
them) and then archive that copy.  The latter is what we'd generally
recommend these days, but with archive-copy and a full backup, you can
just grab the full backup directory from inside the repo and it'll have
everything.

Thanks,

Stephen

Re: pg_basebackup and pg_receivewal timing, missing WAL files

От
Stephen Frost
Дата:
Greetings,

* Tim (timfosho@gmail.com) wrote:
> We currently use pgbackrest as our primary backup tool actually, and it
> works great! The issue I ran into with trying to set it up that way with
> pgbackrest, is when configuring the second repo, pgbackrest will have to
> archive all WAL files to it, going back to the oldest backup. And we have

Eh?  Why is that?  I think there's some confusion here regarding how
pgbackrest multi-repo works.  We won't archive back to the 'oldest
backup' when you add a second repo, we'll just archive *new* WAL to it
after it's been configured.

> this (admittedly arbitrary) legal policy requiring a single instance of a 6
> month old backup for auditing purposes. So pgbackrest has to keep WALs that
> far back, the main repo only needs 4 weeks of WALs for PITR, its a bit
> overkill archiving WALs twice and also storing them an extra 5 months. So I
> opted to write a custom backup script.

You just need to handle the modification of the pgbackrest config to
add/remove the second repo (and run the backup to the second repo when
it's enabled) when you want a new backup of it.

> Though I was not aware of the `--archive-copy` option, I wonder if I can
> just turn that option on for the main repo, and then extract the particular
> backup I need to another location. We use Azure Blobs for backups.

Probably this would also work, yes.

>  Unfortunately `-Xs` cannot be used with pg_basebackup in TAR format, I'm
> using the AzCopy utility, by piping pg_basebackup into it, to stream the
> basebackup directly into Azure Blob storage, without storing it anywhere
> first (Which pgbackrest does natively as well!). So there is no other way
> without using the TAR format option.

... yikes, that sounds horribly rickety.

> Nevertheless, it seems like I need the contents of the pg_wal directory at
> the start of pg_basebackup, since pg_receivewal only streams WAL generated
> after it starts, the restored DB is missing all the files contained
> within, I've adjusted my script to copy them as well and testing currently.

I have to say that I strongly recommend NOT trying to go down this
direction, as it seems terribly risky, not to mention that the WAL files
in pg_wal are still being written to...

Thanks,

Stephen

Вложения

Re: pg_basebackup and pg_receivewal timing, missing WAL files

От
David Steele
Дата:
On 6/9/22 5:24 PM, Stephen Frost wrote:
> 
> * Tim (timfosho@gmail.com) wrote:
>> We currently use pgbackrest as our primary backup tool actually, and it
>> works great! The issue I ran into with trying to set it up that way with
>> pgbackrest, is when configuring the second repo, pgbackrest will have to
>> archive all WAL files to it, going back to the oldest backup. And we have
> 
> Eh?  Why is that?  I think there's some confusion here regarding how
> pgbackrest multi-repo works.  We won't archive back to the 'oldest
> backup' when you add a second repo, we'll just archive *new* WAL to it
> after it's been configured.

By default pgbackrest will keep all WAL since the start of the oldest 
backup in a repo. Each repo can have its own retention settings so this 
might be a *different* oldest WAL per repo.

However, you can use repo-retention-archive and 
repo-retention-archive-type to change this behavior, see 
https://pgbackrest.org/configuration.html#section-repository/option-repo-retention-archive.

Basically, if you need to keep a number of full backups for compliance 
you can set repo-retention-archive-type=full and 
repo-retention-archive-type=1 in that repo and WAL in between the 
backups will only be kept after the last backup. WAL required to make 
backups consistent is always kept until the backup expires.

We've had requests for repo-retention-archive-type=0, i.e. only archive 
WAL during a backup, but we have not had time to implement that. Some 
users have implemented a work-around based on checking the backup lock. 
There is an issue that describes the method but I was not able to find 
it after a brief search. This is safer than it sounds because pgbackrest 
will check that all WAL required to make the backup consistent reached 
the archive. So, if the wrapper is broken you'll get errors.

Regards,
-- 
-David
david@pgmasters.net



Re: pg_basebackup and pg_receivewal timing, missing WAL files

От
David Steele
Дата:
Ugh, typos.

On 6/9/22 6:34 PM, David Steele wrote:
> On 6/9/22 5:24 PM, Stephen Frost wrote:
> 
> Basically, if you need to keep a number of full backups for compliance 
> you can set repo-retention-archive-type=full and 
> repo-retention-archive-type=1 in that repo and WAL in between the 

Here I meant repo-retention-archive-type=full and repo-retention-archive=1.

> 
> We've had requests for repo-retention-archive-type=0, i.e. only archive 

And here repo-retention-archive=0.

Regards,
-- 
-David
david@pgmasters.net



Re: pg_basebackup and pg_receivewal timing, missing WAL files

От
Tim
Дата:
However, you can use repo-retention-archive and
repo-retention-archive-type to change this behavior, see
https://pgbackrest.org/configuration.html#section-repository/option-repo-retention-archive.
Basically, if you need to keep a number of full backups for compliance
you can set repo-retention-archive-type=full and
repo-retention-archive-type=1 in that repo and WAL in between the
backups will only be kept after the last backup. WAL required to make
backups consistent is always kept until the backup expires.

So as I understand it, it would keep WALs from the most current backup up to the most current WALs. 
In my case, the only purpose of this 2nd repo is to keep monthly backups that are 2-6 months old (and occasionally test restores) for legal purposes. 
The  most recent backup in this repo would still be 2 months old, so it would be keeping an unnecessary 2 months of WALs. 

We've had requests for repo-retention-archive-type=0, i.e. only archive
WAL during a backup, but we have not had time to implement that. Some
users have implemented a work-around based on checking the backup lock.
There is an issue that describes the method but I was not able to find
it after a brief search. This is safer than it sounds because pgbackrest
will check that all WAL required to make the backup consistent reached
the archive. So, if the wrapper is broken you'll get errors.

This seems a bit tricky to get right (and will take some work to convince my team that its legit), also doesn't look immediately possible to only omit WAL archiving on one repo out of multiple. 

Definitely a +1 here for implementing that feature (repo-retention-archive=0). pgbackrest's ability to stream into Azure storage & simultaneously compress 
using parallel CPUs is absolutely killer. It's also limiting us from using it in our non production databases, as  
we run those in Docker containers, and don't want any WAL archiving. Being able to quickly configure and store 1 off backups
 for an often changing environment would be great. 

Stephen Frost wrote:
You just need to handle the modification of the pgbackrest config to
add/remove the second repo (and run the backup to the second repo when
it's enabled) when you want a new backup of it.

I will have to try this method of dynamically tweaking the configuration, did not consider that. 


Thank you for all your help everyone!

On Thu, Jun 9, 2022 at 6:37 PM David Steele <david@pgmasters.net> wrote:
Ugh, typos.

On 6/9/22 6:34 PM, David Steele wrote:
> On 6/9/22 5:24 PM, Stephen Frost wrote:
>
> Basically, if you need to keep a number of full backups for compliance
> you can set repo-retention-archive-type=full and
> repo-retention-archive-type=1 in that repo and WAL in between the

Here I meant repo-retention-archive-type=full and repo-retention-archive=1.

>
> We've had requests for repo-retention-archive-type=0, i.e. only archive

And here repo-retention-archive=0.

Regards,
--
-David
david@pgmasters.net

Re: pg_basebackup and pg_receivewal timing, missing WAL files

От
Stephen Frost
Дата:
Greetings,

* Tim (timfosho@gmail.com) wrote:
> > However, you can use repo-retention-archive and
> > repo-retention-archive-type to change this behavior, see
> >
> > https://pgbackrest.org/configuration.html#section-repository/option-repo-retention-archive
> > .
> > Basically, if you need to keep a number of full backups for compliance
> > you can set repo-retention-archive-type=full and
> > repo-retention-archive-type=1 in that repo and WAL in between the
> > backups will only be kept after the last backup. WAL required to make
> > backups consistent is always kept until the backup expires.
>
> So as I understand it, it would keep WALs from the most current backup up
> to the most current WALs.
> In my case, the only purpose of this 2nd repo is to keep monthly backups
> that are 2-6 months old (and occasionally test restores)
> for legal purposes.
> The  most recent backup in this repo would still be 2 months old, so it
> would be keeping an unnecessary 2 months of WALs.

You don't need to have the second repo be enabled all the time, just
enable it when you do the next full backup into the new repo, and
disable it afterwards, so you don't continue to push WAL to it.

> We've had requests for repo-retention-archive-type=0, i.e. only archive
> > WAL during a backup, but we have not had time to implement that. Some
> > users have implemented a work-around based on checking the backup lock.
> > There is an issue that describes the method but I was not able to find
> > it after a brief search. This is safer than it sounds because pgbackrest
> > will check that all WAL required to make the backup consistent reached
> > the archive. So, if the wrapper is broken you'll get errors.
>
> I'm guessing you're referring to: optional wal archiving · Issue #900 ·
> pgbackrest/pgbackrest (github.com)
> <https://github.com/pgbackrest/pgbackrest/issues/900>,

This was from before we had multi-repo support and I'd think simply
enabling/disabling that second repo would be the simpler approach.

I see that it sounds like you're going to take a look at trying that
approach- would be great to get your feedback on how hard it was and how
well it did or didn't work out.

Thanks,

Stephen

Вложения