Обсуждение: [MASSMAIL]Add notes to pg_combinebackup docs
Hello, While doing some work/research on the new incremental backup feature some limitations were not listed in the docs. Mainly the fact that pg_combienbackup works with plain format and not tar. Around the same time, Tomas Vondra tested incremental backups with a cluster where he enabled checksums after taking the previous full backup. After combining the backups the synthetic backup had pages with checksums and other pages without checksums which ended in checksum errors. I've attached two patches, the first one is just neat-picking things I found when I first read the docs. The second has a note on the two limitations listed above. The limitation on incremental backups of a cluster that had checksums enabled after the previous backup, I was not sure if that should go in pg_basebackup or pg_combienbackup reference documentation. Or maybe somewhere else. Kind regards, Martín -- Martín Marqués It’s not that I have something to hide, it’s that I have nothing I want you to see
Вложения
On 4/9/24 09:59, Martín Marqués wrote: > Hello, > > While doing some work/research on the new incremental backup feature > some limitations were not listed in the docs. Mainly the fact that > pg_combienbackup works with plain format and not tar. > Right. The docs mostly imply this by talking about output directory and backup directories, but making it more explicit would not hurt. FWIW it'd be great if we could make incremental backups work with tar format in the future too. People probably don't want to keep around the expanded data directory or extract everything before combining the backups is not very convenient. Reading and writing the tar would make this simpler. > Around the same time, Tomas Vondra tested incremental backups with a > cluster where he enabled checksums after taking the previous full > backup. After combining the backups the synthetic backup had pages > with checksums and other pages without checksums which ended in > checksum errors. > I'm not sure just documenting this limitation is sufficient. We can't make the incremental backups work in this case (it's as if someone messes with cluster without writing stuff into WAL), but I think we should do better than silently producing (seemingly) corrupted backups. I say seemingly, because the backup is actually fine, the only problem is it has checksums enabled in the controlfile, but the pages from the full backup (and the early incremental backups) have no checksums. What we could do is detect this in pg_combinebackup, and either just disable checksums with a warning and hint to maybe enable them again. Or maybe just print that the user needs to disable them. I was thinking maybe we could detect this while taking the backups, and force taking a full backup if checksums got enabled since the last backup. But we can't do that because we only have the manifest from the last backup, and the manifest does not include info about checksums. It's a bit unfortunate we don't have a way to enable checksums online. That'd not have this problem IIRC, because it writes proper WAL. Maybe it's time to revive that idea ... I recall there were some concerns about tracking progress to allow resuming stuff, but maybe not having anything because in some (rare?) cases it'd need to do more work does not seem like a great trade off. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 4/9/24 19:44, Tomas Vondra wrote: > > On 4/9/24 09:59, Martín Marqués wrote: >> Hello, >> >> While doing some work/research on the new incremental backup feature >> some limitations were not listed in the docs. Mainly the fact that >> pg_combienbackup works with plain format and not tar. >> > > Right. The docs mostly imply this by talking about output directory and > backup directories, but making it more explicit would not hurt. > > FWIW it'd be great if we could make incremental backups work with tar > format in the future too. People probably don't want to keep around the > expanded data directory or extract everything before combining the > backups is not very convenient. Reading and writing the tar would make > this simpler. I have a hard time seeing this feature as being very useful, especially for large databases, until pg_combinebackup works on tar (and compressed tar). Right now restoring an incremental requires at least twice the space of the original cluster, which is going to take a lot of users by surprise. I know you have made some improvements here for COW filesystems, but my experience is that Postgres is generally not run on such filesystems, though that is changing a bit. >> Around the same time, Tomas Vondra tested incremental backups with a >> cluster where he enabled checksums after taking the previous full >> backup. After combining the backups the synthetic backup had pages >> with checksums and other pages without checksums which ended in >> checksum errors. > > I'm not sure just documenting this limitation is sufficient. We can't > make the incremental backups work in this case (it's as if someone > messes with cluster without writing stuff into WAL), but I think we > should do better than silently producing (seemingly) corrupted backups. > > I say seemingly, because the backup is actually fine, the only problem > is it has checksums enabled in the controlfile, but the pages from the > full backup (and the early incremental backups) have no checksums. > > What we could do is detect this in pg_combinebackup, and either just > disable checksums with a warning and hint to maybe enable them again. Or > maybe just print that the user needs to disable them. > > I was thinking maybe we could detect this while taking the backups, and > force taking a full backup if checksums got enabled since the last > backup. But we can't do that because we only have the manifest from the > last backup, and the manifest does not include info about checksums. I'd say making a new full backup is the right thing to do in this case. It should be easy enough to store the checksum state of the cluster in the manifest. Regards, -David
On 4/11/24 02:01, David Steele wrote: > On 4/9/24 19:44, Tomas Vondra wrote: >> >> On 4/9/24 09:59, Martín Marqués wrote: >>> Hello, >>> >>> While doing some work/research on the new incremental backup feature >>> some limitations were not listed in the docs. Mainly the fact that >>> pg_combienbackup works with plain format and not tar. >>> >> >> Right. The docs mostly imply this by talking about output directory and >> backup directories, but making it more explicit would not hurt. >> >> FWIW it'd be great if we could make incremental backups work with tar >> format in the future too. People probably don't want to keep around the >> expanded data directory or extract everything before combining the >> backups is not very convenient. Reading and writing the tar would make >> this simpler. > > I have a hard time seeing this feature as being very useful, especially > for large databases, until pg_combinebackup works on tar (and compressed > tar). Right now restoring an incremental requires at least twice the > space of the original cluster, which is going to take a lot of users by > surprise. > I do agree it'd be nice if pg_combinebackup worked with .tar directly, without having to extract the directories first. No argument there, but as I said in the other thread, I believe that's something we can add later. That's simply how incremental development works. I can certainly imagine other ways to do pg_combinebackup, e.g. by "merging" the increments into the data directory, instead of creating a copy. But again, I don't think that has to be in v1. > I know you have made some improvements here for COW filesystems, but my > experience is that Postgres is generally not run on such filesystems, > though that is changing a bit. > I'd say XFS is a pretty common choice, for example. And it's one of the filesystems that work great with pg_combinebackup. However, who says this has to be the filesystem the Postgres instance runs on? Who in their right mind put backups on the same volume as the instance anyway? At which point it can be a different filesystem, even if it's not ideal for running the database. FWIW I think it's fine to tell users that to minimize the disk space requirements, they should use a CoW filesystem and --copy-file-range. The docs don't say that currently, that's true. All of this also depends on how people do the restore. With the CoW stuff they can do a quick (and small) copy on the backup server, and then copy the result to the actual instance. Or they can do restore on the target directly (e.g. by mounting a r/o volume with backups), in which case the CoW won't really help. But yeah, having to keep the backups as expanded directories is not great, I'd love to have .tar. Not necessarily because of the disk space (in my experience the compression in filesystems works quite well for this purpose), but mostly because it's more compact and allows working with backups as a single piece of data (e.g. it's much cleared what the checksum of a single .tar is, compared to a directory). >>> Around the same time, Tomas Vondra tested incremental backups with a >>> cluster where he enabled checksums after taking the previous full >>> backup. After combining the backups the synthetic backup had pages >>> with checksums and other pages without checksums which ended in >>> checksum errors. >> >> I'm not sure just documenting this limitation is sufficient. We can't >> make the incremental backups work in this case (it's as if someone >> messes with cluster without writing stuff into WAL), but I think we >> should do better than silently producing (seemingly) corrupted backups. >> >> I say seemingly, because the backup is actually fine, the only problem >> is it has checksums enabled in the controlfile, but the pages from the >> full backup (and the early incremental backups) have no checksums. >> >> What we could do is detect this in pg_combinebackup, and either just >> disable checksums with a warning and hint to maybe enable them again. Or >> maybe just print that the user needs to disable them. >> >> I was thinking maybe we could detect this while taking the backups, and >> force taking a full backup if checksums got enabled since the last >> backup. But we can't do that because we only have the manifest from the >> last backup, and the manifest does not include info about checksums. > > I'd say making a new full backup is the right thing to do in this case. > It should be easy enough to store the checksum state of the cluster in > the manifest. > Agreed. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 4/11/24 20:51, Tomas Vondra wrote: > On 4/11/24 02:01, David Steele wrote: >> >> I have a hard time seeing this feature as being very useful, especially >> for large databases, until pg_combinebackup works on tar (and compressed >> tar). Right now restoring an incremental requires at least twice the >> space of the original cluster, which is going to take a lot of users by >> surprise. > > I do agree it'd be nice if pg_combinebackup worked with .tar directly, > without having to extract the directories first. No argument there, but > as I said in the other thread, I believe that's something we can add > later. That's simply how incremental development works. OK, sure, but if the plan is to make it practical later doesn't that make the feature something to be avoided now? >> I know you have made some improvements here for COW filesystems, but my >> experience is that Postgres is generally not run on such filesystems, >> though that is changing a bit. > > I'd say XFS is a pretty common choice, for example. And it's one of the > filesystems that work great with pg_combinebackup. XFS has certainly advanced more than I was aware. > However, who says this has to be the filesystem the Postgres instance > runs on? Who in their right mind put backups on the same volume as the > instance anyway? At which point it can be a different filesystem, even > if it's not ideal for running the database. My experience is these days backups are generally placed in object stores. Sure, people are still using NFS but admins rarely have much control over those volumes. They may or not be COW filesystems. > FWIW I think it's fine to tell users that to minimize the disk space > requirements, they should use a CoW filesystem and --copy-file-range. > The docs don't say that currently, that's true. That would probably be a good addition to the docs. > All of this also depends on how people do the restore. With the CoW > stuff they can do a quick (and small) copy on the backup server, and > then copy the result to the actual instance. Or they can do restore on > the target directly (e.g. by mounting a r/o volume with backups), in > which case the CoW won't really help. And again, this all requires a significant amount of setup and tooling. Obviously I believe good backup requires effort but doing this right gets very complicated due to the limitations of the tool. > But yeah, having to keep the backups as expanded directories is not > great, I'd love to have .tar. Not necessarily because of the disk space > (in my experience the compression in filesystems works quite well for > this purpose), but mostly because it's more compact and allows working > with backups as a single piece of data (e.g. it's much cleared what the > checksum of a single .tar is, compared to a directory). But again, object stores are commonly used for backup these days and billing is based on data stored rather than any compression that can be done on the data. Of course, you'd want to store the compressed tars in the object store, but that does mean storing an expanded copy somewhere to do pg_combinebackup. But if the argument is that all this can/will be fixed in the future, I guess the smart thing for users to do is wait a few releases for incremental backups to become a practical feature. Regards, -David
On Fri, Apr 12, 2024 at 12:14 AM David Steele <david@pgmasters.net> wrote:
On 4/11/24 20:51, Tomas Vondra wrote:
> On 4/11/24 02:01, David Steele wrote:
>>
>> I have a hard time seeing this feature as being very useful, especially
>> for large databases, until pg_combinebackup works on tar (and compressed
>> tar). Right now restoring an incremental requires at least twice the
>> space of the original cluster, which is going to take a lot of users by
>> surprise.
>
> I do agree it'd be nice if pg_combinebackup worked with .tar directly,
> without having to extract the directories first. No argument there, but
> as I said in the other thread, I believe that's something we can add
> later. That's simply how incremental development works.
OK, sure, but if the plan is to make it practical later doesn't that
make the feature something to be avoided now?
That could be said for any feature. When we shipped streaming replication, the plan was to support synchronous in the future. Should we not have shipped it, or told people to avoid it?
Sure, the current state limits it's uses in some cases. But it still leaves a bunch of other cases where it works just fine.
>> I know you have made some improvements here for COW filesystems, but my
>> experience is that Postgres is generally not run on such filesystems,
>> though that is changing a bit.
>
> I'd say XFS is a pretty common choice, for example. And it's one of the
> filesystems that work great with pg_combinebackup.
XFS has certainly advanced more than I was aware.
And it happens to be the default on at least one of our most common platforms.
> However, who says this has to be the filesystem the Postgres instance
> runs on? Who in their right mind put backups on the same volume as the
> instance anyway? At which point it can be a different filesystem, even
> if it's not ideal for running the database.
My experience is these days backups are generally placed in object
stores. Sure, people are still using NFS but admins rarely have much
control over those volumes. They may or not be COW filesystems.
If it's mounted through NFS I assume pg_combinebackup won't actually be able to use the COW features? Or does that actually work through NFS?
Mounted LUNs on a SAN I find more common today though, and there it would do a fine job.
> FWIW I think it's fine to tell users that to minimize the disk space
> requirements, they should use a CoW filesystem and --copy-file-range.
> The docs don't say that currently, that's true.
That would probably be a good addition to the docs.
+1, that would be a good improvement.
> All of this also depends on how people do the restore. With the CoW
> stuff they can do a quick (and small) copy on the backup server, and
> then copy the result to the actual instance. Or they can do restore on
> the target directly (e.g. by mounting a r/o volume with backups), in
> which case the CoW won't really help.
And again, this all requires a significant amount of setup and tooling.
Obviously I believe good backup requires effort but doing this right
gets very complicated due to the limitations of the tool.
It clearly needs to be documented that there are space needs. But temporarily getting space for something like that is not very complicated in most environments. But you do have to be aware of it.
Generally speaking it's already the case that the "restore experience" with pg_basebackup is far from great. We don't have a "pg_baserestore". You still have to deal with archive_command and restore_command, which we all know can be easy to get wrong. I don't see how this is fundamentally worse than that.
Personally, I tend to recommend that "if you want PITR and thus need to mess with archive_command etc, you should use a backup tool like pg_backrest. If you're fine with just daily backups or whatnot, use pg_basebackup". The incremental backup story fits somewhere in between, but I'd still say this is (today) primarily a tool directed at those that don't need full PITR.
> But yeah, having to keep the backups as expanded directories is not
> great, I'd love to have .tar. Not necessarily because of the disk space
> (in my experience the compression in filesystems works quite well for
> this purpose), but mostly because it's more compact and allows working
> with backups as a single piece of data (e.g. it's much cleared what the
> checksum of a single .tar is, compared to a directory).
But again, object stores are commonly used for backup these days and
billing is based on data stored rather than any compression that can be
done on the data. Of course, you'd want to store the compressed tars in
the object store, but that does mean storing an expanded copy somewhere
to do pg_combinebackup.
Object stores are definitely getting more common. I wish they were getting a lot more common than they actually are, because they simplify a lot. But they're in my experience still very far from being a majority.
But if the argument is that all this can/will be fixed in the future, I
guess the smart thing for users to do is wait a few releases for
incremental backups to become a practical feature.
There's always going to be another set of goalposts further ahead. I think it can still be practical for quite a few people.
I'm more worried about the issue you raised in the other thread about missing files, for example...
On Tue, Apr 9, 2024 at 11:46 AM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
On 4/9/24 09:59, Martín Marqués wrote:
> Hello,
>
> While doing some work/research on the new incremental backup feature
> some limitations were not listed in the docs. Mainly the fact that
> pg_combienbackup works with plain format and not tar.
>
Right. The docs mostly imply this by talking about output directory and
backup directories, but making it more explicit would not hurt.
FWIW it'd be great if we could make incremental backups work with tar
format in the future too. People probably don't want to keep around the
expanded data directory or extract everything before combining the
backups is not very convenient. Reading and writing the tar would make
this simpler.
> Around the same time, Tomas Vondra tested incremental backups with a
> cluster where he enabled checksums after taking the previous full
> backup. After combining the backups the synthetic backup had pages
> with checksums and other pages without checksums which ended in
> checksum errors.
>
I'm not sure just documenting this limitation is sufficient. We can't
make the incremental backups work in this case (it's as if someone
messes with cluster without writing stuff into WAL), but I think we
should do better than silently producing (seemingly) corrupted backups
+1. I think that should be an open item that needs to get sorted.
I say seemingly, because the backup is actually fine, the only problem
is it has checksums enabled in the controlfile, but the pages from the
full backup (and the early incremental backups) have no checksums.
What we could do is detect this in pg_combinebackup, and either just
disable checksums with a warning and hint to maybe enable them again. Or
maybe just print that the user needs to disable them.
I don't think either of these should be done automatically. Something like pg_combinebackup simply failing and requiring you to say "--disable-checksums" to have it do that would be the way to go, IMO. (once we can reliably detect it of course)
I was thinking maybe we could detect this while taking the backups, and
force taking a full backup if checksums got enabled since the last
backup. But we can't do that because we only have the manifest from the
last backup, and the manifest does not include info about checksums.
Can we forcibly read and parse it out of pg_control?
It's a bit unfortunate we don't have a way to enable checksums online.
That'd not have this problem IIRC, because it writes proper WAL. Maybe
it's time to revive that idea ... I recall there were some concerns
about tracking progress to allow resuming stuff, but maybe not having
anything because in some (rare?) cases it'd need to do more work does
not seem like a great trade off.
On 4/12/24 19:09, Magnus Hagander wrote: > On Fri, Apr 12, 2024 at 12:14 AM David Steele <david@pgmasters.net > > OK, sure, but if the plan is to make it practical later doesn't that > make the feature something to be avoided now? > > > That could be said for any feature. When we shipped streaming > replication, the plan was to support synchronous in the future. Should > we not have shipped it, or told people to avoid it? This doesn't seem like a great example. Synchronous rep is by far the more used mode in my experience. I actively dissuade people from using sync rep because of the downsides. More people think they need it than actually need it. > > However, who says this has to be the filesystem the Postgres instance > > runs on? Who in their right mind put backups on the same volume > as the > > instance anyway? At which point it can be a different filesystem, > even > > if it's not ideal for running the database. > > My experience is these days backups are generally placed in object > stores. Sure, people are still using NFS but admins rarely have much > control over those volumes. They may or not be COW filesystems. > > > If it's mounted through NFS I assume pg_combinebackup won't actually be > able to use the COW features? Or does that actually work through NFS? Pretty sure it won't work via NFS, but I was wrong about XFS, so... > Mounted LUNs on a SAN I find more common today though, and there it > would do a fine job. Huh, interesting. This is a case I almost never see anymore. > > All of this also depends on how people do the restore. With the CoW > > stuff they can do a quick (and small) copy on the backup server, and > > then copy the result to the actual instance. Or they can do > restore on > > the target directly (e.g. by mounting a r/o volume with backups), in > > which case the CoW won't really help. > > And again, this all requires a significant amount of setup and tooling. > Obviously I believe good backup requires effort but doing this right > gets very complicated due to the limitations of the tool. > > It clearly needs to be documented that there are space needs. But > temporarily getting space for something like that is not very > complicated in most environments. But you do have to be aware of it. We find many environments ridiculously tight on space. There is a constant fight with customers/users to even get the data/WAL volumes sized correctly. For small databases it is probably not an issue, but this feature really shines with very large databases. > Generally speaking it's already the case that the "restore experience" > with pg_basebackup is far from great. We don't have a "pg_baserestore". > You still have to deal with archive_command and restore_command, which > we all know can be easy to get wrong. I don't see how this is > fundamentally worse than that. I pretty much agree with this statement. pg_basebackup is already hard to use effectively. Now it is just optionally harder. > Personally, I tend to recommend that "if you want PITR and thus need to > mess with archive_command etc, you should use a backup tool like > pg_backrest. If you're fine with just daily backups or whatnot, use > pg_basebackup". The incremental backup story fits somewhere in between, > but I'd still say this is (today) primarily a tool directed at those > that don't need full PITR. Yeah, there are certainly cases where PITR is not required, but they still seem to be in the minority. PITR cannot be disabled for the most recent backup in pgBackRest and we've had few complaints about that overall. > > But yeah, having to keep the backups as expanded directories is not > > great, I'd love to have .tar. Not necessarily because of the disk > space > > (in my experience the compression in filesystems works quite well for > > this purpose), but mostly because it's more compact and allows > working > > with backups as a single piece of data (e.g. it's much cleared > what the > > checksum of a single .tar is, compared to a directory). > > But again, object stores are commonly used for backup these days and > billing is based on data stored rather than any compression that can be > done on the data. Of course, you'd want to store the compressed tars in > the object store, but that does mean storing an expanded copy somewhere > to do pg_combinebackup. > > Object stores are definitely getting more common. I wish they were > getting a lot more common than they actually are, because they simplify > a lot. But they're in my experience still very far from being a majority. I see it the other way, especially the last few years. The majority seem to be object stores followed up closely by NFS. Directly mounted storage on the backup host appears to be rarer. > But if the argument is that all this can/will be fixed in the future, I > guess the smart thing for users to do is wait a few releases for > incremental backups to become a practical feature. > > There's always going to be another set of goalposts further ahead. I > think it can still be practical for quite a few people. Since barman uses pg_basebackup in certain cases I imagine that will end up being the way most users access this feature. > I'm more worried about the issue you raised in the other thread about > missing files, for example... Me, too. Regards, -David
On 4/12/24 11:50, David Steele wrote: > On 4/12/24 19:09, Magnus Hagander wrote: >> On Fri, Apr 12, 2024 at 12:14 AM David Steele <david@pgmasters.net >> >> ...>> >> > But yeah, having to keep the backups as expanded directories is >> not >> > great, I'd love to have .tar. Not necessarily because of the disk >> space >> > (in my experience the compression in filesystems works quite >> well for >> > this purpose), but mostly because it's more compact and allows >> working >> > with backups as a single piece of data (e.g. it's much cleared >> what the >> > checksum of a single .tar is, compared to a directory). >> >> But again, object stores are commonly used for backup these days and >> billing is based on data stored rather than any compression that >> can be >> done on the data. Of course, you'd want to store the compressed >> tars in >> the object store, but that does mean storing an expanded copy >> somewhere >> to do pg_combinebackup. >> >> Object stores are definitely getting more common. I wish they were >> getting a lot more common than they actually are, because they >> simplify a lot. But they're in my experience still very far from >> being a majority. > > I see it the other way, especially the last few years. The majority seem > to be object stores followed up closely by NFS. Directly mounted storage > on the backup host appears to be rarer. > One thing I'd mention is that not having built-in support for .tar and .tgz backups does not mean it's impossible to use pg_combinebackup with archives. You can mount them using e.g. "ratarmount" and then use that as source directories for pg_combinebackup. It's not entirely friction-less because AFAICS it's necessary to do the backup in plain format and then do the .tar to have the expected "flat" directory structure (and not manifest + 2x tar). But other than that it seems to work fine (based on my limited testing). FWIW the "archivemount" performs terribly, so adding this capability into pg_combinebackup is clearly far from trivial. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 4/12/24 11:12, Magnus Hagander wrote: > On Tue, Apr 9, 2024 at 11:46 AM Tomas Vondra <tomas.vondra@enterprisedb.com> > wrote: > >> >> >> On 4/9/24 09:59, Martín Marqués wrote: >>> Hello, >>> >>> While doing some work/research on the new incremental backup feature >>> some limitations were not listed in the docs. Mainly the fact that >>> pg_combienbackup works with plain format and not tar. >>> >> >> Right. The docs mostly imply this by talking about output directory and >> backup directories, but making it more explicit would not hurt. >> >> FWIW it'd be great if we could make incremental backups work with tar >> format in the future too. People probably don't want to keep around the >> expanded data directory or extract everything before combining the >> backups is not very convenient. Reading and writing the tar would make >> this simpler. >> >>> Around the same time, Tomas Vondra tested incremental backups with a >>> cluster where he enabled checksums after taking the previous full >>> backup. After combining the backups the synthetic backup had pages >>> with checksums and other pages without checksums which ended in >>> checksum errors. >>> >> >> I'm not sure just documenting this limitation is sufficient. We can't >> make the incremental backups work in this case (it's as if someone >> messes with cluster without writing stuff into WAL), but I think we >> should do better than silently producing (seemingly) corrupted backups > > > +1. I think that should be an open item that needs to get sorted. > > > I say seemingly, because the backup is actually fine, the only problem >> is it has checksums enabled in the controlfile, but the pages from the >> full backup (and the early incremental backups) have no checksums. >> >> What we could do is detect this in pg_combinebackup, and either just >> disable checksums with a warning and hint to maybe enable them again. Or >> maybe just print that the user needs to disable them. >> > > I don't think either of these should be done automatically. Something like > pg_combinebackup simply failing and requiring you to say > "--disable-checksums" to have it do that would be the way to go, IMO. > (once we can reliably detect it of course) > You mean pg_combinebackup would have "--disable-checksums" switch? Yeah, that'd work, I think. It's probably better than producing a backup that would seem broken when the user tries to start the instance. Also, I realized the user probably can't disable the checksums without starting the instance to finish recovery. And if there are invalid checksums, I'm not sure that would actually work. > > I was thinking maybe we could detect this while taking the backups, and >> force taking a full backup if checksums got enabled since the last >> backup. But we can't do that because we only have the manifest from the >> last backup, and the manifest does not include info about checksums. >> > > Can we forcibly read and parse it out of pg_control? > You mean when taking the backup, or during pg_combinebackup? During backup, it depends. For the instance we should be able to just get that from the instance, no need to get it from pg_control. But for the backup (used as a baseline for the increment) we can't read the pg_control - the only thing we have is the manifest. During pg_combinebackup we obviously can read pg_control for all the backups to combine, but at that point it feels a bit too late - it does not seem great to do backups, and then at recovery to tell the user the backups are actually not valid. I think it'd be better to detect this while taking the basebackup. > > It's a bit unfortunate we don't have a way to enable checksums online. >> That'd not have this problem IIRC, because it writes proper WAL. Maybe >> it's time to revive that idea ... I recall there were some concerns >> about tracking progress to allow resuming stuff, but maybe not having >> anything because in some (rare?) cases it'd need to do more work does >> not seem like a great trade off. >> >> > For that one I still think it would be perfectly acceptable to have no > resume at all, but that's a whole different topic :) > I very much agree. It seems a bit strange that given three options to enable checksums 1) offline 2) online without resume 3) online with resume the initial argument was that we need to allow resuming the process because on large systems it might take a lot of time, and we'd lose all the work if the system restarts. But then we concluded that it's too complex and it's better if the large systems have to do an extended outage to enable checksums ... regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Apr 12, 2024 at 3:01 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
On 4/12/24 11:12, Magnus Hagander wrote:
> On Tue, Apr 9, 2024 at 11:46 AM Tomas Vondra <tomas.vondra@enterprisedb.com>
> wrote:
>
>>
>>
>> On 4/9/24 09:59, Martín Marqués wrote:
>>> Hello,
>>>
>>> While doing some work/research on the new incremental backup feature
>>> some limitations were not listed in the docs. Mainly the fact that
>>> pg_combienbackup works with plain format and not tar.
>>>
>>
>> Right. The docs mostly imply this by talking about output directory and
>> backup directories, but making it more explicit would not hurt.
>>
>> FWIW it'd be great if we could make incremental backups work with tar
>> format in the future too. People probably don't want to keep around the
>> expanded data directory or extract everything before combining the
>> backups is not very convenient. Reading and writing the tar would make
>> this simpler.
>>
>>> Around the same time, Tomas Vondra tested incremental backups with a
>>> cluster where he enabled checksums after taking the previous full
>>> backup. After combining the backups the synthetic backup had pages
>>> with checksums and other pages without checksums which ended in
>>> checksum errors.
>>>
>>
>> I'm not sure just documenting this limitation is sufficient. We can't
>> make the incremental backups work in this case (it's as if someone
>> messes with cluster without writing stuff into WAL), but I think we
>> should do better than silently producing (seemingly) corrupted backups
>
>
> +1. I think that should be an open item that needs to get sorted.
>
>
> I say seemingly, because the backup is actually fine, the only problem
>> is it has checksums enabled in the controlfile, but the pages from the
>> full backup (and the early incremental backups) have no checksums.
>>
>> What we could do is detect this in pg_combinebackup, and either just
>> disable checksums with a warning and hint to maybe enable them again. Or
>> maybe just print that the user needs to disable them.
>>
>
> I don't think either of these should be done automatically. Something like
> pg_combinebackup simply failing and requiring you to say
> "--disable-checksums" to have it do that would be the way to go, IMO.
> (once we can reliably detect it of course)
>
You mean pg_combinebackup would have "--disable-checksums" switch? Yeah,
that'd work, I think. It's probably better than producing a backup that
would seem broken when the user tries to start the instance.
Also, I realized the user probably can't disable the checksums without
starting the instance to finish recovery. And if there are invalid
checksums, I'm not sure that would actually work.
>
> I was thinking maybe we could detect this while taking the backups, and
>> force taking a full backup if checksums got enabled since the last
>> backup. But we can't do that because we only have the manifest from the
>> last backup, and the manifest does not include info about checksums.
>>
>
> Can we forcibly read and parse it out of pg_control?
>
You mean when taking the backup, or during pg_combinebackup?
Yes. That way combining the backups into something that doesn't have proper checksums (either by actually turning them off or as today just breaking them and forcing you to turn it off yourself) can only happen intentionally. And if you weren't aware of the problem, it turns into a hard error, so you will notice before it's too late.
During backup, it depends. For the instance we should be able to just
get that from the instance, no need to get it from pg_control. But for
the backup (used as a baseline for the increment) we can't read the
pg_control - the only thing we have is the manifest.
During pg_combinebackup we obviously can read pg_control for all the
backups to combine, but at that point it feels a bit too late - it does
not seem great to do backups, and then at recovery to tell the user the
backups are actually not valid.
I think it'd be better to detect this while taking the basebackup.
Agreed. In the end, we might want to do *both*, but the earlier the better.
But to do that what we'd need is to add a flag to the initial manifest that says "this cluster is supposed to have checksum = <on/off>" and then refuse to take an inc if it changes? It doesn't seem like the end of the world to add that to it?
> It's a bit unfortunate we don't have a way to enable checksums online.
>> That'd not have this problem IIRC, because it writes proper WAL. Maybe
>> it's time to revive that idea ... I recall there were some concerns
>> about tracking progress to allow resuming stuff, but maybe not having
>> anything because in some (rare?) cases it'd need to do more work does
>> not seem like a great trade off.
>>
>>
> For that one I still think it would be perfectly acceptable to have no
> resume at all, but that's a whole different topic :)
>
I very much agree.
It seems a bit strange that given three options to enable checksums
1) offline
2) online without resume
3) online with resume
the initial argument was that we need to allow resuming the process
because on large systems it might take a lot of time, and we'd lose all
the work if the system restarts. But then we concluded that it's too
complex and it's better if the large systems have to do an extended
outage to enable checksums ...
Indeed. Or the workaround that still scares the crap out of me where you use a switchover-and-switchback to a replica to "do the offline thing almost online". To me that seems a lot scarier than the original option as well.
On 4/12/24 22:40, Tomas Vondra wrote: > On 4/12/24 11:50, David Steele wrote: >> On 4/12/24 19:09, Magnus Hagander wrote: >>> On Fri, Apr 12, 2024 at 12:14 AM David Steele <david@pgmasters.net >>> >>> ...>> >>> > But yeah, having to keep the backups as expanded directories is >>> not >>> > great, I'd love to have .tar. Not necessarily because of the disk >>> space >>> > (in my experience the compression in filesystems works quite >>> well for >>> > this purpose), but mostly because it's more compact and allows >>> working >>> > with backups as a single piece of data (e.g. it's much cleared >>> what the >>> > checksum of a single .tar is, compared to a directory). >>> >>> But again, object stores are commonly used for backup these days and >>> billing is based on data stored rather than any compression that >>> can be >>> done on the data. Of course, you'd want to store the compressed >>> tars in >>> the object store, but that does mean storing an expanded copy >>> somewhere >>> to do pg_combinebackup. >>> >>> Object stores are definitely getting more common. I wish they were >>> getting a lot more common than they actually are, because they >>> simplify a lot. But they're in my experience still very far from >>> being a majority. >> >> I see it the other way, especially the last few years. The majority seem >> to be object stores followed up closely by NFS. Directly mounted storage >> on the backup host appears to be rarer. >> > > One thing I'd mention is that not having built-in support for .tar and > .tgz backups does not mean it's impossible to use pg_combinebackup with > archives. You can mount them using e.g. "ratarmount" and then use that > as source directories for pg_combinebackup. > > It's not entirely friction-less because AFAICS it's necessary to do the > backup in plain format and then do the .tar to have the expected "flat" > directory structure (and not manifest + 2x tar). But other than that it > seems to work fine (based on my limited testing). Well, that's certainly convoluted and doesn't really help a lot in terms of space consumption, it just shifts the additional space required to the backup side. I doubt this is something we'd be willing to add to our documentation so it would be up to the user to figure out and script. > FWIW the "archivemount" performs terribly, so adding this capability > into pg_combinebackup is clearly far from trivial. I imagine this would perform pretty badly. And yes, doing it efficiently is not trivial but certainly doable. Scanning the tar file and matching to entries in the manifest is one way, but I would prefer to store the offsets into the tar file in the manifest then assemble an ordered list of work to do on each tar file. But of course the latter requires a manifest-centric approach, which is not what we have right now. Regards, -David
On Tue, Apr 9, 2024 at 4:00 AM Martín Marqués <martin.marques@gmail.com> wrote: > I've attached two patches, the first one is just neat-picking things I > found when I first read the docs. I pushed a commit to remove the spurious "the" that you found, but I don't really agree with the other changes. They all seem grammatically correct, but I think they're grammatically correct now, too, and I find it more readable the way it is. I'll write a separate email about the checksum issue. -- Robert Haas EDB: http://www.enterprisedb.com
On Tue, Apr 9, 2024 at 5:46 AM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > What we could do is detect this in pg_combinebackup, and either just > disable checksums with a warning and hint to maybe enable them again. Or > maybe just print that the user needs to disable them. > > I was thinking maybe we could detect this while taking the backups, and > force taking a full backup if checksums got enabled since the last > backup. But we can't do that because we only have the manifest from the > last backup, and the manifest does not include info about checksums. So, as I see it, we have at least five options here: 1. Document that this doesn't work. By "this doesn't work" I think what we mean is: (A) If any of the backups you'd need to combine were taken with checksums off but the last one was taken with checksums on, then you might get checksum failures on the cluster written by pg_combinebackup. (B) Therefore, after enabling checksums, you should take a new full backup and make future incrementals dependent thereupon. (C) But if you don't, then you can (I presume) run recovery with ignore_checksum_failure=true to bring the cluster to a consistent state, stop the database, shut checksums off, optionally also turn them on again, and restart the database. Then presumably everything should be OK, except that you'll have wiped out any real checksum failures that weren't artifacts of the reconstruction process along with the fake ones. 2. As (1), but make check_control_files() emit a warning message when the problem case is detected. 3. As (2), but also add a command-line option to pg_combinebackup to flip the checksum flag to false in the control file. Then, if you have the problem case, instead of following the procedure described above, you can just use this option, and enable checksums afterward if you want. It still has the same disadvantage as the procedure described above: any "real" checksum failures will be suppressed, too. From a design perspective, this feels like kind of an ugly wart to me: hey, in this one scenario, you have to add --do-something-random or it doesn't work! But I see it's got some votes already, so maybe it's the right answer. 4. Add the checksum state to the backup manifest. Then, if someone tries to take an incremental backup with checksums on and the precursor backup had checksums off, we could fail. A strength of this proposal is that it actually stops the problem from happening at backup time, which in general is a whole lot nicer than not noticing a problem until restore time. A possible weakness is that it stops you from doing something that is ... actually sort of OK. I mean, in the strict sense, the incremental backup isn't valid, because it's going to cause checksum failures after reconstruction, but it's valid apart from the checksums, and those are fixable. I wonder whether users who encounter this error message will say "oh, I'm glad PostgreSQL prevented me from doing that" or "oh, I'm annoyed that PostgreSQL prevented me from doing that." 5. At reconstruction time, notice which backups have checksums enabled. If the final backup in the chain has them enabled, then whenever we take a block from an earlier backup with checksums disabled, re-checksum the block. As opposed to any of the previous options, this creates a fully correct result, so there's no real need to document any restrictions on what you're allowed to do. We might need to document the performance consequences, though: fast file copy methods will have to be disabled, and we'll have to go block by block while paying the cost of checksum calculation for each one. You might be sad to find out that your reconstruction is a lot slower than you were expecting. While I'm probably willing to implement any of these, I have some reservations about attempting (4) or especially (5) after feature freeze. I think there's a pretty decent chance that those fixes will turn out to have issues of their own which we'll then need to fix in turn. We could perhaps consider doing (2) for now and (5) for a future release, or something like that. -- Robert Haas EDB: http://www.enterprisedb.com
> On 18 Apr 2024, at 20:11, Robert Haas <robertmhaas@gmail.com> wrote: > 2. As (1), but make check_control_files() emit a warning message when > the problem case is detected. Being in the post-freeze part of the cycle, this seems like the best option. > 3. As (2), but also add a command-line option to pg_combinebackup to > flip the checksum flag to false in the control file. I don't think this is the way to go, such an option will be plastered on to helpful tutorials which users will copy/paste from even when they don't need the option at all, only to disable checksums when there was no reason for doing so. > We could perhaps consider doing (2) for now and (5) for a future > release, or something like that. +1 -- Daniel Gustafsson
On Thu, Apr 18, 2024 at 3:25 PM Daniel Gustafsson <daniel@yesql.se> wrote: > I don't think this is the way to go, such an option will be plastered on to > helpful tutorials which users will copy/paste from even when they don't need > the option at all, only to disable checksums when there was no reason for doing > so. That's actually a really good point. I mean, it's such an obscure scenario, maybe it wouldn't make it into the tutorials, but if it does, it'll never get taken out. -- Robert Haas EDB: http://www.enterprisedb.com
On Thu, Apr 18, 2024 at 3:25 PM Daniel Gustafsson <daniel@yesql.se> wrote: > > On 18 Apr 2024, at 20:11, Robert Haas <robertmhaas@gmail.com> wrote: > > 2. As (1), but make check_control_files() emit a warning message when > > the problem case is detected. > > Being in the post-freeze part of the cycle, this seems like the best option. Here is a patch for that. -- Robert Haas EDB: http://www.enterprisedb.com
Вложения
On Mon, Apr 22, 2024 at 2:52 PM Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Apr 18, 2024 at 3:25 PM Daniel Gustafsson <daniel@yesql.se> wrote: > > > On 18 Apr 2024, at 20:11, Robert Haas <robertmhaas@gmail.com> wrote: > > > 2. As (1), but make check_control_files() emit a warning message when > > > the problem case is detected. > > > > Being in the post-freeze part of the cycle, this seems like the best option. > > Here is a patch for that. Is someone willing to review this patch? cfbot seems happy enough with it, and I'd like to get it committed since it's an open item, but I'd be happy to have some review before I do that. If nobody reviews or indicates an intent to do so in the next couple of days, I'll go ahead and commit this on my own. -- Robert Haas EDB: http://www.enterprisedb.com
> On 22 Apr 2024, at 20:52, Robert Haas <robertmhaas@gmail.com> wrote: > > On Thu, Apr 18, 2024 at 3:25 PM Daniel Gustafsson <daniel@yesql.se> wrote: >>> On 18 Apr 2024, at 20:11, Robert Haas <robertmhaas@gmail.com> wrote: >>> 2. As (1), but make check_control_files() emit a warning message when >>> the problem case is detected. >> >> Being in the post-freeze part of the cycle, this seems like the best option. > > Here is a patch for that. LGTM; only one small comment which you can ignore if you feel it's not worth the extra words. + <literal>pg_combinebackup</literal> when the checksum status of the + cluster has been changed; see I would have preferred that this sentence included the problematic period for the change, perhaps "..has been changed after the initial backup." or ideally something even better. In other words, clarifying that if checksums were enabled before any backups were taken this limitation is not in play. It's not critical as the link aptly documents this, it just seems like the sentence is cut short. -- Daniel Gustafsson
On Wed, Apr 24, 2024 at 3:08 PM Daniel Gustafsson <daniel@yesql.se> wrote: > LGTM; only one small comment which you can ignore if you feel it's not worth > the extra words. > > + <literal>pg_combinebackup</literal> when the checksum status of the > + cluster has been changed; see > > I would have preferred that this sentence included the problematic period for > the change, perhaps "..has been changed after the initial backup." or ideally > something even better. In other words, clarifying that if checksums were > enabled before any backups were taken this limitation is not in play. It's not > critical as the link aptly documents this, it just seems like the sentence is > cut short. This was somewhat deliberate. The phraseology that you propose doesn't exactly seem incorrect to me. However, consider the scenario where someone takes a full backup A, an incremental backup B based on A, and another incremental backup C based on B. Then, they combine A with B to produce X, remove A and B, and later combine X with C. When we talk about the "initial backup", are we talking about A or X? It doesn't quite matter, in the end, because if X has a problem it must be because A had a similar problem, and it's also sort of meaningless to talk about when X was taken, because it wasn't ever taken from the origin server; it was reconstructed. And it doesn't matter what was happening on the origin server at the time it was reconstructed, but rather what was happening on the origin server at the time its inputs were taken. Or, in the case of its first input, that could also be a reconstruction, in which case the time of that reconstruction doesn't matter either; there can be any number of levels here. I feel that all of this makes it a little murky to talk about what happened after the "initial backup". The actual original full backup need not even exist any more at the time of reconstruction, as in the example above. Now, I think the user will probably still get the point, but I also think they'll probably get the point without the extra verbiage. I think that it will be natural for people to imagine that what matters is not whether the checksum status has ever changed, but whether it has changed within the relevant time period, whatever that is exactly. If they have a more complicated situation where it's hard to reason about what the relevant time period is, my hope is that they'll click on the link and that the longer text they see on the other end will help them think the situation through. Again, this is not to say that what you're proposing is necessarily wrong; I'm just explaining my own thinking. -- Robert Haas EDB: http://www.enterprisedb.com
> On 25 Apr 2024, at 18:16, Robert Haas <robertmhaas@gmail.com> wrote: > > On Wed, Apr 24, 2024 at 3:08 PM Daniel Gustafsson <daniel@yesql.se> wrote: >> LGTM; only one small comment which you can ignore if you feel it's not worth >> the extra words. >> >> + <literal>pg_combinebackup</literal> when the checksum status of the >> + cluster has been changed; see >> >> I would have preferred that this sentence included the problematic period for >> the change, perhaps "..has been changed after the initial backup." or ideally >> something even better. In other words, clarifying that if checksums were >> enabled before any backups were taken this limitation is not in play. It's not >> critical as the link aptly documents this, it just seems like the sentence is >> cut short. > > This was somewhat deliberate. Given your reasoning below, +1 on the patch you proposed. > I think the user will probably still get the > point, but I also think they'll probably get the point without the > extra verbiage. I think that it will be natural for people to imagine > that what matters is not whether the checksum status has ever changed, > but whether it has changed within the relevant time period, whatever > that is exactly. Fair enough. > Again, this is not to say that what you're proposing is necessarily > wrong; I'm just explaining my own thinking. Gotcha, much appreciated. -- Daniel Gustafsson