Обсуждение: [MASSMAIL]pg_combinebackup fails on file named INCREMENTAL.*
Hackers, Since incremental backup is using INCREMENTAL as a keyword (rather than storing incremental info in the manifest) it is vulnerable to any file in PGDATA with the pattern INCREMENTAL.*. For example: $ pg_basebackup -c fast -D test/backup/full -F plain $ touch test/data/INCREMENTAL.CONFIG $ /home/dev/test/pg/bin/pg_basebackup -c fast -D test/backup/incr1 -F plain -i /home/dev/test/backup/full/backup_manifest $ /home/dev/test/pg/bin/pg_combinebackup test/backup/full test/backup/incr1 -o test/backup/combine pg_combinebackup: error: could not read file "test/backup/incr1/INCREMENTAL.CONFIG": read only 0 of 4 bytes pg_combinebackup: removing output directory "test/backup/combine" This works anywhere in PGDATA as far as I can see, e.g. $ touch test/data/base/1/INCREMENTAL.1 Or just by dropping a new file into the incremental backup: $ touch test/backup/incr1/INCREMENTAL.x $ /home/dev/test/pg/bin/pg_combinebackup test/backup/full test/backup/incr1 -o test/backup/combine pg_combinebackup: error: could not read file "test/backup/incr1/INCREMENTAL.x": read only 0 of 4 bytes pg_combinebackup: removing output directory "test/backup/combine" We could fix the issue by forbidding this file pattern in PGDATA, i.e. error when it is detected during pg_basebackup, but in my view it would be better (at least eventually) to add incremental info to the manifest. That would also allow us to skip storing zero-length files and incremental stubs (with no changes) as files. Regards, -David
On Sun, Apr 14, 2024 at 11:53 PM David Steele <david@pgmasters.net> wrote: > Since incremental backup is using INCREMENTAL as a keyword (rather than > storing incremental info in the manifest) it is vulnerable to any file > in PGDATA with the pattern INCREMENTAL.*. Yeah, that's true. I'm not greatly concerned about this, because I think anyone who creates a file called INCREMENTAL.CONFIG is just being perverse. However, we could ignore those files just in subtrees where they're not expected to occur i.e. only pay attention to them under base, global, and pg_tblspc. I didn't do this because I thought someone might eventually want to do something like incremental backup of SLRU files, and then it would be annoying if there were a bunch of random pathname restrictions. But if we're really concerned about this, I think it would be a reasonable fix; you're not really supposed to decide to store your configuration files in directories that exist for the purpose of storing relation data files. > We could fix the issue by forbidding this file pattern in PGDATA, i.e. > error when it is detected during pg_basebackup, but in my view it would > be better (at least eventually) to add incremental info to the manifest. > That would also allow us to skip storing zero-length files and > incremental stubs (with no changes) as files. I did think about this, and I do like the idea of being able to elide incremental stubs. If you have a tremendous number of relations and very few of them have changed at all, the incremental stubs might take up a significant percentage of the space consumed by the incremental backup, which kind of sucks, even if the incremental backup is still quite small compared to the original database. And, like you, I had the idea of trying to use the backup_manifest to do it. But ... I didn't really end up feeling very comfortable with it. Right now, the backup manifest is something we only use to verify the integrity of the backup. If we were to do this, it would become a critical part of the backup. I don't think I like the idea that removing the backup_manifest should be allowed to, in effect, corrupt the backup. But I think I dislike even more the idea that the data that is used to verify the backup gets mushed together with the backup data itself. Maybe in practice it's fine, but it doesn't seem very conceptually clean. There are also some practical considerations that made me not want to go in this direction: we'd need to make sure that the relevant information from the backup manifest was available to the reconstruction process at the right part of the code. When iterating over a directory, it would need to consider all of the files actually present in that directory plus any "hallucinated" incremental stubs from the manifest. I didn't feel confident of my ability to implement that in the available time without messing anything up. I think we should consider other possible designs here. For example, instead of including this in the manifest, we could ship one INCREMENTAL.STUBS file per directory that contains a list of all of the incremental stubs that should be created in that directory. My guess is that something like this will turn out to be way simpler to code. -- Robert Haas EDB: http://www.enterprisedb.com
On 4/16/24 06:33, Robert Haas wrote: > On Sun, Apr 14, 2024 at 11:53 PM David Steele <david@pgmasters.net> wrote: >> Since incremental backup is using INCREMENTAL as a keyword (rather than >> storing incremental info in the manifest) it is vulnerable to any file >> in PGDATA with the pattern INCREMENTAL.*. > > Yeah, that's true. I'm not greatly concerned about this, because I > think anyone who creates a file called INCREMENTAL.CONFIG is just > being perverse. Well, it's INCREMENTAL.* and you might be surprised (though I doubt it) at what users will do. We've been caught out by wacky file names (and database names) a few times. > However, we could ignore those files just in subtrees > where they're not expected to occur i.e. only pay attention to them > under base, global, and pg_tblspc. I didn't do this because I thought > someone might eventually want to do something like incremental backup > of SLRU files, and then it would be annoying if there were a bunch of > random pathname restrictions. But if we're really concerned about > this, I think it would be a reasonable fix; you're not really supposed > to decide to store your configuration files in directories that exist > for the purpose of storing relation data files. I think it would be reasonable to restrict what can be put in base/ and global/ but users generally feel free to create whatever they want in the root of PGDATA, despite being strongly encouraged not to. Anyway, I think it should be fixed or documented as a caveat since it causes a hard failure on restore. >> We could fix the issue by forbidding this file pattern in PGDATA, i.e. >> error when it is detected during pg_basebackup, but in my view it would >> be better (at least eventually) to add incremental info to the manifest. >> That would also allow us to skip storing zero-length files and >> incremental stubs (with no changes) as files. > > I did think about this, and I do like the idea of being able to elide > incremental stubs. If you have a tremendous number of relations and > very few of them have changed at all, the incremental stubs might take > up a significant percentage of the space consumed by the incremental > backup, which kind of sucks, even if the incremental backup is still > quite small compared to the original database. And, like you, I had > the idea of trying to use the backup_manifest to do it. > > But ... I didn't really end up feeling very comfortable with it. Right > now, the backup manifest is something we only use to verify the > integrity of the backup. If we were to do this, it would become a > critical part of the backup. For my 2c the manifest is absolutely a critical part of the backup. I'm having a hard time even seeing why we have the --no-manifest option for pg_combinebackup at all. If the manifest is missing who knows what else might be missing as well? If present, why wouldn't we use it? I know Tomas added some optimizations that work best with --no-manifest but if we can eventually read compressed tars (which I expect to be the general case) then those optimizations are not very useful. > I don't think I like the idea that > removing the backup_manifest should be allowed to, in effect, corrupt > the backup. But I think I dislike even more the idea that the data > that is used to verify the backup gets mushed together with the backup > data itself. Maybe in practice it's fine, but it doesn't seem very > conceptually clean. I don't think this is a problem. The manifest can do more than store verification info, IMO. > There are also some practical considerations that made me not want to > go in this direction: we'd need to make sure that the relevant > information from the backup manifest was available to the > reconstruction process at the right part of the code. When iterating > over a directory, it would need to consider all of the files actually > present in that directory plus any "hallucinated" incremental stubs > from the manifest. I didn't feel confident of my ability to implement > that in the available time without messing anything up. I think it would be better to iterate over the manifest than files in a directory. In any case we still need to fix [1], which presents more or less the same problem. > I think we should consider other possible designs here. For example, > instead of including this in the manifest, we could ship one > INCREMENTAL.STUBS file per directory that contains a list of all of > the incremental stubs that should be created in that directory. My > guess is that something like this will turn out to be way simpler to > code. So we'd store a mini manifest per directory rather than just put the info in the main manifest? Sounds like unnecessary complexity to me. Regards, -David [1] https://www.postgresql.org/message-id/flat/9badd24d-5bd9-4c35-ba85-4c38a2feb73e%40pgmasters.net
Hi, > I think it would be reasonable to restrict what can be put in base/ and > global/ but users generally feel free to create whatever they want in > the root of PGDATA, despite being strongly encouraged not to. > > Anyway, I think it should be fixed or documented as a caveat since it > causes a hard failure on restore. +1. IMHO, no matter how you'd further decide to reference the incremental stubs, the earlier we can mention the failure tothe user, the better. Tbh, I'm not very confortable seeing pg_combinebackup fail when the user would need to restore (whatever the reason). I mean,failing to take the backup is less of a problem (compared to failing to restore) because then the user might have thetime to investigate and fix the issue, not being in the hurry of a down production to restore... > > But ... I didn't really end up feeling very comfortable with it. Right > > now, the backup manifest is something we only use to verify the > > integrity of the backup. If we were to do this, it would become a > > critical part of the backup. Isn't it already the case? I mean, you need the manifest of the previous backup to take an incremental one, right? And shouldn't we encourage to verify the backup sets before (at least) trying to combine them? It's not because a file was only use for one specific purpose until now that we can't improve it later. Splitting the meaningful information across multiple places would be more error-prone (for both devs and users) imo. > > I don't think I like the idea that > > removing the backup_manifest should be allowed to, in effect, corrupt > > the backup. But I think I dislike even more the idea that the data > > that is used to verify the backup gets mushed together with the backup > > data itself. Maybe in practice it's fine, but it doesn't seem very > > conceptually clean. Removing pretty much any file in the backup set would probably corrupt it. I mean, why would people remove the backup manifestwhen they already can't remove backup_label? A doc stating "dont touch anything" is IMHO easier than "this part is critical, not that one". > I don't think this is a problem. The manifest can do more than store > verification info, IMO. +1. Adding more info to the backup manifest can be handy for other use-cases too (i.e. like avoiding to store empty files,or storing the checksums state of the cluster). Kind Regards, -- Stefan FERCOT Data Egret (https://dataegret.com)
On Tue, Apr 16, 2024 at 3:10 AM Stefan Fercot <stefan.fercot@protonmail.com> wrote: > > > But ... I didn't really end up feeling very comfortable with it. Right > > > now, the backup manifest is something we only use to verify the > > > integrity of the backup. If we were to do this, it would become a > > > critical part of the backup. > > Isn't it already the case? I mean, you need the manifest of the previous backup to take an incremental one, right? > And shouldn't we encourage to verify the backup sets before (at least) trying to combine them? > It's not because a file was only use for one specific purpose until now that we can't improve it later. > Splitting the meaningful information across multiple places would be more error-prone (for both devs and users) imo. Well, right now, if you just take a full backup, and you throw away the backup manifest because you don't care, you have a working full backup. Furthermore, if you took any incremental backups based on that full backup before discarding the manifest, you can still restore them. Now, it is possible that nobody in the world cares about those properties other than me; I have been known to (a) care about weird stuff and (b) be a pedant. However, we've already shipped a couple of releases where backup manifests were thoroughly non-critical: you needed them to run pg_verifybackup, and for nothing else. I think it's quite likely that there are users out there who are used to things working in that way, and I'm not sure that those users will adjust their expectations when a new feature comes out. I also feel that if I were a user, I would think of something called a "manifest" as just a table of contents for whatever the thing was. I still remember downloading tar files from the Internet in the 1990s and there'd be a file in the tarball sometimes called MANIFEST which was, you know, a list of what was in the tarball. You didn't need that file for anything functional; it was just so you could check if anything was missing. What I fear is that this will turn into another situation like we had with pg_xlog, where people saw "log" in the name and just blew it away. Matter of fact, I recently encountered one of my few recent examples of someone doing that thing since the pg_wal renaming happened. Some users don't take much convincing to remove anything that looks inessential. And what I'm particularly worried about with this feature is tar-format backups. If you have a directory format backup and you do an "ls", you're going to see a whole bunch of files in there of which backup_manifest will be one. How you treat that file is just going to depend on what you know about its purpose. But if you have a tar-format backup, possibly compressed, the backup_manifest file stands out a lot more. You may have something like this: backup_manifest root.tar.gz 16384.tar.gz Well, at this point, it becomes much more likely that you're going to think that there are special rules for the backup_manifest file. The kicker for me is that I can't see any reason to do any of this stuff. Including the information that we need to elide incremental stubs in some other way, say with one stub-list per directory, will be easier to implement and probably perform better. Like, I'm not saying we can't find a way to jam this into the manifest. But I'm fairly sure it's just making life difficult for ourselves. I may ultimately lose this argument, as I did the one about whether the backup_manifest should be JSON or some bespoke format. And that's fine. I respect your opinion, and David's. But I also reserve the right to feel differently, and I do. And I would also just gently point out that my level of motivation to work on a particular feature can depend quite a bit on whether I'm going to be forced to implement it in a way that I disagree with. -- Robert Haas EDB: http://www.enterprisedb.com
On Mon, Apr 15, 2024 at 10:12 PM David Steele <david@pgmasters.net> wrote: > Anyway, I think it should be fixed or documented as a caveat since it > causes a hard failure on restore. Alright, I'll look into this. > I know Tomas added some optimizations that work best with --no-manifest > but if we can eventually read compressed tars (which I expect to be the > general case) then those optimizations are not very useful. My belief is that those optimizations work fine with or without manifests; you only start to lose the benefit in cases where you use different checksum types for different backups that you then try to combine. Which should hopefully be a rare case. -- Robert Haas EDB: http://www.enterprisedb.com
On Tuesday, April 16th, 2024 at 3:22 PM, Robert Haas wrote: > What I fear is that this will turn into another situation like we had > with pg_xlog, where people saw "log" in the name and just blew it > away. Matter of fact, I recently encountered one of my few recent > examples of someone doing that thing since the pg_wal renaming > happened. Some users don't take much convincing to remove anything > that looks inessential. And what I'm particularly worried about with > this feature is tar-format backups. If you have a directory format > backup and you do an "ls", you're going to see a whole bunch of files > in there of which backup_manifest will be one. How you treat that file > is just going to depend on what you know about its purpose. But if you > have a tar-format backup, possibly compressed, the backup_manifest > file stands out a lot more. You may have something like this: > > backup_manifest root.tar.gz 16384.tar.gz Sure, I can see your point here and how people could be tempted to through away that backup_manifest if they don't know howimportant it is to keep it. Probably in this case we'd need the list to be inside the tar, just like backup_label and tablespace_map then. > The kicker for me is that I can't see any reason to do any of this > stuff. Including the information that we need to elide incremental > stubs in some other way, say with one stub-list per directory, will be > easier to implement and probably perform better. Like, I'm not saying > we can't find a way to jam this into the manifest. But I'm fairly sure > it's just making life difficult for ourselves. > > I may ultimately lose this argument, as I did the one about whether > the backup_manifest should be JSON or some bespoke format. And that's > fine. I respect your opinion, and David's. But I also reserve the > right to feel differently, and I do. Do you mean 1 stub-list per pgdata + 1 per tablespaces? Sure, it is important to respect and value each other feelings, I never said otherwise. I don't really see how it would be faster to recursively go through each sub-directories of the pgdata and tablespaces togather all the pieces together compared to reading 1 main file. But I guess, choosing one option or the other, we will only find out how well it works once people will use it on the fieldand possibly give some feedback. As you mentioned in [1], we're not going to start rewriting the implementation a week after feature freeze nor probably alreadystart building big new things now anyway. So maybe let's start with documenting the possible gotchas/corner cases to make our support life easier in the future. Kind Regards, -- Stefan FERCOT Data Egret (https://dataegret.com) [1] https://www.postgresql.org/message-id/CA%2BTgmoaVxr_o3mrDBrBcXm3gowr9Qc4ABW-c73NR_201KkDavw%40mail.gmail.com
On Tue, Apr 16, 2024 at 12:06 PM Stefan Fercot <stefan.fercot@protonmail.com> wrote: > Sure, I can see your point here and how people could be tempted to through away that backup_manifest if they don't knowhow important it is to keep it. > Probably in this case we'd need the list to be inside the tar, just like backup_label and tablespace_map then. Yeah, I think anywhere inside the tar is better than anywhere outside the tar, by a mile. I'm happy to leave the specific question of where inside the tar as something TBD at time of implementation by fiat of the person doing the work. But that said ... > Do you mean 1 stub-list per pgdata + 1 per tablespaces? > > I don't really see how it would be faster to recursively go through each sub-directories of the pgdata and tablespacesto gather all the pieces together compared to reading 1 main file. > But I guess, choosing one option or the other, we will only find out how well it works once people will use it on the fieldand possibly give some feedback. The reason why I was suggesting one stub-list per directory is that we recurse over the directory tree. We reach each directory in turn, process it, and then move on to the next one. What I imagine that we want to do is - first iterate over all of the files actually present in a directory. Then, iterate over the list of stubs for that directory and do whatever we would have done if there had been a stub file present for each of those. So, we don't really want a list of every stub in the whole backup, or even every stub in the whole tablespace. What we want is to be able to easily get a list of stubs for a single directory. Which is very easily done if each directory contains its own stub-list file. If we instead have a centralized stub-list for the whole tablespace, or the whole backup, it's still quite possible to make it work. We just read that centralized stub list and we build an in-memory data structure that is indexed by containing directory, like a hash table where the key is the directory name and the value is a list of filenames within that directory. But, a slight disadvantage of this model is that you have to keep that whole data structure in memory for the whole time you're reconstructing, and you have to pass around a pointer to it everywhere so that the code that handles individual directories can access it. I'm sure this isn't the end of the world. It's probably unlikely that someone has so many stub files that the memory used for such a data structure is painfully high, and even if they did, it's unlikely that they are spread out across multiple databases and/or tablespaces in such a way that only needing the data for one directory at a time would save you. But, it's not impossible that such a scenario could exist. Somebody might say - well, don't go directory by directory. Just handle all of the stubs at the end. But I don't think that really fixes anything. I want to be able to verify that none of the stubs listed in the stub-list are also present in the backup as real files, for sanity checking purposes. It's quite easy to see how to do that in the design I proposed above: keep a list of the files for each directory as you read it, and then when you read the stub-list for that directory, check those lists against each other for duplicates. Doing this on the level of a whole tablespace or the whole backup is clearly also possible, but once again it potentially uses more memory, and there's no functional gain. Plus, this kind of approach would also make the reconstruction process "jump around" more. It might pull a bunch of mostly-unchanged files from the full backup while handling the non-stub files, and then come back to that directory a second time, much later, when it's processing the stub-list. Perhaps that would lead to a less-optimal I/O pattern, or perhaps it would make it harder for the user to understand how much progress reconstruction had made. Or perhaps it would make no difference at all; I don't know. Maybe there's even some advantage in a two-pass approach like this. I don't see one. But it might prove otherwise on closer examination. -- Robert Haas EDB: http://www.enterprisedb.com
On Tue, Apr 16, 2024 at 9:25 AM Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Apr 15, 2024 at 10:12 PM David Steele <david@pgmasters.net> wrote: > > Anyway, I think it should be fixed or documented as a caveat since it > > causes a hard failure on restore. > > Alright, I'll look into this. Here's a patch. -- Robert Haas EDB: http://www.enterprisedb.com
Вложения
On 4/18/24 00:14, Robert Haas wrote: > On Tue, Apr 16, 2024 at 9:25 AM Robert Haas <robertmhaas@gmail.com> wrote: >> On Mon, Apr 15, 2024 at 10:12 PM David Steele <david@pgmasters.net> wrote: >>> Anyway, I think it should be fixed or documented as a caveat since it >>> causes a hard failure on restore. >> >> Alright, I'll look into this. > > Here's a patch. Thanks! I've tested this and it works as advertised. Ideally I'd want an error on backup if there is a similar file in any data directories that would cause an error on combine, but I admit that it is vanishingly rare for users to put files anywhere but the root of PGDATA, so this approach seems OK to me. Regards, -David
On Wed, Apr 17, 2024 at 7:56 PM David Steele <david@pgmasters.net> wrote: > Thanks! I've tested this and it works as advertised. > > Ideally I'd want an error on backup if there is a similar file in any > data directories that would cause an error on combine, but I admit that > it is vanishingly rare for users to put files anywhere but the root of > PGDATA, so this approach seems OK to me. OK, committed. Thanks for the review. -- Robert Haas EDB: http://www.enterprisedb.com
On 4/19/24 01:10, Robert Haas wrote: > On Wed, Apr 17, 2024 at 7:56 PM David Steele <david@pgmasters.net> wrote: >> Thanks! I've tested this and it works as advertised. >> >> Ideally I'd want an error on backup if there is a similar file in any >> data directories that would cause an error on combine, but I admit that >> it is vanishingly rare for users to put files anywhere but the root of >> PGDATA, so this approach seems OK to me. > > OK, committed. Thanks for the review. Excellent, thank you! Regards, -David