Обсуждение: Proposal: More flexible backup/restore via pg_dump
I am considering making some changes to pg_dump, and would appreciate any suggestions/ideas from the list. The outline is as follows: - Add a flag to modify the output of pg_dump so that it is more amenable to retrieving parts of the extracted data. This may involve writing the data to a tar file/dbm file/custom-format file, rather than just a simple SQL script file (see below). - Add flags to allow selective import of information stored in the custom dump file: eg. load the schema (no data), load only one table, define all indexes or triggers for a given table etc. This would eventually allow for overriding of tablespace settings. - Add options to dump selected information in a readble format (ie. probably SQL). The broad approach would be modify the existing pg_dump as little as possible; I am inclined to write the data as SQL (as currently done), and append an 'index' to the output, specifying the offset on the file that each piece of extractable data can be found. The 'restore' option would just go to the relevant section(s), and pipe the data to psql. I am also considering the possibility of writing the data to separate files in a tar archive, since this may be a lot cleaner in the long run, although the text file with the index at the end has the advantage that the index could be written as a series of SQL comments, effectively making the dump file compatible with existing dump file formats. Any comments/suggestions would be appreciated. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
comments prefixed with PM... -- Peter Mount Enterprise Support Maidstone Borough Council Any views stated are my own, and not those of Maidstone Borough Council -----Original Message----- From: Philip Warner [mailto:pjw@rhyme.com.au] Sent: Tuesday, June 27, 2000 10:07 AM To: Giles Lean Cc: Zeugswetter Andreas SB; pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Proposal: More flexible backup/restore via pg_dump At 07:00 27/06/00 +1000, Giles Lean wrote: > >Are you are also assuming that a backup fits in a single file, >i.e. that anyone with >2GB of backup has some sort of large file >support? That's up to the format used to save the database; in the case of the 'custom' format, yes. But that is the size after compression. This is not substantially different to pg_dump's behaviour, except that pg_dump can be piped to a tape drive... PM: So can most other Unix based formats. On the intranet server here, I pg_dump into /tmp, then include them in a tar piped to the tape drive. The objective of the API components are to (a) make it very easy to add new metadata to dump (eg. tablespaces), and (b) make it easy to add new output formats (eg. tar archives). Basically the metadata dumping side makes one call to register the thing to be saved, passing an optional function pointer to dump data (eg. table contents) - this *could* even be used to implement dumping of BLOBs. PM: The problem with blobs hasn't been with dumping them (I have some Java code that does it into a compressed zip file), but restoring them - you can't create a blob with a specific OID, so any references in existing tables will break. I currently get round it by updating the tables after the restore - but it's ugly and easy to break :( The 'archiver' format provider must have some basic IO routines: Read/WriteBuf and Read/WriteByte and has a number of hook functions which it can use to output the data. It needs to provide at least one function that actually writes data somewhere. It also has to provide the associated function to read the data. PM: Having a set of api's (either accessible directly into the backend, and/or via some fastpath call) would be useful indeed. > >As someone else answered: no. You can't portably assume random access >to tape blocks. This is probably an issue. One of the motivations for this utility it to allow partial restores (eg. table data for one table only), and arbitrarilly ordered restores. But I may have a solution: PM: That would be useful. I don't know about CPIO, but tar stores the TOC at the start of each file (so you can actually join two tar files together and still read all the files). In this way, you could put the table name as the "filename" in the header, so partial restores could be done. write the schema and TOC out at the start of the file/tape, then compressed data with headers for each indicating which TOC item they correspond to. This metadata can be loaded into /tmp, so fseek is possible. The actual data restoration (assuming constraints are not defined [THIS IS A PROBLEM]) can be done by scanning the rest of the tape in it's own order since RI will not be an issue. I think I'm happy with this. PM: How about IOCTL's? I know that ArcServe on both NT & Unixware can seek through the tape, so there must be a way of doing it. [snip] >Using either tar or cpio format as defined for POSIX would allow a lot >of us to understand your on-tape format with a very low burden on you >for documentation. (If you do go this route you might want to think >about cpio format; it is less restrictive about filename length than >tar.) Tom Lane was also very favorably disposed to tar format. As I said above, the archive interfaces should be pretty amenable to adding tar support - it's just I'd like to get a version working with custom and directory based formats to ensure the flexibility is there. As I see it, the 'backup to directory' format should be easy to use as a basis for the 'backup to tar' code. PM: I don't see a problem there. The Java stuff I have use the standard java.util.zip package wich has a simple api for zip files. Tar or most other formats could be implemented in a similar compatible fashion. The problem I have with tar is that it does not support random access to the associated data. For reordering large backups, or (ultimately) single BLOB extraction, this is a performance problem. PM: Tar can do this sequentially, which I've had to do many times over the years - restoring just one file from a tape, sequential access is probably the only way. If you have a tar spec (or suitably licenced code), please mail it to me, and I'll be able to make more informed comments. PM: The tar spec should be around somewhere - just be careful, the obvious source I was thinking of would be GPL'd, and we don't want to be poluted :-) [snip] ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
-- Peter Mount Enterprise Support Maidstone Borough Council Any views stated are my own, and not those of Maidstone Borough Council -----Original Message----- From: Philip Warner [mailto:pjw@rhyme.com.au] Sent: Tuesday, June 27, 2000 11:31 AM To: Peter Mount; Giles Lean Cc: Zeugswetter Andreas SB; pgsql-hackers@postgresql.org Subject: RE: [HACKERS] Proposal: More flexible backup/restore via pg_dump At 10:55 27/06/00 +0100, Peter Mount wrote: >comments prefixed with PM... > >PM: So can most other Unix based formats. On the intranet server here, I >pg_dump into /tmp, then include them in a tar piped to the tape drive. Since you are using an intermediate file, there would be no change. fseek is only an issue on tape drives and pipes, not files on disk. I suspect that most people can afford the overhead of restoring the backup file to disk before restoring the database, but it'd be nice to start out as flexible as possible. In the back of my mind is the fact that when the WAL and storage manager are going, raw data backups should be possible (and fast) - but then again, maybe it's a pipe dream. PM: Actually, with MS-SQL here, I use it's "Backup Device" (M$'s name for a file ;-) ). It then has a scheduled job that backs up into that file (a full backup empties the file first, then the incremental's append to it). Another server then just backsup this single file. PM: Anyhow, when I need to do a restore, I'm presented with a list of what backup's are stored on that "device", and can restore from there. >PM: The problem with blobs hasn't been with dumping them (I have some Java >code that does it into a compressed zip file), but restoring them - you >can't create a blob with a specific OID, so any references in existing >tables will break. I currently get round it by updating the tables after the >restore - but it's ugly and easy to break :( I assumed this would have to happen - hence why it will not be in the first version. With all the TOAST stuff coming, and talk about the storage manager, I still live in hope of a better BLOB system... PM: I do as well. I've got a project coming up soon where the database will be storing thumbnails of photographs (with details of where the full images are stored, what film/negative# the original is from (for 35mm), notes, etc. Anyhow, the current BLOB system may not cope with it, and the kludges I've used in the past probably won't help. >PM: Having a set of api's (either accessible directly into the backend, >and/or via some fastpath call) would be useful indeed. By this I assume you mean APIs to get to database data, not backup data. PM: I was thinking of the backup data (in a similar way you can with SQL7). A user could (if they have permissions) trigger a backup or a restore. eg, my log database here is backed up using:BACKUP DATABASE eventlog TO eventdb WITH INIT, NAME=N'Eventlog Backup', DESCRIPTION=N'The NT & Exchange Event Logs' Here, eventlog is the database name, eventdb the "device", ie the file it writes to, INIT erases the "device" and NAME/DESCRIPTION are written to the device to help locate the backup when restoring. Anyhow, this is possible future stuff ;-) > >This is probably an issue. One of the motivations for this utility it to >allow partial restores (eg. table data for one table only), and >arbitrarilly ordered restores. But I may have a solution: > >PM: That would be useful. I don't know about CPIO, but tar stores the TOC at >the start of each file (so you can actually join two tar files together and >still read all the files). In this way, you could put the table name as the >"filename" in the header, so partial restores could be done. Well, the way my plans work, I'll use either a munged OID, or a arbitrary unique ID as the file name. All meaningful access has to go via the TOC. But that's just a detail; the basic idea is what I'm implementing. It's very tempting to say tape restores are only possible in the order in which the backup file was written ('pg_restore --stream' ?), and that reordering/selection is only possible if you put the file on disk. PM: It's an idea for those with large (>=2Gb) databases. Most filesystems don't like an individual file to be larger (or equal) to 2Gb. You may want to keep this in mind when writing to a file (ie, break at 1Gb as we do with tables). > >PM: How about IOCTL's? I know that ArcServe on both NT & Unixware can seek >through the tape, so there must be a way of doing it. Maybe; I know BackupExec also does some kind of seek to update the TOC at end of a backup (which is what I need to do). Then again, maybe that's just a rewind. I don't want to get into custom tape formats... PM: Some tape drives support a "mark", but I've not seen/used one. When watching ArcServe, it seems to rewind to the beginning, then "seek" to the begining of the backup. But once it's done that, it knows where it is, and seeks forward from there. Do we have any tape experts out there? >PM: Tar can do this sequentially, which I've had to do many times over the >years - restoring just one file from a tape, sequential access is probably >the only way. It's just nasty when the user reorders the restoration of tables and metadata. In the worst cast it might be hundreds of scans of the tape. I'd hate to have my name associated with something so unfriendly (even if it is the operators fault). PM: That's true, but sometimes (and it happens here a lot), it's unavoidable. >PM: The tar spec should be around somewhere - just be careful, the obvious >source I was thinking of would be GPL'd, and we don't want to be poluted :-) That was my problem. I've got some references now, and I'll look at them. At least everything I've written so far can be put in PG. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/