Обсуждение: Proposal: More flexible backup/restore via pg_dump

Поиск
Список
Период
Сортировка

Proposal: More flexible backup/restore via pg_dump

От
Philip Warner
Дата:
I am considering making some changes to pg_dump, and would appreciate any
suggestions/ideas from the list.

The outline is as follows:

- Add a flag to modify the output of pg_dump so that it is more amenable to
retrieving parts of the extracted data. This may involve writing the data
to a tar file/dbm file/custom-format file, rather than just a simple SQL
script file (see below).

- Add flags to allow selective import of information stored in the custom
dump file: eg. load the schema (no data), load only one table, define all
indexes or triggers for a given table etc. This would eventually allow for
overriding of tablespace settings.

- Add options to dump selected information in a readble format (ie.
probably SQL).

The broad approach would be modify the existing pg_dump as little as
possible; I am inclined to write the data as SQL (as currently done), and
append an 'index' to the output, specifying the offset on the file that
each piece of extractable data can be found. The 'restore' option would
just go to the relevant section(s), and pipe the data to psql.

I am also considering the possibility of writing the data to separate files
in a tar archive, since this may be a lot cleaner in the long run, although
the text file with the index at the end has the advantage that the index
could be written as a series of SQL comments, effectively making the dump
file compatible with existing dump file formats.

Any comments/suggestions would be appreciated.


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


RE: Proposal: More flexible backup/restore via pg_dump

От
Peter Mount
Дата:
comments prefixed with PM...

--
Peter Mount
Enterprise Support
Maidstone Borough Council
Any views stated are my own, and not those of Maidstone Borough Council


-----Original Message-----
From: Philip Warner [mailto:pjw@rhyme.com.au]
Sent: Tuesday, June 27, 2000 10:07 AM
To: Giles Lean
Cc: Zeugswetter Andreas SB; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Proposal: More flexible backup/restore via
pg_dump 


At 07:00 27/06/00 +1000, Giles Lean wrote:
>
>Are you are also assuming that a backup fits in a single file,
>i.e. that anyone with >2GB of backup has some sort of large file
>support?

That's up to the format used to save the database; in the case of the
'custom' format, yes. But that is the size after compression. This is not
substantially different to pg_dump's behaviour, except that pg_dump can be
piped to a tape drive...

PM: So can most other Unix based formats. On the intranet server here, I
pg_dump into /tmp, then include them in a tar piped to the tape drive.

The objective of the API components are to (a) make it very easy to add new
metadata to dump (eg. tablespaces), and (b) make it easy to add new output
formats (eg. tar archives). Basically the metadata dumping side makes one
call to register the thing to be saved, passing an optional function
pointer to dump data (eg. table contents) - this *could* even be used to
implement dumping of BLOBs.

PM: The problem with blobs hasn't been with dumping them (I have some Java
code that does it into a compressed zip file), but restoring them - you
can't create a blob with a specific OID, so any references in existing
tables will break. I currently get round it by updating the tables after the
restore - but it's ugly and easy to break :(

The 'archiver' format provider must have some basic IO routines:
Read/WriteBuf and Read/WriteByte and has a number of hook functions which
it can use to output the data. It needs to provide at least one function
that actually writes data somewhere. It also has to provide the associated
function to read the data.

PM: Having a set of api's (either accessible directly into the backend,
and/or via some fastpath call) would be useful indeed.

>
>As someone else answered: no.  You can't portably assume random access
>to tape blocks.

This is probably an issue. One of the motivations for this utility it to
allow partial restores (eg. table data for one table only), and
arbitrarilly ordered restores. But I may have a solution:

PM: That would be useful. I don't know about CPIO, but tar stores the TOC at
the start of each file (so you can actually join two tar files together and
still read all the files). In this way, you could put the table name as the
"filename" in the header, so partial restores could be done.

write the schema and TOC out at the start of the file/tape, then compressed
data with headers for each indicating which TOC item they correspond to.
This metadata can be loaded into /tmp, so fseek is possible. The actual
data restoration (assuming constraints are not defined [THIS IS A PROBLEM])
can be done by scanning the rest of the tape in it's own order since RI
will not be an issue. I think I'm happy with this.

PM: How about IOCTL's? I know that ArcServe on both NT & Unixware can seek
through the tape, so there must be a way of doing it.

[snip]

>Using either tar or cpio format as defined for POSIX would allow a lot
>of us to understand your on-tape format with a very low burden on you
>for documentation.  (If you do go this route you might want to think
>about cpio format; it is less restrictive about filename length than
>tar.)

Tom Lane was also very favorably disposed to tar format. As I said above,
the archive interfaces should be pretty amenable to adding tar support -
it's just I'd like to get a version working with custom and directory based
formats to ensure the flexibility is there. As I see it, the 'backup to
directory' format should be easy to use as a basis for the 'backup to tar'
code.

PM: I don't see a problem there. The Java stuff I have use the standard
java.util.zip package wich has a simple api for zip files. Tar or most other
formats could be implemented in a similar compatible fashion.

The problem I have with tar is that it does not support random access to
the associated data. For reordering large backups, or (ultimately) single
BLOB extraction, this is a performance problem.

PM: Tar can do this sequentially, which I've had to do many times over the
years - restoring just one file from a tape, sequential access is probably
the only way.

If you have a tar spec (or suitably licenced code), please mail it to me,
and I'll be able to make more informed comments.

PM: The tar spec should be around somewhere - just be careful, the obvious
source I was thinking of would be GPL'd, and we don't want to be poluted :-)

[snip]


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


RE: Proposal: More flexible backup/restore via pg_dump

От
Peter Mount
Дата:

--
Peter Mount
Enterprise Support
Maidstone Borough Council
Any views stated are my own, and not those of Maidstone Borough Council


-----Original Message-----
From: Philip Warner [mailto:pjw@rhyme.com.au]
Sent: Tuesday, June 27, 2000 11:31 AM
To: Peter Mount; Giles Lean
Cc: Zeugswetter Andreas SB; pgsql-hackers@postgresql.org
Subject: RE: [HACKERS] Proposal: More flexible backup/restore via
pg_dump 


At 10:55 27/06/00 +0100, Peter Mount wrote:
>comments prefixed with PM...
>
>PM: So can most other Unix based formats. On the intranet server here, I
>pg_dump into /tmp, then include them in a tar piped to the tape drive.

Since you are using an intermediate file, there would be no change. fseek
is only an issue on tape drives and pipes, not files on disk. I suspect
that most people can afford the overhead of restoring the backup file to
disk before restoring the database, but it'd be nice to start out as
flexible as possible. In the back of my mind is the fact that when the WAL
and storage manager are going, raw data backups should be possible (and
fast) - but then again, maybe it's a pipe dream.

PM: Actually, with MS-SQL here, I use it's "Backup Device" (M$'s name for a
file ;-) ). It then has a scheduled job that backs up into that file (a full
backup empties the file first, then the incremental's append to it). Another
server then just backsup this single file.

PM: Anyhow, when I need to do a restore, I'm presented with a list of what
backup's are stored on that "device", and can restore from there.

>PM: The problem with blobs hasn't been with dumping them (I have some Java
>code that does it into a compressed zip file), but restoring them - you
>can't create a blob with a specific OID, so any references in existing
>tables will break. I currently get round it by updating the tables after
the
>restore - but it's ugly and easy to break :(

I assumed this would have to happen - hence why it will not be in the first
version. With all the TOAST stuff coming, and talk about the storage
manager, I still live in hope of a better BLOB system...

PM: I do as well. I've got a project coming up soon where the database will
be storing thumbnails of photographs (with details of where the full images
are stored, what film/negative# the original is from (for 35mm), notes, etc.
Anyhow, the current BLOB system may not cope with it, and the kludges I've
used in the past probably won't help.

>PM: Having a set of api's (either accessible directly into the backend,
>and/or via some fastpath call) would be useful indeed.

By this I assume you mean APIs to get to database data, not backup data.

PM: I was thinking of the backup data (in a similar way you can with SQL7).
A user could (if they have permissions) trigger a backup or a restore.

eg, my log database here is backed up using:BACKUP DATABASE eventlog TO eventdb WITH INIT, NAME=N'Eventlog
Backup', DESCRIPTION=N'The NT & Exchange Event Logs'

Here, eventlog is the database name, eventdb the "device", ie the file it
writes to, INIT erases the "device" and NAME/DESCRIPTION are written to the
device to help locate the backup when restoring.

Anyhow, this is possible future stuff ;-)

>
>This is probably an issue. One of the motivations for this utility it to
>allow partial restores (eg. table data for one table only), and
>arbitrarilly ordered restores. But I may have a solution:
>
>PM: That would be useful. I don't know about CPIO, but tar stores the TOC
at
>the start of each file (so you can actually join two tar files together and
>still read all the files). In this way, you could put the table name as the
>"filename" in the header, so partial restores could be done.

Well, the way my plans work, I'll use either a munged OID, or a arbitrary
unique ID as the file name. All meaningful access has to go via the TOC.
But that's just a detail; the basic idea is what I'm implementing. 

It's very tempting to say tape restores are only possible in the order in
which the backup file was written ('pg_restore --stream' ?), and that
reordering/selection is only possible if you put the file on disk.

PM: It's an idea for those with large (>=2Gb) databases. Most filesystems
don't like an individual file to be larger (or equal) to 2Gb. You may want
to keep this in mind when writing to a file (ie, break at 1Gb as we do with
tables).

>
>PM: How about IOCTL's? I know that ArcServe on both NT & Unixware can seek
>through the tape, so there must be a way of doing it.

Maybe; I know BackupExec also does some kind of seek to update the TOC at
end of a backup (which is what I need to do). Then again, maybe that's just
a rewind. I don't want to get into custom tape formats...

PM: Some tape drives support a "mark", but I've not seen/used one. When
watching ArcServe, it seems to rewind to the beginning, then "seek" to the
begining of the backup. But once it's done that, it knows where it is, and
seeks forward from there.

Do we have any tape experts out there?


>PM: Tar can do this sequentially, which I've had to do many times over the
>years - restoring just one file from a tape, sequential access is probably
>the only way.

It's just nasty when the user reorders the restoration of tables and
metadata. In the worst cast it might be hundreds of scans of the tape. I'd
hate to have my name associated with something so unfriendly (even if it is
the operators fault).

PM: That's true, but sometimes (and it happens here a lot), it's
unavoidable.

>PM: The tar spec should be around somewhere - just be careful, the obvious
>source I was thinking of would be GPL'd, and we don't want to be poluted
:-)

That was my problem. I've got some references now, and I'll look at them.
At least everything I've written so far can be put in PG.


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/