Re: pg_dump additional options for performance
От | Tom Dunstan |
---|---|
Тема | Re: pg_dump additional options for performance |
Дата | |
Msg-id | ca33c0a30802260449r45c19725iea4b03c0f02c8b37@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: pg_dump additional options for performance (Simon Riggs <simon@2ndquadrant.com>) |
Ответы |
Re: pg_dump additional options for performance
(Simon Riggs <simon@2ndquadrant.com>)
Re: pg_dump additional options for performance (Dimitri Fontaine <dfontaine@hi-media.com>) |
Список | pgsql-hackers |
On Tue, Feb 26, 2008 at 5:35 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Tue, 2008-02-26 at 12:46 +0100, Dimitri Fontaine wrote: > > As a user I'd really prefer all of this to be much more transparent, and could > > well imagine the -Fc format to be some kind of TOC + zip of table data + post > > load instructions (organized per table), or something like this. > > In fact just what you described, all embedded in a single file. > > If its in a single file then it won't perform as well as if its separate > files. We can put separate files on separate drives. We can begin > reloading one table while another is still unloading. The OS will > perform readahead for us on single files whereas on one file it will > look like random I/O. etc. Yeah, writing multiple unknown-length streams to a single file in parallel is going to be all kinds of painful, and this use case seems to be the biggest complaint against a zip file kind of approach. I didn't know about the custom file format when I suggested the zip file one yesterday*, but a zip or equivalent has the major benefit of allowing the user to do manual inspection / tweaking of the dump because the file format is one that can be manipulated by standard tools. And zip wins over tar because it's indexed - if you want to extract just the schema and hack on it you don't need to touch your multi-GBs of data. Perhaps a compromise: we specify a file system layout for table data files, pre/post scripts and other metadata that we want to be made available to pg_restore. By default, it gets dumped into a zip file / whatever, but a user who wants to get parallel unloads can pass a flag that tells pg_dump to stick it into a directory instead, with exactly the same file layout. Or how about this: if the filename given to pg_dump is a directory, spit out files in there, otherwise create/overwrite a single file. While it's a bit fiddly, putting data on separate drives would then involve something like symlinking the tablename inside the dump dir off to an appropriate mount point, but that's probably not much worse than running n different pg_dump commands specifying different files. Heck, if you've got lots of data and want very particular behavior, you've got to specify it somehow. :) Cheers Tom * The custom file format does not seem well advertised. None of the examples on the pg_dump page use it, and I've never come across it in my travels on the vast interwebs. Heck, I've even hacked on pg_dump and I didn't know about it :). I won't suggest advertising it more while this discussion is going on though, since it may be obsoleted by whatever the final outcome is here.
В списке pgsql-hackers по дате отправления: