Обсуждение: Deprecating, and scheduling removal of, pg_dump's tar format.
Hi, Is there any real reason to retain it? I realize it's not been a huge issue to maintain the format, but it has significantly fewer features than the other formats, without corresponding upsides (sql is clearer to read, directory format has parallelism, custom is a single file while allowing to select sub-sets of the data). There's also the fact that it creates temp files, which other formats don't. Given the apparent lack of upside, it doesn't seem good to offer a sub-par choice to our users. If we were to remove it, it'd obviously be a drawn out affair, given that we likely couldn't drop it from the restore side immediately... Greetings, Andres Freund
On Thu, Jul 26, 2018 at 06:53:06PM -0700, Andres Freund wrote: > If we were to remove it, it'd obviously be a drawn out affair, given > that we likely couldn't drop it from the restore side immediately... The maintenance load is not high as well, so I see no real point in removing it, and that it would likely make people using it unhappy. -- Michael
Вложения
On Thu, Jul 26, 2018 at 7:33 PM, Michael Paquier <michael@paquier.xyz> wrote: > The maintenance load is not high as well, so I see no real point in > removing it, and that it would likely make people using it unhappy. Why, specifically, would it make them unhappy? -- Peter Geoghegan
On July 26, 2018 7:33:30 PM PDT, Michael Paquier <michael@paquier.xyz> wrote: >On Thu, Jul 26, 2018 at 06:53:06PM -0700, Andres Freund wrote: >> If we were to remove it, it'd obviously be a drawn out affair, given >> that we likely couldn't drop it from the restore side immediately... > >The maintenance load is not high as well Yea, I mentioned that. Worthwhile to note that it's nor pretty code. >, so I see no real point in >removing it, and that it would likely make people using it unhappy. Because others have to figure out what the format is when looking at pg-dump. And might choose wrongly. That's a cost aswell. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
> On Jul 26, 2018, at 19:35, Peter Geoghegan <pg@bowt.ie> wrote: > > Why, specifically, would it make them unhappy? Forensic and archive backups in .tar format (which I know of users doing) would require a two-step restore process on newerversions. -- -- Christophe Pettus xof@thebuild.com
On Thu, Jul 26, 2018 at 07:35:56PM -0700, Peter Geoghegan wrote: > On Thu, Jul 26, 2018 at 7:33 PM, Michael Paquier <michael@paquier.xyz> wrote: >> The maintenance load is not high as well, so I see no real point in >> removing it, and that it would likely make people using it unhappy. > > Why, specifically, would it make them unhappy? When upgrading PostgreSQL in an application framework which does migration upgrades, those users would need to complicate their code so as they need to handle multiple dump formats, by either detecting the format in use or assuming what to do based on the origin version. Believe me, that's a pain to dig into such issues. -- Michael
Вложения
On 07/26/2018 07:41 PM, Christophe Pettus wrote: >> On Jul 26, 2018, at 19:35, Peter Geoghegan <pg@bowt.ie> wrote: >> >> Why, specifically, would it make them unhappy? > Forensic and archive backups in .tar format (which I know of users doing) would require a two-step restore process on newerversions. I am a +1 for removing the format, though I would suggest we leave it as a restore option (pg_restore) for at least two more major releases. It is nice to get rid of cruft. JD > > -- > -- Christophe Pettus > xof@thebuild.com > > -- Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc PostgreSQL centered full stack support, consulting and development. Advocate: @amplifypostgres || Learn: https://postgresconf.org ***** Unless otherwise stated, opinions are my own. *****
Greetings, * Christophe Pettus (xof@thebuild.com) wrote: > > On Jul 26, 2018, at 19:35, Peter Geoghegan <pg@bowt.ie> wrote: > > > > Why, specifically, would it make them unhappy? > > Forensic and archive backups in .tar format (which I know of users doing) would require a two-step restore process on newerversions. Do you, perhaps, have any insight into why those users are currently using the .tar format? The use of temp files strikes me as a particularly good reason to do away with that format as it can cause odd failure cases. The other downside is that it's just more testing to be done to make sure that we didn't break it, testing which every developer waits for whenever they run the test suite. Further, any changes in pg_dump that could possibly impact the tar format also need to have tests written for them to make sure that this particular format, that has no real advantages or redeeming qualities over the other formats, continues to work. Thanks! Stephen
Вложения
> On Jul 26, 2018, at 20:09, Stephen Frost <sfrost@snowman.net> wrote: > > Do you, perhaps, have any insight into why those users are currently > using the .tar format? Inertia, in most cases; some of those procedures have been around since 8.1 days. Custom format (or a tar'd parallel dump)would be an undoubtedly superior choice. A long depreciation window would cover a lot of those situations. -- -- Christophe Pettus xof@thebuild.com
On July 26, 2018 8:14:57 PM PDT, Christophe Pettus <xof@thebuild.com> wrote: > >> On Jul 26, 2018, at 20:09, Stephen Frost <sfrost@snowman.net> wrote: >> >> Do you, perhaps, have any insight into why those users are currently >> using the .tar format? > >Inertia, in most cases; some of those procedures have been around since >8.1 days. Custom format (or a tar'd parallel dump) would be an >undoubtedly superior choice. A long depreciation window would cover a >lot of those situations. Yea, that obviously would be called for (see also the subject of my email). I'd assume we'd do something like deprecatingthe dump side in 12, removing it in 13. The restore side would then be removed in 12+5. Or similar. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Andres Freund <andres@anarazel.de> writes: > Is there any real reason to retain it? As I recall, the principal argument for having it to begin with was that it's a "non proprietary" format that could be read without any PG-specific tools. Perhaps the directory format could be said to serve that purpose too, but if you were to try to collapse a directory dump into one file for transportation, you'd have ... a tar dump. I think a more significant question is what we'd get by removing it? If you want to look around for features that are slightly less used than other arguably-equivalent things, we must have hundreds of those. Doesn't mean that those features have no user constituency. regards, tom lane
On Thu, Jul 26, 2018 at 11:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Andres Freund <andres@anarazel.de> writes: >> Is there any real reason to retain it? > > As I recall, the principal argument for having it to begin with was > that it's a "non proprietary" format that could be read without any > PG-specific tools. Perhaps the directory format could be said to > serve that purpose too, but if you were to try to collapse a directory > dump into one file for transportation, you'd have ... a tar dump. > > I think a more significant question is what we'd get by removing it? > If you want to look around for features that are slightly less used > than other arguably-equivalent things, we must have hundreds of those. > Doesn't mean that those features have no user constituency. Yeah. I don't mind removing really marginal features to ease maintenance, but I'm not sure that this one is all that marginal or that we'd save that much maintenance by eliminating it. I used text-format dumps for years primarily because I figured that no matter what happened, I'd always be able to find some way of getting my data out of a text file. Ideally the PostgreSQL tools will always work, but if they don't work and you have a text file, you have alternatives. If they don't work and you have a format in some PostgreSQL-specific format, then what? I probably wouldn't be as nervous about this now as I was then, seeing how careful we've been about this stuff. But I can certainly understand somebody wanting a standard format rather than a PostgreSQL-specific one. Why did we invent "custom" format dumps instead of using a standard container-file format like tar/cpio/zip/whatever? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, On 2018-07-27 12:51:17 -0400, Robert Haas wrote: > On Thu, Jul 26, 2018 at 11:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Andres Freund <andres@anarazel.de> writes: > >> Is there any real reason to retain it? > > > > As I recall, the principal argument for having it to begin with was > > that it's a "non proprietary" format that could be read without any > > PG-specific tools. Perhaps the directory format could be said to > > serve that purpose too, but if you were to try to collapse a directory > > dump into one file for transportation, you'd have ... a tar dump. > > > > I think a more significant question is what we'd get by removing it? > > If you want to look around for features that are slightly less used > > than other arguably-equivalent things, we must have hundreds of those. > > Doesn't mean that those features have no user constituency. > > Yeah. I don't mind removing really marginal features to ease > maintenance, but I'm not sure that this one is all that marginal or > that we'd save that much maintenance by eliminating it. My point is more that it forces users to make choices whenever they use pg_dump. And the tar format has plenty downsides that aren't immediately apparent. By keeping something with only a small upside around, we force users to waste time. > Why did we invent "custom" format dumps instead of using a standard > container-file format like tar/cpio/zip/whatever? Because they're either not all that simple, or don't random read access inside. But that's just a guess, not fact. Greetings, Andres Freund
On 07/27/2018 10:05 AM, Andres Freund wrote: > >> Yeah. I don't mind removing really marginal features to ease >> maintenance, but I'm not sure that this one is all that marginal or >> that we'd save that much maintenance by eliminating it. > My point is more that it forces users to make choices whenever they use > pg_dump. And the tar format has plenty downsides that aren't immediately > apparent. By keeping something with only a small upside around, we > force users to waste time. Correct. Sometimes it is best to limit choices, someone may chose tar because it is a command they have used but not fully understand what that means within the context of PostgreSQL. Then they are going to have something happen, they will ask for help either on the lists or from a consulting firm and the first they either will say is, "Don't use the tar format" or at least, "You should be using one of the other formats". Why invite the overhead? JD -- Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc *** A fault and talent of mine is to tell it exactly how it is. *** PostgreSQL centered full stack support, consulting and development. Advocate: @amplifypostgres || Learn: https://postgresconf.org ***** Unless otherwise stated, opinions are my own. *****
On Fri, Jul 27, 2018 at 12:51 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Jul 26, 2018 at 11:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Andres Freund <andres@anarazel.de> writes:
>> Is there any real reason to retain it?
>
> As I recall, the principal argument for having it to begin with was
> that it's a "non proprietary" format that could be read without any
> PG-specific tools. Perhaps the directory format could be said to
> serve that purpose too, but if you were to try to collapse a directory
> dump into one file for transportation, you'd have ... a tar dump.
>
> I think a more significant question is what we'd get by removing it?
> If you want to look around for features that are slightly less used
> than other arguably-equivalent things, we must have hundreds of those.
> Doesn't mean that those features have no user constituency.
Yeah. I don't mind removing really marginal features to ease
maintenance, but I'm not sure that this one is all that marginal or
that we'd save that much maintenance by eliminating it. I used
text-format dumps for years primarily because I figured that no matter
what happened, I'd always be able to find some way of getting my data
out of a text file. Ideally the PostgreSQL tools will always work,
but if they don't work and you have a text file, you have
alternatives. If they don't work and you have a format in some
PostgreSQL-specific format, then what?
But he isn't proposing getting rid of -Fp, just -Ft. Isn't -Ft is just as PostgresSQL-specific
as -Fd is?
Cheers,
Jeff
On Fri, Jul 27, 2018 at 1:05 PM, Andres Freund <andres@anarazel.de> wrote: > My point is more that it forces users to make choices whenever they use > pg_dump. And the tar format has plenty downsides that aren't immediately > apparent. By keeping something with only a small upside around, we > force users to waste time. Yeah, I admit that's a valid argument. >> Why did we invent "custom" format dumps instead of using a standard >> container-file format like tar/cpio/zip/whatever? > > Because they're either not all that simple, or don't random read access > inside. But that's just a guess, not fact. Mmm. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Jeff Janes <jeff.janes@gmail.com> writes: > But he isn't proposing getting rid of -Fp, just -Ft. Isn't -Ft is just as > PostgresSQL-specific > as -Fd is? No. The point about -Ft format is that you can extract files that contain SQL text and COPY data, using nothing but standard Unix tools (i.e. tar). So just as with a plain-text dump, you'd have some work to do to get your data into some other RDBMS, but it'd be mostly about SQL-compatibility problems, not "what the heck is this binary file format". I was thinking before that -Fd had basically the same payload files as an -Ft archive, but it doesn't: we don't emit anything corresponding to the "restore.sql" member of an -Ft archive. This means that -Fd still leaves you needing PG-specific tools to interpret the toc.dat file, so it's not a plausible answer if you would like to have something that's more structured than a plain-text dump but will still be of use if your PG tools are not available. The -Ft format certainly has got its problems, and I wouldn't complain if we decided to, say, extend -Fd format so that you could also get info out of it without using pg_restore. But I do not think we should just drop -Ft as long as it's our only nonproprietary structured dump format. regards, tom lane
>>>>> "Andres" == Andres Freund <andres@anarazel.de> writes: >> Why did we invent "custom" format dumps instead of using a standard >> container-file format like tar/cpio/zip/whatever? Andres> Because they're either not all that simple, or don't random Andres> read access inside. But that's just a guess, not fact. A more significant factor is that tar (like most file archive formats) doesn't allow streamed _write_ access - you need to know the size of each archive member in advance, hence why -Ft has to copy each table to a temp file and then copy that into the archive. -- Andrew.