Обсуждение: Deprecating, and scheduling removal of, pg_dump's tar format.

Поиск
Список
Период
Сортировка

Deprecating, and scheduling removal of, pg_dump's tar format.

От
Andres Freund
Дата:
Hi,

Is there any real reason to retain it? I realize it's not been a huge
issue to maintain the format, but it has significantly fewer features
than the other formats, without corresponding upsides (sql is clearer to
read, directory format has parallelism, custom is a single file while
allowing to select sub-sets of the data).  There's also the fact that it
creates temp files, which other formats don't.

Given the apparent lack of upside, it doesn't seem good to offer a
sub-par choice to our users.

If we were to remove it, it'd obviously be a drawn out affair, given
that we likely couldn't drop it from the restore side immediately...

Greetings,

Andres Freund


Re: Deprecating, and scheduling removal of, pg_dump's tar format.

От
Michael Paquier
Дата:
On Thu, Jul 26, 2018 at 06:53:06PM -0700, Andres Freund wrote:
> If we were to remove it, it'd obviously be a drawn out affair, given
> that we likely couldn't drop it from the restore side immediately...

The maintenance load is not high as well, so I see no real point in
removing it, and that it would likely make people using it unhappy.
--
Michael

Вложения

Re: Deprecating, and scheduling removal of, pg_dump's tar format.

От
Peter Geoghegan
Дата:
On Thu, Jul 26, 2018 at 7:33 PM, Michael Paquier <michael@paquier.xyz> wrote:
> The maintenance load is not high as well, so I see no real point in
> removing it, and that it would likely make people using it unhappy.

Why, specifically, would it make them unhappy?

-- 
Peter Geoghegan


Re: Deprecating, and scheduling removal of, pg_dump's tar format.

От
Andres Freund
Дата:

On July 26, 2018 7:33:30 PM PDT, Michael Paquier <michael@paquier.xyz> wrote:
>On Thu, Jul 26, 2018 at 06:53:06PM -0700, Andres Freund wrote:
>> If we were to remove it, it'd obviously be a drawn out affair, given
>> that we likely couldn't drop it from the restore side immediately...
>
>The maintenance load is not high as well

Yea, I mentioned that.   Worthwhile to note that it's nor pretty code.

>, so I see no real point in
>removing it, and that it would likely make people using it unhappy.

Because others have to figure out what the format is when looking at pg-dump. And might choose wrongly. That's a cost
aswell. 

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: Deprecating, and scheduling removal of, pg_dump's tar format.

От
Christophe Pettus
Дата:
> On Jul 26, 2018, at 19:35, Peter Geoghegan <pg@bowt.ie> wrote:
>
> Why, specifically, would it make them unhappy?

Forensic and archive backups in .tar format (which I know of users doing) would require a two-step restore process on
newerversions. 

--
-- Christophe Pettus
   xof@thebuild.com



Re: Deprecating, and scheduling removal of, pg_dump's tar format.

От
Michael Paquier
Дата:
On Thu, Jul 26, 2018 at 07:35:56PM -0700, Peter Geoghegan wrote:
> On Thu, Jul 26, 2018 at 7:33 PM, Michael Paquier <michael@paquier.xyz> wrote:
>> The maintenance load is not high as well, so I see no real point in
>> removing it, and that it would likely make people using it unhappy.
>
> Why, specifically, would it make them unhappy?

When upgrading PostgreSQL in an application framework which does
migration upgrades, those users would need to complicate their code so
as they need to handle multiple dump formats, by either detecting the
format in use or assuming what to do based on the origin version.
Believe me, that's a pain to dig into such issues.
--
Michael

Вложения

Re: Deprecating, and scheduling removal of, pg_dump's tar format.

От
"Joshua D. Drake"
Дата:
On 07/26/2018 07:41 PM, Christophe Pettus wrote:
>> On Jul 26, 2018, at 19:35, Peter Geoghegan <pg@bowt.ie> wrote:
>>
>> Why, specifically, would it make them unhappy?
> Forensic and archive backups in .tar format (which I know of users doing) would require a two-step restore process on
newerversions.
 

I am a +1 for removing the format, though I would suggest we leave it as 
a restore option (pg_restore) for at least two more major releases. It 
is nice to get rid of cruft.

JD


>
> --
> -- Christophe Pettus
>     xof@thebuild.com
>
>

-- 
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc

PostgreSQL centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://postgresconf.org
*****     Unless otherwise stated, opinions are my own.   *****



Re: Deprecating, and scheduling removal of, pg_dump's tar format.

От
Stephen Frost
Дата:
Greetings,

* Christophe Pettus (xof@thebuild.com) wrote:
> > On Jul 26, 2018, at 19:35, Peter Geoghegan <pg@bowt.ie> wrote:
> >
> > Why, specifically, would it make them unhappy?
>
> Forensic and archive backups in .tar format (which I know of users doing) would require a two-step restore process on
newerversions. 

Do you, perhaps, have any insight into why those users are currently
using the .tar format?

The use of temp files strikes me as a particularly good reason to do
away with that format as it can cause odd failure cases.

The other downside is that it's just more testing to be done to make
sure that we didn't break it, testing which every developer waits for
whenever they run the test suite.  Further, any changes in pg_dump that
could possibly impact the tar format also need to have tests written for
them to make sure that this particular format, that has no real
advantages or redeeming qualities over the other formats, continues to
work.

Thanks!

Stephen

Вложения

Re: Deprecating, and scheduling removal of, pg_dump's tar format.

От
Christophe Pettus
Дата:
> On Jul 26, 2018, at 20:09, Stephen Frost <sfrost@snowman.net> wrote:
>
> Do you, perhaps, have any insight into why those users are currently
> using the .tar format?

Inertia, in most cases; some of those procedures have been around since 8.1 days.  Custom format (or a tar'd parallel
dump)would be an undoubtedly superior choice.  A long depreciation window would cover a lot of those situations. 

--
-- Christophe Pettus
   xof@thebuild.com



Re: Deprecating, and scheduling removal of, pg_dump's tar format.

От
Andres Freund
Дата:

On July 26, 2018 8:14:57 PM PDT, Christophe Pettus <xof@thebuild.com> wrote:
>
>> On Jul 26, 2018, at 20:09, Stephen Frost <sfrost@snowman.net> wrote:
>>
>> Do you, perhaps, have any insight into why those users are currently
>> using the .tar format?
>
>Inertia, in most cases; some of those procedures have been around since
>8.1 days.  Custom format (or a tar'd parallel dump) would be an
>undoubtedly superior choice.  A long depreciation window would cover a
>lot of those situations.

Yea, that obviously would be called for (see also the subject of my email). I'd assume we'd do something like
deprecatingthe dump side in 12, removing it in 13. The restore side would then be removed in 12+5. Or similar. 

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: Deprecating, and scheduling removal of, pg_dump's tar format.

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> Is there any real reason to retain it?

As I recall, the principal argument for having it to begin with was
that it's a "non proprietary" format that could be read without any
PG-specific tools.  Perhaps the directory format could be said to
serve that purpose too, but if you were to try to collapse a directory
dump into one file for transportation, you'd have ... a tar dump.

I think a more significant question is what we'd get by removing it?
If you want to look around for features that are slightly less used
than other arguably-equivalent things, we must have hundreds of those.
Doesn't mean that those features have no user constituency.

            regards, tom lane


Re: Deprecating, and scheduling removal of, pg_dump's tar format.

От
Robert Haas
Дата:
On Thu, Jul 26, 2018 at 11:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Andres Freund <andres@anarazel.de> writes:
>> Is there any real reason to retain it?
>
> As I recall, the principal argument for having it to begin with was
> that it's a "non proprietary" format that could be read without any
> PG-specific tools.  Perhaps the directory format could be said to
> serve that purpose too, but if you were to try to collapse a directory
> dump into one file for transportation, you'd have ... a tar dump.
>
> I think a more significant question is what we'd get by removing it?
> If you want to look around for features that are slightly less used
> than other arguably-equivalent things, we must have hundreds of those.
> Doesn't mean that those features have no user constituency.

Yeah.  I don't mind removing really marginal features to ease
maintenance, but I'm not sure that this one is all that marginal or
that we'd save that much maintenance by eliminating it.  I used
text-format dumps for years primarily because I figured that no matter
what happened, I'd always be able to find some way of getting my data
out of a text file.  Ideally the PostgreSQL tools will always work,
but if they don't work and you have a text file, you have
alternatives.  If they don't work and you have a format in some
PostgreSQL-specific format, then what?

I probably wouldn't be as nervous about this now as I was then, seeing
how careful we've been about this stuff.  But I can certainly
understand somebody wanting a standard format rather than a
PostgreSQL-specific one.  Why did we invent "custom" format dumps
instead of using a standard container-file format like
tar/cpio/zip/whatever?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Deprecating, and scheduling removal of, pg_dump's tar format.

От
Andres Freund
Дата:
Hi,

On 2018-07-27 12:51:17 -0400, Robert Haas wrote:
> On Thu, Jul 26, 2018 at 11:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Andres Freund <andres@anarazel.de> writes:
> >> Is there any real reason to retain it?
> >
> > As I recall, the principal argument for having it to begin with was
> > that it's a "non proprietary" format that could be read without any
> > PG-specific tools.  Perhaps the directory format could be said to
> > serve that purpose too, but if you were to try to collapse a directory
> > dump into one file for transportation, you'd have ... a tar dump.
> >
> > I think a more significant question is what we'd get by removing it?
> > If you want to look around for features that are slightly less used
> > than other arguably-equivalent things, we must have hundreds of those.
> > Doesn't mean that those features have no user constituency.
> 
> Yeah.  I don't mind removing really marginal features to ease
> maintenance, but I'm not sure that this one is all that marginal or
> that we'd save that much maintenance by eliminating it.

My point is more that it forces users to make choices whenever they use
pg_dump. And the tar format has plenty downsides that aren't immediately
apparent.  By keeping something with only a small upside around, we
force users to waste time.


> Why did we invent "custom" format dumps instead of using a standard
> container-file format like tar/cpio/zip/whatever?

Because they're either not all that simple, or don't random read access
inside. But that's just a guess, not fact.

Greetings,

Andres Freund


Re: Deprecating, and scheduling removal of, pg_dump's tar format.

От
"Joshua D. Drake"
Дата:
On 07/27/2018 10:05 AM, Andres Freund wrote:
>
>> Yeah.  I don't mind removing really marginal features to ease
>> maintenance, but I'm not sure that this one is all that marginal or
>> that we'd save that much maintenance by eliminating it.
> My point is more that it forces users to make choices whenever they use
> pg_dump. And the tar format has plenty downsides that aren't immediately
> apparent.  By keeping something with only a small upside around, we
> force users to waste time.

Correct. Sometimes it is best to limit choices, someone may chose tar 
because it is a command they have used but not fully understand what 
that means within the context of PostgreSQL. Then they are going to have 
something happen, they will ask for help either on the lists or from a 
consulting firm and the first they either will say is, "Don't use the 
tar format" or at least, "You should be using one of the other formats".

Why invite the overhead?

JD

-- 
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc
***  A fault and talent of mine is to tell it exactly how it is.  ***
PostgreSQL centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://postgresconf.org
*****     Unless otherwise stated, opinions are my own.   *****



Re: Deprecating, and scheduling removal of, pg_dump's tar format.

От
Jeff Janes
Дата:
On Fri, Jul 27, 2018 at 12:51 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Jul 26, 2018 at 11:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Andres Freund <andres@anarazel.de> writes:
>> Is there any real reason to retain it?
>
> As I recall, the principal argument for having it to begin with was
> that it's a "non proprietary" format that could be read without any
> PG-specific tools.  Perhaps the directory format could be said to
> serve that purpose too, but if you were to try to collapse a directory
> dump into one file for transportation, you'd have ... a tar dump.
>
> I think a more significant question is what we'd get by removing it?
> If you want to look around for features that are slightly less used
> than other arguably-equivalent things, we must have hundreds of those.
> Doesn't mean that those features have no user constituency.

Yeah.  I don't mind removing really marginal features to ease
maintenance, but I'm not sure that this one is all that marginal or
that we'd save that much maintenance by eliminating it.  I used
text-format dumps for years primarily because I figured that no matter
what happened, I'd always be able to find some way of getting my data
out of a text file.  Ideally the PostgreSQL tools will always work,
but if they don't work and you have a text file, you have
alternatives.  If they don't work and you have a format in some
PostgreSQL-specific format, then what?

But he isn't proposing getting rid of -Fp, just -Ft.  Isn't -Ft is just as PostgresSQL-specific
as -Fd is?

Cheers,

Jeff

Re: Deprecating, and scheduling removal of, pg_dump's tar format.

От
Robert Haas
Дата:
On Fri, Jul 27, 2018 at 1:05 PM, Andres Freund <andres@anarazel.de> wrote:
> My point is more that it forces users to make choices whenever they use
> pg_dump. And the tar format has plenty downsides that aren't immediately
> apparent.  By keeping something with only a small upside around, we
> force users to waste time.

Yeah, I admit that's a valid argument.

>> Why did we invent "custom" format dumps instead of using a standard
>> container-file format like tar/cpio/zip/whatever?
>
> Because they're either not all that simple, or don't random read access
> inside. But that's just a guess, not fact.

Mmm.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Deprecating, and scheduling removal of, pg_dump's tar format.

От
Tom Lane
Дата:
Jeff Janes <jeff.janes@gmail.com> writes:
> But he isn't proposing getting rid of -Fp, just -Ft.  Isn't -Ft is just as
> PostgresSQL-specific
> as -Fd is?

No.  The point about -Ft format is that you can extract files that contain
SQL text and COPY data, using nothing but standard Unix tools (i.e. tar).
So just as with a plain-text dump, you'd have some work to do to get your
data into some other RDBMS, but it'd be mostly about SQL-compatibility
problems, not "what the heck is this binary file format".

I was thinking before that -Fd had basically the same payload files as
an -Ft archive, but it doesn't: we don't emit anything corresponding to
the "restore.sql" member of an -Ft archive.  This means that -Fd still
leaves you needing PG-specific tools to interpret the toc.dat file,
so it's not a plausible answer if you would like to have something that's
more structured than a plain-text dump but will still be of use if your
PG tools are not available.

The -Ft format certainly has got its problems, and I wouldn't complain
if we decided to, say, extend -Fd format so that you could also get
info out of it without using pg_restore.  But I do not think we should
just drop -Ft as long as it's our only nonproprietary structured
dump format.

            regards, tom lane


Re: Deprecating, and scheduling removal of, pg_dump's tar format.

От
Andrew Gierth
Дата:
>>>>> "Andres" == Andres Freund <andres@anarazel.de> writes:

 >> Why did we invent "custom" format dumps instead of using a standard
 >> container-file format like tar/cpio/zip/whatever?

 Andres> Because they're either not all that simple, or don't random
 Andres> read access inside. But that's just a guess, not fact.

A more significant factor is that tar (like most file archive formats)
doesn't allow streamed _write_ access - you need to know the size of
each archive member in advance, hence why -Ft has to copy each table to
a temp file and then copy that into the archive.

-- 
Andrew.