Обсуждение: pg_dump --with-* options

Поиск
Список
Период
Сортировка

pg_dump --with-* options

От
Peter Eisentraut
Дата:
I'm looking at the new in PG18 pg_dump --with-* options, and I'm having 
trouble understanding them.  (I did not look into the source code or the 
git or mailing list history for this, to try to understand it as a user.)

We have

   -a, --data-only      dump only the data, not the schema or statistics
   --no-data            do not dump data
   --with-data          dump the data  # this one is new

(and there is also --section=data), and then three analogous options for 
"schema" and "statistics".

What is the purpose of the --with-data option?  Dumping the data is the 
default.  Is this to override an earlier --no-data option?

The man page is only minimally more verbose: "Dump data. This is the 
default."  But why do you then need this option?

I think we should add some more documenting detail for these, but right 
now I don't know what it would be.




Re: pg_dump --with-* options

От
Nathan Bossart
Дата:
On Fri, Jun 06, 2025 at 09:14:32AM +0200, Peter Eisentraut wrote:
> We have
> 
>   -a, --data-only      dump only the data, not the schema or statistics
>   --no-data            do not dump data
>   --with-data          dump the data  # this one is new
> 
> (and there is also --section=data), and then three analogous options for
> "schema" and "statistics".
> 
> What is the purpose of the --with-data option?  Dumping the data is the
> default.  Is this to override an earlier --no-data option?

I believe the idea is that these will allow folks to be explicit about what
they want instead of needing to understand the defaults for every
component.

-- 
nathan



Re: pg_dump --with-* options

От
Peter Eisentraut
Дата:
On 06.06.25 17:39, Nathan Bossart wrote:
> On Fri, Jun 06, 2025 at 09:14:32AM +0200, Peter Eisentraut wrote:
>> We have
>>
>>    -a, --data-only      dump only the data, not the schema or statistics
>>    --no-data            do not dump data
>>    --with-data          dump the data  # this one is new
>>
>> (and there is also --section=data), and then three analogous options for
>> "schema" and "statistics".
>>
>> What is the purpose of the --with-data option?  Dumping the data is the
>> default.  Is this to override an earlier --no-data option?
> 
> I believe the idea is that these will allow folks to be explicit about what
> they want instead of needing to understand the defaults for every
> component.

I get that idea, but we really need some more documentation for this, I 
think.  So far I could only guess how this is supposed to be used, and I 
also happened to guess wrong.

My initial guess was that --with-data can override --no-data.  That 
would have been pretty standard "last option wins" behavior.  But 
pg_dump rejects that.  Personally, I think that is kind of wrong.

But you can use --with-data to override, say, --schema-only.  That also 
seems kind of wrong to me, but anyway.  Then, it seems that 
--schema-only plus --with-data is kind of the same as everything plus 
--no-statistics?

I'm also wondering, if you consider the aforementioned "last option 
wins" behavior, should there be a difference between

--schema-only --with-data, and
--with-data --schema-only

or for that matter

--schema-only --data-only

But in any case, if you want that level of precision, wouldn't it make 
more sense to use the --section option?

This is all a nice logic puzzle, but some more documentation would be nice.




Re: pg_dump --with-* options

От
Robert Haas
Дата:
On Fri, Jun 6, 2025 at 11:40 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
> On Fri, Jun 06, 2025 at 09:14:32AM +0200, Peter Eisentraut wrote:
> > We have
> >
> >   -a, --data-only      dump only the data, not the schema or statistics
> >   --no-data            do not dump data
> >   --with-data          dump the data  # this one is new
> >
> > (and there is also --section=data), and then three analogous options for
> > "schema" and "statistics".
> >
> > What is the purpose of the --with-data option?  Dumping the data is the
> > default.  Is this to override an earlier --no-data option?
>
> I believe the idea is that these will allow folks to be explicit about what
> they want instead of needing to understand the defaults for every
> component.

Am I too late to propose ripping this out?

I mean, if I look at pg_dump --help and there are options for
--with-broccoli and --without-mushrooms, I know that the defaults are
no brocooli, yes mushrooms, and I know which options I need to specify
to get the behavior that I want, whatever that happens to be. If all
options exist in both forms, it's a lot more confusing. Maybe there's
some issue of cross-version compatibility here that justifies this
complexity, but I don't see what it would be. I would think
--with-data has always been the default and always will be, so we just
don't need --with-data for anything. But maybe I'm confused.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: pg_dump --with-* options

От
Fujii Masao
Дата:

On 2025/06/12 22:47, Peter Eisentraut wrote:
> On 06.06.25 17:39, Nathan Bossart wrote:
>> On Fri, Jun 06, 2025 at 09:14:32AM +0200, Peter Eisentraut wrote:
>>> We have
>>>
>>>    -a, --data-only      dump only the data, not the schema or statistics
>>>    --no-data            do not dump data
>>>    --with-data          dump the data  # this one is new
>>>
>>> (and there is also --section=data), and then three analogous options for
>>> "schema" and "statistics".
>>>
>>> What is the purpose of the --with-data option?  Dumping the data is the
>>> default.  Is this to override an earlier --no-data option?
>>
>> I believe the idea is that these will allow folks to be explicit about what
>> they want instead of needing to understand the defaults for every
>> component.
>
> I get that idea, but we really need some more documentation for this, I think.  So far I could only guess how this is
supposedto be used, and I also happened to guess wrong. 
>
> My initial guess was that --with-data can override --no-data.  That would have been pretty standard "last option
wins"behavior.  But pg_dump rejects that.  Personally, I think that is kind of wrong. 
>
> But you can use --with-data to override, say, --schema-only.  That also seems kind of wrong to me, but anyway.

While testing pg_dump --with-* in relation to bug #18952 [1],
I also ran into this behavior. It was surprising,
as I expected pg_dump to reject that combination of options.
The current behavior seems confusing.

Regards,

[1] https://postgr.es/m/18952-be40a620f8b1e755@postgresql.org

--
Fujii Masao
NTT DATA Japan Corporation




Re: pg_dump --with-* options

От
Nathan Bossart
Дата:
On Thu, Jun 12, 2025 at 10:18:56AM -0400, Robert Haas wrote:
> On Fri, Jun 6, 2025 at 11:40 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
>> On Fri, Jun 06, 2025 at 09:14:32AM +0200, Peter Eisentraut wrote:
>> > What is the purpose of the --with-data option?  Dumping the data is the
>> > default.  Is this to override an earlier --no-data option?
>>
>> I believe the idea is that these will allow folks to be explicit about what
>> they want instead of needing to understand the defaults for every
>> component.
> 
> Am I too late to propose ripping this out?
> 
> I mean, if I look at pg_dump --help and there are options for
> --with-broccoli and --without-mushrooms, I know that the defaults are
> no brocooli, yes mushrooms, and I know which options I need to specify
> to get the behavior that I want, whatever that happens to be. If all
> options exist in both forms, it's a lot more confusing. Maybe there's
> some issue of cross-version compatibility here that justifies this
> complexity, but I don't see what it would be. I would think
> --with-data has always been the default and always will be, so we just
> don't need --with-data for anything. But maybe I'm confused.

If the idea is to remove all options for default behavior, we'd be removing
--no-statistics, --with-data, and --with-schema at this point.  Maybe we
could go a step further and even rip out --statistics-only (in favor of
--no-schema --no-data --with-statistics).  In general, I do think the list
of pg_dump options is pretty unwieldy at this point.

-- 
nathan



Re: pg_dump --with-* options

От
Jeff Davis
Дата:
On Thu, 2025-06-12 at 15:47 +0200, Peter Eisentraut wrote:
> My initial guess was that --with-data can override --no-data.  That
> would have been pretty standard "last option wins" behavior.  But
> pg_dump rejects that.  Personally, I think that is kind of wrong.

Do we have other options that are order-sensitive?

> But in any case, if you want that level of precision, wouldn't it
> make
> more sense to use the --section option?

That's not possible with statistics, because some appear in
SECTION_DATA and some in SECTION_POST_DATA (e.g. stats on indexes,
which are in SECTION_POST_DATA).


Regards,
    Jeff Davis




Re: pg_dump --with-* options

От
Jeff Davis
Дата:
On Thu, 2025-06-12 at 10:18 -0400, Robert Haas wrote:
> Am I too late to propose ripping this out?

As long as we keep the functionality, I'm fine changing the
options/names around at this point.

Regards,
    Jeff Davis




Re: pg_dump --with-* options

От
Jeff Davis
Дата:
On Thu, 2025-06-12 at 09:52 -0500, Nathan Bossart wrote:
> If the idea is to remove all options for default behavior, we'd be
> removing
> --no-statistics, --with-data, and --with-schema at this point.

That's OK with me.

>   Maybe we
> could go a step further and even rip out --statistics-only (in favor
> of
> --no-schema --no-data --with-statistics).

I'd probably keep --statistics-only.

Regards,
    Jeff Davis




Re: pg_dump --with-* options

От
Nathan Bossart
Дата:
On Thu, Jun 12, 2025 at 08:58:15AM -0700, Jeff Davis wrote:
> On Thu, 2025-06-12 at 09:52 -0500, Nathan Bossart wrote:
>> If the idea is to remove all options for default behavior, we'd be
>> removing
>> --no-statistics, --with-data, and --with-schema at this point.
> 
> That's OK with me.
> 
>>   Maybe we
>> could go a step further and even rip out --statistics-only (in favor
>> of
>> --no-schema --no-data --with-statistics).
> 
> I'd probably keep --statistics-only.

WFM

-- 
nathan



Re: pg_dump --with-* options

От
Robert Haas
Дата:
On Thu, Jun 12, 2025 at 11:58 AM Jeff Davis <pgsql@j-davis.com> wrote:
> On Thu, 2025-06-12 at 09:52 -0500, Nathan Bossart wrote:
> > If the idea is to remove all options for default behavior, we'd be
> > removing
> > --no-statistics, --with-data, and --with-schema at this point.
>
> That's OK with me.

Same.

> >   Maybe we
> > could go a step further and even rip out --statistics-only (in favor
> > of
> > --no-schema --no-data --with-statistics).
>
> I'd probably keep --statistics-only.

I'm going to vote for removing it. pg_dump has a lot of options, and
it doesn't seem like a good bet to me to have options that are
equivalent to various combinations of other options. I don't see any
particular reason to believe that --statistics-only is even a
particularly likely combination of options for someone to want. I'd
rather keep it simple.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: pg_dump --with-* options

От
Peter Eisentraut
Дата:
On 12.06.25 17:14, Jeff Davis wrote:
> On Thu, 2025-06-12 at 15:47 +0200, Peter Eisentraut wrote:
>> My initial guess was that --with-data can override --no-data.  That
>> would have been pretty standard "last option wins" behavior.  But
>> pg_dump rejects that.  Personally, I think that is kind of wrong.
> 
> Do we have other options that are order-sensitive?

I think most of them are.  For example:

psql -p 5432 -p 5433
initdb --data-checksums --no-data-checksums
postgres --shared-buffers=1GB --shared-buffers=2GB




Re: pg_dump --with-* options

От
Laurenz Albe
Дата:
On Thu, 2025-06-12 at 13:36 -0400, Robert Haas wrote:
> On Thu, Jun 12, 2025 at 11:58 AM Jeff Davis <pgsql@j-davis.com> wrote:
> > On Thu, 2025-06-12 at 09:52 -0500, Nathan Bossart wrote:
> > > If the idea is to remove all options for default behavior, we'd be
> > > removing
> > > --no-statistics, --with-data, and --with-schema at this point.
> >
> > That's OK with me.
>
> Same.

I must be missing something, but I think --no-statistics is sorely needed.
How else can I get the effect of

  pg_dump --no-statistics mydb

Yours,
Laurenz Albe



Re: pg_dump --with-* options

От
Nathan Bossart
Дата:
On Thu, Jun 12, 2025 at 10:07:05PM +0200, Laurenz Albe wrote:
> I must be missing something, but I think --no-statistics is sorely needed.
> How else can I get the effect of
> 
>   pg_dump --no-statistics mydb

This was recently changed to be the default behavior (see commit 34eb2a8).

-- 
nathan



Re: pg_dump --with-* options

От
Corey Huinker
Дата:

On Thu, Jun 12, 2025 at 1:36 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Jun 12, 2025 at 11:58 AM Jeff Davis <pgsql@j-davis.com> wrote:
> On Thu, 2025-06-12 at 09:52 -0500, Nathan Bossart wrote:
> > If the idea is to remove all options for default behavior, we'd be
> > removing
> > --no-statistics, --with-data, and --with-schema at this point.
>
> That's OK with me.

Same.

> >   Maybe we
> > could go a step further and even rip out --statistics-only (in favor
> > of
> > --no-schema --no-data --with-statistics).
>
> I'd probably keep --statistics-only.

I'm going to vote for removing it. pg_dump has a lot of options, and
it doesn't seem like a good bet to me to have options that are
equivalent to various combinations of other options. I don't see any
particular reason to believe that --statistics-only is even a
particularly likely combination of options for someone to want. I'd
rather keep it simple.

The use case for --statistics-only is to extract the existing statistics for the tables and indexes that are involved in a given query that is giving you problems, allowing you to apply those statistics to an existing QA/dev database and tweak them without further impacting operations on the production database. I think this will prove to be very useful, and having a --statistics-only flag conveys the clear intent of "I want the stats, and only the stats", 

 

--
Robert Haas
EDB: http://www.enterprisedb.com


If we're hot to remove options, how about we remove the sections flags? Their utility is reliant upon the user understanding exactly which things go in which section, and further assumes that everything deterministically goes in exactly one section, which is no longer the case as Jeff pointed out recently. They have outlived their usefulness.

If we have the full complement of -no-something flags, and the three -only flags, we wouldn't need the --with-something flags. That would mean making statistics export the default on dumps, which I think it should be anyway, because there's nothing else that we don't dump by default, and while it might seem strange to have them by default now, NOT having them by default will feel very strange a few years down the road.
 

Re: pg_dump --with-* options

От
Nathan Bossart
Дата:
On Thu, Jun 12, 2025 at 04:12:35PM -0400, Corey Huinker wrote:
> The use case for --statistics-only is to extract the existing statistics
> for the tables and indexes that are involved in a given query that is
> giving you problems, allowing you to apply those statistics to an existing
> QA/dev database and tweak them without further impacting operations on the
> production database. I think this will prove to be very useful, and having
> a --statistics-only flag conveys the clear intent of "I want the stats, and
> only the stats",

I do think this is useful functionality, I only suggested removing it
because AFAICT it is redundant, i.e., you can accomplish the same thing
with --with-statistics --no-schema --no-data.  It seems like we're trying
to avoid having multiple ways to do the same thing.

> If we're hot to remove options, how about we remove the sections flags?
> Their utility is reliant upon the user understanding exactly which things
> go in which section, and further assumes that everything deterministically
> goes in exactly one section, which is no longer the case as Jeff
> pointed out recently. They have outlived their usefulness.

I almost brought this up earlier as something else we could potentially
trim.  That's v19 material at this point, though.

-- 
nathan



Re: pg_dump --with-* options

От
Corey Huinker
Дата:
On Thu, Jun 12, 2025 at 4:22 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Thu, Jun 12, 2025 at 04:12:35PM -0400, Corey Huinker wrote:
> The use case for --statistics-only is to extract the existing statistics
> for the tables and indexes that are involved in a given query that is
> giving you problems, allowing you to apply those statistics to an existing
> QA/dev database and tweak them without further impacting operations on the
> production database. I think this will prove to be very useful, and having
> a --statistics-only flag conveys the clear intent of "I want the stats, and
> only the stats",

I do think this is useful functionality, I only suggested removing it
because AFAICT it is redundant, i.e., you can accomplish the same thing
with --with-statistics --no-schema --no-data.  It seems like we're trying
to avoid having multiple ways to do the same thing.

By that same argument, we should remove --schema-only and --data-only as well. I think we shouldn't because those two options have proved very convenient for users and they convey clear intent to the person reading the script, and I believe that --statistics-only will prove the same over time.
 
> If we're hot to remove options, how about we remove the sections flags?
> Their utility is reliant upon the user understanding exactly which things
> go in which section, and further assumes that everything deterministically
> goes in exactly one section, which is no longer the case as Jeff
> pointed out recently. They have outlived their usefulness.

I almost brought this up earlier as something else we could potentially
trim.  That's v19 material at this point, though.

+1 for 19.

Re: pg_dump --with-* options

От
Nathan Bossart
Дата:
On Thu, Jun 12, 2025 at 04:39:00PM -0400, Corey Huinker wrote:
> On Thu, Jun 12, 2025 at 4:22 PM Nathan Bossart <nathandbossart@gmail.com>
> wrote:
>> I do think this is useful functionality, I only suggested removing it
>> because AFAICT it is redundant, i.e., you can accomplish the same thing
>> with --with-statistics --no-schema --no-data.  It seems like we're trying
>> to avoid having multiple ways to do the same thing.
> 
> By that same argument, we should remove --schema-only and --data-only as
> well. I think we shouldn't because those two options have proved very
> convenient for users and they convey clear intent to the person reading the
> script, and I believe that --statistics-only will prove the same over time.

Those predate v18, so while might be able to mark them deprecated, I doubt
we'd remove them anytime soon.  

FWIW I don't have a tremendously strong opinion about --statistics-only.
I'd probably vote to remove it because 1) it's redundant, 2) once you add
an option, it's hard to remove it, and 3) pg_dump already has so many
options.  But I won't cry too hard if we keep it around.

-- 
nathan



Re: pg_dump --with-* options

От
Jeff Davis
Дата:
On Thu, 2025-06-12 at 15:57 -0500, Nathan Bossart wrote:
> FWIW I don't have a tremendously strong opinion about --statistics-
> only.

Same here. I won't cast a vote on this particular issue, as long as the
functionality is available.

Regards,
    Jeff Davis




Re: pg_dump --with-* options

От
Jeff Davis
Дата:
On Thu, 2025-06-12 at 21:16 +0200, Peter Eisentraut wrote:
> > Do we have other options that are order-sensitive?
>
> I think most of them are.  For example:
>
> psql -p 5432 -p 5433
> initdb --data-checksums --no-data-checksums
> postgres --shared-buffers=1GB --shared-buffers=2GB

Interesting. I don't think the "last option wins" model applies to
other pg_dump options, though. For instance, in PG17:

  pg_dump --data-only --schema-only
  pg_dump: error: options -s/--schema-only and -a/--data-only cannot be
used together

I don't think it's simple to start using "last option wins" behavior
now. There are probably some combinations of options where it's not
clear whether a later option is an extra constraint or will override a
previous option.

Regards,
    Jeff Davis




Re: pg_dump --with-* options

От
Fujii Masao
Дата:

On 2025/06/13 6:12, Jeff Davis wrote:
> On Thu, 2025-06-12 at 15:57 -0500, Nathan Bossart wrote:
>> FWIW I don't have a tremendously strong opinion about --statistics-
>> only.
> 
> Same here. I won't cast a vote on this particular issue, as long as the
> functionality is available.

I prefer keeping it as a shorthand for --with-statistics --no-data --no-schema.
It's redundant, but it's intuitive and convenient. That said, if others feel
strongly about removing it, I'm fine with that too.

Regards,

-- 
Fujii Masao
NTT DATA Japan Corporation




Re: pg_dump --with-* options

От
Fujii Masao
Дата:

On 2025/06/12 23:52, Nathan Bossart wrote:
> On Thu, Jun 12, 2025 at 10:18:56AM -0400, Robert Haas wrote:
>> On Fri, Jun 6, 2025 at 11:40 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
>>> On Fri, Jun 06, 2025 at 09:14:32AM +0200, Peter Eisentraut wrote:
>>>> What is the purpose of the --with-data option?  Dumping the data is the
>>>> default.  Is this to override an earlier --no-data option?
>>>
>>> I believe the idea is that these will allow folks to be explicit about what
>>> they want instead of needing to understand the defaults for every
>>> component.
>>
>> Am I too late to propose ripping this out?
>>
>> I mean, if I look at pg_dump --help and there are options for
>> --with-broccoli and --without-mushrooms, I know that the defaults are
>> no brocooli, yes mushrooms, and I know which options I need to specify
>> to get the behavior that I want, whatever that happens to be. If all
>> options exist in both forms, it's a lot more confusing. Maybe there's
>> some issue of cross-version compatibility here that justifies this
>> complexity, but I don't see what it would be. I would think
>> --with-data has always been the default and always will be, so we just
>> don't need --with-data for anything. But maybe I'm confused.
> 
> If the idea is to remove all options for default behavior, we'd be removing
> --no-statistics, --with-data, and --with-schema at this point.

WFM.

Regarding pg_restore, since --with-statistics is already the default,
we should remove it from pg_restore.

By the way, if we keep --with-statistics in pg_dump, are we planning to
continue using the --with-xxx naming pattern for new options that
specify extra data to dump? I just wondered because pg_dump already has
other naming styles like --sequence-data, --include-foreign-data,
and --large-objects.

Regards,

-- 
Fujii Masao
NTT DATA Japan Corporation




Re: pg_dump --with-* options

От
Greg Sabino Mullane
Дата:
On Thu, Jun 12, 2025 at 4:12 PM Corey Huinker <corey.huinker@gmail.com> wrote:
(peacefully skimming thread...)
...
If we're hot to remove options, how about we remove the sections flags? Their utility is reliant upon the user understanding exactly which things go in which section, and further assumes that everything deterministically goes in exactly one section, which is no longer the case as Jeff pointed out recently. They have outlived their usefulness.

[record scratch sound]
Wait, what?! Big -1. Guess we can debate this more for v19.

For this current discussion (--statistics-only), I'm a +1 for keeping things as they are now, in the name of intuitiveness.

Cheers,
Greg

--
Enterprise Postgres Software Products & Tech Support

Re: pg_dump --with-* options

От
Peter Eisentraut
Дата:
On 12.06.25 23:20, Jeff Davis wrote:
> On Thu, 2025-06-12 at 21:16 +0200, Peter Eisentraut wrote:
>>> Do we have other options that are order-sensitive?
>>
>> I think most of them are.  For example:
>>
>> psql -p 5432 -p 5433
>> initdb --data-checksums --no-data-checksums
>> postgres --shared-buffers=1GB --shared-buffers=2GB
> 
> Interesting. I don't think the "last option wins" model applies to
> other pg_dump options, though. For instance, in PG17:
> 
>    pg_dump --data-only --schema-only
>    pg_dump: error: options -s/--schema-only and -a/--data-only cannot be
> used together
> 
> I don't think it's simple to start using "last option wins" behavior
> now. There are probably some combinations of options where it's not
> clear whether a later option is an extra constraint or will override a
> previous option.

It makes sense to raise an error if the specified options cannot be 
consolidated in an obvious way.  I'd expect

pg_recvlogical --create-slot --drop-slot

to fail, but I'd expect

pg_recvlogical --create-slot --slot=foo --slot=bar

to work.

One of the challenges in the current case is that it is not obvious how 
--with-data, --no-data, --data-only etc. are connected.  If that were 
clearer, then the way these options should combine or conflict would 
hopefully follow somewhat naturally.




Re: pg_dump --with-* options

От
Daniel Gustafsson
Дата:
> On 13 Jun 2025, at 02:39, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

> By the way, if we keep --with-statistics in pg_dump, are we planning to
> continue using the --with-xxx naming pattern for new options that
> specify extra data to dump? I just wondered because pg_dump already has
> other naming styles like --sequence-data, --include-foreign-data,
> and --large-objects.

There are quite a number of different conventions in use for pg_dump
parameters, I'm not convinced that we are doing users a favour by adding yet
another one here with --with-XXX.  AFAICT we have:

* --XXX which includes all of XXX, for example --large-objects

* --with-XXX which includes all of XXX, for example --with-data

* --XXX=YYY which includes XXX matching pattern YYY, for example --table

* --include-XXX=YYY which includes all of XXX matching pattern YYY, like
  --include-foreign-data

* --exclude-XXX=YYY which excludes XXX matching pattern YYY, for example
  --exclude-table

* --no-XXX which excludes all of XXX, for example --no-comments

* --XXX-only which excludes objects that aren't XXX

For some things we also have short options which corresponds to --XXX and
--exclude-XXX like -t and -T, but -e and -E break that again.

Most object types can be included and excluded, except --include-foreign-data
where there is no exclusion parameter (and the pattern also applies to the
servername and not data).

Then there are features affecting the dump output which either use --YYY or
--use-YYY , like --inserts and --use-set-session-authorization.  Other features
which affect what can be be be dumped instead use --enable-XXX like
--enable-row-security.  Then --snapshot deviates from both since it has neither
--use nor --enable even though it's a similar category of options.

On top of that, we have the --filter=FILE which can include and exclude objects
(limited to objects where there is a corresponding commandline parameter).

Changing existing parameters is of course hard since it will break scripts and
integrations, but should we really add --with-XXX and increase the
proliferation of concepts rather than trying to make it fit/shoehorn into one
of the ones we already have?

--
Daniel Gustafsson




Re: pg_dump --with-* options

От
Jeff Davis
Дата:
On Fri, 2025-06-13 at 07:22 +0200, Peter Eisentraut wrote:
> > I don't think it's simple to start using "last option wins"
> > behavior
> > now ...
> It makes sense to raise an error if the specified options cannot be
> consolidated in an obvious way.

To me, "last option wins" means that you don't raise an error; the
latter option simply overrides the earlier one.

Given that the pg_dump options are not order-sensitive now (unless I'm
missing something), I'm worried about the consequences of trying to
make them so now.

Regards,
    Jeff Davis




Re: pg_dump --with-* options

От
Jeff Davis
Дата:
On Fri, 2025-06-13 at 09:39 +0900, Fujii Masao wrote:

> By the way, if we keep --with-statistics in pg_dump, are we planning
> to
> continue using the --with-xxx naming pattern for new options that
> specify extra data to dump?

Good point. Now that we are getting rid of some of the other options,
we don't need to worry about consistency with them, and I think we
should just use "--statistics".

Regards,
    Jeff Davis




Re: pg_dump --with-* options

От
Nathan Bossart
Дата:
On Fri, Jun 13, 2025 at 08:58:04AM -0700, Jeff Davis wrote:
> On Fri, 2025-06-13 at 09:39 +0900, Fujii Masao wrote:
>> By the way, if we keep --with-statistics in pg_dump, are we planning
>> to
>> continue using the --with-xxx naming pattern for new options that
>> specify extra data to dump?
> 
> Good point. Now that we are getting rid of some of the other options,
> we don't need to worry about consistency with them, and I think we
> should just use "--statistics".

+1

-- 
nathan



Re: pg_dump --with-* options

От
Corey Huinker
Дата:
One of the challenges in the current case is that it is not obvious how
--with-data, --no-data, --data-only etc. are connected.  If that were
clearer, then the way these options should combine or conflict would
hopefully follow somewhat naturally.

They all should be mutually exclusive, and usage of any two of them should raise an error, hence order not mattering.

Re: pg_dump --with-* options

От
Corey Huinker
Дата:

Good point. Now that we are getting rid of some of the other options,
we don't need to worry about consistency with them, and I think we
should just use "--statistics".

The point of the --with flags was to future proof commands to preserve behavior in case the defaults ever changed. 

This would all be a lot simpler, and the --with switches would all be unnecessary, if we didn't decide to make exactly one type of dumpable object (statistics) off by default but only in one of the three commands (pg_dump).

I think we're creating lasting confusion for the sake of short-term convenience.
 

Re: pg_dump --with-* options

От
Fujii Masao
Дата:

On 2025/06/14 5:32, Nathan Bossart wrote:
> On Fri, Jun 13, 2025 at 08:58:04AM -0700, Jeff Davis wrote:
>> On Fri, 2025-06-13 at 09:39 +0900, Fujii Masao wrote:
>>> By the way, if we keep --with-statistics in pg_dump, are we planning
>>> to
>>> continue using the --with-xxx naming pattern for new options that
>>> specify extra data to dump?
>>
>> Good point. Now that we are getting rid of some of the other options,
>> we don't need to worry about consistency with them, and I think we
>> should just use "--statistics".
> 
> +1

+1

I noticed that --statistics (i.e., the current --with-statistics) causes
statistics to be dumped even when used with --data-only or --schema-only.
So, as far as I understand, here are the possible combinations of dump
targets and options:

  schema, data, stats:       --statistics
  schema, data:              (default)
  schema, stats:             --schema-only --statistics
  data, stats:               --data-only --statistics
  schema only:               --schema-only
  data only:                 --data-only
  stats only:                --statistics-only

This makes me wonder if --no-data and --no-schema are still necessary.
They were also introduced in v18, but might now be redundant. If so,
should we consider removing them?

If we do keep them, we could also use --no-schema --statistics to
dump data and statistics, but I find --data-only --statistics more intuitive.

Regards,

-- 
Fujii Masao
NTT DATA Japan Corporation




Re: pg_dump --with-* options

От
Corey Huinker
Дата:
I noticed that --statistics (i.e., the current --with-statistics) causes
statistics to be dumped even when used with --data-only or --schema-only.
So, as far as I understand, here are the possible combinations of dump
targets and options:

Those should also be mutually exclusive, and I'll write up a patch to add them to the checks.
 

  schema, data, stats:       --statistics
  schema, data:              (default)
  schema, stats:             --schema-only --statistics
  data, stats:               --data-only --statistics
  schema only:               --schema-only
  data only:                 --data-only
  stats only:                --statistics-only

This makes me wonder if --no-data and --no-schema are still necessary.
They were also introduced in v18, but might now be redundant. If so,
should we consider removing them?

If we do keep them, we could also use --no-schema --statistics to
dump data and statistics, but I find --data-only --statistics more intuitive.

I think this is the exact sort of confusion caused by having two of the three types default to on in all circumstances, and one default to off in one special circumstance.

Let's keep this simple, and have all three types default to on in all circumstances.

Re: pg_dump --with-* options

От
Nathan Bossart
Дата:
On Mon, Jun 16, 2025 at 03:35:48PM -0400, Corey Huinker wrote:
> I think this is the exact sort of confusion caused by having two of the
> three types default to on in all circumstances, and one default to off in
> one special circumstance.

I revisited the main thread to see how folks voted.  There are a lot of
messages over a long period of time, and folks may have changed their mind
since, but this is what I saw:

    off-by-default: Mullane, Haas, Davis, Bossart
    on-by-default: Huinker, Lane, Hagander, Frost

In fact, there seems to have been general agreement in 2024 that stats
_should_ be on by default [0].  So perhaps there's not as strong of a
consensus as we thought.  Maybe we should ask for any new/updated votes.

> Let's keep this simple, and have all three types default to on in all
> circumstances.

Assuming we did turn on stats by default, what is the minimum set of new
flags in v18 you'd like to see?

[0] https://postgr.es/m/e16cd9caf4f5229a152d318d70b4d323a03e3539.camel%40j-davis.com

-- 
nathan



Re: pg_dump --with-* options

От
Jeff Davis
Дата:
On Mon, 2025-06-16 at 16:09 -0500, Nathan Bossart wrote:
> So perhaps there's not as strong of a
> consensus as we thought.  Maybe we should ask for any new/updated
> votes.

Does it make any sense to be off by default in 18 and on in some later
release?

Regards
    Jeff Davis




Re: pg_dump --with-* options

От
Tom Lane
Дата:
Jeff Davis <pgsql@j-davis.com> writes:
> Does it make any sense to be off by default in 18 and on in some later
> release?

Probably not, especially if part of the argument for on-by-default
is to allow simplification of the switch set.  We don't get that
benefit if we ship with off-by-default, and we won't be able to
get it later.

I find myself increasingly persuaded by Corey's point of view ...

            regards, tom lane



Re: pg_dump --with-* options

От
Nathan Bossart
Дата:
On Mon, Jun 16, 2025 at 07:09:17PM -0400, Tom Lane wrote:
> I find myself increasingly persuaded by Corey's point of view ...

+1

-- 
nathan



Re: pg_dump --with-* options

От
Fujii Masao
Дата:

On 2025/06/17 4:35, Corey Huinker wrote:
>     I noticed that --statistics (i.e., the current --with-statistics) causes
>     statistics to be dumped even when used with --data-only or --schema-only.
>     So, as far as I understand, here are the possible combinations of dump
>     targets and options:
> 
> 
> Those should also be mutually exclusive, and I'll write up a patch to add them to the checks.

--sequence-data behaves similarly, i.e., it still dumps sequence data
even when used with --schema-only. So I've been thinking of both
--statistics and --sequence-data as options that include additional data,
regardless of whether --*-only is specified.

It seems better to keep their behavior consistent to avoid confusing users.

Regards,

-- 
Fujii Masao
NTT DATA Japan Corporation




Re: pg_dump --with-* options

От
Fujii Masao
Дата:

On 2025/06/17 9:58, Nathan Bossart wrote:
> On Mon, Jun 16, 2025 at 07:09:17PM -0400, Tom Lane wrote:
>> I find myself increasingly persuaded by Corey's point of view ...
> 
> +1

Can you clarify how using on-by-default would simplify things?
I'm not sure it actually makes the options simpler.

Regards,

-- 
Fujii Masao
NTT DATA Japan Corporation




Re: pg_dump --with-* options

От
Jeff Davis
Дата:
On Mon, 2025-06-16 at 15:35 -0400, Corey Huinker wrote:
>
> I think this is the exact sort of confusion caused by having two of
> the three types default to on in all circumstances, and one default
> to off in one special circumstance.

That's certainly a part of the confusion, but the "--x-only" options
also put us in a tough spot.

If --data-only had always been spelled "--no-schema" (or "--without-
data" or whatever), and --schema-only had always been spelled "--no-
data", then I think it would be a lot easier to add statistics into the
mix.

Regards,
    Jeff Davis




Re: pg_dump --with-* options

От
Jeff Davis
Дата:
On Thu, 2025-06-12 at 08:58 -0700, Jeff Davis wrote:
> On Thu, 2025-06-12 at 09:52 -0500, Nathan Bossart wrote:
> > If the idea is to remove all options for default behavior, we'd be
> > removing
> > --no-statistics, --with-data, and --with-schema at this point.
>
> That's OK with me.

Actually, I take that back, we can't just remove --no-statistics.
Remember that statistics currently default to "on" for pg_restore even
though they default "off" for pg_dump.

So pg_restore still needs a way to turn stats off.

Regards,
    Jeff Davis




Re: pg_dump --with-* options

От
Nathan Bossart
Дата:
On Wed, Jun 18, 2025 at 08:29:16AM -0700, Jeff Davis wrote:
> Actually, I take that back, we can't just remove --no-statistics.
> Remember that statistics currently default to "on" for pg_restore even
> though they default "off" for pg_dump.
> 
> So pg_restore still needs a way to turn stats off.

IIUC the current proposal is to:

* Dump/restore stats by default.
* Keep the --no-statistics, --no-schema, and --no-data options.
* Keep the --statistics-only, --schema-only, and --data-only options.
* Remove the --with-statistics, --with-schema, and --with-data options.

How does that sound?

-- 
nathan



Re: pg_dump --with-* options

От
Jeff Davis
Дата:
On Wed, 2025-06-18 at 10:43 -0500, Nathan Bossart wrote:
> IIUC the current proposal is to:
>
> * Dump/restore stats by default.

IIUC some people still object to this. Turning stats off by default was
on the Open Items list. At this point I think we need a pretty strong
consensus to override that and I'm not sure we have one right now.

Regards,
    Jeff Davis




Re: pg_dump --with-* options

От
Nathan Bossart
Дата:
On Wed, Jun 18, 2025 at 09:53:01AM -0700, Jeff Davis wrote:
> On Wed, 2025-06-18 at 10:43 -0500, Nathan Bossart wrote:
>> IIUC the current proposal is to:
>> 
>> * Dump/restore stats by default.
> 
> IIUC some people still object to this. Turning stats off by default was
> on the Open Items list. At this point I think we need a pretty strong
> consensus to override that and I'm not sure we have one right now.

Okay, so I see two main choices on the table:

1) Turn on stats by default in pg_dump.  Keep --no-* flags and --*-only
flags, and remove the --with-* flags.

2) Keep stats off by default in pg_dump.  Keep --no-{schema,data} flags and
--*-only flags, remove --no-statistics and --with-{schema,data}, and rename
--with-statistics to --statistics.

Is that an accurate summary?

-- 
nathan



Re: pg_dump --with-* options

От
Jeff Davis
Дата:
On Wed, 2025-06-18 at 10:43 -0500, Nathan Bossart wrote:
> IIUC the current proposal is to:
>
> * Dump/restore stats by default.
> * Keep the --no-statistics, --no-schema, and --no-data options.
> * Keep the --statistics-only, --schema-only, and --data-only options.
> * Remove the --with-statistics, --with-schema, and --with-data
> options.
>
> How does that sound?

For those that want to keep stats off by default, another proposal
might be:

  * keep stats defaults as they are now (pg_dump defaults to "off",
pg_restore defaults to "on")
  * delete the --with-data and --with-schema options as unnecessary
  * (maybe?) rename --with-statistics to --statistics
  * keep --no-statistics, --no-data, --no-schema (or use "without"
instead?)
  * (maybe?) keep --statistics-only
  * reject the combination of an "only" option and a "with" option

That removes 2 or three options, and rejects some ambiguous
combinations. That would seem to address some of the immediate concerns
raised in this thread, without needing to relitigate the default.

It also leaves the door open to potentially change the default for
stats to "on" in the future, because we will have both --with-
statistics and --no-statistics.

The only downside of this approach is that we'd be stuck with both --
with-statistics and --no-statistics forever. That's a bit inconsistent
with the other options, and it doesn't satisfy Robert's concern about
the --help output. But Robert also wants stats off by default for
pg_dump and on by default for pg_restore, which I think means we need
both --with-statistics and --no-statistics anyway. Robert, comments?

Regards,
    Jeff Davis




Re: pg_dump --with-* options

От
Robert Haas
Дата:
On Wed, Jun 18, 2025 at 1:21 PM Jeff Davis <pgsql@j-davis.com> wrote:
> The only downside of this approach is that we'd be stuck with both --
> with-statistics and --no-statistics forever. That's a bit inconsistent
> with the other options, and it doesn't satisfy Robert's concern about
> the --help output. But Robert also wants stats off by default for
> pg_dump and on by default for pg_restore, which I think means we need
> both --with-statistics and --no-statistics anyway. Robert, comments?

Sorry, I've been largely away from email for the last week due to work
commitments.

I had thought we had a consensus that pg_upgrade should preserve stats
but regularly pg_dump shouldn't include them; perhaps I misunderstood
or that changed.

What confuses me about what you've written here specifically is that
pg_dump and pg_restore are different programs with different option
sets. So when you say we need both --with-statistics and
--no-statistics, I guess that's true, but we're not talking about the
same executable in both cases. It seems to me that pg_restore should
restore everything that was dumped, but that there should be (as there
are) various --no-whatever switches to skip unwanted items. But
pg_dump should have dump a reasonable set of things by default, and
the user should be able to add to that or subtract from it.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: pg_dump --with-* options

От
Nathan Bossart
Дата:
On Mon, Jun 23, 2025 at 01:38:10PM -0400, Robert Haas wrote:
> I had thought we had a consensus that pg_upgrade should preserve stats
> but regularly pg_dump shouldn't include them; perhaps I misunderstood
> or that changed.

I think it's a bit of both.  I skimmed through the past discussions and
found that not only was there a rough consensus in 2024 that stats _should_
be on by default [0], but also that an updated vote tally didn't show much
of a consensus at all [1].  Like you, I thought we had pretty much closed
that door, but the aforementioned analysis along with further discussion
has me convinced that we might want to reconsider [2].

[0] https://postgr.es/m/e16cd9caf4f5229a152d318d70b4d323a03e3539.camel@j-davis.com
[1] https://postgr.es/m/aFCIB1AwXuNzxHXX%40nathan
[2] https://postgr.es/m/aFC9rWSeFz7c07uI%40nathan

-- 
nathan



Re: pg_dump --with-* options

От
Robert Haas
Дата:
On Tue, Jun 24, 2025 at 12:48 PM Nathan Bossart
<nathandbossart@gmail.com> wrote:
> On Mon, Jun 23, 2025 at 01:38:10PM -0400, Robert Haas wrote:
> > I had thought we had a consensus that pg_upgrade should preserve stats
> > but regularly pg_dump shouldn't include them; perhaps I misunderstood
> > or that changed.
>
> I think it's a bit of both.  I skimmed through the past discussions and
> found that not only was there a rough consensus in 2024 that stats _should_
> be on by default [0], but also that an updated vote tally didn't show much
> of a consensus at all [1].  Like you, I thought we had pretty much closed
> that door, but the aforementioned analysis along with further discussion
> has me convinced that we might want to reconsider [2].

Well, I don't know: I still think that's the right answer, so I don't
really want to reconsider, but I understand that I'm not in charge
here.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: pg_dump --with-* options

От
Greg Sabino Mullane
Дата:
On Wed, Jun 18, 2025 at 11:43 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
IIUC the current proposal is to:

* Dump/restore stats by default.
* Keep the --no-statistics, --no-schema, and --no-data options.
* Keep the --statistics-only, --schema-only, and --data-only options.
* Remove the --with-statistics, --with-schema, and --with-data options.

This is so close to ideal. It's just that the first bullet should be "off by default" :)

I think pg_restore and pg_upgrade are solved problems at this point. I'm still not convinced why stats should be on by default, as they are metadata - neither schema nor data, but something special that should be explicitly requested. Also, +1 to the idea of --statistics-only as a QA / debug tool as someone mentioned upthread.

Cheers,
Greg

--
Enterprise Postgres Software Products & Tech Support

Re: pg_dump --with-* options

От
Fujii Masao
Дата:

On 2025/06/25 5:07, Robert Haas wrote:
> On Tue, Jun 24, 2025 at 12:48 PM Nathan Bossart
> <nathandbossart@gmail.com> wrote:
>> On Mon, Jun 23, 2025 at 01:38:10PM -0400, Robert Haas wrote:
>>> I had thought we had a consensus that pg_upgrade should preserve stats
>>> but regularly pg_dump shouldn't include them; perhaps I misunderstood
>>> or that changed.
>>
>> I think it's a bit of both.  I skimmed through the past discussions and
>> found that not only was there a rough consensus in 2024 that stats _should_
>> be on by default [0], but also that an updated vote tally didn't show much
>> of a consensus at all [1].  Like you, I thought we had pretty much closed
>> that door, but the aforementioned analysis along with further discussion
>> has me convinced that we might want to reconsider [2].
> 
> Well, I don't know: I still think that's the right answer, so I don't
> really want to reconsider, but I understand that I'm not in charge
> here.

For the record, my vote is: default "off" for pg_dump and pg_dumpall,
and "on" for pg_restore.

For pg_dump and pg_dumpall, I agree with Jeff's idea in [1],
but if the statistics is skipped by default, I don't think
we need a --no-statistics option. So, here's how I think
the options should work:

     * Keep: --schema-only, --data-only, --statistics-only, --no-schema, --no-data, -and -statistics
     * Remove: --no-statistics, --with-schema, and --with-data
     * Combinations:
         Schema + Data + Stats     : --statistics
         Schema + Data             : (default)
         Schema + Stats            : --no-data --statistics
         Data + Stats              : --no-schema --statistics
         Schema only               : --schema-only   (or --no-data)
         Data only                 : --data-only     (or --no-schema)
         Stats only                : --statistics-only (or --no-schema --no-data --statistics)

As I mentioned in [2], if we treat --statistics in the similar way to
--sequence-data, i.e., allow --statistics to be used with --schema-only
or --data-only, we could simplify further:

     * Keep: --schema-only, --data-only, --statistics-only, and --statistics
     * Remove: --no-schema, --no-data, --no-statistics, --with-schema, and --with-data
     * Combinations:
         Schema + Data + Stats     : --statistics
         Schema + Data             : (default)
         Schema + Stats            : --schema-only --statistics
         Data + Stats              : --data-only --statistics
         Schema only               : --schema-only
         Data only                 : --data-only
         Stats only                : --statistics-only

Some may find this confusing due to mixing --statistics with --schema-only
or --data-only, so I understand if there's hesitation.

For pg_restore, I believe there's agreement to restore statistics
by default if they exist in the archive. So:

     * Keep: --schema-only, --data-only, --statistics-only, --no-schema, --no-data, and --no-statistics
     * Remove: --with-schema, --with-data, and --statistics
     * Combinations:
         Schema + Data + Stats     : (default)
         Schema + Data             : --no-statistics
         Schema + Stats            : --no-data
         Data + Stats              : --no-schema
         Schema only               : --schema-only   (or --no-data --no-statistics)
         Data only                 : --data-only     (or --no-schema --no-statistics)
         Stats only                : --statistics-only (or --no-schema --no-data)

Thought?

Regards,

[1] https://postgr.es/m/031558c60e84362898922caa6a90587e7fdf2a57.camel@j-davis.com
[2] https://postgr.es/m/94f89b0a-5d83-4a67-9092-50ba3913441c@oss.nttdata.com

-- 
Fujii Masao
NTT DATA Japan Corporation




Re: pg_dump --with-* options

От
Nathan Bossart
Дата:
On Tue, Jun 24, 2025 at 06:14:55PM -0400, Greg Sabino Mullane wrote:
> On Wed, Jun 18, 2025 at 11:43 AM Nathan Bossart <nathandbossart@gmail.com>
> wrote:
>> IIUC the current proposal is to:
>>
>> * Dump/restore stats by default.
>> * Keep the --no-statistics, --no-schema, and --no-data options.
>> * Keep the --statistics-only, --schema-only, and --data-only options.
>> * Remove the --with-statistics, --with-schema, and --with-data options.
> 
> This is so close to ideal. It's just that the first bullet should be "off
> by default" :)

If we did that, the only way to dump statistics would be --statistics-only,
right?  You wouldn't be able to include statistics along with other
things.

-- 
nathan



Re: pg_dump --with-* options

От
Nathan Bossart
Дата:
On Wed, Jun 25, 2025 at 08:18:28AM +0900, Fujii Masao wrote:
> For pg_dump and pg_dumpall, I agree with Jeff's idea in [1],
> but if the statistics is skipped by default, I don't think
> we need a --no-statistics option. So, here's how I think
> the options should work:
> 
>     * Keep: --schema-only, --data-only, --statistics-only, --no-schema, --no-data, -and -statistics
>     * Remove: --no-statistics, --with-schema, and --with-data
>     * Combinations:
>         Schema + Data + Stats     : --statistics
>         Schema + Data             : (default)
>         Schema + Stats            : --no-data --statistics
>         Data + Stats              : --no-schema --statistics
>         Schema only               : --schema-only   (or --no-data)
>         Data only                 : --data-only     (or --no-schema)
>         Stats only                : --statistics-only (or --no-schema --no-data --statistics)

I believe this is equivalent to the second option I proposed upthread [0].
Jeff proposed a variation of this option that keeps --no-statistics around
so that we could more easily change the default for stats down the road
[1].

> As I mentioned in [2], if we treat --statistics in the similar way to
> --sequence-data, i.e., allow --statistics to be used with --schema-only
> or --data-only, we could simplify further:
> 
>     * Keep: --schema-only, --data-only, --statistics-only, and --statistics
>     * Remove: --no-schema, --no-data, --no-statistics, --with-schema, and --with-data
>     * Combinations:
>         Schema + Data + Stats     : --statistics
>         Schema + Data             : (default)
>         Schema + Stats            : --schema-only --statistics
>         Data + Stats              : --data-only --statistics
>         Schema only               : --schema-only
>         Data only                 : --data-only
>         Stats only                : --statistics-only
> 
> Some may find this confusing due to mixing --statistics with --schema-only
> or --data-only, so I understand if there's hesitation.

Hm.  I didn't really intend for --sequence-data to set a precedent here.
That's mostly intended as a submode for --binary-upgrade.  Perhaps we
should consider removing it as a documented option and instead convert it
to --binary-upgrade=sequence-data or something.  In any case, allowing
"only" options to be used in conjunction with --statistics seems a little
confusing to me.  But I'm not strongly opposed to the idea.

> For pg_restore, I believe there's agreement to restore statistics
> by default if they exist in the archive. So:
> 
>     * Keep: --schema-only, --data-only, --statistics-only, --no-schema, --no-data, and --no-statistics
>     * Remove: --with-schema, --with-data, and --statistics
>     * Combinations:
>         Schema + Data + Stats     : (default)
>         Schema + Data             : --no-statistics
>         Schema + Stats            : --no-data
>         Data + Stats              : --no-schema
>         Schema only               : --schema-only   (or --no-data --no-statistics)
>         Data only                 : --data-only     (or --no-schema --no-statistics)
>         Stats only                : --statistics-only (or --no-schema --no-data)

+1

[0] https://postgr.es/m/aFLxvrh71VWqdL9A%40nathan
[1] https://postgr.es/m/031558c60e84362898922caa6a90587e7fdf2a57.camel%40j-davis.com

-- 
nathan



Re: pg_dump --with-* options

От
Greg Sabino Mullane
Дата:
On Wed, Jun 25, 2025 at 10:36 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
> This is so close to ideal. It's just that the first bullet should be "off by default" :)

If we did that, the only way to dump statistics would be --statistics-only, right?  You wouldn't be able to include statistics along with other
things.

Oh, right, would also need a --statistics
 
Cheers,
Greg

--
Enterprise Postgres Software Products & Tech Support

Re: pg_dump --with-* options

От
Jeff Davis
Дата:
On Mon, 2025-06-23 at 13:38 -0400, Robert Haas wrote:


> What confuses me about what you've written here specifically is that
> pg_dump and pg_restore are different programs with different option
> sets. So when you say we need both --with-statistics and
> --no-statistics, I guess that's true, but we're not talking about the
> same executable in both cases. It seems to me that pg_restore should
> restore everything that was dumped, but that there should be (as
> there
> are) various --no-whatever switches to skip unwanted items. But
> pg_dump should have dump a reasonable set of things by default, and
> the user should be able to add to that or subtract from it.

True, we could have different options for pg_dump and pg_restore, but
to me that seems a little strange because so many of the other options
overlap. I figured that would be confusing, but maybe it's fine.

Regards,
    Jeff Davis




Re: pg_dump --with-* options

От
Jeff Davis
Дата:
On Wed, 2025-06-18 at 10:21 -0700, Jeff Davis wrote:
>   * reject the combination of an "only" option and a "with" option

There seems to be a rough consensus on this point. Should we move ahead
with this small change and see if we can get consensus to go further?

Regards,
    Jeff Davis




Re: pg_dump --with-* options

От
Jeff Davis
Дата:
On Wed, 2025-06-25 at 08:18 +0900, Fujii Masao wrote:
> For pg_dump and pg_dumpall, I agree with Jeff's idea in [1],
> but if the statistics is skipped by default, I don't think
> we need a --no-statistics option. So, here's how I think
> the options should work:
>
>      * Keep: --schema-only, --data-only, --statistics-only, --no-
> schema, --no-data, -and -statistics
>      * Remove: --no-statistics, --with-schema, and --with-data

...

> For pg_restore, I believe there's agreement to restore statistics
> by default if they exist in the archive. So:
>
>      * Keep: --schema-only, --data-only, --statistics-only, --no-
> schema, --no-data, and --no-statistics
>      * Remove: --with-schema, --with-data, and --statistics

That means pg_dump will accept --statistics and reject --no-statistics;
and pg_restore will accept --no-statistics and reject --statistics.
Other options are mostly the same between them, so I'm not sure it's a
good idea for them to diverge.

Regards,
    Jeff Davis




Re: pg_dump --with-* options

От
Fujii Masao
Дата:

On 2025/07/11 2:57, Jeff Davis wrote:
> On Wed, 2025-06-25 at 08:18 +0900, Fujii Masao wrote:
>> For pg_dump and pg_dumpall, I agree with Jeff's idea in [1],
>> but if the statistics is skipped by default, I don't think
>> we need a --no-statistics option. So, here's how I think
>> the options should work:
>>
>>       * Keep: --schema-only, --data-only, --statistics-only, --no-
>> schema, --no-data, -and -statistics
>>       * Remove: --no-statistics, --with-schema, and --with-data
>
> ...
>
>> For pg_restore, I believe there's agreement to restore statistics
>> by default if they exist in the archive. So:
>>
>>       * Keep: --schema-only, --data-only, --statistics-only, --no-
>> schema, --no-data, and --no-statistics
>>       * Remove: --with-schema, --with-data, and --statistics
>
> That means pg_dump will accept --statistics and reject --no-statistics;
> and pg_restore will accept --no-statistics and reject --statistics.
> Other options are mostly the same between them, so I'm not sure it's a
> good idea for them to diverge.

I agree it would be better to have the same options in both pg_dump
and pg_restore, if possible.

But to do that, we'd either need to make pg_dump dump statistics
by default, or allow redundant options like --statistics in pg_restore,
even though it already restores statistics by default.

As I understand it, the rough consensus so far is that we'd prefer to
avoid both of these approaches. I know some want to change the default
behavior about statistics in pg_dump, though.

But, are you suggesting we go with one of them?

Regards,

--
Fujii Masao
NTT DATA Japan Corporation




Re: pg_dump --with-* options

От
Jeff Davis
Дата:
On Fri, 2025-07-11 at 09:12 +0900, Fujii Masao wrote:
> But to do that, we'd either need to make pg_dump dump statistics
> by default, or allow redundant options like --statistics in
> pg_restore,
> even though it already restores statistics by default.

Redundant options might be annoying, but I don't see them as a major
problem.

> As I understand it, the rough consensus so far is that we'd prefer to
> avoid both of these approaches.

I'm not clear what the consensus approach is, then. Can you elaborate?

> I know some want to change the default
> behavior about statistics in pg_dump, though.

I don't see a consensus to make stats the default.

Regards,
    Jeff Davis




Re: pg_dump --with-* options

От
Jeff Davis
Дата:
On Thu, 2025-07-10 at 10:42 -0700, Jeff Davis wrote:
> On Wed, 2025-06-18 at 10:21 -0700, Jeff Davis wrote:
> >   * reject the combination of an "only" option and a "with" option
>
> There seems to be a rough consensus on this point.

Patch attached.

Regards,
    Jeff Davis


Вложения

Re: pg_dump --with-* options

От
Álvaro Herrera
Дата:
On 2025-Jul-29, Jeff Davis wrote:

> +    /* reject conflicting "only-" and "with-" options */
> +    if (data_only && with_schema)
> +        pg_fatal("options -a/--data-only and --with-schema cannot be used together");
> +    if (data_only && with_statistics)
> +        pg_fatal("options -a/--data-only and --with-statistics cannot be used together");
> +    if (schema_only && with_data)
> +        pg_fatal("options -s/--schema-only and --with-data cannot be used together");
> +    if (schema_only && with_statistics)
> +        pg_fatal("options -s/--schema-only and --with-statistics cannot be used together");
> +    if (statistics_only && with_data)
> +        pg_fatal("options --statistics-only and --with-data cannot be used together");
> +    if (statistics_only && with_schema)
> +        pg_fatal("options --statistics-only and --with-schema cannot be used together");

Please move the switches themselves out of the translatable message,
otherwise there are too many of them.  For instance,

    pg_fatal("options %s and %s cannot be used together",
             "-s/--schema-only", "--with-statistics");

Thanks

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/



Re: pg_dump --with-* options

От
Jeff Davis
Дата:
On Wed, 2025-06-18 at 10:21 -0700, Jeff Davis wrote:
> On Wed, 2025-06-18 at 10:43 -0500, Nathan Bossart wrote:
> > IIUC the current proposal is to:
> >
> > * Dump/restore stats by default.

We don't have a consensus for that, so unless a few people make an
abrupt turnaround, this will remain off for v18.

> > * Keep the --no-statistics, --no-schema, and --no-data options.
> > * Keep the --statistics-only, --schema-only, and --data-only
> > options.
> > * Remove the --with-statistics, --with-schema, and --with-data
> > options.

The other parts of the proposal are very similar with mine below:

>   * delete the --with-data and --with-schema options as unnecessary
>   * (maybe?) rename --with-statistics to --statistics
>   * keep --no-statistics, --no-data, --no-schema (or use "without"
> instead?)
>   * (maybe?) keep --statistics-only
>   * reject the combination of an "only" option and a "with" option

The main difference is whether we:

  (a) keep both --statistics and --no-statistics in pg_dump and
pg_restore; or

  (b) for pg_dump have --statistics but not --no-statistics and for
pg_restore have --no-statistics but not --statistics.

Option (a) creates redundancies whereas (b) creates a divergence. I
suppose I prefer (a), but I don't have a terribly strong opinion and if
more people prefer (b) then I'll go with that.

Regards,
    Jeff Davis




Re: pg_dump --with-* options

От
Jeff Davis
Дата:
On Tue, 2025-07-29 at 20:22 +0200, Álvaro Herrera wrote:
> Please move the switches themselves out of the translatable message,
> otherwise there are too many of them.  For instance,

Thank you for looking, v2 attached.

Regards,
    Jeff Davis


Вложения

Re: pg_dump --with-* options

От
Álvaro Herrera
Дата:
On 2025-Jun-25, Fujii Masao wrote:

> For the record, my vote is: default "off" for pg_dump and pg_dumpall,
> and "on" for pg_restore.

I don't know if this horse is already dead, so bear with me while I beat
it a little more.

> [...] we could simplify further:
> 
>     * Keep: --schema-only, --data-only, --statistics-only, and --statistics
>     * Remove: --no-schema, --no-data, --no-statistics, --with-schema, and --with-data
>     * Combinations:
>         Schema + Data + Stats     : --statistics
>         Schema + Data             : (default)
>         Schema + Stats            : --schema-only --statistics
>         Data + Stats              : --data-only --statistics
>         Schema only               : --schema-only
>         Data only                 : --data-only
>         Stats only                : --statistics-only

I think combinatorial explosions of options is not a great situation to
be in, particularly if we have to make endless decisions on which
combinations are valid or sensible.  Maybe we should invent a new
switch, something like
  --include=[schema,data,statistics]
with which users can give one or more comma-separated types to be
included in the dump.  

Then we state that --data-only is synonym for --include=data and
--schema-only is synonym for --include=schema, and we don't need any
other switches.  Then it is obvious what happens, how to combine
object types in the dumps and restores, and there's no need to reject
invalid combinations because there aren't any.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Investigación es lo que hago cuando no sé lo que estoy haciendo"
(Wernher von Braun)



Re: pg_dump --with-* options

От
Jeff Davis
Дата:
On Wed, 2025-07-30 at 10:23 +0200, Álvaro Herrera wrote:
> Maybe we should invent a new
> switch, something like
>   --include=[schema,data,statistics]
> with which users can give one or more comma-separated types to be
> included in the dump. 

Robert Treat brought up a similar idea before:

https://www.postgresql.org/message-id/CABV9wwO5v8Nu8q%2BxWexMdL3Z%2B2xS%3DfFJMQetBSHy3tR64wNHOA%40mail.gmail.com

I'm certainly open to (re-)considering it.

> Then we state that --data-only is synonym for --include=data and
> --schema-only is synonym for --include=schema, and we don't need any
> other switches.  Then it is obvious what happens, how to combine
> object types in the dumps and restores, and there's no need to reject
> invalid combinations because there aren't any.

I assume that should be read as something like "include only", because
--include=data would also be excluding the schema and the stats. And if
that's the case, it does seem strange to do something like "--
include=data --schema-only".

Another question: could you have multiple --include options, like "--
include=data --include=schema"? Because you currently can't do "--data-
only --schema-only". So that would make it not quite an alias.

If we go in this direction, it might be easier to just say that --
include conflicts with --schema-only and --data-only.

Regards,
    Jeff Davis




Re: pg_dump --with-* options

От
Álvaro Herrera
Дата:
On 2025-Jul-30, Jeff Davis wrote:

> On Wed, 2025-07-30 at 10:23 +0200, Álvaro Herrera wrote:
> > Maybe we should invent a new
> > switch, something like
> >   --include=[schema,data,statistics]
> > with which users can give one or more comma-separated types to be
> > included in the dump.  
> 
> Robert Treat brought up a similar idea before:
> 
> https://www.postgresql.org/message-id/CABV9wwO5v8Nu8q%2BxWexMdL3Z%2B2xS%3DfFJMQetBSHy3tR64wNHOA%40mail.gmail.com

Oh, I hadn't seen it, but now that I do, it seems identical to mine.  He
even used some of the same terms I did.

> > Then we state that --data-only is synonym for --include=data and
> > --schema-only is synonym for --include=schema, and we don't need any
> > other switches.  Then it is obvious what happens, how to combine
> > object types in the dumps and restores, and there's no need to reject
> > invalid combinations because there aren't any.
> 
> I assume that should be read as something like "include only", because
> --include=data would also be excluding the schema and the stats.

Of course.

> And if that's the case, it does seem strange to do something like "--
> include=data --schema-only".

> Another question: could you have multiple --include options, like "--
> include=data --include=schema"? Because you currently can't do "--data-
> only --schema-only". So that would make it not quite an alias.

IMO we should only allow one --include.

> If we go in this direction, it might be easier to just say that --
> include conflicts with --schema-only and --data-only.

Yep.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Small aircraft do not crash frequently ... usually only once!"
                                  (ponder, http://thedailywtf.com/)



Re: pg_dump --with-* options

От
Corey Huinker
Дата:
 
> I assume that should be read as something like "include only", because
> --include=data would also be excluding the schema and the stats.

Of course.

In general, I like the idea of --include, but it would need to be consistent in behavior across pg_dump/pg_restore/pg_upgrade(if applicable).

Under the current defaults, if a person wanted a dump to include stats on pg_dump they would need "--include=data,schema,statistics" (checking first whether it was "statistics" or "stats"), but they could use the defaults on pg_restore. Using the defaults on both would allow for stats to be restored, but none would have been generated. That is confusing to me, and I imagine it will be even more confusing to somebody who hasn't been reading this thread.

I don't see anything wrong with a full complement of --something-only, --no-something flags. The combinatorics aren't that hard, only tedious. If we *must* make the defaults for pg_dump different from pg_restore and pg_upgrade, then adding the --with-something flags becomes necessary, and we let redundant non-conflicting combinations slide.

All of these problems are solved if we include statistics by default across all three programs. Anything else involves complicated explanations that will confuse the users who read them, and surprise the users who don't.



Re: pg_dump --with-* options

От
Jeff Davis
Дата:
On Tue, 2025-07-29 at 11:24 -0700, Jeff Davis wrote:
> On Wed, 2025-06-18 at 10:21 -0700, Jeff Davis wrote:
> > On Wed, 2025-06-18 at 10:43 -0500, Nathan Bossart wrote:
> > > IIUC the current proposal is to:
> > >
> > > * Dump/restore stats by default.
>
> We don't have a consensus for that, so unless a few people make an
> abrupt turnaround, this will remain off for v18.
>
> > > * Keep the --no-statistics, --no-schema, and --no-data options.
> > > * Keep the --statistics-only, --schema-only, and --data-only
> > > options.
> > > * Remove the --with-statistics, --with-schema, and --with-data
> > > options.
>
> The other parts of the proposal are very similar with mine below:
>
> >   * delete the --with-data and --with-schema options as unnecessary
> >   * (maybe?) rename --with-statistics to --statistics
> >   * keep --no-statistics, --no-data, --no-schema (or use "without"
> > instead?)
> >   * (maybe?) keep --statistics-only
> >   * reject the combination of an "only" option and a "with" option

Patch attached.

  * removes --with-data and --with-schema (redundant)
  * renames --with-statistics to just --statistics

I kept --statistics and --no-statistics for both pg_dump and
pg_restore, because: (a) I think it's good to have consistent options
between those two programs; and (b) it allows us to potentially change
the default to include statistics in the future. That leaves some
redundancy of the options, which some have expressed annoyance over,
but it doesn't seem like a major point of objection.

Regards,
    Jeff Davis


Вложения

Re: pg_dump --with-* options

От
Jeff Davis
Дата:
On Thu, 2025-07-31 at 16:28 -0400, Corey Huinker wrote:
>
> In general, I like the idea of --include, but it would need to be
> consistent in behavior across pg_dump/pg_restore/pg_upgrade(if
> applicable).

How should you exclude stats when doing pg_restore? Presumably, --
include=data,schema. But it's a bit strange if "--include" is the only
way to exclude something.

There are enough nuances and details here that I think the next step is
for someone to turn the idea for --include into a reviewable patch, so
that we can compare it to what we have now and see if people generally
think it's an improvement over what we have now.

Otherwise, we should just proceed with:

https://www.postgresql.org/message-id/40cedfc22da152928a74d472708aaadb8855d8d9.camel@j-davis.com

and close the open item.

Regards,
    Jeff Davis




Re: pg_dump --with-* options

От
Corey Huinker
Дата:
On Fri, Aug 1, 2025 at 4:02 PM Jeff Davis <pgsql@j-davis.com> wrote:
On Thu, 2025-07-31 at 16:28 -0400, Corey Huinker wrote:
>
> In general, I like the idea of --include, but it would need to be
> consistent in behavior across pg_dump/pg_restore/pg_upgrade(if
> applicable).

How should you exclude stats when doing pg_restore? Presumably, --
include=data,schema. But it's a bit strange if "--include" is the only
way to exclude something.

Yes, that's how you'd do it, if we go with the request for one --include option (or series of options) and no --exclude option (or series of options). I was under the impression that was the stated feature of --include.
 
There are enough nuances and details here that I think the next step is
for someone to turn the idea for --include into a reviewable patch, so
that we can compare it to what we have now and see if people generally
think it's an improvement over what we have now.

If the defaults aren't changing, then --include is a big step backwards, requiring --include=data,schema,statistics to actually get statistics in a dump. I think that's cumbersome and weird.

Re: pg_dump --with-* options

От
Nathan Bossart
Дата:
On Fri, Aug 01, 2025 at 12:42:16PM -0700, Jeff Davis wrote:
> -      <term><option>--with-statistics</option></term>
> +      <term><option>--statistics</option></term>

> -      <term><option>--with-statistics</option></term>
> +      <term><option>--statistics</option></term>

> -      <term><option>--with-statistics</option></term>
> +      <term><option>--statistics</option></term>

nitpick: --statistics should be moved to maintain alphabetical ordering.

-- 
nathan



Re: pg_dump --with-* options

От
Fujii Masao
Дата:
On Sat, Aug 2, 2025 at 4:42 AM Jeff Davis <pgsql@j-davis.com> wrote:
> Patch attached.
>
>   * removes --with-data and --with-schema (redundant)
>   * renames --with-statistics to just --statistics
>
> I kept --statistics and --no-statistics for both pg_dump and
> pg_restore, because: (a) I think it's good to have consistent options
> between those two programs; and (b) it allows us to potentially change
> the default to include statistics in the future. That leaves some
> redundancy of the options, which some have expressed annoyance over,
> but it doesn't seem like a major point of objection.

I'm OK with this approach. Thanks for the patch! It looks good to me.

While not directly related to your patch, I feel inclined to simplify option
handling in pg_dump, similar to what we've already done in pg_restore.c
and pg_dumpall.c. For example, we could change how statistics_only is handled
like this:

---------------------------------------
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -448,7 +448,7 @@ main(int argc, char **argv)
        DataDirSyncMethod sync_method = DATA_DIR_SYNC_METHOD_FSYNC;
        bool            data_only = false;
        bool            schema_only = false;
-       bool            statistics_only = false;
+       static int      statistics_only = 0;
        bool            with_statistics = false;
        bool            no_data = false;
        bool            no_schema = false;
@@ -513,7 +513,7 @@ main(int argc, char **argv)
                {"serializable-deferrable", no_argument,
&dopt.serializable_deferrable, 1},
                {"snapshot", required_argument, NULL, 6},
                {"statistics", no_argument, NULL, 22},
-               {"statistics-only", no_argument, NULL, 18},
+               {"statistics-only", no_argument, &statistics_only, 1},
                {"strict-names", no_argument, &strict_names, 1},
                {"use-set-session-authorization", no_argument,
&dopt.use_setsessauth, 1},
                {"no-comments", no_argument, &dopt.no_comments, 1},
@@ -777,10 +777,6 @@ main(int argc, char **argv)

           optarg);
                                break;

-                       case 18:
-                               statistics_only = true;
-                               break;
-
                        case 19:
                                no_data = true;
                                break;
---------------------------------------

We could apply the same pattern to others like no_data, no_schema,
no_statistics, and with_statistics to make the code cleaner and more
consistent across tools.

Regards,

--
Fujii Masao



Re: pg_dump --with-* options

От
Jeff Davis
Дата:
On Fri, 2025-08-01 at 17:46 -0500, Nathan Bossart wrote:
> On Fri, Aug 01, 2025 at 12:42:16PM -0700, Jeff Davis wrote:
> > -      <term><option>--with-statistics</option></term>
> > +      <term><option>--statistics</option></term>
>
> > -      <term><option>--with-statistics</option></term>
> > +      <term><option>--statistics</option></term>
>
> > -      <term><option>--with-statistics</option></term>
> > +      <term><option>--statistics</option></term>
>
> nitpick: --statistics should be moved to maintain alphabetical
> ordering.
>

Thank you. I committed this and closed the Open Item.

I did not address the following issues:

  * Did not change default to on: there was no consensus on this point,
and the default-off opinions were stronger. We may be able to revisit
for PG19.
  * Did not add --include syntax: no patch and no consensus on the
details.
  * Did not remove redundant --no-statistics option for pg_dump and
redundant --statistics option for pg_restore: kept for consistency
between pg_dump and pg_restore, and to allow us to potentially change
to default-on in the future.

If someone feels one of these needs to be addressed in PG18, let me
know.

Regards,
    Jeff Davis




Re: pg_dump --with-* options

От
Jeff Davis
Дата:
On Sat, 2025-08-02 at 22:58 +0900, Fujii Masao wrote:
> I'm OK with this approach. Thanks for the patch! It looks good to me.

Thank you for the review.

> -       bool            statistics_only = false;
> +       static int      statistics_only = 0;

Agreed. That can just be done for 19 though.

Regards,
    Jeff Davis