Обсуждение: pg_dump --with-* options
I'm looking at the new in PG18 pg_dump --with-* options, and I'm having trouble understanding them. (I did not look into the source code or the git or mailing list history for this, to try to understand it as a user.) We have -a, --data-only dump only the data, not the schema or statistics --no-data do not dump data --with-data dump the data # this one is new (and there is also --section=data), and then three analogous options for "schema" and "statistics". What is the purpose of the --with-data option? Dumping the data is the default. Is this to override an earlier --no-data option? The man page is only minimally more verbose: "Dump data. This is the default." But why do you then need this option? I think we should add some more documenting detail for these, but right now I don't know what it would be.
On Fri, Jun 06, 2025 at 09:14:32AM +0200, Peter Eisentraut wrote: > We have > > -a, --data-only dump only the data, not the schema or statistics > --no-data do not dump data > --with-data dump the data # this one is new > > (and there is also --section=data), and then three analogous options for > "schema" and "statistics". > > What is the purpose of the --with-data option? Dumping the data is the > default. Is this to override an earlier --no-data option? I believe the idea is that these will allow folks to be explicit about what they want instead of needing to understand the defaults for every component. -- nathan
On 06.06.25 17:39, Nathan Bossart wrote: > On Fri, Jun 06, 2025 at 09:14:32AM +0200, Peter Eisentraut wrote: >> We have >> >> -a, --data-only dump only the data, not the schema or statistics >> --no-data do not dump data >> --with-data dump the data # this one is new >> >> (and there is also --section=data), and then three analogous options for >> "schema" and "statistics". >> >> What is the purpose of the --with-data option? Dumping the data is the >> default. Is this to override an earlier --no-data option? > > I believe the idea is that these will allow folks to be explicit about what > they want instead of needing to understand the defaults for every > component. I get that idea, but we really need some more documentation for this, I think. So far I could only guess how this is supposed to be used, and I also happened to guess wrong. My initial guess was that --with-data can override --no-data. That would have been pretty standard "last option wins" behavior. But pg_dump rejects that. Personally, I think that is kind of wrong. But you can use --with-data to override, say, --schema-only. That also seems kind of wrong to me, but anyway. Then, it seems that --schema-only plus --with-data is kind of the same as everything plus --no-statistics? I'm also wondering, if you consider the aforementioned "last option wins" behavior, should there be a difference between --schema-only --with-data, and --with-data --schema-only or for that matter --schema-only --data-only But in any case, if you want that level of precision, wouldn't it make more sense to use the --section option? This is all a nice logic puzzle, but some more documentation would be nice.
On Fri, Jun 6, 2025 at 11:40 AM Nathan Bossart <nathandbossart@gmail.com> wrote: > On Fri, Jun 06, 2025 at 09:14:32AM +0200, Peter Eisentraut wrote: > > We have > > > > -a, --data-only dump only the data, not the schema or statistics > > --no-data do not dump data > > --with-data dump the data # this one is new > > > > (and there is also --section=data), and then three analogous options for > > "schema" and "statistics". > > > > What is the purpose of the --with-data option? Dumping the data is the > > default. Is this to override an earlier --no-data option? > > I believe the idea is that these will allow folks to be explicit about what > they want instead of needing to understand the defaults for every > component. Am I too late to propose ripping this out? I mean, if I look at pg_dump --help and there are options for --with-broccoli and --without-mushrooms, I know that the defaults are no brocooli, yes mushrooms, and I know which options I need to specify to get the behavior that I want, whatever that happens to be. If all options exist in both forms, it's a lot more confusing. Maybe there's some issue of cross-version compatibility here that justifies this complexity, but I don't see what it would be. I would think --with-data has always been the default and always will be, so we just don't need --with-data for anything. But maybe I'm confused. -- Robert Haas EDB: http://www.enterprisedb.com
On 2025/06/12 22:47, Peter Eisentraut wrote: > On 06.06.25 17:39, Nathan Bossart wrote: >> On Fri, Jun 06, 2025 at 09:14:32AM +0200, Peter Eisentraut wrote: >>> We have >>> >>> -a, --data-only dump only the data, not the schema or statistics >>> --no-data do not dump data >>> --with-data dump the data # this one is new >>> >>> (and there is also --section=data), and then three analogous options for >>> "schema" and "statistics". >>> >>> What is the purpose of the --with-data option? Dumping the data is the >>> default. Is this to override an earlier --no-data option? >> >> I believe the idea is that these will allow folks to be explicit about what >> they want instead of needing to understand the defaults for every >> component. > > I get that idea, but we really need some more documentation for this, I think. So far I could only guess how this is supposedto be used, and I also happened to guess wrong. > > My initial guess was that --with-data can override --no-data. That would have been pretty standard "last option wins"behavior. But pg_dump rejects that. Personally, I think that is kind of wrong. > > But you can use --with-data to override, say, --schema-only. That also seems kind of wrong to me, but anyway. While testing pg_dump --with-* in relation to bug #18952 [1], I also ran into this behavior. It was surprising, as I expected pg_dump to reject that combination of options. The current behavior seems confusing. Regards, [1] https://postgr.es/m/18952-be40a620f8b1e755@postgresql.org -- Fujii Masao NTT DATA Japan Corporation
On Thu, Jun 12, 2025 at 10:18:56AM -0400, Robert Haas wrote: > On Fri, Jun 6, 2025 at 11:40 AM Nathan Bossart <nathandbossart@gmail.com> wrote: >> On Fri, Jun 06, 2025 at 09:14:32AM +0200, Peter Eisentraut wrote: >> > What is the purpose of the --with-data option? Dumping the data is the >> > default. Is this to override an earlier --no-data option? >> >> I believe the idea is that these will allow folks to be explicit about what >> they want instead of needing to understand the defaults for every >> component. > > Am I too late to propose ripping this out? > > I mean, if I look at pg_dump --help and there are options for > --with-broccoli and --without-mushrooms, I know that the defaults are > no brocooli, yes mushrooms, and I know which options I need to specify > to get the behavior that I want, whatever that happens to be. If all > options exist in both forms, it's a lot more confusing. Maybe there's > some issue of cross-version compatibility here that justifies this > complexity, but I don't see what it would be. I would think > --with-data has always been the default and always will be, so we just > don't need --with-data for anything. But maybe I'm confused. If the idea is to remove all options for default behavior, we'd be removing --no-statistics, --with-data, and --with-schema at this point. Maybe we could go a step further and even rip out --statistics-only (in favor of --no-schema --no-data --with-statistics). In general, I do think the list of pg_dump options is pretty unwieldy at this point. -- nathan
On Thu, 2025-06-12 at 15:47 +0200, Peter Eisentraut wrote: > My initial guess was that --with-data can override --no-data. That > would have been pretty standard "last option wins" behavior. But > pg_dump rejects that. Personally, I think that is kind of wrong. Do we have other options that are order-sensitive? > But in any case, if you want that level of precision, wouldn't it > make > more sense to use the --section option? That's not possible with statistics, because some appear in SECTION_DATA and some in SECTION_POST_DATA (e.g. stats on indexes, which are in SECTION_POST_DATA). Regards, Jeff Davis
On Thu, 2025-06-12 at 10:18 -0400, Robert Haas wrote: > Am I too late to propose ripping this out? As long as we keep the functionality, I'm fine changing the options/names around at this point. Regards, Jeff Davis
On Thu, 2025-06-12 at 09:52 -0500, Nathan Bossart wrote: > If the idea is to remove all options for default behavior, we'd be > removing > --no-statistics, --with-data, and --with-schema at this point. That's OK with me. > Maybe we > could go a step further and even rip out --statistics-only (in favor > of > --no-schema --no-data --with-statistics). I'd probably keep --statistics-only. Regards, Jeff Davis
On Thu, Jun 12, 2025 at 08:58:15AM -0700, Jeff Davis wrote: > On Thu, 2025-06-12 at 09:52 -0500, Nathan Bossart wrote: >> If the idea is to remove all options for default behavior, we'd be >> removing >> --no-statistics, --with-data, and --with-schema at this point. > > That's OK with me. > >> Maybe we >> could go a step further and even rip out --statistics-only (in favor >> of >> --no-schema --no-data --with-statistics). > > I'd probably keep --statistics-only. WFM -- nathan
On Thu, Jun 12, 2025 at 11:58 AM Jeff Davis <pgsql@j-davis.com> wrote: > On Thu, 2025-06-12 at 09:52 -0500, Nathan Bossart wrote: > > If the idea is to remove all options for default behavior, we'd be > > removing > > --no-statistics, --with-data, and --with-schema at this point. > > That's OK with me. Same. > > Maybe we > > could go a step further and even rip out --statistics-only (in favor > > of > > --no-schema --no-data --with-statistics). > > I'd probably keep --statistics-only. I'm going to vote for removing it. pg_dump has a lot of options, and it doesn't seem like a good bet to me to have options that are equivalent to various combinations of other options. I don't see any particular reason to believe that --statistics-only is even a particularly likely combination of options for someone to want. I'd rather keep it simple. -- Robert Haas EDB: http://www.enterprisedb.com
On 12.06.25 17:14, Jeff Davis wrote: > On Thu, 2025-06-12 at 15:47 +0200, Peter Eisentraut wrote: >> My initial guess was that --with-data can override --no-data. That >> would have been pretty standard "last option wins" behavior. But >> pg_dump rejects that. Personally, I think that is kind of wrong. > > Do we have other options that are order-sensitive? I think most of them are. For example: psql -p 5432 -p 5433 initdb --data-checksums --no-data-checksums postgres --shared-buffers=1GB --shared-buffers=2GB
On Thu, 2025-06-12 at 13:36 -0400, Robert Haas wrote: > On Thu, Jun 12, 2025 at 11:58 AM Jeff Davis <pgsql@j-davis.com> wrote: > > On Thu, 2025-06-12 at 09:52 -0500, Nathan Bossart wrote: > > > If the idea is to remove all options for default behavior, we'd be > > > removing > > > --no-statistics, --with-data, and --with-schema at this point. > > > > That's OK with me. > > Same. I must be missing something, but I think --no-statistics is sorely needed. How else can I get the effect of pg_dump --no-statistics mydb Yours, Laurenz Albe
On Thu, Jun 12, 2025 at 10:07:05PM +0200, Laurenz Albe wrote: > I must be missing something, but I think --no-statistics is sorely needed. > How else can I get the effect of > > pg_dump --no-statistics mydb This was recently changed to be the default behavior (see commit 34eb2a8). -- nathan
On Thu, Jun 12, 2025 at 1:36 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Jun 12, 2025 at 11:58 AM Jeff Davis <pgsql@j-davis.com> wrote:
> On Thu, 2025-06-12 at 09:52 -0500, Nathan Bossart wrote:
> > If the idea is to remove all options for default behavior, we'd be
> > removing
> > --no-statistics, --with-data, and --with-schema at this point.
>
> That's OK with me.
Same.
> > Maybe we
> > could go a step further and even rip out --statistics-only (in favor
> > of
> > --no-schema --no-data --with-statistics).
>
> I'd probably keep --statistics-only.
I'm going to vote for removing it. pg_dump has a lot of options, and
it doesn't seem like a good bet to me to have options that are
equivalent to various combinations of other options. I don't see any
particular reason to believe that --statistics-only is even a
particularly likely combination of options for someone to want. I'd
rather keep it simple.
The use case for --statistics-only is to extract the existing statistics for the tables and indexes that are involved in a given query that is giving you problems, allowing you to apply those statistics to an existing QA/dev database and tweak them without further impacting operations on the production database. I think this will prove to be very useful, and having a --statistics-only flag conveys the clear intent of "I want the stats, and only the stats",
--
Robert Haas
EDB: http://www.enterprisedb.com
If we're hot to remove options, how about we remove the sections flags? Their utility is reliant upon the user understanding exactly which things go in which section, and further assumes that everything deterministically goes in exactly one section, which is no longer the case as Jeff pointed out recently. They have outlived their usefulness.
If we have the full complement of -no-something flags, and the three -only flags, we wouldn't need the --with-something flags. That would mean making statistics export the default on dumps, which I think it should be anyway, because there's nothing else that we don't dump by default, and while it might seem strange to have them by default now, NOT having them by default will feel very strange a few years down the road.
If we have the full complement of -no-something flags, and the three -only flags, we wouldn't need the --with-something flags. That would mean making statistics export the default on dumps, which I think it should be anyway, because there's nothing else that we don't dump by default, and while it might seem strange to have them by default now, NOT having them by default will feel very strange a few years down the road.
On Thu, Jun 12, 2025 at 04:12:35PM -0400, Corey Huinker wrote: > The use case for --statistics-only is to extract the existing statistics > for the tables and indexes that are involved in a given query that is > giving you problems, allowing you to apply those statistics to an existing > QA/dev database and tweak them without further impacting operations on the > production database. I think this will prove to be very useful, and having > a --statistics-only flag conveys the clear intent of "I want the stats, and > only the stats", I do think this is useful functionality, I only suggested removing it because AFAICT it is redundant, i.e., you can accomplish the same thing with --with-statistics --no-schema --no-data. It seems like we're trying to avoid having multiple ways to do the same thing. > If we're hot to remove options, how about we remove the sections flags? > Their utility is reliant upon the user understanding exactly which things > go in which section, and further assumes that everything deterministically > goes in exactly one section, which is no longer the case as Jeff > pointed out recently. They have outlived their usefulness. I almost brought this up earlier as something else we could potentially trim. That's v19 material at this point, though. -- nathan
On Thu, Jun 12, 2025 at 4:22 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Thu, Jun 12, 2025 at 04:12:35PM -0400, Corey Huinker wrote:
> The use case for --statistics-only is to extract the existing statistics
> for the tables and indexes that are involved in a given query that is
> giving you problems, allowing you to apply those statistics to an existing
> QA/dev database and tweak them without further impacting operations on the
> production database. I think this will prove to be very useful, and having
> a --statistics-only flag conveys the clear intent of "I want the stats, and
> only the stats",
I do think this is useful functionality, I only suggested removing it
because AFAICT it is redundant, i.e., you can accomplish the same thing
with --with-statistics --no-schema --no-data. It seems like we're trying
to avoid having multiple ways to do the same thing.
By that same argument, we should remove --schema-only and --data-only as well. I think we shouldn't because those two options have proved very convenient for users and they convey clear intent to the person reading the script, and I believe that --statistics-only will prove the same over time.
> If we're hot to remove options, how about we remove the sections flags?
> Their utility is reliant upon the user understanding exactly which things
> go in which section, and further assumes that everything deterministically
> goes in exactly one section, which is no longer the case as Jeff
> pointed out recently. They have outlived their usefulness.
I almost brought this up earlier as something else we could potentially
trim. That's v19 material at this point, though.
On Thu, Jun 12, 2025 at 04:39:00PM -0400, Corey Huinker wrote: > On Thu, Jun 12, 2025 at 4:22 PM Nathan Bossart <nathandbossart@gmail.com> > wrote: >> I do think this is useful functionality, I only suggested removing it >> because AFAICT it is redundant, i.e., you can accomplish the same thing >> with --with-statistics --no-schema --no-data. It seems like we're trying >> to avoid having multiple ways to do the same thing. > > By that same argument, we should remove --schema-only and --data-only as > well. I think we shouldn't because those two options have proved very > convenient for users and they convey clear intent to the person reading the > script, and I believe that --statistics-only will prove the same over time. Those predate v18, so while might be able to mark them deprecated, I doubt we'd remove them anytime soon. FWIW I don't have a tremendously strong opinion about --statistics-only. I'd probably vote to remove it because 1) it's redundant, 2) once you add an option, it's hard to remove it, and 3) pg_dump already has so many options. But I won't cry too hard if we keep it around. -- nathan
On Thu, 2025-06-12 at 15:57 -0500, Nathan Bossart wrote: > FWIW I don't have a tremendously strong opinion about --statistics- > only. Same here. I won't cast a vote on this particular issue, as long as the functionality is available. Regards, Jeff Davis
On Thu, 2025-06-12 at 21:16 +0200, Peter Eisentraut wrote: > > Do we have other options that are order-sensitive? > > I think most of them are. For example: > > psql -p 5432 -p 5433 > initdb --data-checksums --no-data-checksums > postgres --shared-buffers=1GB --shared-buffers=2GB Interesting. I don't think the "last option wins" model applies to other pg_dump options, though. For instance, in PG17: pg_dump --data-only --schema-only pg_dump: error: options -s/--schema-only and -a/--data-only cannot be used together I don't think it's simple to start using "last option wins" behavior now. There are probably some combinations of options where it's not clear whether a later option is an extra constraint or will override a previous option. Regards, Jeff Davis
On 2025/06/13 6:12, Jeff Davis wrote: > On Thu, 2025-06-12 at 15:57 -0500, Nathan Bossart wrote: >> FWIW I don't have a tremendously strong opinion about --statistics- >> only. > > Same here. I won't cast a vote on this particular issue, as long as the > functionality is available. I prefer keeping it as a shorthand for --with-statistics --no-data --no-schema. It's redundant, but it's intuitive and convenient. That said, if others feel strongly about removing it, I'm fine with that too. Regards, -- Fujii Masao NTT DATA Japan Corporation
On 2025/06/12 23:52, Nathan Bossart wrote: > On Thu, Jun 12, 2025 at 10:18:56AM -0400, Robert Haas wrote: >> On Fri, Jun 6, 2025 at 11:40 AM Nathan Bossart <nathandbossart@gmail.com> wrote: >>> On Fri, Jun 06, 2025 at 09:14:32AM +0200, Peter Eisentraut wrote: >>>> What is the purpose of the --with-data option? Dumping the data is the >>>> default. Is this to override an earlier --no-data option? >>> >>> I believe the idea is that these will allow folks to be explicit about what >>> they want instead of needing to understand the defaults for every >>> component. >> >> Am I too late to propose ripping this out? >> >> I mean, if I look at pg_dump --help and there are options for >> --with-broccoli and --without-mushrooms, I know that the defaults are >> no brocooli, yes mushrooms, and I know which options I need to specify >> to get the behavior that I want, whatever that happens to be. If all >> options exist in both forms, it's a lot more confusing. Maybe there's >> some issue of cross-version compatibility here that justifies this >> complexity, but I don't see what it would be. I would think >> --with-data has always been the default and always will be, so we just >> don't need --with-data for anything. But maybe I'm confused. > > If the idea is to remove all options for default behavior, we'd be removing > --no-statistics, --with-data, and --with-schema at this point. WFM. Regarding pg_restore, since --with-statistics is already the default, we should remove it from pg_restore. By the way, if we keep --with-statistics in pg_dump, are we planning to continue using the --with-xxx naming pattern for new options that specify extra data to dump? I just wondered because pg_dump already has other naming styles like --sequence-data, --include-foreign-data, and --large-objects. Regards, -- Fujii Masao NTT DATA Japan Corporation
On Thu, Jun 12, 2025 at 4:12 PM Corey Huinker <corey.huinker@gmail.com> wrote:
(peacefully skimming thread...)
...
If we're hot to remove options, how about we remove the sections flags? Their utility is reliant upon the user understanding exactly which things go in which section, and further assumes that everything deterministically goes in exactly one section, which is no longer the case as Jeff pointed out recently. They have outlived their usefulness.
[record scratch sound]
Wait, what?! Big -1. Guess we can debate this more for v19.
For this current discussion (--statistics-only), I'm a +1 for keeping things as they are now, in the name of intuitiveness.
Cheers,
Greg
--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support
On 12.06.25 23:20, Jeff Davis wrote: > On Thu, 2025-06-12 at 21:16 +0200, Peter Eisentraut wrote: >>> Do we have other options that are order-sensitive? >> >> I think most of them are. For example: >> >> psql -p 5432 -p 5433 >> initdb --data-checksums --no-data-checksums >> postgres --shared-buffers=1GB --shared-buffers=2GB > > Interesting. I don't think the "last option wins" model applies to > other pg_dump options, though. For instance, in PG17: > > pg_dump --data-only --schema-only > pg_dump: error: options -s/--schema-only and -a/--data-only cannot be > used together > > I don't think it's simple to start using "last option wins" behavior > now. There are probably some combinations of options where it's not > clear whether a later option is an extra constraint or will override a > previous option. It makes sense to raise an error if the specified options cannot be consolidated in an obvious way. I'd expect pg_recvlogical --create-slot --drop-slot to fail, but I'd expect pg_recvlogical --create-slot --slot=foo --slot=bar to work. One of the challenges in the current case is that it is not obvious how --with-data, --no-data, --data-only etc. are connected. If that were clearer, then the way these options should combine or conflict would hopefully follow somewhat naturally.
> On 13 Jun 2025, at 02:39, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > By the way, if we keep --with-statistics in pg_dump, are we planning to > continue using the --with-xxx naming pattern for new options that > specify extra data to dump? I just wondered because pg_dump already has > other naming styles like --sequence-data, --include-foreign-data, > and --large-objects. There are quite a number of different conventions in use for pg_dump parameters, I'm not convinced that we are doing users a favour by adding yet another one here with --with-XXX. AFAICT we have: * --XXX which includes all of XXX, for example --large-objects * --with-XXX which includes all of XXX, for example --with-data * --XXX=YYY which includes XXX matching pattern YYY, for example --table * --include-XXX=YYY which includes all of XXX matching pattern YYY, like --include-foreign-data * --exclude-XXX=YYY which excludes XXX matching pattern YYY, for example --exclude-table * --no-XXX which excludes all of XXX, for example --no-comments * --XXX-only which excludes objects that aren't XXX For some things we also have short options which corresponds to --XXX and --exclude-XXX like -t and -T, but -e and -E break that again. Most object types can be included and excluded, except --include-foreign-data where there is no exclusion parameter (and the pattern also applies to the servername and not data). Then there are features affecting the dump output which either use --YYY or --use-YYY , like --inserts and --use-set-session-authorization. Other features which affect what can be be be dumped instead use --enable-XXX like --enable-row-security. Then --snapshot deviates from both since it has neither --use nor --enable even though it's a similar category of options. On top of that, we have the --filter=FILE which can include and exclude objects (limited to objects where there is a corresponding commandline parameter). Changing existing parameters is of course hard since it will break scripts and integrations, but should we really add --with-XXX and increase the proliferation of concepts rather than trying to make it fit/shoehorn into one of the ones we already have? -- Daniel Gustafsson
On Fri, 2025-06-13 at 07:22 +0200, Peter Eisentraut wrote: > > I don't think it's simple to start using "last option wins" > > behavior > > now ... > It makes sense to raise an error if the specified options cannot be > consolidated in an obvious way. To me, "last option wins" means that you don't raise an error; the latter option simply overrides the earlier one. Given that the pg_dump options are not order-sensitive now (unless I'm missing something), I'm worried about the consequences of trying to make them so now. Regards, Jeff Davis
On Fri, 2025-06-13 at 09:39 +0900, Fujii Masao wrote: > By the way, if we keep --with-statistics in pg_dump, are we planning > to > continue using the --with-xxx naming pattern for new options that > specify extra data to dump? Good point. Now that we are getting rid of some of the other options, we don't need to worry about consistency with them, and I think we should just use "--statistics". Regards, Jeff Davis
On Fri, Jun 13, 2025 at 08:58:04AM -0700, Jeff Davis wrote: > On Fri, 2025-06-13 at 09:39 +0900, Fujii Masao wrote: >> By the way, if we keep --with-statistics in pg_dump, are we planning >> to >> continue using the --with-xxx naming pattern for new options that >> specify extra data to dump? > > Good point. Now that we are getting rid of some of the other options, > we don't need to worry about consistency with them, and I think we > should just use "--statistics". +1 -- nathan
One of the challenges in the current case is that it is not obvious how
--with-data, --no-data, --data-only etc. are connected. If that were
clearer, then the way these options should combine or conflict would
hopefully follow somewhat naturally.
They all should be mutually exclusive, and usage of any two of them should raise an error, hence order not mattering.
Good point. Now that we are getting rid of some of the other options,
we don't need to worry about consistency with them, and I think we
should just use "--statistics".
The point of the --with flags was to future proof commands to preserve behavior in case the defaults ever changed.
This would all be a lot simpler, and the --with switches would all be unnecessary, if we didn't decide to make exactly one type of dumpable object (statistics) off by default but only in one of the three commands (pg_dump).
I think we're creating lasting confusion for the sake of short-term convenience.
On 2025/06/14 5:32, Nathan Bossart wrote: > On Fri, Jun 13, 2025 at 08:58:04AM -0700, Jeff Davis wrote: >> On Fri, 2025-06-13 at 09:39 +0900, Fujii Masao wrote: >>> By the way, if we keep --with-statistics in pg_dump, are we planning >>> to >>> continue using the --with-xxx naming pattern for new options that >>> specify extra data to dump? >> >> Good point. Now that we are getting rid of some of the other options, >> we don't need to worry about consistency with them, and I think we >> should just use "--statistics". > > +1 +1 I noticed that --statistics (i.e., the current --with-statistics) causes statistics to be dumped even when used with --data-only or --schema-only. So, as far as I understand, here are the possible combinations of dump targets and options: schema, data, stats: --statistics schema, data: (default) schema, stats: --schema-only --statistics data, stats: --data-only --statistics schema only: --schema-only data only: --data-only stats only: --statistics-only This makes me wonder if --no-data and --no-schema are still necessary. They were also introduced in v18, but might now be redundant. If so, should we consider removing them? If we do keep them, we could also use --no-schema --statistics to dump data and statistics, but I find --data-only --statistics more intuitive. Regards, -- Fujii Masao NTT DATA Japan Corporation
I noticed that --statistics (i.e., the current --with-statistics) causes
statistics to be dumped even when used with --data-only or --schema-only.
So, as far as I understand, here are the possible combinations of dump
targets and options:
Those should also be mutually exclusive, and I'll write up a patch to add them to the checks.
schema, data, stats: --statistics
schema, data: (default)
schema, stats: --schema-only --statistics
data, stats: --data-only --statistics
schema only: --schema-only
data only: --data-only
stats only: --statistics-only
This makes me wonder if --no-data and --no-schema are still necessary.
They were also introduced in v18, but might now be redundant. If so,
should we consider removing them?
If we do keep them, we could also use --no-schema --statistics to
dump data and statistics, but I find --data-only --statistics more intuitive.
Let's keep this simple, and have all three types default to on in all circumstances.
On Mon, Jun 16, 2025 at 03:35:48PM -0400, Corey Huinker wrote: > I think this is the exact sort of confusion caused by having two of the > three types default to on in all circumstances, and one default to off in > one special circumstance. I revisited the main thread to see how folks voted. There are a lot of messages over a long period of time, and folks may have changed their mind since, but this is what I saw: off-by-default: Mullane, Haas, Davis, Bossart on-by-default: Huinker, Lane, Hagander, Frost In fact, there seems to have been general agreement in 2024 that stats _should_ be on by default [0]. So perhaps there's not as strong of a consensus as we thought. Maybe we should ask for any new/updated votes. > Let's keep this simple, and have all three types default to on in all > circumstances. Assuming we did turn on stats by default, what is the minimum set of new flags in v18 you'd like to see? [0] https://postgr.es/m/e16cd9caf4f5229a152d318d70b4d323a03e3539.camel%40j-davis.com -- nathan
On Mon, 2025-06-16 at 16:09 -0500, Nathan Bossart wrote: > So perhaps there's not as strong of a > consensus as we thought. Maybe we should ask for any new/updated > votes. Does it make any sense to be off by default in 18 and on in some later release? Regards Jeff Davis
Jeff Davis <pgsql@j-davis.com> writes: > Does it make any sense to be off by default in 18 and on in some later > release? Probably not, especially if part of the argument for on-by-default is to allow simplification of the switch set. We don't get that benefit if we ship with off-by-default, and we won't be able to get it later. I find myself increasingly persuaded by Corey's point of view ... regards, tom lane
On Mon, Jun 16, 2025 at 07:09:17PM -0400, Tom Lane wrote: > I find myself increasingly persuaded by Corey's point of view ... +1 -- nathan
On 2025/06/17 4:35, Corey Huinker wrote: > I noticed that --statistics (i.e., the current --with-statistics) causes > statistics to be dumped even when used with --data-only or --schema-only. > So, as far as I understand, here are the possible combinations of dump > targets and options: > > > Those should also be mutually exclusive, and I'll write up a patch to add them to the checks. --sequence-data behaves similarly, i.e., it still dumps sequence data even when used with --schema-only. So I've been thinking of both --statistics and --sequence-data as options that include additional data, regardless of whether --*-only is specified. It seems better to keep their behavior consistent to avoid confusing users. Regards, -- Fujii Masao NTT DATA Japan Corporation
On 2025/06/17 9:58, Nathan Bossart wrote: > On Mon, Jun 16, 2025 at 07:09:17PM -0400, Tom Lane wrote: >> I find myself increasingly persuaded by Corey's point of view ... > > +1 Can you clarify how using on-by-default would simplify things? I'm not sure it actually makes the options simpler. Regards, -- Fujii Masao NTT DATA Japan Corporation
On Mon, 2025-06-16 at 15:35 -0400, Corey Huinker wrote: > > I think this is the exact sort of confusion caused by having two of > the three types default to on in all circumstances, and one default > to off in one special circumstance. That's certainly a part of the confusion, but the "--x-only" options also put us in a tough spot. If --data-only had always been spelled "--no-schema" (or "--without- data" or whatever), and --schema-only had always been spelled "--no- data", then I think it would be a lot easier to add statistics into the mix. Regards, Jeff Davis
On Thu, 2025-06-12 at 08:58 -0700, Jeff Davis wrote: > On Thu, 2025-06-12 at 09:52 -0500, Nathan Bossart wrote: > > If the idea is to remove all options for default behavior, we'd be > > removing > > --no-statistics, --with-data, and --with-schema at this point. > > That's OK with me. Actually, I take that back, we can't just remove --no-statistics. Remember that statistics currently default to "on" for pg_restore even though they default "off" for pg_dump. So pg_restore still needs a way to turn stats off. Regards, Jeff Davis
On Wed, Jun 18, 2025 at 08:29:16AM -0700, Jeff Davis wrote: > Actually, I take that back, we can't just remove --no-statistics. > Remember that statistics currently default to "on" for pg_restore even > though they default "off" for pg_dump. > > So pg_restore still needs a way to turn stats off. IIUC the current proposal is to: * Dump/restore stats by default. * Keep the --no-statistics, --no-schema, and --no-data options. * Keep the --statistics-only, --schema-only, and --data-only options. * Remove the --with-statistics, --with-schema, and --with-data options. How does that sound? -- nathan
On Wed, 2025-06-18 at 10:43 -0500, Nathan Bossart wrote: > IIUC the current proposal is to: > > * Dump/restore stats by default. IIUC some people still object to this. Turning stats off by default was on the Open Items list. At this point I think we need a pretty strong consensus to override that and I'm not sure we have one right now. Regards, Jeff Davis
On Wed, Jun 18, 2025 at 09:53:01AM -0700, Jeff Davis wrote: > On Wed, 2025-06-18 at 10:43 -0500, Nathan Bossart wrote: >> IIUC the current proposal is to: >> >> * Dump/restore stats by default. > > IIUC some people still object to this. Turning stats off by default was > on the Open Items list. At this point I think we need a pretty strong > consensus to override that and I'm not sure we have one right now. Okay, so I see two main choices on the table: 1) Turn on stats by default in pg_dump. Keep --no-* flags and --*-only flags, and remove the --with-* flags. 2) Keep stats off by default in pg_dump. Keep --no-{schema,data} flags and --*-only flags, remove --no-statistics and --with-{schema,data}, and rename --with-statistics to --statistics. Is that an accurate summary? -- nathan
On Wed, 2025-06-18 at 10:43 -0500, Nathan Bossart wrote: > IIUC the current proposal is to: > > * Dump/restore stats by default. > * Keep the --no-statistics, --no-schema, and --no-data options. > * Keep the --statistics-only, --schema-only, and --data-only options. > * Remove the --with-statistics, --with-schema, and --with-data > options. > > How does that sound? For those that want to keep stats off by default, another proposal might be: * keep stats defaults as they are now (pg_dump defaults to "off", pg_restore defaults to "on") * delete the --with-data and --with-schema options as unnecessary * (maybe?) rename --with-statistics to --statistics * keep --no-statistics, --no-data, --no-schema (or use "without" instead?) * (maybe?) keep --statistics-only * reject the combination of an "only" option and a "with" option That removes 2 or three options, and rejects some ambiguous combinations. That would seem to address some of the immediate concerns raised in this thread, without needing to relitigate the default. It also leaves the door open to potentially change the default for stats to "on" in the future, because we will have both --with- statistics and --no-statistics. The only downside of this approach is that we'd be stuck with both -- with-statistics and --no-statistics forever. That's a bit inconsistent with the other options, and it doesn't satisfy Robert's concern about the --help output. But Robert also wants stats off by default for pg_dump and on by default for pg_restore, which I think means we need both --with-statistics and --no-statistics anyway. Robert, comments? Regards, Jeff Davis
On Wed, Jun 18, 2025 at 1:21 PM Jeff Davis <pgsql@j-davis.com> wrote: > The only downside of this approach is that we'd be stuck with both -- > with-statistics and --no-statistics forever. That's a bit inconsistent > with the other options, and it doesn't satisfy Robert's concern about > the --help output. But Robert also wants stats off by default for > pg_dump and on by default for pg_restore, which I think means we need > both --with-statistics and --no-statistics anyway. Robert, comments? Sorry, I've been largely away from email for the last week due to work commitments. I had thought we had a consensus that pg_upgrade should preserve stats but regularly pg_dump shouldn't include them; perhaps I misunderstood or that changed. What confuses me about what you've written here specifically is that pg_dump and pg_restore are different programs with different option sets. So when you say we need both --with-statistics and --no-statistics, I guess that's true, but we're not talking about the same executable in both cases. It seems to me that pg_restore should restore everything that was dumped, but that there should be (as there are) various --no-whatever switches to skip unwanted items. But pg_dump should have dump a reasonable set of things by default, and the user should be able to add to that or subtract from it. -- Robert Haas EDB: http://www.enterprisedb.com
On Mon, Jun 23, 2025 at 01:38:10PM -0400, Robert Haas wrote: > I had thought we had a consensus that pg_upgrade should preserve stats > but regularly pg_dump shouldn't include them; perhaps I misunderstood > or that changed. I think it's a bit of both. I skimmed through the past discussions and found that not only was there a rough consensus in 2024 that stats _should_ be on by default [0], but also that an updated vote tally didn't show much of a consensus at all [1]. Like you, I thought we had pretty much closed that door, but the aforementioned analysis along with further discussion has me convinced that we might want to reconsider [2]. [0] https://postgr.es/m/e16cd9caf4f5229a152d318d70b4d323a03e3539.camel@j-davis.com [1] https://postgr.es/m/aFCIB1AwXuNzxHXX%40nathan [2] https://postgr.es/m/aFC9rWSeFz7c07uI%40nathan -- nathan
On Tue, Jun 24, 2025 at 12:48 PM Nathan Bossart <nathandbossart@gmail.com> wrote: > On Mon, Jun 23, 2025 at 01:38:10PM -0400, Robert Haas wrote: > > I had thought we had a consensus that pg_upgrade should preserve stats > > but regularly pg_dump shouldn't include them; perhaps I misunderstood > > or that changed. > > I think it's a bit of both. I skimmed through the past discussions and > found that not only was there a rough consensus in 2024 that stats _should_ > be on by default [0], but also that an updated vote tally didn't show much > of a consensus at all [1]. Like you, I thought we had pretty much closed > that door, but the aforementioned analysis along with further discussion > has me convinced that we might want to reconsider [2]. Well, I don't know: I still think that's the right answer, so I don't really want to reconsider, but I understand that I'm not in charge here. -- Robert Haas EDB: http://www.enterprisedb.com
On Wed, Jun 18, 2025 at 11:43 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
IIUC the current proposal is to:
* Dump/restore stats by default.
* Keep the --no-statistics, --no-schema, and --no-data options.
* Keep the --statistics-only, --schema-only, and --data-only options.
* Remove the --with-statistics, --with-schema, and --with-data options.
This is so close to ideal. It's just that the first bullet should be "off by default" :)
I think pg_restore and pg_upgrade are solved problems at this point. I'm still not convinced why stats should be on by default, as they are metadata - neither schema nor data, but something special that should be explicitly requested. Also, +1 to the idea of --statistics-only as a QA / debug tool as someone mentioned upthread.
Cheers,
Greg
On 2025/06/25 5:07, Robert Haas wrote: > On Tue, Jun 24, 2025 at 12:48 PM Nathan Bossart > <nathandbossart@gmail.com> wrote: >> On Mon, Jun 23, 2025 at 01:38:10PM -0400, Robert Haas wrote: >>> I had thought we had a consensus that pg_upgrade should preserve stats >>> but regularly pg_dump shouldn't include them; perhaps I misunderstood >>> or that changed. >> >> I think it's a bit of both. I skimmed through the past discussions and >> found that not only was there a rough consensus in 2024 that stats _should_ >> be on by default [0], but also that an updated vote tally didn't show much >> of a consensus at all [1]. Like you, I thought we had pretty much closed >> that door, but the aforementioned analysis along with further discussion >> has me convinced that we might want to reconsider [2]. > > Well, I don't know: I still think that's the right answer, so I don't > really want to reconsider, but I understand that I'm not in charge > here. For the record, my vote is: default "off" for pg_dump and pg_dumpall, and "on" for pg_restore. For pg_dump and pg_dumpall, I agree with Jeff's idea in [1], but if the statistics is skipped by default, I don't think we need a --no-statistics option. So, here's how I think the options should work: * Keep: --schema-only, --data-only, --statistics-only, --no-schema, --no-data, -and -statistics * Remove: --no-statistics, --with-schema, and --with-data * Combinations: Schema + Data + Stats : --statistics Schema + Data : (default) Schema + Stats : --no-data --statistics Data + Stats : --no-schema --statistics Schema only : --schema-only (or --no-data) Data only : --data-only (or --no-schema) Stats only : --statistics-only (or --no-schema --no-data --statistics) As I mentioned in [2], if we treat --statistics in the similar way to --sequence-data, i.e., allow --statistics to be used with --schema-only or --data-only, we could simplify further: * Keep: --schema-only, --data-only, --statistics-only, and --statistics * Remove: --no-schema, --no-data, --no-statistics, --with-schema, and --with-data * Combinations: Schema + Data + Stats : --statistics Schema + Data : (default) Schema + Stats : --schema-only --statistics Data + Stats : --data-only --statistics Schema only : --schema-only Data only : --data-only Stats only : --statistics-only Some may find this confusing due to mixing --statistics with --schema-only or --data-only, so I understand if there's hesitation. For pg_restore, I believe there's agreement to restore statistics by default if they exist in the archive. So: * Keep: --schema-only, --data-only, --statistics-only, --no-schema, --no-data, and --no-statistics * Remove: --with-schema, --with-data, and --statistics * Combinations: Schema + Data + Stats : (default) Schema + Data : --no-statistics Schema + Stats : --no-data Data + Stats : --no-schema Schema only : --schema-only (or --no-data --no-statistics) Data only : --data-only (or --no-schema --no-statistics) Stats only : --statistics-only (or --no-schema --no-data) Thought? Regards, [1] https://postgr.es/m/031558c60e84362898922caa6a90587e7fdf2a57.camel@j-davis.com [2] https://postgr.es/m/94f89b0a-5d83-4a67-9092-50ba3913441c@oss.nttdata.com -- Fujii Masao NTT DATA Japan Corporation
On Tue, Jun 24, 2025 at 06:14:55PM -0400, Greg Sabino Mullane wrote: > On Wed, Jun 18, 2025 at 11:43 AM Nathan Bossart <nathandbossart@gmail.com> > wrote: >> IIUC the current proposal is to: >> >> * Dump/restore stats by default. >> * Keep the --no-statistics, --no-schema, and --no-data options. >> * Keep the --statistics-only, --schema-only, and --data-only options. >> * Remove the --with-statistics, --with-schema, and --with-data options. > > This is so close to ideal. It's just that the first bullet should be "off > by default" :) If we did that, the only way to dump statistics would be --statistics-only, right? You wouldn't be able to include statistics along with other things. -- nathan
On Wed, Jun 25, 2025 at 08:18:28AM +0900, Fujii Masao wrote: > For pg_dump and pg_dumpall, I agree with Jeff's idea in [1], > but if the statistics is skipped by default, I don't think > we need a --no-statistics option. So, here's how I think > the options should work: > > * Keep: --schema-only, --data-only, --statistics-only, --no-schema, --no-data, -and -statistics > * Remove: --no-statistics, --with-schema, and --with-data > * Combinations: > Schema + Data + Stats : --statistics > Schema + Data : (default) > Schema + Stats : --no-data --statistics > Data + Stats : --no-schema --statistics > Schema only : --schema-only (or --no-data) > Data only : --data-only (or --no-schema) > Stats only : --statistics-only (or --no-schema --no-data --statistics) I believe this is equivalent to the second option I proposed upthread [0]. Jeff proposed a variation of this option that keeps --no-statistics around so that we could more easily change the default for stats down the road [1]. > As I mentioned in [2], if we treat --statistics in the similar way to > --sequence-data, i.e., allow --statistics to be used with --schema-only > or --data-only, we could simplify further: > > * Keep: --schema-only, --data-only, --statistics-only, and --statistics > * Remove: --no-schema, --no-data, --no-statistics, --with-schema, and --with-data > * Combinations: > Schema + Data + Stats : --statistics > Schema + Data : (default) > Schema + Stats : --schema-only --statistics > Data + Stats : --data-only --statistics > Schema only : --schema-only > Data only : --data-only > Stats only : --statistics-only > > Some may find this confusing due to mixing --statistics with --schema-only > or --data-only, so I understand if there's hesitation. Hm. I didn't really intend for --sequence-data to set a precedent here. That's mostly intended as a submode for --binary-upgrade. Perhaps we should consider removing it as a documented option and instead convert it to --binary-upgrade=sequence-data or something. In any case, allowing "only" options to be used in conjunction with --statistics seems a little confusing to me. But I'm not strongly opposed to the idea. > For pg_restore, I believe there's agreement to restore statistics > by default if they exist in the archive. So: > > * Keep: --schema-only, --data-only, --statistics-only, --no-schema, --no-data, and --no-statistics > * Remove: --with-schema, --with-data, and --statistics > * Combinations: > Schema + Data + Stats : (default) > Schema + Data : --no-statistics > Schema + Stats : --no-data > Data + Stats : --no-schema > Schema only : --schema-only (or --no-data --no-statistics) > Data only : --data-only (or --no-schema --no-statistics) > Stats only : --statistics-only (or --no-schema --no-data) +1 [0] https://postgr.es/m/aFLxvrh71VWqdL9A%40nathan [1] https://postgr.es/m/031558c60e84362898922caa6a90587e7fdf2a57.camel%40j-davis.com -- nathan
On Wed, Jun 25, 2025 at 10:36 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
> This is so close to ideal. It's just that the first bullet should be "off by default" :)
If we did that, the only way to dump statistics would be --statistics-only, right? You wouldn't be able to include statistics along with other
things.
Oh, right, would also need a --statistics
Cheers,
Greg
--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support
On Mon, 2025-06-23 at 13:38 -0400, Robert Haas wrote: > What confuses me about what you've written here specifically is that > pg_dump and pg_restore are different programs with different option > sets. So when you say we need both --with-statistics and > --no-statistics, I guess that's true, but we're not talking about the > same executable in both cases. It seems to me that pg_restore should > restore everything that was dumped, but that there should be (as > there > are) various --no-whatever switches to skip unwanted items. But > pg_dump should have dump a reasonable set of things by default, and > the user should be able to add to that or subtract from it. True, we could have different options for pg_dump and pg_restore, but to me that seems a little strange because so many of the other options overlap. I figured that would be confusing, but maybe it's fine. Regards, Jeff Davis
On Wed, 2025-06-18 at 10:21 -0700, Jeff Davis wrote: > * reject the combination of an "only" option and a "with" option There seems to be a rough consensus on this point. Should we move ahead with this small change and see if we can get consensus to go further? Regards, Jeff Davis
On Wed, 2025-06-25 at 08:18 +0900, Fujii Masao wrote: > For pg_dump and pg_dumpall, I agree with Jeff's idea in [1], > but if the statistics is skipped by default, I don't think > we need a --no-statistics option. So, here's how I think > the options should work: > > * Keep: --schema-only, --data-only, --statistics-only, --no- > schema, --no-data, -and -statistics > * Remove: --no-statistics, --with-schema, and --with-data ... > For pg_restore, I believe there's agreement to restore statistics > by default if they exist in the archive. So: > > * Keep: --schema-only, --data-only, --statistics-only, --no- > schema, --no-data, and --no-statistics > * Remove: --with-schema, --with-data, and --statistics That means pg_dump will accept --statistics and reject --no-statistics; and pg_restore will accept --no-statistics and reject --statistics. Other options are mostly the same between them, so I'm not sure it's a good idea for them to diverge. Regards, Jeff Davis
On 2025/07/11 2:57, Jeff Davis wrote: > On Wed, 2025-06-25 at 08:18 +0900, Fujii Masao wrote: >> For pg_dump and pg_dumpall, I agree with Jeff's idea in [1], >> but if the statistics is skipped by default, I don't think >> we need a --no-statistics option. So, here's how I think >> the options should work: >> >> * Keep: --schema-only, --data-only, --statistics-only, --no- >> schema, --no-data, -and -statistics >> * Remove: --no-statistics, --with-schema, and --with-data > > ... > >> For pg_restore, I believe there's agreement to restore statistics >> by default if they exist in the archive. So: >> >> * Keep: --schema-only, --data-only, --statistics-only, --no- >> schema, --no-data, and --no-statistics >> * Remove: --with-schema, --with-data, and --statistics > > That means pg_dump will accept --statistics and reject --no-statistics; > and pg_restore will accept --no-statistics and reject --statistics. > Other options are mostly the same between them, so I'm not sure it's a > good idea for them to diverge. I agree it would be better to have the same options in both pg_dump and pg_restore, if possible. But to do that, we'd either need to make pg_dump dump statistics by default, or allow redundant options like --statistics in pg_restore, even though it already restores statistics by default. As I understand it, the rough consensus so far is that we'd prefer to avoid both of these approaches. I know some want to change the default behavior about statistics in pg_dump, though. But, are you suggesting we go with one of them? Regards, -- Fujii Masao NTT DATA Japan Corporation
On Fri, 2025-07-11 at 09:12 +0900, Fujii Masao wrote: > But to do that, we'd either need to make pg_dump dump statistics > by default, or allow redundant options like --statistics in > pg_restore, > even though it already restores statistics by default. Redundant options might be annoying, but I don't see them as a major problem. > As I understand it, the rough consensus so far is that we'd prefer to > avoid both of these approaches. I'm not clear what the consensus approach is, then. Can you elaborate? > I know some want to change the default > behavior about statistics in pg_dump, though. I don't see a consensus to make stats the default. Regards, Jeff Davis
On Thu, 2025-07-10 at 10:42 -0700, Jeff Davis wrote: > On Wed, 2025-06-18 at 10:21 -0700, Jeff Davis wrote: > > * reject the combination of an "only" option and a "with" option > > There seems to be a rough consensus on this point. Patch attached. Regards, Jeff Davis
Вложения
On 2025-Jul-29, Jeff Davis wrote: > + /* reject conflicting "only-" and "with-" options */ > + if (data_only && with_schema) > + pg_fatal("options -a/--data-only and --with-schema cannot be used together"); > + if (data_only && with_statistics) > + pg_fatal("options -a/--data-only and --with-statistics cannot be used together"); > + if (schema_only && with_data) > + pg_fatal("options -s/--schema-only and --with-data cannot be used together"); > + if (schema_only && with_statistics) > + pg_fatal("options -s/--schema-only and --with-statistics cannot be used together"); > + if (statistics_only && with_data) > + pg_fatal("options --statistics-only and --with-data cannot be used together"); > + if (statistics_only && with_schema) > + pg_fatal("options --statistics-only and --with-schema cannot be used together"); Please move the switches themselves out of the translatable message, otherwise there are too many of them. For instance, pg_fatal("options %s and %s cannot be used together", "-s/--schema-only", "--with-statistics"); Thanks -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
On Wed, 2025-06-18 at 10:21 -0700, Jeff Davis wrote: > On Wed, 2025-06-18 at 10:43 -0500, Nathan Bossart wrote: > > IIUC the current proposal is to: > > > > * Dump/restore stats by default. We don't have a consensus for that, so unless a few people make an abrupt turnaround, this will remain off for v18. > > * Keep the --no-statistics, --no-schema, and --no-data options. > > * Keep the --statistics-only, --schema-only, and --data-only > > options. > > * Remove the --with-statistics, --with-schema, and --with-data > > options. The other parts of the proposal are very similar with mine below: > * delete the --with-data and --with-schema options as unnecessary > * (maybe?) rename --with-statistics to --statistics > * keep --no-statistics, --no-data, --no-schema (or use "without" > instead?) > * (maybe?) keep --statistics-only > * reject the combination of an "only" option and a "with" option The main difference is whether we: (a) keep both --statistics and --no-statistics in pg_dump and pg_restore; or (b) for pg_dump have --statistics but not --no-statistics and for pg_restore have --no-statistics but not --statistics. Option (a) creates redundancies whereas (b) creates a divergence. I suppose I prefer (a), but I don't have a terribly strong opinion and if more people prefer (b) then I'll go with that. Regards, Jeff Davis
On Tue, 2025-07-29 at 20:22 +0200, Álvaro Herrera wrote: > Please move the switches themselves out of the translatable message, > otherwise there are too many of them. For instance, Thank you for looking, v2 attached. Regards, Jeff Davis
Вложения
On 2025-Jun-25, Fujii Masao wrote: > For the record, my vote is: default "off" for pg_dump and pg_dumpall, > and "on" for pg_restore. I don't know if this horse is already dead, so bear with me while I beat it a little more. > [...] we could simplify further: > > * Keep: --schema-only, --data-only, --statistics-only, and --statistics > * Remove: --no-schema, --no-data, --no-statistics, --with-schema, and --with-data > * Combinations: > Schema + Data + Stats : --statistics > Schema + Data : (default) > Schema + Stats : --schema-only --statistics > Data + Stats : --data-only --statistics > Schema only : --schema-only > Data only : --data-only > Stats only : --statistics-only I think combinatorial explosions of options is not a great situation to be in, particularly if we have to make endless decisions on which combinations are valid or sensible. Maybe we should invent a new switch, something like --include=[schema,data,statistics] with which users can give one or more comma-separated types to be included in the dump. Then we state that --data-only is synonym for --include=data and --schema-only is synonym for --include=schema, and we don't need any other switches. Then it is obvious what happens, how to combine object types in the dumps and restores, and there's no need to reject invalid combinations because there aren't any. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "Investigación es lo que hago cuando no sé lo que estoy haciendo" (Wernher von Braun)
On Wed, 2025-07-30 at 10:23 +0200, Álvaro Herrera wrote: > Maybe we should invent a new > switch, something like > --include=[schema,data,statistics] > with which users can give one or more comma-separated types to be > included in the dump. Robert Treat brought up a similar idea before: https://www.postgresql.org/message-id/CABV9wwO5v8Nu8q%2BxWexMdL3Z%2B2xS%3DfFJMQetBSHy3tR64wNHOA%40mail.gmail.com I'm certainly open to (re-)considering it. > Then we state that --data-only is synonym for --include=data and > --schema-only is synonym for --include=schema, and we don't need any > other switches. Then it is obvious what happens, how to combine > object types in the dumps and restores, and there's no need to reject > invalid combinations because there aren't any. I assume that should be read as something like "include only", because --include=data would also be excluding the schema and the stats. And if that's the case, it does seem strange to do something like "-- include=data --schema-only". Another question: could you have multiple --include options, like "-- include=data --include=schema"? Because you currently can't do "--data- only --schema-only". So that would make it not quite an alias. If we go in this direction, it might be easier to just say that -- include conflicts with --schema-only and --data-only. Regards, Jeff Davis
On 2025-Jul-30, Jeff Davis wrote: > On Wed, 2025-07-30 at 10:23 +0200, Álvaro Herrera wrote: > > Maybe we should invent a new > > switch, something like > > --include=[schema,data,statistics] > > with which users can give one or more comma-separated types to be > > included in the dump. > > Robert Treat brought up a similar idea before: > > https://www.postgresql.org/message-id/CABV9wwO5v8Nu8q%2BxWexMdL3Z%2B2xS%3DfFJMQetBSHy3tR64wNHOA%40mail.gmail.com Oh, I hadn't seen it, but now that I do, it seems identical to mine. He even used some of the same terms I did. > > Then we state that --data-only is synonym for --include=data and > > --schema-only is synonym for --include=schema, and we don't need any > > other switches. Then it is obvious what happens, how to combine > > object types in the dumps and restores, and there's no need to reject > > invalid combinations because there aren't any. > > I assume that should be read as something like "include only", because > --include=data would also be excluding the schema and the stats. Of course. > And if that's the case, it does seem strange to do something like "-- > include=data --schema-only". > Another question: could you have multiple --include options, like "-- > include=data --include=schema"? Because you currently can't do "--data- > only --schema-only". So that would make it not quite an alias. IMO we should only allow one --include. > If we go in this direction, it might be easier to just say that -- > include conflicts with --schema-only and --data-only. Yep. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "Small aircraft do not crash frequently ... usually only once!" (ponder, http://thedailywtf.com/)
> I assume that should be read as something like "include only", because
> --include=data would also be excluding the schema and the stats.
Of course.
In general, I like the idea of --include, but it would need to be consistent in behavior across pg_dump/pg_restore/pg_upgrade(if applicable).
Under the current defaults, if a person wanted a dump to include stats on pg_dump they would need "--include=data,schema,statistics" (checking first whether it was "statistics" or "stats"), but they could use the defaults on pg_restore. Using the defaults on both would allow for stats to be restored, but none would have been generated. That is confusing to me, and I imagine it will be even more confusing to somebody who hasn't been reading this thread.
Under the current defaults, if a person wanted a dump to include stats on pg_dump they would need "--include=data,schema,statistics" (checking first whether it was "statistics" or "stats"), but they could use the defaults on pg_restore. Using the defaults on both would allow for stats to be restored, but none would have been generated. That is confusing to me, and I imagine it will be even more confusing to somebody who hasn't been reading this thread.
I don't see anything wrong with a full complement of --something-only, --no-something flags. The combinatorics aren't that hard, only tedious. If we *must* make the defaults for pg_dump different from pg_restore and pg_upgrade, then adding the --with-something flags becomes necessary, and we let redundant non-conflicting combinations slide.
All of these problems are solved if we include statistics by default across all three programs. Anything else involves complicated explanations that will confuse the users who read them, and surprise the users who don't.
On Tue, 2025-07-29 at 11:24 -0700, Jeff Davis wrote: > On Wed, 2025-06-18 at 10:21 -0700, Jeff Davis wrote: > > On Wed, 2025-06-18 at 10:43 -0500, Nathan Bossart wrote: > > > IIUC the current proposal is to: > > > > > > * Dump/restore stats by default. > > We don't have a consensus for that, so unless a few people make an > abrupt turnaround, this will remain off for v18. > > > > * Keep the --no-statistics, --no-schema, and --no-data options. > > > * Keep the --statistics-only, --schema-only, and --data-only > > > options. > > > * Remove the --with-statistics, --with-schema, and --with-data > > > options. > > The other parts of the proposal are very similar with mine below: > > > * delete the --with-data and --with-schema options as unnecessary > > * (maybe?) rename --with-statistics to --statistics > > * keep --no-statistics, --no-data, --no-schema (or use "without" > > instead?) > > * (maybe?) keep --statistics-only > > * reject the combination of an "only" option and a "with" option Patch attached. * removes --with-data and --with-schema (redundant) * renames --with-statistics to just --statistics I kept --statistics and --no-statistics for both pg_dump and pg_restore, because: (a) I think it's good to have consistent options between those two programs; and (b) it allows us to potentially change the default to include statistics in the future. That leaves some redundancy of the options, which some have expressed annoyance over, but it doesn't seem like a major point of objection. Regards, Jeff Davis
Вложения
On Thu, 2025-07-31 at 16:28 -0400, Corey Huinker wrote: > > In general, I like the idea of --include, but it would need to be > consistent in behavior across pg_dump/pg_restore/pg_upgrade(if > applicable). How should you exclude stats when doing pg_restore? Presumably, -- include=data,schema. But it's a bit strange if "--include" is the only way to exclude something. There are enough nuances and details here that I think the next step is for someone to turn the idea for --include into a reviewable patch, so that we can compare it to what we have now and see if people generally think it's an improvement over what we have now. Otherwise, we should just proceed with: https://www.postgresql.org/message-id/40cedfc22da152928a74d472708aaadb8855d8d9.camel@j-davis.com and close the open item. Regards, Jeff Davis
On Fri, Aug 1, 2025 at 4:02 PM Jeff Davis <pgsql@j-davis.com> wrote:
On Thu, 2025-07-31 at 16:28 -0400, Corey Huinker wrote:
>
> In general, I like the idea of --include, but it would need to be
> consistent in behavior across pg_dump/pg_restore/pg_upgrade(if
> applicable).
How should you exclude stats when doing pg_restore? Presumably, --
include=data,schema. But it's a bit strange if "--include" is the only
way to exclude something.
Yes, that's how you'd do it, if we go with the request for one --include option (or series of options) and no --exclude option (or series of options). I was under the impression that was the stated feature of --include.
There are enough nuances and details here that I think the next step is
for someone to turn the idea for --include into a reviewable patch, so
that we can compare it to what we have now and see if people generally
think it's an improvement over what we have now.
On Fri, Aug 01, 2025 at 12:42:16PM -0700, Jeff Davis wrote: > - <term><option>--with-statistics</option></term> > + <term><option>--statistics</option></term> > - <term><option>--with-statistics</option></term> > + <term><option>--statistics</option></term> > - <term><option>--with-statistics</option></term> > + <term><option>--statistics</option></term> nitpick: --statistics should be moved to maintain alphabetical ordering. -- nathan
On Sat, Aug 2, 2025 at 4:42 AM Jeff Davis <pgsql@j-davis.com> wrote: > Patch attached. > > * removes --with-data and --with-schema (redundant) > * renames --with-statistics to just --statistics > > I kept --statistics and --no-statistics for both pg_dump and > pg_restore, because: (a) I think it's good to have consistent options > between those two programs; and (b) it allows us to potentially change > the default to include statistics in the future. That leaves some > redundancy of the options, which some have expressed annoyance over, > but it doesn't seem like a major point of objection. I'm OK with this approach. Thanks for the patch! It looks good to me. While not directly related to your patch, I feel inclined to simplify option handling in pg_dump, similar to what we've already done in pg_restore.c and pg_dumpall.c. For example, we could change how statistics_only is handled like this: --------------------------------------- --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -448,7 +448,7 @@ main(int argc, char **argv) DataDirSyncMethod sync_method = DATA_DIR_SYNC_METHOD_FSYNC; bool data_only = false; bool schema_only = false; - bool statistics_only = false; + static int statistics_only = 0; bool with_statistics = false; bool no_data = false; bool no_schema = false; @@ -513,7 +513,7 @@ main(int argc, char **argv) {"serializable-deferrable", no_argument, &dopt.serializable_deferrable, 1}, {"snapshot", required_argument, NULL, 6}, {"statistics", no_argument, NULL, 22}, - {"statistics-only", no_argument, NULL, 18}, + {"statistics-only", no_argument, &statistics_only, 1}, {"strict-names", no_argument, &strict_names, 1}, {"use-set-session-authorization", no_argument, &dopt.use_setsessauth, 1}, {"no-comments", no_argument, &dopt.no_comments, 1}, @@ -777,10 +777,6 @@ main(int argc, char **argv) optarg); break; - case 18: - statistics_only = true; - break; - case 19: no_data = true; break; --------------------------------------- We could apply the same pattern to others like no_data, no_schema, no_statistics, and with_statistics to make the code cleaner and more consistent across tools. Regards, -- Fujii Masao
On Fri, 2025-08-01 at 17:46 -0500, Nathan Bossart wrote: > On Fri, Aug 01, 2025 at 12:42:16PM -0700, Jeff Davis wrote: > > - <term><option>--with-statistics</option></term> > > + <term><option>--statistics</option></term> > > > - <term><option>--with-statistics</option></term> > > + <term><option>--statistics</option></term> > > > - <term><option>--with-statistics</option></term> > > + <term><option>--statistics</option></term> > > nitpick: --statistics should be moved to maintain alphabetical > ordering. > Thank you. I committed this and closed the Open Item. I did not address the following issues: * Did not change default to on: there was no consensus on this point, and the default-off opinions were stronger. We may be able to revisit for PG19. * Did not add --include syntax: no patch and no consensus on the details. * Did not remove redundant --no-statistics option for pg_dump and redundant --statistics option for pg_restore: kept for consistency between pg_dump and pg_restore, and to allow us to potentially change to default-on in the future. If someone feels one of these needs to be addressed in PG18, let me know. Regards, Jeff Davis
On Sat, 2025-08-02 at 22:58 +0900, Fujii Masao wrote: > I'm OK with this approach. Thanks for the patch! It looks good to me. Thank you for the review. > - bool statistics_only = false; > + static int statistics_only = 0; Agreed. That can just be done for 19 though. Regards, Jeff Davis