Обсуждение: Re: analyze-in-stages post upgrade questions

Поиск
Список
Период
Сортировка

Re: analyze-in-stages post upgrade questions

От
Laurenz Albe
Дата:
[moving to pgsql-hackers]
On Thu, 2025-07-10 at 17:20 +0100, Mircea Cadariu wrote:
>       I have only one suggestion for the patch. Consider adding a
> corresponding test in src/bin/scripts/t/100_vacuumdb.pl.
>
> Proposal (I used this to check the patch):
>
> $node->safe_psql('postgres',
>      "CREATE TABLE parent_table (a INT) PARTITION BY LIST (a);\n"
>        . "CREATE TABLE child_table PARTITION OF parent_table FOR VALUES
> IN (1);\n"
>        . "INSERT INTO parent_table VALUES (1);\n");
> $node->issues_sql_like(
>      [
>          'vacuumdb', '--analyze-only', 'postgres'
>      ],
>      qr/statement:\s+ANALYZE\s+public\.parent_table/s,
>      '--analyze_only updates statistics for partitioned tables');

Good idea; done in the attached version 2 of the patch.

Yours,
Laurenz Albe

Вложения

Re: analyze-in-stages post upgrade questions

От
Mircea Cadariu
Дата:
On 11/07/2025 10:51, Laurenz Albe wrote:

> Good idea; done in the attached version 2 of the patch.

Thanks! Looks good. I have set the status of the Commitfest entry to 
"Ready for Committer".


Kind regards,

Mircea Cadariu




Re: analyze-in-stages post upgrade questions

От
Fujii Masao
Дата:
On Fri, Jul 11, 2025 at 7:42 PM Mircea Cadariu <cadariu.mircea@gmail.com> wrote:
>
> On 11/07/2025 10:51, Laurenz Albe wrote:
>
> > Good idea; done in the attached version 2 of the patch.
>
> Thanks! Looks good. I have set the status of the Commitfest entry to
> "Ready for Committer".

I've started reviewing the patch since it's marked as ready for committer.

Overall, I like the change. But I have one question: should this be treated as
a bug fix that we back-patch to supported branches, or is it more of
an improvement that should only go into master?


         Only calculate statistics for use by the optimizer (no vacuum).
+        If that option is specified, <command>vacuumdb</command> will also
+        process partitioned tables.  Without that option, only the partitions
+        will be considered, unless a partitioned table is explicitly specified
+        with the <option>--table</option> option.

This wording seems a bit out of place in the --analyze-only section,
since it also describes the default behavior of vacuumdb without that option.
Wouldn't it make more sense to move that explanation in the --table section?

For example, we could add something like:

------------------
If no tables are specified with the --table option, vacuumdb will
clean all regular tables and materialized views in the connected
database. If --analyze-only or --analyze-in-stages is also specified,
it will analyze all regular tables, partitioned tables, and
materialized views (but not foreign tables).
------------------


+ /*
+ * VACUUMing partitioned tables would be unreasonably expensive, since
+ * that entails processing the partitions twice (once as part of the
+ * partitioned table, once as tables in their own right) for no
+ * benefit. But if we only ANALYZE, collecting statistics for
+ * partitioned tables is worth the effort.
+ */

This is probably true. But isn't the main reason more about aligning with
the behavior of the underlying VACUUM and ANALYZE commands? As the vacuumdb
docs says, "There is no effective difference between vacuuming and analyzing
databases via this utility and via other methods for accessing the server.",
so its default target objects should match: VACUUM skips partitioned tables
by default, while ANALYZE includes them. If that's the case, maybe the comment
should reflect that instead.


+ qr/statement:\s+ANALYZE\s+public\.parent_table/s,
+ '--analyze_only updates statistics for partitioned tables');

A plain space might be sufficient instead of \s+.
Also, I don't think the backslash before ".parent_table" is necessary.

Regards,

--
Fujii Masao



Re: analyze-in-stages post upgrade questions

От
Mircea Cadariu
Дата:

Hi,

On 30/07/2025 12:49, Fujii Masao wrote:
I've started reviewing the patch since it's marked as ready for committer.
Thanks! 
Overall, I like the change. But I have one question: should this be treated as
a bug fix that we back-patch to supported branches, or is it more of
an improvement that should only go into master?
I reckon it might make sense to back-patch it to previous versions, as users might not upgrade always to the latest version.
         Only calculate statistics for use by the optimizer (no vacuum).
+        If that option is specified, <command>vacuumdb</command> will also
+        process partitioned tables.  Without that option, only the partitions
+        will be considered, unless a partitioned table is explicitly specified
+        with the <option>--table</option> option.

This wording seems a bit out of place in the --analyze-only section,
since it also describes the default behavior of vacuumdb without that option.
Wouldn't it make more sense to move that explanation in the --table section?

For example, we could add something like:

------------------
If no tables are specified with the --table option, vacuumdb will
clean all regular tables and materialized views in the connected
database. If --analyze-only or --analyze-in-stages is also specified,
it will analyze all regular tables, partitioned tables, and
materialized views (but not foreign tables).
------------------
Yes, agreed. 
+ /*
+ * VACUUMing partitioned tables would be unreasonably expensive, since
+ * that entails processing the partitions twice (once as part of the
+ * partitioned table, once as tables in their own right) for no
+ * benefit. But if we only ANALYZE, collecting statistics for
+ * partitioned tables is worth the effort.
+ */

This is probably true. But isn't the main reason more about aligning with
the behavior of the underlying VACUUM and ANALYZE commands? As the vacuumdb
docs says, "There is no effective difference between vacuuming and analyzing
databases via this utility and via other methods for accessing the server.",
so its default target objects should match: VACUUM skips partitioned tables
by default, while ANALYZE includes them. If that's the case, maybe the comment
should reflect that instead.
I see what you mean. From that perspective, I wonder if we even need a comment there at all. 
+ qr/statement:\s+ANALYZE\s+public\.parent_table/s,
+ '--analyze_only updates statistics for partitioned tables');

A plain space might be sufficient instead of \s+.
Also, I don't think the backslash before ".parent_table" is necessary.

Good catch! Indeed let's simplify that to contain strictly only what's necessary.  

Kind regards,

Mircea Cadariu

Re: analyze-in-stages post upgrade questions

От
Fujii Masao
Дата:
On Wed, Aug 6, 2025 at 1:01 PM Mircea Cadariu <cadariu.mircea@gmail.com> wrote:
> Overall, I like the change. But I have one question: should this be treated as
> a bug fix that we back-patch to supported branches, or is it more of
> an improvement that should only go into master?
>
> I reckon it might make sense to back-patch it to previous versions, as users might not upgrade always to the latest
version.

I understand your point. But on second thought, since the patch changes
behavior, I'm leaning toward treating it as an improvement, so it should
only go to master...


> + /*
> + * VACUUMing partitioned tables would be unreasonably expensive, since
> + * that entails processing the partitions twice (once as part of the
> + * partitioned table, once as tables in their own right) for no
> + * benefit. But if we only ANALYZE, collecting statistics for
> + * partitioned tables is worth the effort.
> + */
>
> This is probably true. But isn't the main reason more about aligning with
> the behavior of the underlying VACUUM and ANALYZE commands? As the vacuumdb
> docs says, "There is no effective difference between vacuuming and analyzing
> databases via this utility and via other methods for accessing the server.",
> so its default target objects should match: VACUUM skips partitioned tables
> by default, while ANALYZE includes them. If that's the case, maybe the comment
> should reflect that instead.
>
> I see what you mean. From that perspective, I wonder if we even need a comment there at all.

Or, if we keep it, though, I'd like to update it to something like
the following:

--------------------
vacuumdb should generally follow the behavior of the underlying
VACUUM and ANALYZE commands. If analyze_only is true, process
regular tables, materialized views, and partitioned tables, just like
ANALYZE (with no specific target tables) does. Otherwise, process
only regular tables and materialized views, since VACUUM skips
partitioned tables when no target tables are specified.
--------------------

Regards,

--
Fujii Masao



Re: analyze-in-stages post upgrade questions

От
Nathan Bossart
Дата:
On Wed, Aug 06, 2025 at 11:25:53PM +0900, Fujii Masao wrote:
> On Wed, Aug 6, 2025 at 1:01 PM Mircea Cadariu <cadariu.mircea@gmail.com> wrote:
>> Overall, I like the change. But I have one question: should this be treated as
>> a bug fix that we back-patch to supported branches, or is it more of
>> an improvement that should only go into master?
>>
>> I reckon it might make sense to back-patch it to previous versions, as users might not upgrade always to the latest
version.
> 
> I understand your point. But on second thought, since the patch changes
> behavior, I'm leaning toward treating it as an improvement, so it should
> only go to master...

I also am leaning towards treating this as v19 material.  It's a nontrivial
behavior change, and this option is useful for major version upgrades,
which is an area that we really don't want to surprise users too much.
Furthermore, auto-analyze doesn't process partitioned tables, either, so
this introduces a bit of divergence.  (I'd love to see that project picked
up again someday.  Perhaps I will take a gander...)

-- 
nathan



Re: analyze-in-stages post upgrade questions

От
Laurenz Albe
Дата:
On Wed, 2025-08-06 at 23:25 +0900, Fujii Masao wrote:
> On Wed, Aug 6, 2025 at 1:01 PM Mircea Cadariu <cadariu.mircea@gmail.com> wrote:
> > Overall, I like the change. But I have one question: should this be treated as
> > a bug fix that we back-patch to supported branches, or is it more of
> > an improvement that should only go into master?
> >
> > I reckon it might make sense to back-patch it to previous versions, as users might not upgrade always to the latest
version.
>
> I understand your point. But on second thought, since the patch changes
> behavior, I'm leaning toward treating it as an improvement, so it should
> only go to master...

I agree that this behavior change should not be backpatched.
That is not a bugfix.

> > + /*
> > + * VACUUMing partitioned tables would be unreasonably expensive, since
> > + * that entails processing the partitions twice (once as part of the
> > + * partitioned table, once as tables in their own right) for no
> > + * benefit. But if we only ANALYZE, collecting statistics for
> > + * partitioned tables is worth the effort.
> > + */
> >
> > This is probably true. But isn't the main reason more about aligning with
> > the behavior of the underlying VACUUM and ANALYZE commands? As the vacuumdb
> > docs says, "There is no effective difference between vacuuming and analyzing
> > databases via this utility and via other methods for accessing the server.",
> > so its default target objects should match: VACUUM skips partitioned tables
> > by default, while ANALYZE includes them. If that's the case, maybe the comment
> > should reflect that instead.
> >
> > I see what you mean. From that perspective, I wonder if we even need a comment there at all.
>
> Or, if we keep it, though, I'd like to update it to something like
> the following:
>
> --------------------
> vacuumdb should generally follow the behavior of the underlying
> VACUUM and ANALYZE commands. If analyze_only is true, process
> regular tables, materialized views, and partitioned tables, just like
> ANALYZE (with no specific target tables) does. Otherwise, process
> only regular tables and materialized views, since VACUUM skips
> partitioned tables when no target tables are specified.
> --------------------

I am fine with that suggestion.

Alternatively, my original comment could be amended with

  Besides, ANALYZE (without an option) processes partitioned tables, and
  "vacuumdb -Z" should behave like ANALYZE.

Yours,
Laurenz Albe



Re: analyze-in-stages post upgrade questions

От
Fujii Masao
Дата:
On Thu, Aug 7, 2025 at 3:14 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
> I also am leaning towards treating this as v19 material.  It's a nontrivial
> behavior change, and this option is useful for major version upgrades,
> which is an area that we really don't want to surprise users too much.

+1

> Furthermore, auto-analyze doesn't process partitioned tables, either, so
> this introduces a bit of divergence.  (I'd love to see that project picked
> up again someday.  Perhaps I will take a gander...)

Sounds good!

Regards,

--
Fujii Masao



Re: analyze-in-stages post upgrade questions

От
Fujii Masao
Дата:
On Thu, Aug 7, 2025 at 5:52 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> > I understand your point. But on second thought, since the patch changes
> > behavior, I'm leaning toward treating it as an improvement, so it should
> > only go to master...
>
> I agree that this behavior change should not be backpatched.
> That is not a bugfix.

+1

> > --------------------
> > vacuumdb should generally follow the behavior of the underlying
> > VACUUM and ANALYZE commands. If analyze_only is true, process
> > regular tables, materialized views, and partitioned tables, just like
> > ANALYZE (with no specific target tables) does. Otherwise, process
> > only regular tables and materialized views, since VACUUM skips
> > partitioned tables when no target tables are specified.
> > --------------------
>
> I am fine with that suggestion.

Thanks! So I've updated the patch based on my earlier comments.
Unless there are objections, I'll commit the attached version to master only.

Regards,

--
Fujii Masao

Вложения

Re: analyze-in-stages post upgrade questions

От
Laurenz Albe
Дата:
On Mon, 2025-08-18 at 11:38 +0900, Fujii Masao wrote:
> Thanks! So I've updated the patch based on my earlier comments.
> Unless there are objections, I'll commit the attached version to master only.

I am fine with your patch.

One suggestion:

> --- a/doc/src/sgml/ref/vacuumdb.sgml
> +++ b/doc/src/sgml/ref/vacuumdb.sgml
> @@ -397,6 +397,15 @@ PostgreSQL documentation
>          Multiple tables can be vacuumed by writing multiple
>          <option>-t</option> switches.
>         </para>
> +       <para>
> +        If no tables are specified with the <option>--table</option> option,
> +        <application>vacuumdb</application> will clean all regular tables
> +        and materialized views in the connected database.
> +        If <option>--analyze-only</option> or
> +        <option>--analyze-in-stages</option> is also specified,
> +        it will analyze all regular tables, partitioned tables,
> +        and materialized views (but not foreign tables).
> +       </para>

I suggest replacing "clean" with "process", since VACUUM does so much more than
clean up dead tuples.

Concerning backpatching, I voted against, but I suggest that this be backpatched
to v18.  I don't feel very strongly about it though.

Yours,
Laurenz Albe



Re: analyze-in-stages post upgrade questions

От
Fujii Masao
Дата:
On Mon, Aug 18, 2025 at 3:40 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
>
> On Mon, 2025-08-18 at 11:38 +0900, Fujii Masao wrote:
> > Thanks! So I've updated the patch based on my earlier comments.
> > Unless there are objections, I'll commit the attached version to master only.
>
> I am fine with your patch.

Thanks for the review!


> One suggestion:
>
> > --- a/doc/src/sgml/ref/vacuumdb.sgml
> > +++ b/doc/src/sgml/ref/vacuumdb.sgml
> > @@ -397,6 +397,15 @@ PostgreSQL documentation
> >          Multiple tables can be vacuumed by writing multiple
> >          <option>-t</option> switches.
> >         </para>
> > +       <para>
> > +        If no tables are specified with the <option>--table</option> option,
> > +        <application>vacuumdb</application> will clean all regular tables
> > +        and materialized views in the connected database.
> > +        If <option>--analyze-only</option> or
> > +        <option>--analyze-in-stages</option> is also specified,
> > +        it will analyze all regular tables, partitioned tables,
> > +        and materialized views (but not foreign tables).
> > +       </para>
>
> I suggest replacing "clean" with "process", since VACUUM does so much more than
> clean up dead tuples.

I see your point. However, since the vacuumdb docs already use "clean"
in several places, I think it's better to keep using "clean" here
for consistency. Thought?


> Concerning backpatching, I voted against, but I suggest that this be backpatched
> to v18.  I don't feel very strongly about it though.

As for back-patching, I failed to find a strong reason to apply this change
to v18 over the many other patches that could not be committed before
the feature freeze... Of course if there's broad support for back-patching,
we can certainly revisit it. But for now I'm thinking to commit the patch
to master.

Regards,

--
Fujii Masao



Re: analyze-in-stages post upgrade questions

От
Laurenz Albe
Дата:
On Tue, 2025-08-19 at 23:40 +0900, Fujii Masao wrote:
> > > --- a/doc/src/sgml/ref/vacuumdb.sgml
> > > +++ b/doc/src/sgml/ref/vacuumdb.sgml
> > > @@ -397,6 +397,15 @@ PostgreSQL documentation
> > >           Multiple tables can be vacuumed by writing multiple
> > >           <option>-t</option> switches.
> > >          </para>
> > > +       <para>
> > > +        If no tables are specified with the <option>--table</option> option,
> > > +        <application>vacuumdb</application> will clean all regular tables
> > > +        and materialized views in the connected database.
> > > +        If <option>--analyze-only</option> or
> > > +        <option>--analyze-in-stages</option> is also specified,
> > > +        it will analyze all regular tables, partitioned tables,
> > > +        and materialized views (but not foreign tables).
> > > +       </para>
> >
> > I suggest replacing "clean" with "process", since VACUUM does so much more than
> > clean up dead tuples.
>
> I see your point. However, since the vacuumdb docs already use "clean"
> in several places, I think it's better to keep using "clean" here
> for consistency. Thought?

Works for me; I didn't consider that.

> > Concerning backpatching, I voted against, but I suggest that this be backpatched
> > to v18.  I don't feel very strongly about it though.
>
> As for back-patching, I failed to find a strong reason to apply this change
> to v18 over the many other patches that could not be committed before
> the feature freeze... Of course if there's broad support for back-patching,
> we can certainly revisit it. But for now I'm thinking to commit the patch
> to master.

I don't have a strong reason either - my reasoning was that the change is small
and unlikely to introduce a bug, and that it would be nice to get more accurate
statistics on partitioned tables after "pg_upgrade" a year earlier.

But I won't object if the patch is only in v19.

Yours,
Laurenz Albe



Re: analyze-in-stages post upgrade questions

От
Fujii Masao
Дата:
On Wed, Aug 20, 2025 at 12:16 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
>
> On Tue, 2025-08-19 at 23:40 +0900, Fujii Masao wrote:
> > > > --- a/doc/src/sgml/ref/vacuumdb.sgml
> > > > +++ b/doc/src/sgml/ref/vacuumdb.sgml
> > > > @@ -397,6 +397,15 @@ PostgreSQL documentation
> > > >           Multiple tables can be vacuumed by writing multiple
> > > >           <option>-t</option> switches.
> > > >          </para>
> > > > +       <para>
> > > > +        If no tables are specified with the <option>--table</option> option,
> > > > +        <application>vacuumdb</application> will clean all regular tables
> > > > +        and materialized views in the connected database.
> > > > +        If <option>--analyze-only</option> or
> > > > +        <option>--analyze-in-stages</option> is also specified,
> > > > +        it will analyze all regular tables, partitioned tables,
> > > > +        and materialized views (but not foreign tables).
> > > > +       </para>
> > >
> > > I suggest replacing "clean" with "process", since VACUUM does so much more than
> > > clean up dead tuples.
> >
> > I see your point. However, since the vacuumdb docs already use "clean"
> > in several places, I think it's better to keep using "clean" here
> > for consistency. Thought?
>
> Works for me; I didn't consider that.
>
> > > Concerning backpatching, I voted against, but I suggest that this be backpatched
> > > to v18.  I don't feel very strongly about it though.
> >
> > As for back-patching, I failed to find a strong reason to apply this change
> > to v18 over the many other patches that could not be committed before
> > the feature freeze... Of course if there's broad support for back-patching,
> > we can certainly revisit it. But for now I'm thinking to commit the patch
> > to master.
>
> I don't have a strong reason either - my reasoning was that the change is small
> and unlikely to introduce a bug, and that it would be nice to get more accurate
> statistics on partitioned tables after "pg_upgrade" a year earlier.
>
> But I won't object if the patch is only in v19.

OK, so for now I've pushed the patch to master. Thanks!

Regards,

--
Fujii Masao



Re: analyze-in-stages post upgrade questions

От
Laurenz Albe
Дата:
On Wed, 2025-08-20 at 13:31 +0900, Fujii Masao wrote:
> OK, so for now I've pushed the patch to master. Thanks!

Thank you for working on that!

Yours,
Laurenz Albe



Re: analyze-in-stages post upgrade questions

От
Justin Pryzby
Дата:
When analyzing a partitioned table, I think you should use ANALYZE ONLY,
or otherwise avoid processing the children twice.

Thanks for handling this.  I was recently suprised to learn that
vacuumdb doesn't process parents.

-- 
Justin