Обсуждение: Re: analyze-in-stages post upgrade questions
[moving to pgsql-hackers] On Thu, 2025-07-10 at 17:20 +0100, Mircea Cadariu wrote: > I have only one suggestion for the patch. Consider adding a > corresponding test in src/bin/scripts/t/100_vacuumdb.pl. > > Proposal (I used this to check the patch): > > $node->safe_psql('postgres', > "CREATE TABLE parent_table (a INT) PARTITION BY LIST (a);\n" > . "CREATE TABLE child_table PARTITION OF parent_table FOR VALUES > IN (1);\n" > . "INSERT INTO parent_table VALUES (1);\n"); > $node->issues_sql_like( > [ > 'vacuumdb', '--analyze-only', 'postgres' > ], > qr/statement:\s+ANALYZE\s+public\.parent_table/s, > '--analyze_only updates statistics for partitioned tables'); Good idea; done in the attached version 2 of the patch. Yours, Laurenz Albe
Вложения
On 11/07/2025 10:51, Laurenz Albe wrote: > Good idea; done in the attached version 2 of the patch. Thanks! Looks good. I have set the status of the Commitfest entry to "Ready for Committer". Kind regards, Mircea Cadariu
On Fri, Jul 11, 2025 at 7:42 PM Mircea Cadariu <cadariu.mircea@gmail.com> wrote: > > On 11/07/2025 10:51, Laurenz Albe wrote: > > > Good idea; done in the attached version 2 of the patch. > > Thanks! Looks good. I have set the status of the Commitfest entry to > "Ready for Committer". I've started reviewing the patch since it's marked as ready for committer. Overall, I like the change. But I have one question: should this be treated as a bug fix that we back-patch to supported branches, or is it more of an improvement that should only go into master? Only calculate statistics for use by the optimizer (no vacuum). + If that option is specified, <command>vacuumdb</command> will also + process partitioned tables. Without that option, only the partitions + will be considered, unless a partitioned table is explicitly specified + with the <option>--table</option> option. This wording seems a bit out of place in the --analyze-only section, since it also describes the default behavior of vacuumdb without that option. Wouldn't it make more sense to move that explanation in the --table section? For example, we could add something like: ------------------ If no tables are specified with the --table option, vacuumdb will clean all regular tables and materialized views in the connected database. If --analyze-only or --analyze-in-stages is also specified, it will analyze all regular tables, partitioned tables, and materialized views (but not foreign tables). ------------------ + /* + * VACUUMing partitioned tables would be unreasonably expensive, since + * that entails processing the partitions twice (once as part of the + * partitioned table, once as tables in their own right) for no + * benefit. But if we only ANALYZE, collecting statistics for + * partitioned tables is worth the effort. + */ This is probably true. But isn't the main reason more about aligning with the behavior of the underlying VACUUM and ANALYZE commands? As the vacuumdb docs says, "There is no effective difference between vacuuming and analyzing databases via this utility and via other methods for accessing the server.", so its default target objects should match: VACUUM skips partitioned tables by default, while ANALYZE includes them. If that's the case, maybe the comment should reflect that instead. + qr/statement:\s+ANALYZE\s+public\.parent_table/s, + '--analyze_only updates statistics for partitioned tables'); A plain space might be sufficient instead of \s+. Also, I don't think the backslash before ".parent_table" is necessary. Regards, -- Fujii Masao
Hi,
On 30/07/2025 12:49, Fujii Masao wrote:
Thanks!I've started reviewing the patch since it's marked as ready for committer.
I reckon it might make sense to back-patch it to previous versions, as users might not upgrade always to the latest version.Overall, I like the change. But I have one question: should this be treated as a bug fix that we back-patch to supported branches, or is it more of an improvement that should only go into master?
Yes, agreed.Only calculate statistics for use by the optimizer (no vacuum). + If that option is specified, <command>vacuumdb</command> will also + process partitioned tables. Without that option, only the partitions + will be considered, unless a partitioned table is explicitly specified + with the <option>--table</option> option. This wording seems a bit out of place in the --analyze-only section, since it also describes the default behavior of vacuumdb without that option. Wouldn't it make more sense to move that explanation in the --table section? For example, we could add something like: ------------------ If no tables are specified with the --table option, vacuumdb will clean all regular tables and materialized views in the connected database. If --analyze-only or --analyze-in-stages is also specified, it will analyze all regular tables, partitioned tables, and materialized views (but not foreign tables). ------------------
I see what you mean. From that perspective, I wonder if we even need a comment there at all.+ /* + * VACUUMing partitioned tables would be unreasonably expensive, since + * that entails processing the partitions twice (once as part of the + * partitioned table, once as tables in their own right) for no + * benefit. But if we only ANALYZE, collecting statistics for + * partitioned tables is worth the effort. + */ This is probably true. But isn't the main reason more about aligning with the behavior of the underlying VACUUM and ANALYZE commands? As the vacuumdb docs says, "There is no effective difference between vacuuming and analyzing databases via this utility and via other methods for accessing the server.", so its default target objects should match: VACUUM skips partitioned tables by default, while ANALYZE includes them. If that's the case, maybe the comment should reflect that instead.
+ qr/statement:\s+ANALYZE\s+public\.parent_table/s, + '--analyze_only updates statistics for partitioned tables'); A plain space might be sufficient instead of \s+. Also, I don't think the backslash before ".parent_table" is necessary.
Good catch! Indeed let's simplify that to contain strictly only what's necessary.
Kind regards,
Mircea Cadariu
On Wed, Aug 6, 2025 at 1:01 PM Mircea Cadariu <cadariu.mircea@gmail.com> wrote: > Overall, I like the change. But I have one question: should this be treated as > a bug fix that we back-patch to supported branches, or is it more of > an improvement that should only go into master? > > I reckon it might make sense to back-patch it to previous versions, as users might not upgrade always to the latest version. I understand your point. But on second thought, since the patch changes behavior, I'm leaning toward treating it as an improvement, so it should only go to master... > + /* > + * VACUUMing partitioned tables would be unreasonably expensive, since > + * that entails processing the partitions twice (once as part of the > + * partitioned table, once as tables in their own right) for no > + * benefit. But if we only ANALYZE, collecting statistics for > + * partitioned tables is worth the effort. > + */ > > This is probably true. But isn't the main reason more about aligning with > the behavior of the underlying VACUUM and ANALYZE commands? As the vacuumdb > docs says, "There is no effective difference between vacuuming and analyzing > databases via this utility and via other methods for accessing the server.", > so its default target objects should match: VACUUM skips partitioned tables > by default, while ANALYZE includes them. If that's the case, maybe the comment > should reflect that instead. > > I see what you mean. From that perspective, I wonder if we even need a comment there at all. Or, if we keep it, though, I'd like to update it to something like the following: -------------------- vacuumdb should generally follow the behavior of the underlying VACUUM and ANALYZE commands. If analyze_only is true, process regular tables, materialized views, and partitioned tables, just like ANALYZE (with no specific target tables) does. Otherwise, process only regular tables and materialized views, since VACUUM skips partitioned tables when no target tables are specified. -------------------- Regards, -- Fujii Masao
On Wed, Aug 06, 2025 at 11:25:53PM +0900, Fujii Masao wrote: > On Wed, Aug 6, 2025 at 1:01 PM Mircea Cadariu <cadariu.mircea@gmail.com> wrote: >> Overall, I like the change. But I have one question: should this be treated as >> a bug fix that we back-patch to supported branches, or is it more of >> an improvement that should only go into master? >> >> I reckon it might make sense to back-patch it to previous versions, as users might not upgrade always to the latest version. > > I understand your point. But on second thought, since the patch changes > behavior, I'm leaning toward treating it as an improvement, so it should > only go to master... I also am leaning towards treating this as v19 material. It's a nontrivial behavior change, and this option is useful for major version upgrades, which is an area that we really don't want to surprise users too much. Furthermore, auto-analyze doesn't process partitioned tables, either, so this introduces a bit of divergence. (I'd love to see that project picked up again someday. Perhaps I will take a gander...) -- nathan
On Wed, 2025-08-06 at 23:25 +0900, Fujii Masao wrote: > On Wed, Aug 6, 2025 at 1:01 PM Mircea Cadariu <cadariu.mircea@gmail.com> wrote: > > Overall, I like the change. But I have one question: should this be treated as > > a bug fix that we back-patch to supported branches, or is it more of > > an improvement that should only go into master? > > > > I reckon it might make sense to back-patch it to previous versions, as users might not upgrade always to the latest version. > > I understand your point. But on second thought, since the patch changes > behavior, I'm leaning toward treating it as an improvement, so it should > only go to master... I agree that this behavior change should not be backpatched. That is not a bugfix. > > + /* > > + * VACUUMing partitioned tables would be unreasonably expensive, since > > + * that entails processing the partitions twice (once as part of the > > + * partitioned table, once as tables in their own right) for no > > + * benefit. But if we only ANALYZE, collecting statistics for > > + * partitioned tables is worth the effort. > > + */ > > > > This is probably true. But isn't the main reason more about aligning with > > the behavior of the underlying VACUUM and ANALYZE commands? As the vacuumdb > > docs says, "There is no effective difference between vacuuming and analyzing > > databases via this utility and via other methods for accessing the server.", > > so its default target objects should match: VACUUM skips partitioned tables > > by default, while ANALYZE includes them. If that's the case, maybe the comment > > should reflect that instead. > > > > I see what you mean. From that perspective, I wonder if we even need a comment there at all. > > Or, if we keep it, though, I'd like to update it to something like > the following: > > -------------------- > vacuumdb should generally follow the behavior of the underlying > VACUUM and ANALYZE commands. If analyze_only is true, process > regular tables, materialized views, and partitioned tables, just like > ANALYZE (with no specific target tables) does. Otherwise, process > only regular tables and materialized views, since VACUUM skips > partitioned tables when no target tables are specified. > -------------------- I am fine with that suggestion. Alternatively, my original comment could be amended with Besides, ANALYZE (without an option) processes partitioned tables, and "vacuumdb -Z" should behave like ANALYZE. Yours, Laurenz Albe
On Thu, Aug 7, 2025 at 3:14 AM Nathan Bossart <nathandbossart@gmail.com> wrote: > I also am leaning towards treating this as v19 material. It's a nontrivial > behavior change, and this option is useful for major version upgrades, > which is an area that we really don't want to surprise users too much. +1 > Furthermore, auto-analyze doesn't process partitioned tables, either, so > this introduces a bit of divergence. (I'd love to see that project picked > up again someday. Perhaps I will take a gander...) Sounds good! Regards, -- Fujii Masao
On Thu, Aug 7, 2025 at 5:52 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > I understand your point. But on second thought, since the patch changes > > behavior, I'm leaning toward treating it as an improvement, so it should > > only go to master... > > I agree that this behavior change should not be backpatched. > That is not a bugfix. +1 > > -------------------- > > vacuumdb should generally follow the behavior of the underlying > > VACUUM and ANALYZE commands. If analyze_only is true, process > > regular tables, materialized views, and partitioned tables, just like > > ANALYZE (with no specific target tables) does. Otherwise, process > > only regular tables and materialized views, since VACUUM skips > > partitioned tables when no target tables are specified. > > -------------------- > > I am fine with that suggestion. Thanks! So I've updated the patch based on my earlier comments. Unless there are objections, I'll commit the attached version to master only. Regards, -- Fujii Masao
Вложения
On Mon, 2025-08-18 at 11:38 +0900, Fujii Masao wrote: > Thanks! So I've updated the patch based on my earlier comments. > Unless there are objections, I'll commit the attached version to master only. I am fine with your patch. One suggestion: > --- a/doc/src/sgml/ref/vacuumdb.sgml > +++ b/doc/src/sgml/ref/vacuumdb.sgml > @@ -397,6 +397,15 @@ PostgreSQL documentation > Multiple tables can be vacuumed by writing multiple > <option>-t</option> switches. > </para> > + <para> > + If no tables are specified with the <option>--table</option> option, > + <application>vacuumdb</application> will clean all regular tables > + and materialized views in the connected database. > + If <option>--analyze-only</option> or > + <option>--analyze-in-stages</option> is also specified, > + it will analyze all regular tables, partitioned tables, > + and materialized views (but not foreign tables). > + </para> I suggest replacing "clean" with "process", since VACUUM does so much more than clean up dead tuples. Concerning backpatching, I voted against, but I suggest that this be backpatched to v18. I don't feel very strongly about it though. Yours, Laurenz Albe
On Mon, Aug 18, 2025 at 3:40 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > On Mon, 2025-08-18 at 11:38 +0900, Fujii Masao wrote: > > Thanks! So I've updated the patch based on my earlier comments. > > Unless there are objections, I'll commit the attached version to master only. > > I am fine with your patch. Thanks for the review! > One suggestion: > > > --- a/doc/src/sgml/ref/vacuumdb.sgml > > +++ b/doc/src/sgml/ref/vacuumdb.sgml > > @@ -397,6 +397,15 @@ PostgreSQL documentation > > Multiple tables can be vacuumed by writing multiple > > <option>-t</option> switches. > > </para> > > + <para> > > + If no tables are specified with the <option>--table</option> option, > > + <application>vacuumdb</application> will clean all regular tables > > + and materialized views in the connected database. > > + If <option>--analyze-only</option> or > > + <option>--analyze-in-stages</option> is also specified, > > + it will analyze all regular tables, partitioned tables, > > + and materialized views (but not foreign tables). > > + </para> > > I suggest replacing "clean" with "process", since VACUUM does so much more than > clean up dead tuples. I see your point. However, since the vacuumdb docs already use "clean" in several places, I think it's better to keep using "clean" here for consistency. Thought? > Concerning backpatching, I voted against, but I suggest that this be backpatched > to v18. I don't feel very strongly about it though. As for back-patching, I failed to find a strong reason to apply this change to v18 over the many other patches that could not be committed before the feature freeze... Of course if there's broad support for back-patching, we can certainly revisit it. But for now I'm thinking to commit the patch to master. Regards, -- Fujii Masao
On Tue, 2025-08-19 at 23:40 +0900, Fujii Masao wrote: > > > --- a/doc/src/sgml/ref/vacuumdb.sgml > > > +++ b/doc/src/sgml/ref/vacuumdb.sgml > > > @@ -397,6 +397,15 @@ PostgreSQL documentation > > > Multiple tables can be vacuumed by writing multiple > > > <option>-t</option> switches. > > > </para> > > > + <para> > > > + If no tables are specified with the <option>--table</option> option, > > > + <application>vacuumdb</application> will clean all regular tables > > > + and materialized views in the connected database. > > > + If <option>--analyze-only</option> or > > > + <option>--analyze-in-stages</option> is also specified, > > > + it will analyze all regular tables, partitioned tables, > > > + and materialized views (but not foreign tables). > > > + </para> > > > > I suggest replacing "clean" with "process", since VACUUM does so much more than > > clean up dead tuples. > > I see your point. However, since the vacuumdb docs already use "clean" > in several places, I think it's better to keep using "clean" here > for consistency. Thought? Works for me; I didn't consider that. > > Concerning backpatching, I voted against, but I suggest that this be backpatched > > to v18. I don't feel very strongly about it though. > > As for back-patching, I failed to find a strong reason to apply this change > to v18 over the many other patches that could not be committed before > the feature freeze... Of course if there's broad support for back-patching, > we can certainly revisit it. But for now I'm thinking to commit the patch > to master. I don't have a strong reason either - my reasoning was that the change is small and unlikely to introduce a bug, and that it would be nice to get more accurate statistics on partitioned tables after "pg_upgrade" a year earlier. But I won't object if the patch is only in v19. Yours, Laurenz Albe
On Wed, Aug 20, 2025 at 12:16 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > On Tue, 2025-08-19 at 23:40 +0900, Fujii Masao wrote: > > > > --- a/doc/src/sgml/ref/vacuumdb.sgml > > > > +++ b/doc/src/sgml/ref/vacuumdb.sgml > > > > @@ -397,6 +397,15 @@ PostgreSQL documentation > > > > Multiple tables can be vacuumed by writing multiple > > > > <option>-t</option> switches. > > > > </para> > > > > + <para> > > > > + If no tables are specified with the <option>--table</option> option, > > > > + <application>vacuumdb</application> will clean all regular tables > > > > + and materialized views in the connected database. > > > > + If <option>--analyze-only</option> or > > > > + <option>--analyze-in-stages</option> is also specified, > > > > + it will analyze all regular tables, partitioned tables, > > > > + and materialized views (but not foreign tables). > > > > + </para> > > > > > > I suggest replacing "clean" with "process", since VACUUM does so much more than > > > clean up dead tuples. > > > > I see your point. However, since the vacuumdb docs already use "clean" > > in several places, I think it's better to keep using "clean" here > > for consistency. Thought? > > Works for me; I didn't consider that. > > > > Concerning backpatching, I voted against, but I suggest that this be backpatched > > > to v18. I don't feel very strongly about it though. > > > > As for back-patching, I failed to find a strong reason to apply this change > > to v18 over the many other patches that could not be committed before > > the feature freeze... Of course if there's broad support for back-patching, > > we can certainly revisit it. But for now I'm thinking to commit the patch > > to master. > > I don't have a strong reason either - my reasoning was that the change is small > and unlikely to introduce a bug, and that it would be nice to get more accurate > statistics on partitioned tables after "pg_upgrade" a year earlier. > > But I won't object if the patch is only in v19. OK, so for now I've pushed the patch to master. Thanks! Regards, -- Fujii Masao
On Wed, 2025-08-20 at 13:31 +0900, Fujii Masao wrote: > OK, so for now I've pushed the patch to master. Thanks! Thank you for working on that! Yours, Laurenz Albe
When analyzing a partitioned table, I think you should use ANALYZE ONLY, or otherwise avoid processing the children twice. Thanks for handling this. I was recently suprised to learn that vacuumdb doesn't process parents. -- Justin