Обсуждение: analyze-in-stages post upgrade questions

Поиск
Список
Период
Сортировка

analyze-in-stages post upgrade questions

От
"Zechman, Derek S"
Дата:

 

We recently performed an upgrade from pg14 (14.18) to pg16 (16.9) and performed the analyze-in-stages post upgrade.  It has been noticed that some plans changed to use hash joins instead of nested loops.  Further investigation found it was because the parent table of partitioned tables did not have stats.  After running an ANALYZE on the parent tables we got similar plan an execution times as before.

 

I have two questions

1 - Why does analyze-in-stages not analyze the parent tables? 

2 – What happens if we do not run analyze-in-stages post upgrade and just run an analyze?

 

Thanks,

Sean

Re: analyze-in-stages post upgrade questions

От
Ron Johnson
Дата:
On Fri, Jun 27, 2025 at 9:35 AM Zechman, Derek S <Derek.S.Zechman@snapon.com> wrote:

 

We recently performed an upgrade from pg14 (14.18) to pg16 (16.9) and performed the analyze-in-stages post upgrade.  It has been noticed that some plans changed to use hash joins instead of nested loops.  Further investigation found it was because the parent table of partitioned tables did not have stats.  After running an ANALYZE on the parent tables we got similar plan an execution times as before.

 

I have two questions

1 - Why does analyze-in-stages not analyze the parent tables? 

2 – What happens if we do not run analyze-in-stages post upgrade and just run an analyze?


It takes more time, and you don't have any statistics on a given table until the ANALYZE on that table completes.

How long did "vacuumdb --analyze-only --jobs=$mumble your_db" take?

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

Re: analyze-in-stages post upgrade questions

От
Adrian Klaver
Дата:
On 6/27/25 06:35, Zechman, Derek S wrote:
> We recently performed an upgrade from pg14 (14.18) to pg16 (16.9) and 
> performed the analyze-in-stages post upgrade.  It has been noticed that 
> some plans changed to use hash joins instead of nested loops.  Further 
> investigation found it was because the parent table of partitioned 
> tables did not have stats.  After running an ANALYZE on the parent 
> tables we got similar plan an execution times as before.
> 
> I have two questions
> 
> 1 - Why does analyze-in-stages not analyze the parent tables?
> 
> 2 – What happens if we do not run analyze-in-stages post upgrade and 
> just run an analyze?

It is spelled out in the docs:

https://www.postgresql.org/docs/current/pgupgrade.html

Emphasis added

"Using vacuumdb --all --analyze-only can efficiently generate such 
statistics, and the use of --jobs can speed it up. Option 
--analyze-in-stages can be used to generate **minimal statistics** 
quickly. If vacuum_cost_delay is set to a non-zero value, this can be 
overridden to speed up statistics generation using PGOPTIONS, e.g., 
PGOPTIONS='-c vacuum_cost_delay=0' vacuumdb ...."

and from here:

https://www.postgresql.org/docs/current/app-vacuumdb.html

"--analyze-in-stages

     Only calculate statistics for use by the optimizer (no vacuum), 
like --analyze-only. Run three stages of analyze; the first stage uses 
the lowest possible statistics target (see default_statistics_target) to 
produce usable statistics faster, and subsequent stages build the full 
statistics.

     This option is only useful to analyze a database that currently has 
no statistics or has wholly incorrect ones, such as if it is newly 
populated from a restored dump or by pg_upgrade. Be aware that running 
with this option in a database with existing statistics may cause the 
query optimizer choices to become transiently worse due to the low 
statistics targets of the early stages.
"

> 
> Thanks,
> 
> Sean
> 

-- 
Adrian Klaver
adrian.klaver@aklaver.com




Re: analyze-in-stages post upgrade questions

От
Laurenz Albe
Дата:
On Fri, 2025-06-27 at 08:31 -0700, Adrian Klaver wrote:
> On 6/27/25 06:35, Zechman, Derek S wrote:
> > We recently performed an upgrade from pg14 (14.18) to pg16 (16.9) and
> > performed the analyze-in-stages post upgrade.  It has been noticed that
> > some plans changed to use hash joins instead of nested loops.  Further
> > investigation found it was because the parent table of partitioned
> > tables did not have stats.  After running an ANALYZE on the parent
> > tables we got similar plan an execution times as before.
> >
> > I have two questions
> >
> > 1 - Why does analyze-in-stages not analyze the parent tables?
> >
> > 2 – What happens if we do not run analyze-in-stages post upgrade and
> > just run an analyze?
>
> It is spelled out in the docs:
>
> https://www.postgresql.org/docs/current/pgupgrade.html
>
> Emphasis added
>
> "Using vacuumdb --all --analyze-only can efficiently generate such
> statistics, and the use of --jobs can speed it up. Option
> --analyze-in-stages can be used to generate **minimal statistics**
> quickly. If vacuum_cost_delay is set to a non-zero value, this can be
> overridden to speed up statistics generation using PGOPTIONS, e.g.,
> PGOPTIONS='-c vacuum_cost_delay=0' vacuumdb ...."
>
> and from here:
>
> https://www.postgresql.org/docs/current/app-vacuumdb.html
>
> "--analyze-in-stages
>
>      Only calculate statistics for use by the optimizer (no vacuum),
> like --analyze-only. Run three stages of analyze; the first stage uses
> the lowest possible statistics target (see default_statistics_target) to
> produce usable statistics faster, and subsequent stages build the full
> statistics.
>
>      This option is only useful to analyze a database that currently has
> no statistics or has wholly incorrect ones, such as if it is newly
> populated from a restored dump or by pg_upgrade. Be aware that running
> with this option in a database with existing statistics may cause the
> query optimizer choices to become transiently worse due to the low
> statistics targets of the early stages.

Well, that wouldn't explain why it doesn't work on partitioned tables.
I am under the impression that it should.

Derek, can cou share the pg_stats entries for the partitioned table?

Yours,
Laurenz Albe



RE: analyze-in-stages post upgrade questions

От
"Zechman, Derek S"
Дата:

 

 

 

We recently performed an upgrade from pg14 (14.18) to pg16 (16.9) and performed the analyze-in-stages post upgrade.  It has been noticed that some plans changed to use hash joins instead of nested loops.  Further investigation found it was because the parent table of partitioned tables did not have stats.  After running an ANALYZE on the parent tables we got similar plan an execution times as before.

 

I have two questions

1 - Why does analyze-in-stages not analyze the parent tables? 

2 – What happens if we do not run analyze-in-stages post upgrade and just run an analyze?

 

“It takes more time, and you don't have any statistics on a given table until the ANALYZE on that table completes.

 

How long did "vacuumdb --analyze-only --jobs=$mumble your_db" take?”


Thanks – that makes sense.  I understand what analyze in stages does just wish it would include parent tables.

 

"vacuumdb --all --analyze-only --jobs=7" took about 75 minutes where the analyze-in-stages after upgrade took 115 minutes.  Neither of these activities analyzed the parent tables. 

Reading more and it seems vacuumdb doesn’t analyze parent tables and a manual analyze on those is needed if we want better planner statistics.

RE: analyze-in-stages post upgrade questions

От
"Zechman, Derek S"
Дата:
> > We recently performed an upgrade from pg14 (14.18) to pg16 (16.9) and 
> > performed the analyze-in-stages post upgrade.  It has been noticed that 
> > some plans changed to use hash joins instead of nested loops.  Further 
> > investigation found it was because the parent table of partitioned 
> > tables did not have stats.  After running an ANALYZE on the parent 
> > tables we got similar plan an execution times as before.
> > 
> > I have two questions
> > 
> > 1 - Why does analyze-in-stages not analyze the parent tables?
> > 
> > 2 – What happens if we do not run analyze-in-stages post upgrade and 
> > just run an analyze?
> 
> It is spelled out in the docs:
> 
> https://urldefense.com/v3/__https://www.postgresql.org/docs/current/pgupgrade.html__;!!Lf_9VycLqA!jU5QgA1re0Txg_h2dD1N3XvK_l8hYdyMvpcxrLL5GDyQ5qN4aGQcxmDE8qmaV_p5telTzYmOL6S3fR8eRc0s_8UvnFFR6Ws$
> 
> Emphasis added
> 
> "Using vacuumdb --all --analyze-only can efficiently generate such 
> statistics, and the use of --jobs can speed it up. Option 
> --analyze-in-stages can be used to generate **minimal statistics** 
> quickly. If vacuum_cost_delay is set to a non-zero value, this can be 
> overridden to speed up statistics generation using PGOPTIONS, e.g., 
> PGOPTIONS='-c vacuum_cost_delay=0' vacuumdb ...."
> 
> and from here:
> 
> https://urldefense.com/v3/__https://www.postgresql.org/docs/current/app-vacuumdb.html__;!!Lf_9VycLqA!jU5QgA1re0Txg_h2dD1N3XvK_l8hYdyMvpcxrLL5GDyQ5qN4aGQcxmDE8qmaV_p5telTzYmOL6S3fR8eRc0s_8UvvC7gfd0$
> 
> "--analyze-in-stages
> 
>      Only calculate statistics for use by the optimizer (no vacuum), 
> like --analyze-only. Run three stages of analyze; the first stage uses 
> the lowest possible statistics target (see default_statistics_target) to 
> produce usable statistics faster, and subsequent stages build the full 
> statistics.
> 
>      This option is only useful to analyze a database that currently has 
> no statistics or has wholly incorrect ones, such as if it is newly 
> populated from a restored dump or by pg_upgrade. Be aware that running 
> with this option in a database with existing statistics may cause the 
> query optimizer choices to become transiently worse due to the low 
> statistics targets of the early stages.
 
Well, that wouldn't explain why it doesn't work on partitioned tables.
I am under the impression that it should.
 
Derek, can cou share the pg_stats entries for the partitioned table?
 
Yours,
Laurenz Albe
 
 
There are no entries in pg_stats for the parent table until after I manually run an analyze on it – Example below
 
=> select relname, reltuples, relkind from pg_class where relname ~ '^chapter_[0-9]+$' or relname='chapter' order by 1;
   relname   | reltuples | relkind
-------------+-----------+---------
 chapter     |        -1 | p
 chapter_1   |         4 | r
 chapter_10  |         4 | r
 chapter_100 |        30 | r
 chapter_101 |        15 | r
 chapter_102 |        15 | r
=> select count(*) from pg_stats where tablename='chapter';
 count
-------
     0
(1 row)
 
=> analyze chapter;
ANALYZE
=> select relname, reltuples, relkind from pg_class where relkind ='p' and relname='chapter';
 relname | reltuples | relkind
---------+-----------+---------
 chapter |      7589 | p
(1 row)
 
=> select count(*) from pg_stats where tablename='chapter';
 count
-------
    49
(1 row)
 
toy_epc_stg_1_db=>

Re: analyze-in-stages post upgrade questions

От
Laurenz Albe
Дата:
On Sat, 2025-06-28 at 01:23 +0000, Zechman, Derek S wrote:
> > Well, that wouldn't explain why it doesn't work on partitioned tables.
> > I am under the impression that it should.
> >
> > Derek, can cou share the pg_stats entries for the partitioned table?
>
> There are no entries in pg_stats for the parent table until after I manually run an analyze on it – Example below

You are right.  I looked at the code, and "vacuumdb" does not process
partitiond tables, even if --analyze-only is specified.  I find that
surprising, aince the SQL command ANALYZE (without a table name) will
also collect statistics for partitioned tables.

I think that it would be a good idea to change that behavior.
In particular, it makes a lot of sense to collect statistics for
partitioned tables after a "pg_upgrade".

Attached is a patch to make "vacuumdb --analyze-only" consider
partitioned tables as well.

Yours,
Laurenz Albe

Вложения

RE: analyze-in-stages post upgrade questions

От
"Zechman, Derek S"
Дата:
> > Well, that wouldn't explain why it doesn't work on partitioned tables.
> > I am under the impression that it should.
> >
> > Derek, can cou share the pg_stats entries for the partitioned table?
> 
> There are no entries in pg_stats for the parent table until after I manually run an analyze on it – Example below
 
You are right.  I looked at the code, and "vacuumdb" does not process
partitiond tables, even if --analyze-only is specified.  I find that
surprising, aince the SQL command ANALYZE (without a table name) will
also collect statistics for partitioned tables.
 
I think that it would be a good idea to change that behavior.
In particular, it makes a lot of sense to collect statistics for
partitioned tables after a "pg_upgrade".
 
Attached is a patch to make "vacuumdb --analyze-only" consider
partitioned tables as well.
 
Yours,
Laurenz Albe
 
Is there a plan to include this patch in future releases/patches of postgres? 
 
Thanks,
(Derek) Sean

Re: analyze-in-stages post upgrade questions

От
Laurenz Albe
Дата:
On Wed, 2025-07-09 at 11:30 +0000, Zechman, Derek S wrote:
> > > There are no entries in pg_stats for the parent table until after I manually run an analyze on it – Example below
> >
> > You are right.  I looked at the code, and "vacuumdb" does not process
> > partitiond tables, even if --analyze-only is specified.  I find that
> > surprising, aince the SQL command ANALYZE (without a table name) will
> > also collect statistics for partitioned tables.
> >
> > I think that it would be a good idea to change that behavior.
> > In particular, it makes a lot of sense to collect statistics for
> > partitioned tables after a "pg_upgrade".
> >
> > Attached is a patch to make "vacuumdb --analyze-only" consider
> > partitioned tables as well.
>
> Is there a plan to include this patch in future releases/patches of postgres?

I have added the patch to the current commitfest:
https://commitfest.postgresql.org/patch/5871/

So far, it has not got any peer review.  So yes, I'd like to include
the patch, but I cannot make it happen by myself.
Essentially, patches get applied if
a) they get peer review and
b) a committer applies them

If you want this to happen, the best thing you could do would be
to review the patch and see if it works for you, if it does what you
need and so on:
https://wiki.postgresql.org/wiki/Reviewing_a_Patch

Yours,
Laurenz Albe



Re: analyze-in-stages post upgrade questions

От
Mircea Cadariu
Дата:

Hi Laurenz,

On 09/07/2025 16:26, Laurenz Albe wrote:
I have added the patch to the current commitfest:
https://commitfest.postgresql.org/patch/5871/

Just to let you know that I have added a review through the commitfest app. 

You can see it here: https://www.postgresql.org/message-id/flat/175189219162.2200286.3306593311375985296.pgcf@coridan.postgresql.org

Kind regards,

Mircea Cadariu

Re: analyze-in-stages post upgrade questions

От
Laurenz Albe
Дата:
On Wed, 2025-07-09 at 17:37 +0100, Mircea Cadariu wrote:
> Just to let you know that I have added a review through the commitfest app. 

Thanks!

The patch is still in state "needs review".
If there is something that I should change, you should set it to
"waiting on author".  If you think that the patch is ready to go
as it is, please set it to "ready for committer".

Yours,
Laurenz Albe



Re: analyze-in-stages post upgrade questions

От
Mircea Cadariu
Дата:
Hi Laurenz,


Got it. I have only one suggestion for the patch. Consider adding a 
corresponding test in src/bin/scripts/t/100_vacuumdb.pl.

Proposal (I used this to check the patch):

$node->safe_psql('postgres',
     "CREATE TABLE parent_table (a INT) PARTITION BY LIST (a);\n"
       . "CREATE TABLE child_table PARTITION OF parent_table FOR VALUES 
IN (1);\n"
       . "INSERT INTO parent_table VALUES (1);\n");
$node->issues_sql_like(
     [
         'vacuumdb', '--analyze-only', 'postgres'
     ],
     qr/statement:\s+ANALYZE\s+public\.parent_table/s,
     '--analyze_only updates statistics for partitioned tables');


Kind regards,

Mircea Cadariu