Обсуждение: progress report for ANALYZE

Поиск

Список

Период

Сортировка

progress report for ANALYZE

От

Alvaro Herrera

Дата:

21 июня 2019 г., 18:52:07

Hello

Here's a patch that implements progress reporting for ANALYZE.


Thanks

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

v1-0001-Report-progress-for-ANALYZE.patch

Re: progress report for ANALYZE

От

Julien Rouhaud

Дата:

02 июля 2019 г., 13:22:42

On Fri, Jun 21, 2019 at 8:52 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>
> Here's a patch that implements progress reporting for ANALYZE.

Patch applies, code and doc and compiles cleanly.  I have few comments:

@@ -512,7 +529,18 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
    if (numrows > 0)
    {
        MemoryContext col_context,
-                   old_context;
+                     old_context;
+       const int   index[] = {
+           PROGRESS_ANALYZE_PHASE,
+           PROGRESS_ANALYZE_TOTAL_BLOCKS,
+           PROGRESS_ANALYZE_BLOCKS_DONE
+       };
+       const int64 val[] = {
+           PROGRESS_ANALYZE_PHASE_ANALYSIS,
+           0, 0
+       };
+
+       pgstat_progress_update_multi_param(3, index, val);
[...]
    }
+   pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+                                PROGRESS_ANALYZE_PHASE_COMPLETE);
+
If there wasn't any row but multiple blocks were scanned, the
PROGRESS_ANALYZE_PHASE_COMPLETE will still show the informations about
the blocks that were scanned.  I'm not sure if we should stay
consistent here.

diff --git a/src/backend/utils/adt/pgstatfuncs.c
b/src/backend/utils/adt/pgstatfuncs.c
index 05240bfd14..98b01e54fa 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -469,6 +469,8 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
    /* Translate command name into command type code. */
    if (pg_strcasecmp(cmd, "VACUUM") == 0)
        cmdtype = PROGRESS_COMMAND_VACUUM;
+   if (pg_strcasecmp(cmd, "ANALYZE") == 0)
+       cmdtype = PROGRESS_COMMAND_ANALYZE;
    else if (pg_strcasecmp(cmd, "CLUSTER") == 0)
        cmdtype = PROGRESS_COMMAND_CLUSTER;
    else if (pg_strcasecmp(cmd, "CREATE INDEX") == 0)

it should be an "else if" here.

Everything else LGTM.

Re: progress report for ANALYZE

От

Anthony Nowocien

Дата:

02 июля 2019 г., 14:43:20

Hi,
In monitoring.sgml, "a" is missing in "row for ech backend that is currently running that command[...]".
Anthony

On Tuesday, July 2, 2019, Julien Rouhaud <rjuju123@gmail.com> wrote:
> On Fri, Jun 21, 2019 at 8:52 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>>
>> Here's a patch that implements progress reporting for ANALYZE.
>
> Patch applies, code and doc and compiles cleanly. I have few comments:
>
> @@ -512,7 +529,18 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
> if (numrows > 0)
> {
> MemoryContext col_context,
> - old_context;
> + old_context;
> + const int index[] = {
> + PROGRESS_ANALYZE_PHASE,
> + PROGRESS_ANALYZE_TOTAL_BLOCKS,
> + PROGRESS_ANALYZE_BLOCKS_DONE
> + };
> + const int64 val[] = {
> + PROGRESS_ANALYZE_PHASE_ANALYSIS,
> + 0, 0
> + };
> +
> + pgstat_progress_update_multi_param(3, index, val);
> [...]
> }
> + pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
> + PROGRESS_ANALYZE_PHASE_COMPLETE);
> +
> If there wasn't any row but multiple blocks were scanned, the
> PROGRESS_ANALYZE_PHASE_COMPLETE will still show the informations about
> the blocks that were scanned. I'm not sure if we should stay
> consistent here.
>
> diff --git a/src/backend/utils/adt/pgstatfuncs.c
> b/src/backend/utils/adt/pgstatfuncs.c
> index 05240bfd14..98b01e54fa 100644
> --- a/src/backend/utils/adt/pgstatfuncs.c
> +++ b/src/backend/utils/adt/pgstatfuncs.c
> @@ -469,6 +469,8 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
> /* Translate command name into command type code. */
> if (pg_strcasecmp(cmd, "VACUUM") == 0)
> cmdtype = PROGRESS_COMMAND_VACUUM;
> + if (pg_strcasecmp(cmd, "ANALYZE") == 0)
> + cmdtype = PROGRESS_COMMAND_ANALYZE;
> else if (pg_strcasecmp(cmd, "CLUSTER") == 0)
> cmdtype = PROGRESS_COMMAND_CLUSTER;
> else if (pg_strcasecmp(cmd, "CREATE INDEX") == 0)
>
> it should be an "else if" here.
>
> Everything else LGTM.
>
>
>

Re: progress report for ANALYZE

От

Tatsuro Yamada

Дата:

03 июля 2019 г., 04:05:24

Hi Alvaro!

On 2019/06/22 3:52, Alvaro Herrera wrote:
> Hello
> 
> Here's a patch that implements progress reporting for ANALYZE.

Sorry for the late reply.
My email address was changed to tatsuro.yamada.tf@nttcom.co.jp.

I have a question about your patch.
My ex-colleague Vinayak created same patch in 2017 [1], and he
couldn't get commit because there are some reasons such as the
patch couldn't handle analyzing Foreign table. Therefore, I wonder
whether your patch is able to do that or not.

However, actually, I think it's okay because the feature is useful
for DBAs, even if your patch can't handle Foreign table.

I'll review your patch in this week. :)

[1] ANALYZE command progress checker

https://www.postgresql.org/message-id/flat/968b4eda-2417-8b7b-d468-71643cf088b6%40openscg.com#574488592fcc9708c38fa44b0dae9006

Regards,
Tatsuro Yamada

Re: progress report for ANALYZE

От

Tatsuro Yamada

Дата:

08 июля 2019 г., 09:28:52

Hi Alvaro,

> I'll review your patch in this week. :)

I tested your patch on 6b854896.
Here is a result. See below:

---------------------------------------------------------
[Session #1]
create table hoge as select * from generate_series(1, 1000000) a;
analyze verbose hoge;

[Session #2]
\a
\t
select * from pg_stat_progress_analyze; \watch 0.001

17520|13599|postgres|16387|f|16387|scanning table|4425|14
17520|13599|postgres|16387|f|16387|scanning table|4425|64
17520|13599|postgres|16387|f|16387|scanning table|4425|111
...
17520|13599|postgres|16387|f|16387|scanning table|4425|4425
17520|13599|postgres|16387|f|16387|scanning table|4425|4425
17520|13599|postgres|16387|f|16387|scanning table|4425|4425
17520|13599|postgres|16387|f|16387|analyzing sample|0|0
17520|13599|postgres|16387|f|16387||0|0  <-- Is it Okay??
---------------------------------------------------------

I have a question of the last line of the result.
I'm not sure it is able or not, but I guess it would be better
to keep the phase name (analyzing sample) on the view until the
end of this command. :)

Regards,
Tatsuro Yamada

Re: progress report for ANALYZE

От

Robert Haas

Дата:

08 июля 2019 г., 18:10:50

On Mon, Jul 8, 2019 at 5:29 AM Tatsuro Yamada
<tatsuro.yamada.tf@nttcom.co.jp> wrote:
> 17520|13599|postgres|16387|f|16387|scanning table|4425|4425
> 17520|13599|postgres|16387|f|16387|analyzing sample|0|0
> 17520|13599|postgres|16387|f|16387||0|0  <-- Is it Okay??

Why do we zero out the block numbers when we switch phases?  The
CREATE INDEX progress reporting patch does that kind of thing too, and
it seems like poor behavior to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: progress report for ANALYZE

От

Alvaro Herrera

Дата:

08 июля 2019 г., 18:18:45

On 2019-Jul-08, Robert Haas wrote:

> On Mon, Jul 8, 2019 at 5:29 AM Tatsuro Yamada
> <tatsuro.yamada.tf@nttcom.co.jp> wrote:
> > 17520|13599|postgres|16387|f|16387|scanning table|4425|4425
> > 17520|13599|postgres|16387|f|16387|analyzing sample|0|0
> > 17520|13599|postgres|16387|f|16387||0|0  <-- Is it Okay??
> 
> Why do we zero out the block numbers when we switch phases?  The
> CREATE INDEX progress reporting patch does that kind of thing too, and
> it seems like poor behavior to me.

Yeah, I got the impression that that was determined to be the desirable
behavior, so I made it do that, but I'm not really happy about it
either.  We're not too late to change the CREATE INDEX behavior, but
let's discuss what is it that we want.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: progress report for ANALYZE

От

Robert Haas

Дата:

08 июля 2019 г., 18:44:09

On Mon, Jul 8, 2019 at 2:18 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> Yeah, I got the impression that that was determined to be the desirable
> behavior, so I made it do that, but I'm not really happy about it
> either.  We're not too late to change the CREATE INDEX behavior, but
> let's discuss what is it that we want.

I don't think I intended to make any such determination -- which
commit do you think established this as the canonical behavior?

I propose that once a field is set, we should leave it set until the end.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: progress report for ANALYZE

От

Julien Rouhaud

Дата:

08 июля 2019 г., 18:47:01

On Mon, Jul 8, 2019 at 8:44 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Mon, Jul 8, 2019 at 2:18 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> > Yeah, I got the impression that that was determined to be the desirable
> > behavior, so I made it do that, but I'm not really happy about it
> > either.  We're not too late to change the CREATE INDEX behavior, but
> > let's discuss what is it that we want.
>
> I don't think I intended to make any such determination -- which
> commit do you think established this as the canonical behavior?
>
> I propose that once a field is set, we should leave it set until the end.

+1

Note that this patch is already behaving like that if the table only
contains dead rows.

Re: progress report for ANALYZE

От

Tatsuro Yamada

Дата:

09 июля 2019 г., 08:38:44

Hi Alvaro, Anthony, Julien and Robert,

On 2019/07/09 3:47, Julien Rouhaud wrote:
> On Mon, Jul 8, 2019 at 8:44 PM Robert Haas <robertmhaas@gmail.com> wrote:
>>
>> On Mon, Jul 8, 2019 at 2:18 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>>> Yeah, I got the impression that that was determined to be the desirable
>>> behavior, so I made it do that, but I'm not really happy about it
>>> either.  We're not too late to change the CREATE INDEX behavior, but
>>> let's discuss what is it that we want.
>>
>> I don't think I intended to make any such determination -- which
>> commit do you think established this as the canonical behavior?
>>
>> I propose that once a field is set, we should leave it set until the end.
> 
> +1
> 
> Note that this patch is already behaving like that if the table only
> contains dead rows.


I fixed the patch including:

   - Replace "if" to "else if". (Suggested by Julien)
   - Fix typo s/ech/each/. (Suggested by Anthony)
   - Add Phase "analyzing complete" in the pgstat view. (Suggested by Julien, Robert and me)
     It was overlooked to add it in system_views.sql.

I share my re-test result, see below:

---------------------------------------------------------
[Session #1]
create table hoge as select * from generate_series(1, 1000000) a;
analyze verbose hoge;

[Session #2]
\a \t
select * from pg_stat_progress_analyze; \watch 0.001

3785|13599|postgres|16384|f|16384|scanning table|4425|6
3785|13599|postgres|16384|f|16384|scanning table|4425|31
3785|13599|postgres|16384|f|16384|scanning table|4425|70
3785|13599|postgres|16384|f|16384|scanning table|4425|109
...
3785|13599|postgres|16384|f|16384|scanning table|4425|4425
3785|13599|postgres|16384|f|16384|scanning table|4425|4425
3785|13599|postgres|16384|f|16384|scanning table|4425|4425
3785|13599|postgres|16384|f|16384|analyzing sample|0|0
3785|13599|postgres|16384|f|16384|analyzing complete|0|0 <-- Added and fixed. :)
---------------------------------------------------------

Thanks,
Tatsuro Yamada

Вложения

v2-0001-Report-progress-for-ANALYZE.patch

Re: progress report for ANALYZE

От

Alvaro Herrera

Дата:

09 июля 2019 г., 22:12:17

On 2019-Jul-08, Robert Haas wrote:

> On Mon, Jul 8, 2019 at 2:18 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> > Yeah, I got the impression that that was determined to be the desirable
> > behavior, so I made it do that, but I'm not really happy about it
> > either.  We're not too late to change the CREATE INDEX behavior, but
> > let's discuss what is it that we want.
> 
> I don't think I intended to make any such determination -- which
> commit do you think established this as the canonical behavior?

No commit, just discussion in the CREATE INDEX thread.

> I propose that once a field is set, we should leave it set until the end.

Hmm, ok.  In CREATE INDEX, we use the block counters multiple times. We
can leave them set until the next time we need them, I suppose.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: progress report for ANALYZE

От

Robert Haas

Дата:

10 июля 2019 г., 13:23:19

On Tue, Jul 9, 2019 at 6:12 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> Hmm, ok.  In CREATE INDEX, we use the block counters multiple times.

Why do we do that?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: progress report for ANALYZE

От

Alvaro Herrera

Дата:

10 июля 2019 г., 13:26:28

On 2019-Jul-10, Robert Haas wrote:

> On Tue, Jul 9, 2019 at 6:12 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> > Hmm, ok.  In CREATE INDEX, we use the block counters multiple times.
> 
> Why do we do that?

Because we scan the table first, then the index, then the table again
(last two for the validation phase of CIC).  We count "block numbers"
separately for each of those, and keep those counters in the same pair
of columns.  I think we also do that for tuple counters in one case.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: progress report for ANALYZE

От

Robert Haas

Дата:

11 июля 2019 г., 02:23:55

On Wed, Jul 10, 2019 at 9:26 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> On 2019-Jul-10, Robert Haas wrote:
> > On Tue, Jul 9, 2019 at 6:12 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> > > Hmm, ok.  In CREATE INDEX, we use the block counters multiple times.
> >
> > Why do we do that?
>
> Because we scan the table first, then the index, then the table again
> (last two for the validation phase of CIC).  We count "block numbers"
> separately for each of those, and keep those counters in the same pair
> of columns.  I think we also do that for tuple counters in one case.

Hmm.  I think I would have been inclined to use different counter
numbers for table blocks and index blocks, but perhaps we don't have
room. Anyway, leaving them set until we need them again seems like the
best we can do as things stand.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: progress report for ANALYZE

От

Kyotaro Horiguchi

Дата:

11 июля 2019 г., 10:56:10

Hello.

At Tue, 9 Jul 2019 17:38:44 +0900, Tatsuro Yamada <tatsuro.yamada.tf@nttcom.co.jp> wrote in
<244cb241-168b-d6a9-c45f-a80c34cdc6ad@nttcom.co.jp>
> Hi Alvaro, Anthony, Julien and Robert,
> 
> On 2019/07/09 3:47, Julien Rouhaud wrote:
> > On Mon, Jul 8, 2019 at 8:44 PM Robert Haas <robertmhaas@gmail.com>
> > wrote:
> >>
> >> On Mon, Jul 8, 2019 at 2:18 PM Alvaro Herrera
> >> <alvherre@2ndquadrant.com> wrote:
> >>> Yeah, I got the impression that that was determined to be the
> >>> desirable
> >>> behavior, so I made it do that, but I'm not really happy about it
> >>> either.  We're not too late to change the CREATE INDEX behavior, but
> >>> let's discuss what is it that we want.
> >>
> >> I don't think I intended to make any such determination -- which
> >> commit do you think established this as the canonical behavior?
> >>
> >> I propose that once a field is set, we should leave it set until the
> >> end.
> > +1
> > Note that this patch is already behaving like that if the table only
> > contains dead rows.

+1 from me.

> I fixed the patch including:
> 
>   - Replace "if" to "else if". (Suggested by Julien)
>   - Fix typo s/ech/each/. (Suggested by Anthony)
>   - Add Phase "analyzing complete" in the pgstat view. (Suggested by
>   - Julien, Robert and me)
>     It was overlooked to add it in system_views.sql.
> 
> I share my re-test result, see below:
> 
> ---------------------------------------------------------
> [Session #1]
> create table hoge as select * from generate_series(1, 1000000) a;
> analyze verbose hoge;
> 
> [Session #2]
> \a \t
> select * from pg_stat_progress_analyze; \watch 0.001
> 
> 3785|13599|postgres|16384|f|16384|scanning table|4425|6
> 3785|13599|postgres|16384|f|16384|scanning table|4425|31
> 3785|13599|postgres|16384|f|16384|scanning table|4425|70
> 3785|13599|postgres|16384|f|16384|scanning table|4425|109
> ...
> 3785|13599|postgres|16384|f|16384|scanning table|4425|4425
> 3785|13599|postgres|16384|f|16384|scanning table|4425|4425
> 3785|13599|postgres|16384|f|16384|scanning table|4425|4425
> 3785|13599|postgres|16384|f|16384|analyzing sample|0|0
> 3785|13599|postgres|16384|f|16384|analyzing complete|0|0 <-- Added and
> fixed. :)
> ---------------------------------------------------------

I have some comments.

+        const int   index[] = {
+            PROGRESS_ANALYZE_PHASE,
+            PROGRESS_ANALYZE_TOTAL_BLOCKS,
+            PROGRESS_ANALYZE_BLOCKS_DONE
+        };
+        const int64 val[] = {
+            PROGRESS_ANALYZE_PHASE_ANALYSIS,
+            0, 0

Do you oppose to leaving the total/done blocks alone here:p?

+  BlockNumber  nblocks;
+  double    blksdone = 0;

Why is it a double even though blksdone is of the same domain
with nblocks? And finally:

+    pgstat_progress_update_param(PROGRESS_ANALYZE_BLOCKS_DONE,
+                   ++blksdone);

It is converted into int64.

+                      WHEN 2 THEN 'analyzing sample'
+                      WHEN 3 THEN 'analyzing sample (extended stats)'

I think we should avoid parenthesized phrases as far as
not-necessary. That makes the column unnecessarily long. The
phase is internally called as "compute stats" so *I* would prefer
something like the followings:

+                      WHEN 2 THEN 'computing stats'
+                      WHEN 3 THEN 'computing extended stats'

+                      WHEN 4 THEN 'analyzing complete'

And if you are intending by this that (as mentioned in the
documentation) "working to complete this analyze", this would be:

+                      WHEN 4 THEN 'completing analyze'
+                      WHEN 4 THEN 'finalizing analyze'

+     <entry>Process ID of backend.</entry>

of "the" backend. ? And period is not attached on all
descriptions consisting of a single sentence.

+      <entry>OID of the database to which this backend is connected.</entry>
+      <entry>Name of the database to which this backend is connected.</entry>

"database to which .. is connected" is phrased as "database this
backend is connected to" in pg_stat_acttivity. (Just for consistency)

+      <entry>Whether the current scan includes legacy inheritance children.</entry>

This apparently excludes partition tables but actually it is
included.

  "Whether scanning through child tables" ?

I'm not sure "child tables" is established as the word to mean
both "partition tables and inheritance children"..

+       The table being scanned (differs from <literal>relid</literal>
+       only when processing partitions or inheritance children).

Is <literal> needed? And I think the parentheses are not needed.

  OID of the table currently being scanned. Can differ from relid
  when analyzing tables that have child tables.

+       Total number of heap blocks to scan in the current table.

   Number of heap blocks on scanning_table to scan?

It might be better that this description describes that this
and the next column is meaningful only while the phase
"scanning table".

+       The command is currently scanning the table.

"sample(s)" comes somewhat abruptly in the next item. Something
like "The command is currently scanning the table
<structfield>scanning_table</structfield> to obtain samples"
might be better.

+       <command>ANALYZE</command> is currently extracting statistical data
+       from the sample obtained.

Something like "The command is computing stats from the samples
obtained in the previous phase" might be better.

regards.

- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: progress report for ANALYZE

От

Tatsuro Yamada

Дата:

22 июля 2019 г., 06:02:16

Hi Horiguchi-san!


On 2019/07/11 19:56, Kyotaro Horiguchi wrote:
> Hello.
> 
> At Tue, 9 Jul 2019 17:38:44 +0900, Tatsuro Yamada <tatsuro.yamada.tf@nttcom.co.jp> wrote in
<244cb241-168b-d6a9-c45f-a80c34cdc6ad@nttcom.co.jp>
>> Hi Alvaro, Anthony, Julien and Robert,
>>
>> On 2019/07/09 3:47, Julien Rouhaud wrote:
>>> On Mon, Jul 8, 2019 at 8:44 PM Robert Haas <robertmhaas@gmail.com>
>>> wrote:
>>>>
>>>> On Mon, Jul 8, 2019 at 2:18 PM Alvaro Herrera
>>>> <alvherre@2ndquadrant.com> wrote:
>>>>> Yeah, I got the impression that that was determined to be the
>>>>> desirable
>>>>> behavior, so I made it do that, but I'm not really happy about it
>>>>> either.  We're not too late to change the CREATE INDEX behavior, but
>>>>> let's discuss what is it that we want.
>>>>
>>>> I don't think I intended to make any such determination -- which
>>>> commit do you think established this as the canonical behavior?
>>>>
>>>> I propose that once a field is set, we should leave it set until the
>>>> end.
>>> +1
>>> Note that this patch is already behaving like that if the table only
>>> contains dead rows.
> 
> +1 from me.
> 
>> 3785|13599|postgres|16384|f|16384|analyzing complete|0|0 <-- Added and
>> fixed. :)
>> ---------------------------------------------------------
>
> I have some comments.
> 
> +        const int   index[] = {
> +            PROGRESS_ANALYZE_PHASE,
> +            PROGRESS_ANALYZE_TOTAL_BLOCKS,
> +            PROGRESS_ANALYZE_BLOCKS_DONE
> +        };
> +        const int64 val[] = {
> +            PROGRESS_ANALYZE_PHASE_ANALYSIS,
> +            0, 0
> 
> Do you oppose to leaving the total/done blocks alone here:p?


Thanks for your comments!
I created a new patch based on your comments because Alvaro allowed me
to work on revising the patch. :)


Ah, I revised it to remove "0, 0".



> +  BlockNumber  nblocks;
> +  double    blksdone = 0;
> 
> Why is it a double even though blksdone is of the same domain
> with nblocks? And finally:
> 
> +    pgstat_progress_update_param(PROGRESS_ANALYZE_BLOCKS_DONE,
> +                   ++blksdone);
> 
> It is converted into int64.


Fixed.
But is it suitable to use BlockNumber instead int64?


  
> +                      WHEN 2 THEN 'analyzing sample'
> +                      WHEN 3 THEN 'analyzing sample (extended stats)'
> 
> I think we should avoid parenthesized phrases as far as
> not-necessary. That makes the column unnecessarily long. The
> phase is internally called as "compute stats" so *I* would prefer
> something like the followings:
> 
> +                      WHEN 2 THEN 'computing stats'
> +                      WHEN 3 THEN 'computing extended stats'
> 
> 
> 
> +                      WHEN 4 THEN 'analyzing complete'
> 
> And if you are intending by this that (as mentioned in the
> documentation) "working to complete this analyze", this would be:
> 
> +                      WHEN 4 THEN 'completing analyze'
> +                      WHEN 4 THEN 'finalizing analyze'


I have no strong opinion, so I changed the phase-names based on
your suggestions like following:

   WHEN 2 THEN 'computing stats'
   WHEN 3 THEN 'computing extended stats'
   WHEN 4 THEN 'finalizing analyze'

However, I'd like to get any comments from hackers to get a consensus
about the names.



> +     <entry>Process ID of backend.</entry>
> 
> of "the" backend. ? And period is not attached on all
> descriptions consisting of a single sentence.
>
> +      <entry>OID of the database to which this backend is connected.</entry>
> +      <entry>Name of the database to which this backend is connected.</entry>
> 
> "database to which .. is connected" is phrased as "database this
> backend is connected to" in pg_stat_acttivity. (Just for consistency)


I checked the sentences on other views of progress monitor (VACUUM,
CREATE INDEX and CLUSTER), and they are same sentence. Therefore,
I'd like to keep it as is. :)



> +      <entry>Whether the current scan includes legacy inheritance children.</entry>
> 
> This apparently excludes partition tables but actually it is
> included.
>
>    "Whether scanning through child tables" ?
> 
> I'm not sure "child tables" is established as the word to mean
> both "partition tables and inheritance children"..


Hmm... I'm also not sure but I fixed as you suggested.



> +       The table being scanned (differs from <literal>relid</literal>
> +       only when processing partitions or inheritance children).
> 
> Is <literal> needed? And I think the parentheses are not needed.
> 
>    OID of the table currently being scanned. Can differ from relid
>    when analyzing tables that have child tables.


How about:
   OID of the table currently being scanned.
   It might be different from relid when analyzing tables that have child tables.



> +       Total number of heap blocks to scan in the current table.
> 
>     Number of heap blocks on scanning_table to scan?
> 
> It might be better that this description describes that this
> and the next column is meaningful only while the phase
> "scanning table".


How about:
   Total number of heap blocks in the scanning_table.




> +       The command is currently scanning the table.
> 
> "sample(s)" comes somewhat abruptly in the next item. Something
> like "The command is currently scanning the table
> <structfield>scanning_table</structfield> to obtain samples"
> might be better.


Fixed.

  
> +       <command>ANALYZE</command> is currently extracting statistical data
> +       from the sample obtained.
> 
> Something like "The command is computing stats from the samples
> obtained in the previous phase" might be better.


Fixed.


Please find attached patch. :)

Here is a test result of the patch.
==========================================================
# select * from pg_stat_progress_analyze ; \watch 0.0001;

9067|13599|postgres|16387|f|16387|scanning table|443|14
9067|13599|postgres|16387|f|16387|scanning table|443|44
9067|13599|postgres|16387|f|16387|scanning table|443|76
9067|13599|postgres|16387|f|16387|scanning table|443|100
...
9067|13599|postgres|16387|f|16387|scanning table|443|443
9067|13599|postgres|16387|f|16387|scanning table|443|443
9067|13599|postgres|16387|f|16387|scanning table|443|443
9067|13599|postgres|16387|f|16387|computing stats|443|443
9067|13599|postgres|16387|f|16387|computing stats|443|443
9067|13599|postgres|16387|f|16387|computing stats|443|443
9067|13599|postgres|16387|f|16387|finalizing analyze|443|443
==========================================================


Thanks,
Tatsuro Yamada

Вложения

v3-0001-Report-progress-for-ANALYZE.patch

Re: progress report for ANALYZE

От

Kyotaro Horiguchi

Дата:

22 июля 2019 г., 08:30:39

Hello.

# It's very good timing, as you came in while I have a time after
# finishing a quite nerve-wrackig task..

At Mon, 22 Jul 2019 15:02:16 +0900, Tatsuro Yamada <tatsuro.yamada.tf@nttcom.co.jp> wrote in
<0876b4fe-26fb-ca32-f179-c696fa3ddfec@nttcom.co.jp>
> >> 3785|13599|postgres|16384|f|16384|analyzing complete|0|0 <-- Added and
> >> fixed. :)
> >> ---------------------------------------------------------
> >
> > I have some comments.
> > +        const int   index[] = {
> > +            PROGRESS_ANALYZE_PHASE,
> > +            PROGRESS_ANALYZE_TOTAL_BLOCKS,
> > +            PROGRESS_ANALYZE_BLOCKS_DONE
> > +        };
> > +        const int64 val[] = {
> > +            PROGRESS_ANALYZE_PHASE_ANALYSIS,
> > +            0, 0
> > Do you oppose to leaving the total/done blocks alone here:p?
> 
> 
> Thanks for your comments!
> I created a new patch based on your comments because Alvaro allowed me
> to work on revising the patch. :)
> 
> 
> Ah, I revised it to remove "0, 0".

Thanks. For a second I thought that
PROGRESS_ANALYZE_PHASE_ANALYSIS was lost but it is living being
renamed to PROGRESS_ANALYZE_PHASE_ANALYSIS.

> > +  BlockNumber  nblocks;
> > +  double    blksdone = 0;
> > Why is it a double even though blksdone is of the same domain
> > with nblocks? And finally:
> > +    pgstat_progress_update_param(PROGRESS_ANALYZE_BLOCKS_DONE,
> > +                   ++blksdone);
> > It is converted into int64.
> 
> 
> Fixed.
> But is it suitable to use BlockNumber instead int64?

Yeah, I didn't meant that we should use int64 there. Sorry for
the confusing comment. I meant that blksdone should be of
BlockNumber.

> > +                      WHEN 2 THEN 'analyzing sample'
> > +                      WHEN 3 THEN 'analyzing sample (extended stats)'
> > I think we should avoid parenthesized phrases as far as
> > not-necessary. That makes the column unnecessarily long. The
> > phase is internally called as "compute stats" so *I* would prefer
> > something like the followings:
> > +                      WHEN 2 THEN 'computing stats'
> > +                      WHEN 3 THEN 'computing extended stats'
> > +                      WHEN 4 THEN 'analyzing complete'
> > And if you are intending by this that (as mentioned in the
> > documentation) "working to complete this analyze", this would be:
> > +                      WHEN 4 THEN 'completing analyze'
> > +                      WHEN 4 THEN 'finalizing analyze'
> 
> 
> I have no strong opinion, so I changed the phase-names based on
> your suggestions like following:
> 
>   WHEN 2 THEN 'computing stats'
>   WHEN 3 THEN 'computing extended stats'
>   WHEN 4 THEN 'finalizing analyze'
> 
> However, I'd like to get any comments from hackers to get a consensus
> about the names.

Agreed. Especially such word choosing is not suitable for me..

> > +     <entry>Process ID of backend.</entry>
> > of "the" backend. ? And period is not attached on all
> > descriptions consisting of a single sentence.
> >
> > + <entry>OID of the database to which this backend is
> > connected.</entry>
> > + <entry>Name of the database to which this backend is
> > connected.</entry>
> > "database to which .. is connected" is phrased as "database this
> > backend is connected to" in pg_stat_acttivity. (Just for consistency)
> 
> 
> I checked the sentences on other views of progress monitor (VACUUM,
> CREATE INDEX and CLUSTER), and they are same sentence. Therefore,
> I'd like to keep it as is. :)

Oh, I see from where the wordings came. But no periods seen after
sentense when it is the only one in a description in other system
views tables. I think the progress views tables should be
corrected following convention.

> > + <entry>Whether the current scan includes legacy inheritance
> > children.</entry>
> > This apparently excludes partition tables but actually it is
> > included.
> >
> >    "Whether scanning through child tables" ?
> > I'm not sure "child tables" is established as the word to mean
> > both "partition tables and inheritance children"..
> 
> 
> Hmm... I'm also not sure but I fixed as you suggested.

Yeah, I also am not sure the suggestion is good enough as is..


> > +       The table being scanned (differs from <literal>relid</literal>
> > +       only when processing partitions or inheritance children).
> > Is <literal> needed? And I think the parentheses are not needed.
> >    OID of the table currently being scanned. Can differ from relid
> >    when analyzing tables that have child tables.
> 
> 
> How about:
>   OID of the table currently being scanned.
>   It might be different from relid when analyzing tables that have child
>   tables.
> 
> 
> 
> > +       Total number of heap blocks to scan in the current table.
> >     Number of heap blocks on scanning_table to scan?
> > It might be better that this description describes that this
> > and the next column is meaningful only while the phase
> > "scanning table".
> 
> 
> How about:
>   Total number of heap blocks in the scanning_table.

(For me, ) that seems like it shows blocks including all
descendents for inheritance parent. But I'm not sure..a

> > +       The command is currently scanning the table.
> > "sample(s)" comes somewhat abruptly in the next item. Something
> > like "The command is currently scanning the table
> > <structfield>scanning_table</structfield> to obtain samples"
> > might be better.
> 
> 
> Fixed.
> 
>  
> > + <command>ANALYZE</command> is currently extracting statistical data
> > +       from the sample obtained.
> > Something like "The command is computing stats from the samples
> > obtained in the previous phase" might be better.
> 
> 
> Fixed.
> 
> 
> Please find attached patch. :)
> 
> Here is a test result of the patch.
> ==========================================================
> # select * from pg_stat_progress_analyze ; \watch 0.0001;
> 
> 9067|13599|postgres|16387|f|16387|scanning table|443|14
> 9067|13599|postgres|16387|f|16387|scanning table|443|44
> 9067|13599|postgres|16387|f|16387|scanning table|443|76
> 9067|13599|postgres|16387|f|16387|scanning table|443|100
> ...
> 9067|13599|postgres|16387|f|16387|scanning table|443|443
> 9067|13599|postgres|16387|f|16387|scanning table|443|443
> 9067|13599|postgres|16387|f|16387|scanning table|443|443
> 9067|13599|postgres|16387|f|16387|computing stats|443|443
> 9067|13599|postgres|16387|f|16387|computing stats|443|443
> 9067|13599|postgres|16387|f|16387|computing stats|443|443
> 9067|13599|postgres|16387|f|16387|finalizing analyze|443|443
> ==========================================================

Looks fine!

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: progress report for ANALYZE

От

Tatsuro Yamada

Дата:

23 июля 2019 г., 04:51:04

Hi Horiguchi-san, Alvaro, Anthony, Julien and Robert,


On 2019/07/22 17:30, Kyotaro Horiguchi wrote:
> Hello.
> 
> # It's very good timing, as you came in while I have a time after
> # finishing a quite nerve-wrackig task..
> 
> At Mon, 22 Jul 2019 15:02:16 +0900, Tatsuro Yamada <tatsuro.yamada.tf@nttcom.co.jp> wrote in
<0876b4fe-26fb-ca32-f179-c696fa3ddfec@nttcom.co.jp>
>>>> 3785|13599|postgres|16384|f|16384|analyzing complete|0|0 <-- Added and
>>>> fixed. :)
>>>> ---------------------------------------------------------
>>>
>>> I have some comments.
>>> +        const int   index[] = {
>>> +            PROGRESS_ANALYZE_PHASE,
>>> +            PROGRESS_ANALYZE_TOTAL_BLOCKS,
>>> +            PROGRESS_ANALYZE_BLOCKS_DONE
>>> +        };
>>> +        const int64 val[] = {
>>> +            PROGRESS_ANALYZE_PHASE_ANALYSIS,
>>> +            0, 0
>>> Do you oppose to leaving the total/done blocks alone here:p?
>>
>>
>> Thanks for your comments!
>> I created a new patch based on your comments because Alvaro allowed me
>> to work on revising the patch. :)
>>
>>
>> Ah, I revised it to remove "0, 0".
> 
> Thanks. For a second I thought that
> PROGRESS_ANALYZE_PHASE_ANALYSIS was lost but it is living being
> renamed to PROGRESS_ANALYZE_PHASE_ANALYSIS.


"PROGRESS_ANALYZE_PHASE_ANALYSIS" was replaced with
"PROGRESS_ANALYZE_PHASE_COMPUTING" because I changed
the phase-name on v3.patch like this:

./src/include/commands/progress.h

+/* Phases of analyze (as advertised via PROGRESS_ANALYZE_PHASE) */
+#define PROGRESS_ANALYZE_PHASE_SCAN_TABLE              1
+#define PROGRESS_ANALYZE_PHASE_COMPUTING                       2
+#define PROGRESS_ANALYZE_PHASE_COMPUTING_EXTENDED 3
+#define PROGRESS_ANALYZE_PHASE_FINALIZE                        4

Is it Okay?

  
>>> +  BlockNumber  nblocks;
>>> +  double    blksdone = 0;
>>> Why is it a double even though blksdone is of the same domain
>>> with nblocks? And finally:
>>> +    pgstat_progress_update_param(PROGRESS_ANALYZE_BLOCKS_DONE,
>>> +                   ++blksdone);
>>> It is converted into int64.
>>
>>
>> Fixed.
>> But is it suitable to use BlockNumber instead int64?
> 
> Yeah, I didn't meant that we should use int64 there. Sorry for
> the confusing comment. I meant that blksdone should be of
> BlockNumber.


Fixed. Thanks for the clarification. :)
Attached v4 patch file only includes this fix.
  

>>> +                      WHEN 2 THEN 'analyzing sample'
>>> +                      WHEN 3 THEN 'analyzing sample (extended stats)'
>>> I think we should avoid parenthesized phrases as far as
>>> not-necessary. That makes the column unnecessarily long. The
>>> phase is internally called as "compute stats" so *I* would prefer
>>> something like the followings:
>>> +                      WHEN 2 THEN 'computing stats'
>>> +                      WHEN 3 THEN 'computing extended stats'
>>> +                      WHEN 4 THEN 'analyzing complete'
>>> And if you are intending by this that (as mentioned in the
>>> documentation) "working to complete this analyze", this would be:
>>> +                      WHEN 4 THEN 'completing analyze'
>>> +                      WHEN 4 THEN 'finalizing analyze'
>>
>>
>> I have no strong opinion, so I changed the phase-names based on
>> your suggestions like following:
>>
>>    WHEN 2 THEN 'computing stats'
>>    WHEN 3 THEN 'computing extended stats'
>>    WHEN 4 THEN 'finalizing analyze'
>>
>> However, I'd like to get any comments from hackers to get a consensus
>> about the names.
> 
> Agreed. Especially such word choosing is not suitable for me..


To Alvaro, Julien, Anthony and Robert,
Do you have any ideas? :)


  
>>> +     <entry>Process ID of backend.</entry>
>>> of "the" backend. ? And period is not attached on all
>>> descriptions consisting of a single sentence.
>>>
>>> + <entry>OID of the database to which this backend is
>>> connected.</entry>
>>> + <entry>Name of the database to which this backend is
>>> connected.</entry>
>>> "database to which .. is connected" is phrased as "database this
>>> backend is connected to" in pg_stat_acttivity. (Just for consistency)
>>
>>
>> I checked the sentences on other views of progress monitor (VACUUM,
>> CREATE INDEX and CLUSTER), and they are same sentence. Therefore,
>> I'd like to keep it as is. :)
> 
> Oh, I see from where the wordings came. But no periods seen after
> sentense when it is the only one in a description in other system
> views tables. I think the progress views tables should be
> corrected following convention.


Sounds reasonable. However, I'd like to create another patch after
this feature was committed since that document fix influence other
progress monitor's description on the document.

  
>>> + <entry>Whether the current scan includes legacy inheritance
>>> children.</entry>
>>> This apparently excludes partition tables but actually it is
>>> included.
>>>
>>>     "Whether scanning through child tables" ?
>>> I'm not sure "child tables" is established as the word to mean
>>> both "partition tables and inheritance children"..
>>
>>
>> Hmm... I'm also not sure but I fixed as you suggested.
> 
> Yeah, I also am not sure the suggestion is good enough as is..
>
>>> +       Total number of heap blocks to scan in the current table.
>>>      Number of heap blocks on scanning_table to scan?
>>> It might be better that this description describes that this
>>> and the next column is meaningful only while the phase
>>> "scanning table".
>>
>>
>> How about:
>>    Total number of heap blocks in the scanning_table.
> 
> (For me, ) that seems like it shows blocks including all
> descendents for inheritance parent. But I'm not sure..a


In the case of scanning_table is parent table, it doesn't
show the number. However, child tables shows the number.
I tested it using Declarative partitioning table, See the bottom
of this email. :)


>> Please find attached patch. :)
>>
>> Here is a test result of the patch.
>> ==========================================================
>> # select * from pg_stat_progress_analyze ; \watch 0.0001;
>>
>> 9067|13599|postgres|16387|f|16387|scanning table|443|14
>> ...
>> 9067|13599|postgres|16387|f|16387|scanning table|443|443
>> 9067|13599|postgres|16387|f|16387|computing stats|443|443
>> 9067|13599|postgres|16387|f|16387|computing stats|443|443
>> 9067|13599|postgres|16387|f|16387|computing stats|443|443
>> 9067|13599|postgres|16387|f|16387|finalizing analyze|443|443
>> ==========================================================
> 
> Looks fine!


I shared a test result using Declarative partitioning table.

==========================================================
## Create partitioning table
create table hoge as select a from generate_series(0, 40000) a;

create table hoge2(
   a integer
) partition by range(a);

create table hoge2_10000 partition of hoge2
for values from (0) to (10000);

create table hoge2_20000 partition of hoge2
for values from (10000) to (20000);

create table hoge2_30000 partition of hoge2
for values from (20000) to (30000);

create table hoge2_default partition of hoge2 default;


## Test
select oid,relname,relpages,reltuples from pg_class where relname like 'hoge%';

   oid  |    relname    | relpages | reltuples
-------+---------------+----------+-----------
  16538 | hoge          |      177 |     40001
  16541 | hoge2         |        0 |         0
  16544 | hoge2_10000   |       45 |     10000
  16547 | hoge2_20000   |       45 |     10000
  16550 | hoge2_30000   |       45 |     10000
  16553 | hoge2_default |       45 |     10001
(6 rows)

select * from pg_stat_progress_analyze ; \watch 0.00001;

27579|13599|postgres|16541|t|16544|scanning table|45|17
27579|13599|postgres|16541|t|16544|scanning table|45|38
27579|13599|postgres|16541|t|16544|scanning table|45|45
27579|13599|postgres|16541|t|16544|scanning table|45|45
27579|13599|postgres|16541|t|16544|scanning table|45|45

27579|13599|postgres|16541|t|16547|scanning table|45|17
27579|13599|postgres|16541|t|16547|scanning table|45|37
27579|13599|postgres|16541|t|16547|scanning table|45|45
27579|13599|postgres|16541|t|16547|scanning table|45|45
27579|13599|postgres|16541|t|16547|scanning table|45|45

27579|13599|postgres|16541|t|16550|scanning table|45|9
27579|13599|postgres|16541|t|16550|scanning table|45|30
27579|13599|postgres|16541|t|16550|scanning table|45|45
27579|13599|postgres|16541|t|16550|scanning table|45|45
27579|13599|postgres|16541|t|16550|scanning table|45|45

27579|13599|postgres|16541|t|16553|scanning table|45|5
27579|13599|postgres|16541|t|16553|scanning table|45|26
27579|13599|postgres|16541|t|16553|scanning table|45|42
27579|13599|postgres|16541|t|16553|scanning table|45|45
27579|13599|postgres|16541|t|16553|scanning table|45|45
27579|13599|postgres|16541|t|16553|computing stats|45|45
27579|13599|postgres|16541|t|16553|computing stats|45|45
27579|13599|postgres|16541|t|16553|computing stats|45|45
27579|13599|postgres|16541|t|16553|finalizing analyze|45|45

27579|13599|postgres|16544|f|16544|scanning table|45|1
27579|13599|postgres|16544|f|16544|scanning table|45|30
27579|13599|postgres|16544|f|16544|computing stats|45|45

27579|13599|postgres|16547|f|16547|scanning table|45|25
27579|13599|postgres|16547|f|16547|computing stats|45|45

27579|13599|postgres|16550|f|16550|scanning table|45|11
27579|13599|postgres|16550|f|16550|scanning table|45|38
27579|13599|postgres|16550|f|16550|finalizing analyze|45|45

27579|13599|postgres|16553|f|16553|scanning table|45|25
27579|13599|postgres|16553|f|16553|computing stats|45|45
==========================================================

I'll share test result using partitioning table via
Inheritance tables on next email. :)

Thanks,
Tatsuro Yamada

Вложения

v4-0001-Report-progress-for-ANALYZE.patch

Re: progress report for ANALYZE

От

Thomas Munro

Дата:

01 августа 2019 г., 08:44:42

On Tue, Jul 23, 2019 at 4:51 PM Tatsuro Yamada
<tatsuro.yamada.tf@nttcom.co.jp> wrote:
> Attached v4 patch file only includes this fix.

Hello all,

I've moved this to the September CF, where it is in "Needs review" state.

Thanks,

-- 
Thomas Munro
https://enterprisedb.com

Re: progress report for ANALYZE

От

Robert Haas

Дата:

01 августа 2019 г., 17:48:06

On Thu, Aug 1, 2019 at 4:45 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> On Tue, Jul 23, 2019 at 4:51 PM Tatsuro Yamada
> <tatsuro.yamada.tf@nttcom.co.jp> wrote:
> > Attached v4 patch file only includes this fix.
>
> I've moved this to the September CF, where it is in "Needs review" state.

/me reviews.

+      <entry><structfield>scanning_table</structfield></entry>

I think this should be retitled to something that ends in 'relid',
like all of the corresponding cases in existing progress views.
Perhaps 'active_relid' or 'current_relid'.

+       The command is computing extended stats from the samples
obtained in the previous phase.

I think you should change this (and the previous one) to say "from the
samples obtained during the table scan."

+       Total number of heap blocks in the scanning_table.

Perhaps I'm confused, but it looks to me like what you are advertising
is the number of blocks that will be sampled, not the total number of
blocks in the table.  I think that's the right thing to advertise, but
then the column should be named and documented that way.

+ {
+ const int   index[] = {
+ PROGRESS_ANALYZE_TOTAL_BLOCKS,
+ PROGRESS_ANALYZE_SCANREL
+ };
+ const int64 val[] = {
+ nblocks,
+ RelationGetRelid(onerel)
+ };
+
+ pgstat_progress_update_multi_param(2, index, val);
+ }

This block seems to be introduced just so you can declare variables; I
don't think that's good style. It's arguably unnecessary because we
now are selectively allowing variable declarations within functions,
but I think you should just move the first array to the top of the
function and the second declaration to the top of the function
dropping const, and then just do val[0] = nblocks and val[1] =
RelationGetRelid(onerel).  Maybe you can also come up with better
names than 'index' and 'val'.  Same comment applies to another place
where you have something similar.

Patch seems to need minor rebasing.

Maybe "scanning table" should be renamed "acquiring sample rows," to
match the names used in the code?

I'm not a fan of the way you set the scan-table phase and inh flag in
one place, and then slightly later set the relation OID and block
count.  That creates a race during which users could see the first bit
of data set and the second not set.  I don't see any reason not to set
all four fields together.

Please be sure to make the names of the constants you use match up
with the external names as far as it reasonably makes sense, e.g.

+#define PROGRESS_ANALYZE_PHASE_SCAN_TABLE       1
+#define PROGRESS_ANALYZE_PHASE_COMPUTING            2
+#define PROGRESS_ANALYZE_PHASE_COMPUTING_EXTENDED 3
+#define PROGRESS_ANALYZE_PHASE_FINALIZE         4

vs.

+            WHEN 0 THEN 'initializing'::text
+            WHEN 1 THEN 'scanning table'::text
+            WHEN 2 THEN 'computing stats'::text
+            WHEN 3 THEN 'computing extended stats'::text
+            WHEN 4 THEN 'finalizing analyze'::text

Not terrible, but it could be closer.

Similarly with the column names (include_children vs. INH).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: progress report for ANALYZE

От

Tatsuro Yamada

Дата:

13 августа 2019 г., 05:33:34

Hi Robert and All!


On 2019/08/02 2:48, Robert Haas wrote:> On Thu, Aug 1, 2019 at 4:45 AM Thomas Munro <thomas.munro@gmail.com> wrote:
>> On Tue, Jul 23, 2019 at 4:51 PM Tatsuro Yamada
>> <tatsuro.yamada.tf@nttcom.co.jp> wrote:
>>> Attached v4 patch file only includes this fix.
>>
>> I've moved this to the September CF, where it is in "Needs review" state.
> 
> /me reviews.


Thanks for your comments! :)


> +      <entry><structfield>scanning_table</structfield></entry>
> 
> I think this should be retitled to something that ends in 'relid',
> like all of the corresponding cases in existing progress views.
> Perhaps 'active_relid' or 'current_relid'.


Fixed.
I changed "scanning_table" to "current_relid" for analyze in monitoring.sgml.
However, I didn't change "relid" on other places for other commands because
I'd like to create other patch for that later. :)


  > +       The command is computing extended stats from the samples
> obtained in the previous phase.
> 
> I think you should change this (and the previous one) to say "from the
> samples obtained during the table scan."


Fixed.


> +       Total number of heap blocks in the scanning_table.
> 
> Perhaps I'm confused, but it looks to me like what you are advertising
> is the number of blocks that will be sampled, not the total number of
> blocks in the table.  I think that's the right thing to advertise, but
> then the column should be named and documented that way.


Ah, you are right. Fixed.
I used the following sentence based on Vinayak's patch created two years a go.

-     <entry><structfield>heap_blks_total</structfield></entry>
-     <entry><type>bigint</type></entry>
-     <entry>
-       Total number of heap blocks in the current_relid.
-     </entry>

+     <entry><structfield>sample_blks_total</structfield></entry>
+     <entry><type>bigint</></entry>
+     <entry>
+       Total number of heap blocks that will be sampled.
+</entry>



> + {
> + const int   index[] = {
> + PROGRESS_ANALYZE_TOTAL_BLOCKS,
> + PROGRESS_ANALYZE_SCANREL
> + };
> + const int64 val[] = {
> + nblocks,
> + RelationGetRelid(onerel)
> + };
> +
> + pgstat_progress_update_multi_param(2, index, val);
> + }
> 
> This block seems to be introduced just so you can declare variables; I
> don't think that's good style. It's arguably unnecessary because we
> now are selectively allowing variable declarations within functions,
> but I think you should just move the first array to the top of the
> function and the second declaration to the top of the function
> dropping const, and then just do val[0] = nblocks and val[1] =
> RelationGetRelid(onerel).  Maybe you can also come up with better
> names than 'index' and 'val'.  Same comment applies to another place
> where you have something similar.


I agreed and fixed.


> Patch seems to need minor rebasing.
> 
> Maybe "scanning table" should be renamed "acquiring sample rows," to
> match the names used in the code?


I fixed as following:

s/PROGRESS_ANALYZE_PHASE_SCAN_TABLE/
   PROGRESS_ANALYZE_PHASE_ACQUIRING_SAMPLE_ROWS/

s/WHEN 1 THEN 'scanning table'::text/
   WHEN 1 THEN 'acquiring sample rows'::text/


> I'm not a fan of the way you set the scan-table phase and inh flag in
> one place, and then slightly later set the relation OID and block
> count.  That creates a race during which users could see the first bit
> of data set and the second not set.  I don't see any reason not to set
> all four fields together.


Hmm... I understand but it's little difficult because if there are
child rels, acquire_inherited_sample_rows() calls acquire_sample_rows()
(See below). So, it is able to set all four fields together if inh flag
is given as a parameter of those functions, I suppose. But I'm not sure
whether it's okay to add the parameter to both functions or not.
Do you have any ideas? :)


# do_analyze_rel()
     ...
     if (inh)
         numrows = acquire_inherited_sample_rows(onerel, elevel,
                                                 rows, targrows,
                                                 &totalrows, &totaldeadrows);
     else
         numrows = (*acquirefunc) (onerel, elevel,
                                   rows, targrows,
                                   &totalrows, &totaldeadrows);


# acquire_inherited_sample_rows()
     ...
     foreach(lc, tableOIDs)
     {
     ...
        /* Check table type (MATVIEW can't happen, but might as well allow) */
         if (childrel->rd_rel->relkind == RELKIND_RELATION ||
             childrel->rd_rel->relkind == RELKIND_MATVIEW)
         {
             /* Regular table, so use the regular row acquisition function */
             acquirefunc = acquire_sample_rows;
     ...
         /* OK, we'll process this child */
         has_child = true;
         rels[nrels] = childrel;
         acquirefuncs[nrels] = acquirefunc;
     ...
     }
     ...
     for (i = 0; i < nrels; i++)
     {
     ...
         AcquireSampleRowsFunc acquirefunc = acquirefuncs[i];
     ...
             if (childtargrows > 0)
             {
     ...
                 /* Fetch a random sample of the child's rows */
                 childrows = (*acquirefunc) (childrel, elevel,
                                             rows + numrows, childtargrows,
                                             &trows, &tdrows)


> Please be sure to make the names of the constants you use match up
> with the external names as far as it reasonably makes sense, e.g.
> 
> +#define PROGRESS_ANALYZE_PHASE_SCAN_TABLE       1
> +#define PROGRESS_ANALYZE_PHASE_COMPUTING            2
> +#define PROGRESS_ANALYZE_PHASE_COMPUTING_EXTENDED 3
> +#define PROGRESS_ANALYZE_PHASE_FINALIZE         4
> 
> vs.
> 
> +            WHEN 0 THEN 'initializing'::text
> +            WHEN 1 THEN 'scanning table'::text
> +            WHEN 2 THEN 'computing stats'::text
> +            WHEN 3 THEN 'computing extended stats'::text
> +            WHEN 4 THEN 'finalizing analyze'::text
> 
> Not terrible, but it could be closer.


Agreed.
How about these?:

#define PROGRESS_ANALYZE_PHASE_ACQUIRE_SAMPLE_ROWS 1  <- fixed
#define PROGRESS_ANALYZE_PHASE_COMPUTE_STATS       2  <- fixed
#define PROGRESS_ANALYZE_PHASE_COMPUTE_EXT_STATS   3  <- fixed
#define PROGRESS_ANALYZE_PHASE_FINALIZE_ANALYZE    4  <- fixed

vs.

WHEN 1 THEN 'acquiring sample rows'::text
WHEN 2 THEN 'computing stats'::text
WHEN 3 THEN 'computing extended stats'::text
WHEN 4 THEN 'finalizing analyze'::text

I revised the name of the constants, so the constants and the phase
names are closer than before. Also, I used Verb instead Gerund
because other phase names used Verb such as VACUUM. :)


> Similarly with the column names (include_children vs. INH).


Fixed.
I selected "include_children" as the column name because it's
easy to understand than "INH".

s/PROGRESS_ANALYZE_INH/
   PROGRESS_ANALYZE_INCLUDE_CHILDREN/


Please find attached file. :)


Thanks,
Tatsuro Yamada

Вложения

v5-Report-progress-for-ANALYZE.patch

Re: progress report for ANALYZE

От

Alvaro Herrera

Дата:

13 августа 2019 г., 14:01:27

Hello,

On 2019-Jul-03, Tatsuro Yamada wrote:

> My ex-colleague Vinayak created same patch in 2017 [1], and he
> couldn't get commit because there are some reasons such as the
> patch couldn't handle analyzing Foreign table. Therefore, I wonder
> whether your patch is able to do that or not.

> [1] ANALYZE command progress checker
>
https://www.postgresql.org/message-id/flat/968b4eda-2417-8b7b-d468-71643cf088b6%40openscg.com#574488592fcc9708c38fa44b0dae9006

So just now I went to check the jold thread (which I should have
searched for before writing my own implementation!).  It seems clear
that many things are pretty similar in both patch, but I think for the
most part they are similar just because the underlying infrastructure
imposes a certain design already, rather than there being any actual
copying.  (To be perfectly clear: I didn't even know that this patch
existed and I didn't grab any code from there to produce my v1.)

However, you've now modified the patch from what I submitted and I'm
wondering if you've taken any inspiration from Vinayak's old patch.  If
so, it seems fair to credit him as co-author in the commit message.  It
would be good to get his input on the current patch, though.

I have not looked at the current version of the patch yet, but I intend
to do so during the upcoming commitfest.

Thanks for moving this forward!

On the subject of FDW support: I did look into supporting that before
submitting this.  I think it's not academically difficult: just have the
FDW's acquire_sample_rows callback invoke the update_param functions
once in a while.  Sadly, in practical terms it looks like postgres_fdw
is quite stupid about ANALYZE (it scans the whole table??) so doing
something that's actually useful may not be so easy.  At least, we know
the total relation size and maybe we can add the ctid column to the
cursor in postgresAcquireSampleRowsFunc so that we have a current block
number to report (becing careful about synchronized seqscans).  I think
this should *not* be part of the main ANALYZE-progress commit, though,
because getting that properly sorted out is going to take some more
time.

I do wonder why doesn't postgres_fdw use TABLESAMPLE.

I did not look at other FDWs at all, mind.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: progress report for ANALYZE

От

Etsuro Fujita

Дата:

14 августа 2019 г., 07:28:32

Hi,

On Tue, Aug 13, 2019 at 11:01 PM Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> On the subject of FDW support: I did look into supporting that before
> submitting this.  I think it's not academically difficult: just have the
> FDW's acquire_sample_rows callback invoke the update_param functions
> once in a while.  Sadly, in practical terms it looks like postgres_fdw
> is quite stupid about ANALYZE (it scans the whole table??) so doing
> something that's actually useful may not be so easy.  At least, we know
> the total relation size and maybe we can add the ctid column to the
> cursor in postgresAcquireSampleRowsFunc so that we have a current block
> number to report (becing careful about synchronized seqscans).

I don't follow this thread fully, so I might miss something, but I
don't think that's fully applicable, because foreign tables managed by
postgres_fdw can be eg, views on the remote side.

> I do wonder why doesn't postgres_fdw use TABLESAMPLE.

Yeah, that's really what I'm thinking for PG13; but I think we would
still need to scan the whole table in some cases (eg, when the foreign
table is a view on the remote side), because the TABLESAMLE clause can
only be applied to regular tables and materialized views.

> I did not look at other FDWs at all, mind.

IIUC, oracle_fdw already uses the SAMPLE BLOCK clause for that.  Right?

Best regards,
Etsuro Fujita

Re: progress report for ANALYZE

От

Alvaro Herrera

Дата:

14 августа 2019 г., 21:42:36

On 2019-Aug-14, Etsuro Fujita wrote:

> On Tue, Aug 13, 2019 at 11:01 PM Alvaro Herrera
> <alvherre@2ndquadrant.com> wrote:
> > On the subject of FDW support: I did look into supporting that before
> > submitting this.  I think it's not academically difficult: just have the
> > FDW's acquire_sample_rows callback invoke the update_param functions
> > once in a while.  Sadly, in practical terms it looks like postgres_fdw
> > is quite stupid about ANALYZE (it scans the whole table??) so doing
> > something that's actually useful may not be so easy.  At least, we know
> > the total relation size and maybe we can add the ctid column to the
> > cursor in postgresAcquireSampleRowsFunc so that we have a current block
> > number to report (becing careful about synchronized seqscans).
> 
> I don't follow this thread fully, so I might miss something, but I
> don't think that's fully applicable, because foreign tables managed by
> postgres_fdw can be eg, views on the remote side.

Oh, hmm, well I guess that covers the tables and matviews then ... I'm
not sure there's a good way to cover foreign tables that are views or
other stuff.  Maybe that'll have to do.  But at least it covers more
cases than none.

> > I do wonder why doesn't postgres_fdw use TABLESAMPLE.
> 
> Yeah, that's really what I'm thinking for PG13; but I think we would
> still need to scan the whole table in some cases (eg, when the foreign
> table is a view on the remote side), because the TABLESAMLE clause can
> only be applied to regular tables and materialized views.

Sure.

> > I did not look at other FDWs at all, mind.
> 
> IIUC, oracle_fdw already uses the SAMPLE BLOCK clause for that.  Right?

Yeah, it does that, I checked precisely oracle_fdw this morning.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: progress report for ANALYZE

От

Tatsuro Yamada

Дата:

15 августа 2019 г., 01:45:15

Hi Alvaro,

On 2019/08/13 23:01, Alvaro Herrera wrote:
> Hello,
> 
> On 2019-Jul-03, Tatsuro Yamada wrote:
> 
>> My ex-colleague Vinayak created same patch in 2017 [1], and he
>> couldn't get commit because there are some reasons such as the
>> patch couldn't handle analyzing Foreign table. Therefore, I wonder
>> whether your patch is able to do that or not.
> 
>> [1] ANALYZE command progress checker
>>
https://www.postgresql.org/message-id/flat/968b4eda-2417-8b7b-d468-71643cf088b6%40openscg.com#574488592fcc9708c38fa44b0dae9006
> 
> So just now I went to check the jold thread (which I should have
> searched for before writing my own implementation!).  It seems clear
> that many things are pretty similar in both patch, but I think for the
> most part they are similar just because the underlying infrastructure
> imposes a certain design already, rather than there being any actual
> copying.  (To be perfectly clear: I didn't even know that this patch
> existed and I didn't grab any code from there to produce my v1.)


I know your patch was not based on Vinayak's old patch because
coding style is different between him and you.

  
> However, you've now modified the patch from what I submitted and I'm
> wondering if you've taken any inspiration from Vinayak's old patch.  If
> so, it seems fair to credit him as co-author in the commit message.  It
> would be good to get his input on the current patch, though.


Yeah, I'm happy if you added his name as co-authors because I checked
the document including his old patch as a reference. :)

  
> I have not looked at the current version of the patch yet, but I intend
> to do so during the upcoming commitfest.
> 
> Thanks for moving this forward!


Thanks!
Committing the patch on PG13 makes me happy because Progress reporting
features are important for DBA. :)


> On the subject of FDW support: I did look into supporting that before
> submitting this.  I think it's not academically difficult: just have the
> FDW's acquire_sample_rows callback invoke the update_param functions
> once in a while.  Sadly, in practical terms it looks like postgres_fdw


Actually, I've changed my mind.
Even if there is no FDW support, Progress report for ANALYZE is still useful. Therefore, FDW support would be
preferablebut not required for committing
 
the patch, I believe. :)


Thanks,
Tatsuro Yamada

Re: progress report for ANALYZE

От

Alvaro Herrera

Дата:

04 сентября 2019 г., 21:01:03

There were some minor problems in v5 -- bogus Docbook as well as
outdated rules.out, small "git diff --check" complaint about whitespace.
This v6 (on today's master) fixes those, no other changes.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

v6-Report-progress-for-ANALYZE.patch

Re: progress report for ANALYZE

От

Tatsuro Yamada

Дата:

06 сентября 2019 г., 02:21:33

Hi Alvaro,

> There were some minor problems in v5 -- bogus Docbook as well as
> outdated rules.out, small "git diff --check" complaint about whitespace.
> This v6 (on today's master) fixes those, no other changes.

  
Thanks for fixing that. :)
I'll test it later.


I think we have to address the following comment from Robert. Right?
Do you have any ideas?


>> I'm not a fan of the way you set the scan-table phase and inh flag in
>> one place, and then slightly later set the relation OID and block
>> count.  That creates a race during which users could see the first bit
>> of data set and the second not set.  I don't see any reason not to set
>> all four fields together.
> 
> 
> Hmm... I understand but it's little difficult because if there are
> child rels, acquire_inherited_sample_rows() calls acquire_sample_rows()
> (See below). So, it is able to set all four fields together if inh flag
> is given as a parameter of those functions, I suppose. But I'm not sure
> whether it's okay to add the parameter to both functions or not.
> Do you have any ideas?
> 
> 
> # do_analyze_rel()
>     ...
>     if (inh)
>         numrows = acquire_inherited_sample_rows(onerel, elevel,
>                                                 rows, targrows,
>                                                 &totalrows, &totaldeadrows);
>     else
>         numrows = (*acquirefunc) (onerel, elevel,
>                                   rows, targrows,
>                                   &totalrows, &totaldeadrows);
> 
> 
> # acquire_inherited_sample_rows()
>     ...
>     foreach(lc, tableOIDs)
>     {
>     ...
>        /* Check table type (MATVIEW can't happen, but might as well allow) */
>         if (childrel->rd_rel->relkind == RELKIND_RELATION ||
>             childrel->rd_rel->relkind == RELKIND_MATVIEW)
>         {
>             /* Regular table, so use the regular row acquisition function */
>             acquirefunc = acquire_sample_rows;
>     ...
>         /* OK, we'll process this child */
>         has_child = true;
>         rels[nrels] = childrel;
>         acquirefuncs[nrels] = acquirefunc;
>     ...
>     }
>     ...
>     for (i = 0; i < nrels; i++)
>     {
>     ...
>         AcquireSampleRowsFunc acquirefunc = acquirefuncs[i];
>     ...
>             if (childtargrows > 0)
>             {
>     ...
>                 /* Fetch a random sample of the child's rows */
>                 childrows = (*acquirefunc) (childrel, elevel,
>                                             rows + numrows, childtargrows,
>                                             &trows, &tdrows)


Thanks,
Tatsuro Yamada

Re: progress report for ANALYZE

От

vignesh C

Дата:

17 сентября 2019 г., 11:51:37

On Thu, Sep 5, 2019 at 2:31 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>
> There were some minor problems in v5 -- bogus Docbook as well as
> outdated rules.out, small "git diff --check" complaint about whitespace.
> This v6 (on today's master) fixes those, no other changes.
>
+     <entry>
+       The command is preparing to begin scanning the heap.  This phase is
+       expected to be very brief.
+     </entry>
In the above after "." there is an extra space, is this intentional. I
had noticed that in lot of places there is couple of spaces and
sometimes single space across this file.

Like in below, there is single space after ".":
     <entry>Time when this process' current transaction was started, or null
      if no transaction is active. If the current
      query is the first of its transaction, this column is equal to the
      <structfield>query_start</structfield> column.
     </entry>

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

Re: progress report for ANALYZE

От

Tatsuro Yamada

Дата:

01 ноября 2019 г., 06:22:17

Hi vignesh!


On 2019/09/17 20:51, vignesh C wrote:
> On Thu, Sep 5, 2019 at 2:31 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>>
>> There were some minor problems in v5 -- bogus Docbook as well as
>> outdated rules.out, small "git diff --check" complaint about whitespace.
>> This v6 (on today's master) fixes those, no other changes.
>>
> +     <entry>
> +       The command is preparing to begin scanning the heap.  This phase is
> +       expected to be very brief.
> +     </entry>
> In the above after "." there is an extra space, is this intentional. I
> had noticed that in lot of places there is couple of spaces and
> sometimes single space across this file.
> 
> Like in below, there is single space after ".":
>       <entry>Time when this process' current transaction was started, or null
>        if no transaction is active. If the current
>        query is the first of its transaction, this column is equal to the
>        <structfield>query_start</structfield> column.
>       </entry>


Sorry for the late reply.

Probably, it is intentional because there are many extra space
in other documents. See below:

# Results of grep
=============
$ grep '\.  ' doc/src/sgml/monitoring.sgml | wc -l
114
$ grep '\.  ' doc/src/sgml/information_schema.sgml | wc -l
184
$ grep '\.  ' doc/src/sgml/func.sgml | wc -l
577
=============

Therefore, I'm going to leave it as is. :)


Thanks,
Tatsuro Yamada

Re: progress report for ANALYZE

От

Tatsuro Yamada

Дата:

05 ноября 2019 г., 12:07:07

Hi Alvaro, vignesh,

I rebased the patch on 2a4d96eb, and added new column
"ext_compute_count" in pg_stat_progress_analyze vie to
report a number of computing extended stats.
It is like a "index_vacuum_count" in vacuum progress reporter or
"index_rebuild_count" in cluster progress reporter. :)

Please find attached file: v7.

And the following is a test result:
==============
[Session1]
\! pgbench -i
create statistics pg_ext1 (dependencies) ON aid, bid from pgbench_accounts;
create statistics pg_ext2 (mcv) ON aid, bid from pgbench_accounts;
create statistics pg_ext3 (ndistinct) ON aid, bid from pgbench_accounts;

[Session2]
# \a \t
# select * from pg_stat_progress_analyze ; \watch 0.0001

27064|13583|postgres|16405|initializing|f|0|0|0|0
27064|13583|postgres|16405|acquiring sample rows|f|16405|1640|0|0
27064|13583|postgres|16405|acquiring sample rows|f|16405|1640|23|0
27064|13583|postgres|16405|acquiring sample rows|f|16405|1640|64|0
27064|13583|postgres|16405|acquiring sample rows|f|16405|1640|1640|0
27064|13583|postgres|16405|computing stats|f|16405|1640|1640|0
27064|13583|postgres|16405|computing stats|f|16405|1640|1640|0
27064|13583|postgres|16405|computing extended stats|f|16405|1640|1640|0
27064|13583|postgres|16405|computing extended stats|f|16405|1640|1640|1
27064|13583|postgres|16405|computing extended stats|f|16405|1640|1640|2
27064|13583|postgres|16405|computing extended stats|f|16405|1640|1640|3
27064|13583|postgres|16405|finalizing analyze|f|16405|1640|1640|3

Note:
  The result on Session2 was shortened for readability.
  If you'd like to check the whole result, you can see attached file: "hoge.txt".
==============

Thanks,
Tatsuro Yamada

Вложения

Re: progress report for ANALYZE

От

Alvaro Herrera

Дата:

05 ноября 2019 г., 13:38:50

On 2019-Nov-05, Tatsuro Yamada wrote:

> ==============
> [Session1]
> \! pgbench -i
> create statistics pg_ext1 (dependencies) ON aid, bid from pgbench_accounts;
> create statistics pg_ext2 (mcv) ON aid, bid from pgbench_accounts;
> create statistics pg_ext3 (ndistinct) ON aid, bid from pgbench_accounts;

Wow, it takes a long time to compute these ...

Hmm, you normally wouldn't define stats that way; you'd do this instead:

create statistics pg_ext1 (dependencies, mcv,ndistinct) ON aid, bid from pgbench_accounts;

I'm not sure if this has an important impact in practice.  What I'm
saying is that I'm not sure that "number of ext stats" is necessarily a
useful number as shown.  I wonder if it's possible to count the number
of items that have been computed for each stats object.  So if you do
this

create statistics pg_ext1 (dependencies, mcv) ON aid, bid from pgbench_accounts;
create statistics pg_ext2 (ndistinct,histogram) ON aid, bid from pgbench_accounts;

then the counter goes to 4.  But I also wonder if we need to publish
_which_ type of ext stats is currently being built, in a separate
column.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: progress report for ANALYZE

От

Tatsuro Yamada

Дата:

06 ноября 2019 г., 05:49:49

Hi Alvaro!

On 2019/11/05 22:38, Alvaro Herrera wrote:
> On 2019-Nov-05, Tatsuro Yamada wrote:
> 
>> ==============
>> [Session1]
>> \! pgbench -i
>> create statistics pg_ext1 (dependencies) ON aid, bid from pgbench_accounts;
>> create statistics pg_ext2 (mcv) ON aid, bid from pgbench_accounts;
>> create statistics pg_ext3 (ndistinct) ON aid, bid from pgbench_accounts;
> 
> Wow, it takes a long time to compute these ...
> 
> Hmm, you normally wouldn't define stats that way; you'd do this instead:
> 
> create statistics pg_ext1 (dependencies, mcv,ndistinct) ON aid, bid from pgbench_accounts;

I'd like to say it's a just example of test case. But I understand that
your advice. Thanks! :)

> I'm not sure if this has an important impact in practice.  What I'm
> saying is that I'm not sure that "number of ext stats" is necessarily a
> useful number as shown.  I wonder if it's possible to count the number
> of items that have been computed for each stats object.  So if you do
> this
>
> create statistics pg_ext1 (dependencies, mcv) ON aid, bid from pgbench_accounts;
> create statistics pg_ext2 (ndistinct,histogram) ON aid, bid from pgbench_accounts;
> 
> then the counter goes to 4.  But I also wonder if we need to publish
> _which_ type of ext stats is currently being built, in a separate
> column.

Hmm... I have never seen a lot of extended stats on a table (with many columns)
but I suppose it will be existence near future because extended stats is an only
solution to correct row estimation error in vanilla PostgreSQL. Therefore, it
would be better to add the counter on the view, I think.

I revised the patch as following because I realized counting the types of ext
stats is not useful for users.

  - Attached new patch counts a number of ext stats instead the types of ext stats.

So we can see the counter goes to "2", if we created above ext stats (pg_ext1 and
pg_ext2) and analyzed as you wrote. :)

Thanks,
Tatsuro Yamada

Вложения

v8-Report-progress-for-ANALYZE.patch

Re: progress report for ANALYZE

От

Amit Langote

Дата:

19 ноября 2019 г., 01:57:39

Yamada-san,

Thanks for working on this.

On Wed, Nov 6, 2019 at 2:50 PM Tatsuro Yamada
<tatsuro.yamada.tf@nttcom.co.jp> wrote:
> I revised the patch as following because I realized counting the types of ext
> stats is not useful for users.
>
>   - Attached new patch counts a number of ext stats instead the types of ext stats.
>
> So we can see the counter goes to "2", if we created above ext stats (pg_ext1 and
> pg_ext2) and analyzed as you wrote. :)

I have looked at the patch and here are some comments.

I think include_children and current_relid are not enough to
understand the progress of analyzing inheritance trees, because even
with current_relid being updated, I can't tell how many more there
will be.  I think it'd be better to show the total number of children
and the number of children processed, just like
pg_stat_progress_create_index does for partitions.  So, instead of
include_children and current_relid, I think it's better to have
child_tables_total, child_tables_done, and current_child_relid, placed
last in the set of columns.

Also, inheritance tree stats are created *after* creating single table
stats, so I think that it would be better to have a distinct phase
name for that, say "acquiring inherited sample rows".  In
do_analyze_rel(), you can select which of two phases to set based on
whether inh is true or not.  For partitioned tables, the progress
output will immediately switch to this phase, because partitioned
table itself is empty so there's nothing to do in the "acquiring
sample rows" phase.

That's all for now.

Thanks,
Amit

Re: progress report for ANALYZE

От

Tatsuro Yamada

Дата:

26 ноября 2019 г., 02:32:01

Hi Amit-san!

Thanks for your comments!


> I have looked at the patch and here are some comments.
> 
> I think include_children and current_relid are not enough to
> understand the progress of analyzing inheritance trees, because even
> with current_relid being updated, I can't tell how many more there
> will be.  I think it'd be better to show the total number of children
> and the number of children processed, just like
> pg_stat_progress_create_index does for partitions.  So, instead of
> include_children and current_relid, I think it's better to have
> child_tables_total, child_tables_done, and current_child_relid, placed
> last in the set of columns.

Ah, I understood.
I'll check pg_stat_progress_create_index does for partitions,
and will create a new patch. :)

Related to the above,
I wonder whether we need the total number of ext stats on
pg_stat_progress_analyze or not. As you might know, there is the same
counter on pg_stat_progress_vacuum and pg_stat_progress_cluster.
For example, index_vacuum_count and index_rebuild_count.
Would it be added to the total number column to these views? :)


> Also, inheritance tree stats are created *after* creating single table
> stats, so I think that it would be better to have a distinct phase
> name for that, say "acquiring inherited sample rows".  In
> do_analyze_rel(), you can select which of two phases to set based on
> whether inh is true or not.  For partitioned tables, the progress
> output will immediately switch to this phase, because partitioned
> table itself is empty so there's nothing to do in the "acquiring
> sample rows" phase.
> 
> That's all for now.


Okay! I'll also add the new phase "acquiring inherited sample rows" on
the next patch. :)


Thanks,
Tatsuro Yamada

Re: progress report for ANALYZE

От

Tatsuro Yamada

Дата:

26 ноября 2019 г., 03:56:48

Hi Amit-san,


> Related to the above,
> I wonder whether we need the total number of ext stats on
> pg_stat_progress_analyze or not. As you might know, there is the same
> counter on pg_stat_progress_vacuum and pg_stat_progress_cluster.
> For example, index_vacuum_count and index_rebuild_count.
> Would it be added to the total number column to these views? :)


Oops, I made a mistake. :(

What I'd like to say was:
Would it be better to add the total number column to these views? :)

Thanks,
Tatsuro Yamada

Re: progress report for ANALYZE

От

Alvaro Herrera

Дата:

26 ноября 2019 г., 12:22:51

On 2019-Nov-26, Tatsuro Yamada wrote:

> > I wonder whether we need the total number of ext stats on
> > pg_stat_progress_analyze or not. As you might know, there is the same
> > counter on pg_stat_progress_vacuum and pg_stat_progress_cluster.
> > For example, index_vacuum_count and index_rebuild_count.
> 
> Would it be better to add the total number column to these views? :)

Yeah, I think it would be good to add that.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: progress report for ANALYZE

От

Tatsuro Yamada

Дата:

27 ноября 2019 г., 02:01:37

Hi Alvaro!

On 2019/11/26 21:22, Alvaro Herrera wrote:
> On 2019-Nov-26, Tatsuro Yamada wrote:
> 
>>> I wonder whether we need the total number of ext stats on
>>> pg_stat_progress_analyze or not. As you might know, there is the same
>>> counter on pg_stat_progress_vacuum and pg_stat_progress_cluster.
>>> For example, index_vacuum_count and index_rebuild_count.
>>
>> Would it be better to add the total number column to these views? :)
> 
> Yeah, I think it would be good to add that.

Thanks for your comment!
Okay, I'll add the column "ext_stats_total" to
pg_stat_progress_analyze view on the next patch. :)

Regarding to other total number columns,
I'll create another patch to add these columns "index_vacuum_total" and
"index_rebuild_count" on the other views. :)

Thanks,
Tatsuro Yamada

Re: progress report for ANALYZE

От

Amit Langote

Дата:

27 ноября 2019 г., 02:34:26

On Tue, Nov 26, 2019 at 9:22 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>
> On 2019-Nov-26, Tatsuro Yamada wrote:
>
> > > I wonder whether we need the total number of ext stats on
> > > pg_stat_progress_analyze or not. As you might know, there is the same
> > > counter on pg_stat_progress_vacuum and pg_stat_progress_cluster.
> > > For example, index_vacuum_count and index_rebuild_count.
> >
> > Would it be better to add the total number column to these views? :)
>
> Yeah, I think it would be good to add that.

Hmm, does it take that long to calculate ext stats on one column?  The
reason it's worthwhile to have index_vacuum_count,
index_rebuild_count, etc. is because it can take very long for one
index to get vacuumed/rebuilt.

Thanks,
Amit

Re: progress report for ANALYZE

От

Amit Langote

Дата:

27 ноября 2019 г., 02:38:39

Yamada-san,

On Wed, Nov 27, 2019 at 11:04 AM Tatsuro Yamada
<tatsuro.yamada.tf@nttcom.co.jp> wrote:
> Regarding to other total number columns,
> I'll create another patch to add these columns "index_vacuum_total" and
> "index_rebuild_count" on the other views. :)

Maybe you meant "index_rebuild_total"?

Thanks,
Amit

Re: progress report for ANALYZE

От

Alvaro Herrera

Дата:

27 ноября 2019 г., 03:00:14

On 2019-Nov-27, Amit Langote wrote:

> On Tue, Nov 26, 2019 at 9:22 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> >
> > On 2019-Nov-26, Tatsuro Yamada wrote:
> >
> > > > I wonder whether we need the total number of ext stats on
> > > > pg_stat_progress_analyze or not. As you might know, there is the same
> > > > counter on pg_stat_progress_vacuum and pg_stat_progress_cluster.
> > > > For example, index_vacuum_count and index_rebuild_count.
> > >
> > > Would it be better to add the total number column to these views? :)
> >
> > Yeah, I think it would be good to add that.
> 
> Hmm, does it take that long to calculate ext stats on one column?  The
> reason it's worthwhile to have index_vacuum_count,
> index_rebuild_count, etc. is because it can take very long for one
> index to get vacuumed/rebuilt.

Yes, it's noticeable.  It's not as long as building an index, of course,
but it's a long enough fraction of the total analyze time that it should
be reported.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: progress report for ANALYZE

От

Tatsuro Yamada

Дата:

27 ноября 2019 г., 03:14:47

Hi Amit-san,

> On Wed, Nov 27, 2019 at 11:04 AM Tatsuro Yamada
> <tatsuro.yamada.tf@nttcom.co.jp> wrote:
>> Regarding to other total number columns,
>> I'll create another patch to add these columns "index_vacuum_total" and
>> "index_rebuild_count" on the other views. :)
> 
> Maybe you meant "index_rebuild_total"?

Yeah, you are right! :)


Thanks,
Tatsuro Yamada

Re: progress report for ANALYZE

От

Tatsuro Yamada

Дата:

27 ноября 2019 г., 03:45:41

Hi Amit-san!

>> I think include_children and current_relid are not enough to
>> understand the progress of analyzing inheritance trees, because even
>> with current_relid being updated, I can't tell how many more there
>> will be.  I think it'd be better to show the total number of children
>> and the number of children processed, just like
>> pg_stat_progress_create_index does for partitions.  So, instead of
>> include_children and current_relid, I think it's better to have
>> child_tables_total, child_tables_done, and current_child_relid, placed
>> last in the set of columns.
> 
> Ah, I understood.
> I'll check pg_stat_progress_create_index does for partitions,
> and will create a new patch. 

Fixed.

But I just remembered I replaced column name "*_table" with "*_relid"
based on Robert's comment three months ago, see below:

> /me reviews.
> 
> +      <entry><structfield>scanning_table</structfield></entry>
> 
> I think this should be retitled to something that ends in 'relid',
> like all of the corresponding cases in existing progress views.
> Perhaps 'active_relid' or 'current_relid'.

So, it would be better to use same rule against child_tables_total and
child_tables_done. Thus I changed these column names to others and added
to the view. I also removed include_children and current_relid.
The following columns are new version.

<New columns of the view>
  pid
  datid
  datname
  relid
  phase
  sample_blks_total
  heap_blks_scanned
  ext_stats_total     <= Added (based on Alvaro's comment)
  ext_stats_computed  <= Renamed
  child_relids_total  <= Added
  child_relids_done   <= Added
  current_child_relid <= Added


>> Also, inheritance tree stats are created *after* creating single table
>> stats, so I think that it would be better to have a distinct phase
>> name for that, say "acquiring inherited sample rows".  In
>> do_analyze_rel(), you can select which of two phases to set based on
>> whether inh is true or not.  For partitioned tables, the progress
>> output will immediately switch to this phase, because partitioned
>> table itself is empty so there's nothing to do in the "acquiring
>> sample rows" phase.
>>
>> That's all for now.
> 
> 
> Okay! I'll also add the new phase "acquiring inherited sample rows" on
> the next patch. 


Fixed.

I tried to abbreviate it to "acquiring inh sample rows" because I thought
"acquiring inherited sample rows" is a little long for the phase name.

Attached WIP patch is including these fixes:
   - Remove columns: include_children and current_relid
   - Add new columns: child_relieds_total, child_relids_done and current_child_relid
   - Add new phase "acquiring inh sample rows"

   Note: the document is not updated, I'll fix it later. :)

Attached testcase.sql is for creating base table and partitioning table.
You can check the patch by using the following procedures, easily.

Terminal #1
--------------
\a \t
select * from pg_stat_progress_analyze; \watch 0.0001
--------------

Terminal #2
--------------
\i testcase.sql
analyze hoge;
analyze hoge2;
--------------

Thanks,
Tatsuro Yamada

Вложения

Re: progress report for ANALYZE

От

Michael Paquier

Дата:

27 ноября 2019 г., 04:25:49

On Wed, Nov 27, 2019 at 12:45:41PM +0900, Tatsuro Yamada wrote:
> Fixed.

Patch was waiting on input from author, so I have switched it back to
"Needs review", and moved it to next CF while on it as you are working
on it.
--
Michael

Вложения

signature.asc

Re: progress report for ANALYZE

От

Amit Langote

Дата:

28 ноября 2019 г., 01:59:03

Yamada-san,

Thank you for updating the patch.

On Wed, Nov 27, 2019 at 12:46 PM Tatsuro Yamada
<tatsuro.yamada.tf@nttcom.co.jp> wrote:
> But I just remembered I replaced column name "*_table" with "*_relid"
> based on Robert's comment three months ago, see below:
>
> > /me reviews.
> >
> > +      <entry><structfield>scanning_table</structfield></entry>
> >
> > I think this should be retitled to something that ends in 'relid',
> > like all of the corresponding cases in existing progress views.
> > Perhaps 'active_relid' or 'current_relid'.
>
> So, it would be better to use same rule against child_tables_total and
> child_tables_done. Thus I changed these column names to others and added
> to the view.

Robert's comment seems OK for a column that actually reports an OID,
but for columns that report counts, names containing "relids" sound a
bit strange to me.  So, I prefer child_tables_total /
child_tables_done over child_relids_total / child_relids_done.  Would
like to hear more opinions.

> >> Also, inheritance tree stats are created *after* creating single table
> >> stats, so I think that it would be better to have a distinct phase
> >> name for that, say "acquiring inherited sample rows".  In
> >> do_analyze_rel(), you can select which of two phases to set based on
> >> whether inh is true or not.
> >
> > Okay! I'll also add the new phase "acquiring inherited sample rows" on
> > the next patch.
>
>
> Fixed.
>
> I tried to abbreviate it to "acquiring inh sample rows" because I thought
> "acquiring inherited sample rows" is a little long for the phase name.

I think phase names should contain full English words, because they
are supposed to be descriptive.  Users are very likely to not
understand what "inh" means without looking up the docs.

Thanks,
Amit

Re: progress report for ANALYZE

От

Tatsuro Yamada

Дата:

28 ноября 2019 г., 10:55:22

Hi Amit-san,

On 2019/11/28 10:59, Amit Langote wrote:
> Yamada-san,
> 
> Thank you for updating the patch.
> 
> On Wed, Nov 27, 2019 at 12:46 PM Tatsuro Yamada
> <tatsuro.yamada.tf@nttcom.co.jp> wrote:
>> But I just remembered I replaced column name "*_table" with "*_relid"
>> based on Robert's comment three months ago, see below:
>>
>>> /me reviews.
>>>
>>> +      <entry><structfield>scanning_table</structfield></entry>
>>>
>>> I think this should be retitled to something that ends in 'relid',
>>> like all of the corresponding cases in existing progress views.
>>> Perhaps 'active_relid' or 'current_relid'.
>>
>> So, it would be better to use same rule against child_tables_total and
>> child_tables_done. Thus I changed these column names to others and added
>> to the view.
> 
> Robert's comment seems OK for a column that actually reports an OID,
> but for columns that report counts, names containing "relids" sound a
> bit strange to me.  So, I prefer child_tables_total /
> child_tables_done over child_relids_total / child_relids_done.  Would
> like to hear more opinions.

Hmmm... I understand your opinion but I'd like to get more opinions too. :)
Do you prefer these column names? See below:

<Columns of the view>
  pid
  datid
  datname
  relid
  phase
  sample_blks_total
  heap_blks_scanned
  ext_stats_total
  ext_stats_computed
  child_tables_total  <= Renamed: s/relid/table/
  child_tables_done   <= Renamed: s/relid/table/
  current_child_table <= Renamed: s/relid/table/



>>>> Also, inheritance tree stats are created *after* creating single table
>>>> stats, so I think that it would be better to have a distinct phase
>>>> name for that, say "acquiring inherited sample rows".  In
>>>> do_analyze_rel(), you can select which of two phases to set based on
>>>> whether inh is true or not.
>>>
>>> Okay! I'll also add the new phase "acquiring inherited sample rows" on
>>> the next patch.
>>
>>
>> Fixed.
>>
>> I tried to abbreviate it to "acquiring inh sample rows" because I thought
>> "acquiring inherited sample rows" is a little long for the phase name.
> 
> I think phase names should contain full English words, because they
> are supposed to be descriptive.  Users are very likely to not
> understand what "inh" means without looking up the docs.


Okay, I will fix it.
    s/acquiring inh sample rows/acquiring inherited sample rows/


Thanks,
Tatsuro Yamada

Re: progress report for ANALYZE

От

Alvaro Herrera

Дата:

28 ноября 2019 г., 13:37:17

On 2019-Nov-28, Tatsuro Yamada wrote:

> Hmmm... I understand your opinion but I'd like to get more opinions too. :)
> Do you prefer these column names? See below:

Here's my take on it:

 <Columns of the view>
  pid
  datid
  datname
  relid
  phase
  sample_blks_total
  sample_blks_scanned
  ext_stats_total
  ext_stats_computed
  child_tables_total
  child_tables_done
  current_child_table_relid

It seems to make sense to keep using the "child table" terminology in
that last column; but since the column carries an OID then as Robert
said it should have "relid" in the name.  For the other two "child
tables" columns, not using "relid" is appropriate because what they have
is not relids.

I think there should be an obvious correspondence in columns that are
closely related, which there isn't if you use "sample" in one and "heap"
in the other, so I'd go for "sample" in both.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: progress report for ANALYZE

От

Tatsuro Yamada

Дата:

29 ноября 2019 г., 00:54:48

Hi Alvaro!

>> Hmmm... I understand your opinion but I'd like to get more opinions too. :)
>> Do you prefer these column names? See below:
> 
> Here's my take on it:
> 
>   <Columns of the view>
>    pid
>    datid
>    datname
>    relid
>    phase
>    sample_blks_total
>    sample_blks_scanned
>    ext_stats_total
>    ext_stats_computed
>    child_tables_total
>    child_tables_done
>    current_child_table_relid
> 
> It seems to make sense to keep using the "child table" terminology in
> that last column; but since the column carries an OID then as Robert
> said it should have "relid" in the name.  For the other two "child
> tables" columns, not using "relid" is appropriate because what they have
> is not relids.
>
> I think there should be an obvious correspondence in columns that are
> closely related, which there isn't if you use "sample" in one and "heap"
> in the other, so I'd go for "sample" in both.


Thanks for the comment.
Oops, You are right, I overlooked they are not relids..
I agreed with you and Amit's opinion so I'll send a revised patch on the next mail. :-)

Next patch will be included:
  - New columns definition of the view (as above)
  - Renamed the phase name: s/acquiring inh sample rows/acquiring inherited sample rows/
  - Update document


Thanks,
Tatsuro Yamada

Re: progress report for ANALYZE

От

Tatsuro Yamada

Дата:

29 ноября 2019 г., 01:27:20

Hi Michael,

On 2019/11/27 13:25, Michael Paquier wrote:
> On Wed, Nov 27, 2019 at 12:45:41PM +0900, Tatsuro Yamada wrote:
>> Fixed.
> 
> Patch was waiting on input from author, so I have switched it back to
> "Needs review", and moved it to next CF while on it as you are working
> on it.

Thanks for your CF manager work.
I will do my best to be committed at the next CF because
Progress reporting feature is useful for DBAs, as you know. :)

Thanks,
Tatsuro Yamada

Re: progress report for ANALYZE

От

Tatsuro Yamada

Дата:

29 ноября 2019 г., 08:45:14

Hi Alvaro and Amit!

On 2019/11/29 9:54, Tatsuro Yamada wrote:
> Hi Alvaro!
> 
>>> Hmmm... I understand your opinion but I'd like to get more opinions too. :)
>>> Do you prefer these column names? See below:
>>
>> Here's my take on it:
>>
>>   <Columns of the view>
>>    pid
>>    datid
>>    datname
>>    relid
>>    phase
>>    sample_blks_total
>>    sample_blks_scanned
>>    ext_stats_total
>>    ext_stats_computed
>>    child_tables_total
>>    child_tables_done
>>    current_child_table_relid
>>
>> It seems to make sense to keep using the "child table" terminology in
>> that last column; but since the column carries an OID then as Robert
>> said it should have "relid" in the name.  For the other two "child
>> tables" columns, not using "relid" is appropriate because what they have
>> is not relids.
>>
>> I think there should be an obvious correspondence in columns that are
>> closely related, which there isn't if you use "sample" in one and "heap"
>> in the other, so I'd go for "sample" in both.
> 
> 
> Thanks for the comment.
> Oops, You are right, I overlooked they are not relids..
> I agreed with you and Amit's opinion so I'll send a revised patch on the next mail. :-)
> 
> Next patch will be included:
>   - New columns definition of the view (as above)
>   - Renamed the phase name: s/acquiring inh sample rows/acquiring inherited sample rows/
>   - Update document

Attached patch is the revised patch. :)

I wonder two things below. What do you think?

1)
For now, I'm not sure it should be set current_child_table_relid to zero
when the current phase is changed from "acquiring inherited sample rows" to
"computing stats". See <Test result> bellow.

2)
There are many "finalizing analyze" phases based on relids in the case
of partitioning tables. Would it better to fix the document? or it
would be better to reduce it to one?

<Document>
---------------------------------------------------------
      <entry><literal>finalizing analyze</literal></entry>
      <entry>
        The command is updating pg_class. When this phase is completed,
        <command>ANALYZE</command> will end.
---------------------------------------------------------


<New columns of the view>
---------------------------------------------------------
# \d pg_stat_progress_analyze
               View "pg_catalog.pg_stat_progress_analyze"
           Column           |  Type   | Collation | Nullable | Default
---------------------------+---------+-----------+----------+---------
  pid                       | integer |           |          |
  datid                     | oid     |           |          |
  datname                   | name    |           |          |
  relid                     | oid     |           |          |
  phase                     | text    |           |          |
  sample_blks_total         | bigint  |           |          |
  sample_blks_scanned       | bigint  |           |          |
  ext_stats_total           | bigint  |           |          |
  ext_stats_computed        | bigint  |           |          |
  child_tables_total        | bigint  |           |          |
  child_tables_done         | bigint  |           |          |
  current_child_table_relid | oid     |           |          |
---------------------------------------------------------



<Test result using partitioning tables>
---------------------------------------------------------
# select * from pg_stat_progress_analyze ; \watch 0.0001

19309|13583|postgres|36081|acquiring inherited sample rows|0|0|0|0|0|0|0
19309|13583|postgres|36081|acquiring inherited sample rows|45|17|0|0|4|0|36084
19309|13583|postgres|36081|acquiring inherited sample rows|45|35|0|0|4|0|36084
19309|13583|postgres|36081|acquiring inherited sample rows|45|45|0|0|4|0|36084
19309|13583|postgres|36081|acquiring inherited sample rows|45|45|0|0|4|0|36084
19309|13583|postgres|36081|acquiring inherited sample rows|45|45|0|0|4|0|36084
19309|13583|postgres|36081|acquiring inherited sample rows|45|3|0|0|4|1|36087
19309|13583|postgres|36081|acquiring inherited sample rows|45|22|0|0|4|1|36087
19309|13583|postgres|36081|acquiring inherited sample rows|45|38|0|0|4|1|36087
19309|13583|postgres|36081|acquiring inherited sample rows|45|45|0|0|4|1|36087
19309|13583|postgres|36081|acquiring inherited sample rows|45|45|0|0|4|1|36087
19309|13583|postgres|36081|acquiring inherited sample rows|45|45|0|0|4|1|36087
19309|13583|postgres|36081|acquiring inherited sample rows|45|16|0|0|4|2|36090
19309|13583|postgres|36081|acquiring inherited sample rows|45|34|0|0|4|2|36090
19309|13583|postgres|36081|acquiring inherited sample rows|45|45|0|0|4|2|36090
19309|13583|postgres|36081|acquiring inherited sample rows|45|45|0|0|4|2|36090
19309|13583|postgres|36081|acquiring inherited sample rows|45|45|0|0|4|2|36090
19309|13583|postgres|36081|acquiring inherited sample rows|45|10|0|0|4|3|36093
19309|13583|postgres|36081|acquiring inherited sample rows|45|29|0|0|4|3|36093
19309|13583|postgres|36081|acquiring inherited sample rows|45|43|0|0|4|3|36093
19309|13583|postgres|36081|acquiring inherited sample rows|45|45|0|0|4|3|36093
19309|13583|postgres|36081|acquiring inherited sample rows|45|45|0|0|4|3|36093
19309|13583|postgres|36081|acquiring inherited sample rows|45|45|0|0|4|3|36093
19309|13583|postgres|36081|computing stats|45|45|0|0|4|4|36093    <== current_*_reid should be zero?
19309|13583|postgres|36081|computing stats|45|45|0|0|4|4|36093
19309|13583|postgres|36081|finalizing analyze|45|45|0|0|4|4|36093 <== there are many finalizing phases
19309|13583|postgres|36081|finalizing analyze|45|45|0|0|4|4|36093
19309|13583|postgres|36084|acquiring sample rows|45|3|0|0|0|0|0
19309|13583|postgres|36084|acquiring sample rows|45|33|0|0|0|0|0
19309|13583|postgres|36084|computing stats|45|45|0|0|0|0|0
19309|13583|postgres|36087|acquiring sample rows|45|15|0|0|0|0|0
19309|13583|postgres|36087|computing stats|45|45|0|0|0|0|0
19309|13583|postgres|36087|finalizing analyze|45|45|0|0|0|0|0     <== same as above
19309|13583|postgres|36090|acquiring sample rows|45|11|0|0|0|0|0
19309|13583|postgres|36090|acquiring sample rows|45|41|0|0|0|0|0
19309|13583|postgres|36090|finalizing analyze|45|45|0|0|0|0|0     <== same as above
19309|13583|postgres|36093|acquiring sample rows|45|7|0|0|0|0|0
19309|13583|postgres|36093|acquiring sample rows|45|37|0|0|0|0|0
19309|13583|postgres|36093|finalizing analyze|45|45|0|0|0|0|0     <== same as above
---------------------------------------------------------

Thanks,
Tatsuro Yamada

Вложения

v10-Report-progress-for-ANALYZE.patch

Re: progress report for ANALYZE

От

Amit Langote

Дата:

29 ноября 2019 г., 09:15:28

Yamada-san,

On Fri, Nov 29, 2019 at 5:45 PM Tatsuro Yamada
<tatsuro.yamada.tf@nttcom.co.jp> wrote:
> Attached patch is the revised patch. :)
>
> I wonder two things below. What do you think?
>
> 1)
> For now, I'm not sure it should be set current_child_table_relid to zero
> when the current phase is changed from "acquiring inherited sample rows" to
> "computing stats". See <Test result> bellow.

In the upthread discussion [1], Robert asked to *not* do such things,
that is, resetting some values due to phase change.  I'm not sure his
point applies to this case too though.

> 2)
> There are many "finalizing analyze" phases based on relids in the case
> of partitioning tables. Would it better to fix the document? or it
> would be better to reduce it to one?
>
> <Document>
> ---------------------------------------------------------
>       <entry><literal>finalizing analyze</literal></entry>
>       <entry>
>         The command is updating pg_class. When this phase is completed,
>         <command>ANALYZE</command> will end.
> ---------------------------------------------------------

When a partitioned table is analyzed, its partitions are analyzed too.
So, the ANALYZE command effectively runs N + 1 times if there are N
partitions -- first analyze partitioned table to collect "inherited"
statistics by collecting row samples using
acquire_inherited_sample_rows(), then each partition to collect its
own statistics.  Note that this recursive application to ANALYZE to
partitions (child tables) only occurs for partitioned tables, not for
legacy inheritance.

Thanks,
Amit

Re: progress report for ANALYZE

От

Tatsuro Yamada

Дата:

04 декабря 2019 г., 08:53:42

Hi Amit-san,

Thanks for your comments!

>> Attached patch is the revised patch. :)
>>
>> I wonder two things below. What do you think?
>>
>> 1)
>> For now, I'm not sure it should be set current_child_table_relid to zero
>> when the current phase is changed from "acquiring inherited sample rows" to
>> "computing stats". See <Test result> bellow.
> 
> In the upthread discussion [1], Robert asked to *not* do such things,
> that is, resetting some values due to phase change.  I'm not sure his
> point applies to this case too though.

Yeah, I understood.
I'll check target relid of "computing stats" to re-read a code of
analyze command later. :)

  
>> 2)
>> There are many "finalizing analyze" phases based on relids in the case
>> of partitioning tables. Would it better to fix the document? or it
>> would be better to reduce it to one?
>>
>> <Document>
>> ---------------------------------------------------------
>>        <entry><literal>finalizing analyze</literal></entry>
>>        <entry>
>>          The command is updating pg_class. When this phase is completed,
>>          <command>ANALYZE</command> will end.
>> ---------------------------------------------------------
> 
> When a partitioned table is analyzed, its partitions are analyzed too.
> So, the ANALYZE command effectively runs N + 1 times if there are N
> partitions -- first analyze partitioned table to collect "inherited"
> statistics by collecting row samples using
> acquire_inherited_sample_rows(), then each partition to collect its
> own statistics.  Note that this recursive application to ANALYZE to
> partitions (child tables) only occurs for partitioned tables, not for
> legacy inheritance.

Thanks for your explanation.
I understand Analyzing Partitioned table a little.
Below is my understanding. Is it right?

==================================================
In the case of partitioned table (N = 3)

  - Partitioned table name: p (relid is 100)
  - Partitioning table names: p1, p2, p3 (relids are 201, 202 and 203)

For now, We can get the following results by executing "analyze p;".

Num Phase                         relid current_child_table_relid
  1  acquire inherited sample rows  100   201
  2  acquire inherited sample rows  100   202
  3  acquire inherited sample rows  100   203
  4  computing stats                100   0
  5  finalizing analyze             100   0

  6  acquiring sample rows          201   0
  7  computing stats                201   0
  8  finalizing analyze             201   0

  9  acquiring sample rows          202   0
10  computing stats                202   0
11  finalizing analyze             202   0

12  acquiring sample rows          203   0
13  computing stats                203   0
14  finalizing analyze             203   0
==================================================


Thanks,
Tatsuro Yamada

Re: progress report for ANALYZE

От

Tatsuro Yamada

Дата:

06 декабря 2019 г., 06:23:58

Hi Amit-san,


>>> I wonder two things below. What do you think?
>>>
>>> 1)
>>> For now, I'm not sure it should be set current_child_table_relid to zero
>>> when the current phase is changed from "acquiring inherited sample rows" to
>>> "computing stats". See <Test result> bellow.
>>
>> In the upthread discussion [1], Robert asked to *not* do such things,
>> that is, resetting some values due to phase change.  I'm not sure his
>> point applies to this case too though.
> 
> Yeah, I understood.
> I'll check target relid of "computing stats" to re-read a code of
> analyze command later. :)


Finally, I understood after investigation of the code. :)
Call stack is the following, and analyze_rel() calls "N + 1" times
for partitioned table and each partitions.

analyze_rel start
  do_analyze_rel inh==true start
   onerel: hoge2
    acq_inh_sample_rows start
     childrel: hoge2_10000
     childrel: hoge2_20000
     childrel: hoge2_30000
     childrel: hoge2_default
    acq_inh_sample_rows end
    compute_stats start
    compute_stats end
    compute_index_stats start
    compute_index_stats end
    finalizing start
    finalizing end
  do_analyze_rel inh==true end
analyze_rel end
...


Also, I checked my test result. ("//" is my comments)


# select oid,relname,relkind from pg_class where relname like 'hoge2%';
   oid  |    relname    | relkind
-------+---------------+---------
  36081 | hoge2         | p
  36084 | hoge2_10000   | r
  36087 | hoge2_20000   | r
  36090 | hoge2_30000   | r
  36093 | hoge2_default | r
(6 rows)

# select relid,
          current_child_table_relid,
          phase,
          sample_blks_total,
          sample_blks_scanned,
          ext_stats_total,
          ext_stats_computed,
          child_tables_total,
          child_tables_done
   from pg_stat_progress_analyze; \watch 0.00001

== for partitioned table hoge2 ==
//hoge2_10000
36081|36084|acquiring inherited sample rows|45|20|0|0|4|0
36081|36084|acquiring inherited sample rows|45|42|0|0|4|0
36081|36084|acquiring inherited sample rows|45|45|0|0|4|0
36081|36084|acquiring inherited sample rows|45|45|0|0|4|0

//hoge2_20000
36081|36087|acquiring inherited sample rows|45|3|0|0|4|1
36081|36087|acquiring inherited sample rows|45|31|0|0|4|1
36081|36087|acquiring inherited sample rows|45|45|0|0|4|1
36081|36087|acquiring inherited sample rows|45|45|0|0|4|1

//hoge2_30000
36081|36090|acquiring inherited sample rows|45|12|0|0|4|2
36081|36090|acquiring inherited sample rows|45|35|0|0|4|2
36081|36090|acquiring inherited sample rows|45|45|0|0|4|2
36081|36090|acquiring inherited sample rows|45|45|0|0|4|2

//hoge2_default
36081|36093|acquiring inherited sample rows|45|18|0|0|4|3
36081|36093|acquiring inherited sample rows|45|38|0|0|4|3
36081|36093|acquiring inherited sample rows|45|45|0|0|4|3
36081|36093|acquiring inherited sample rows|45|45|0|0|4|3

//Below "computing stats" is for the partitioned table hoge,
//therefore the second column from the left side would be
//better to set Zero to easy to understand.
//I guessd that user think which relid is the target of
//"computing stats"?!
//Of course, other option is to write it on document.

36081|36093|computing stats     |45|45|0|0|4|4
36081|36093|computing stats     |45|45|0|0|4|4
36081|36093|computing stats     |45|45|0|0|4|4
36081|36093|computing stats     |45|45|0|0|4|4
36081|36093|finalizing analyze  |45|45|0|0|4|4

== for each partitions such as hoge2_10000 ... hoge2_default ==

//hoge2_10000
36084|0|acquiring sample rows   |45|25|0|0|0|0
36084|0|computing stats         |45|45|0|0|0|0
36084|0|computing extended stats|45|45|0|0|0|0
36084|0|finalizing analyze      |45|45|0|0|0|0

//hoge2_20000
36087|0|acquiring sample rows   |45|14|0|0|0|0
36087|0|computing stats         |45|45|0|0|0|0
36087|0|computing extended stats|45|45|0|0|0|0
36087|0|finalizing analyze      |45|45|0|0|0|0

//hoge2_30000
36090|0|acquiring sample rows   |45|12|0|0|0|0
36090|0|acquiring sample rows   |45|44|0|0|0|0
36090|0|computing extended stats|45|45|0|0|0|0
36090|0|finalizing analyze      |45|45|0|0|0|0

//hoge2_default
36093|0|acquiring sample rows   |45|10|0|0|0|0
36093|0|acquiring sample rows   |45|43|0|0|0|0
36093|0|computing extended stats|45|45|0|0|0|0
36093|0|finalizing analyze      |45|45|0|0|0|0



>>> 2)
>>> There are many "finalizing analyze" phases based on relids in the case
>>> of partitioning tables. Would it better to fix the document? or it
>>> would be better to reduce it to one?
>>>
>>> <Document>
>>> ---------------------------------------------------------
>>>        <entry><literal>finalizing analyze</literal></entry>
>>>        <entry>
>>>          The command is updating pg_class. When this phase is completed,
>>>          <command>ANALYZE</command> will end.
>>> ---------------------------------------------------------
>>
>> When a partitioned table is analyzed, its partitions are analyzed too.
>> So, the ANALYZE command effectively runs N + 1 times if there are N
>> partitions -- first analyze partitioned table to collect "inherited"
>> statistics by collecting row samples using
>> acquire_inherited_sample_rows(), then each partition to collect its
>> own statistics.  Note that this recursive application to ANALYZE to
>> partitions (child tables) only occurs for partitioned tables, not for
>> legacy inheritance.
> 
> Thanks for your explanation.
> I understand Analyzing Partitioned table a little.


It would be better to modify the document of "finalizing analyze" phase.

   # Before modify
    The command is updating pg_class. When this phase is completed,
    <command>ANALYZE</command> will end.

   # Modified
    The command is updating pg_class. When this phase is completed,
    <command>ANALYZE</command> will end. In the case of partitioned table,
    it might be shown on each partitions.

What do you think that? I'm going to fix it, if you agreed. :)

Thanks,
Tatsuro Yamada

Re: progress report for ANALYZE

От

Amit Langote

Дата:

09 декабря 2019 г., 08:51:24

Yamada-san,

On Fri, Dec 6, 2019 at 3:24 PM Tatsuro Yamada
<tatsuro.yamada.tf@nttcom.co.jp> wrote:
 >>> 1)
> >>> For now, I'm not sure it should be set current_child_table_relid to zero
> >>> when the current phase is changed from "acquiring inherited sample rows" to
> >>> "computing stats". See <Test result> bellow.
> >>
> >> In the upthread discussion [1], Robert asked to *not* do such things,
> >> that is, resetting some values due to phase change.  I'm not sure his
> >> point applies to this case too though.
>
> //Below "computing stats" is for the partitioned table hoge,
> //therefore the second column from the left side would be
> //better to set Zero to easy to understand.

On second thought, maybe we should not reset, because that might be
considered loss of information.  To avoid confusion, we should simply
document that current_child_table_relid is only valid during the
"acquiring inherited sample rows" phase.

> >>> 2)
> >>> There are many "finalizing analyze" phases based on relids in the case
> >>> of partitioning tables. Would it better to fix the document? or it
> >>> would be better to reduce it to one?
> >>>
> It would be better to modify the document of "finalizing analyze" phase.
>
>    # Before modify
>     The command is updating pg_class. When this phase is completed,
>     <command>ANALYZE</command> will end.
>
>    # Modified
>     The command is updating pg_class. When this phase is completed,
>     <command>ANALYZE</command> will end. In the case of partitioned table,
>     it might be shown on each partitions.
>
> What do you think that? I'm going to fix it, if you agreed. :)

*All* phases are repeated in this case, not not just "finalizing
analyze", because ANALYZE repeatedly runs for each partition after the
parent partitioned table's ANALYZE finishes.  ANALYZE's documentation
mentions that analyzing a partitioned table also analyzes all of its
partitions, so users should expect to see the progress information for
each partition.  So, I don't think we need to clarify that if only in
one phase's description.  Maybe we can add a note after the phase
description table which mentions this implementation detail about
partitioned tables.  Like this:

  <note>
   <para>
    Note that when <command>ANALYZE</command> is run on a partitioned table,
    all of its partitions are also recursively analyzed as also mentioned on
    <xref linkend="sql-analyze"/>.  In that case, <command>ANALYZE</command>
    progress is reported first for the parent table, whereby its inheritance
    statistics are collected, followed by that for each partition.
   </para>
  </note>

Some more comments on the documentation:

+       Number of computed extended stats.  This counter only advances
when the phase
+       is <literal>computing extended stats</literal>.

Number of computer extended stats -> Number of extended stats computed

+       Number of analyzed child tables.  This counter only advances
when the phase
+       is <literal>computing extended stats</literal>.

Regarding, "Number of analyzed child table", note that we don't
"analyze" child tables in this phase, only scan its blocks to collect
samples for parent's ANALYZE.  Also, the 2nd sentence is wrong -- you
meant "when the phase is <literal>acquiring inherited sample
rows</literal>.  I suggest to write this as follows:

Number of child tables scanned.  This counter only advances when the phase
is <literal>acquiring inherited sample rows</literal>.

+     <entry>OID of the child table currently being scanned.
+       It might be different from relid when analyzing tables that
have child tables.

I suggest:

OID of the child table currently being scanned.  This field is only valid when
the phase is <literal>computing extended stats</literal>.

+       The command is currently scanning the
<structfield>current_relid</structfield>
+       to obtain samples.

I suggest:

The command is currently scanning the the table given by
<structfield>current_relid</structfield> to obtain sample rows.

+       The command is currently scanning the
<structfield>current_child_table_relid</structfield>
+       to obtain samples.

I suggest (based on phase description pg_stat_progress_create_index
phase descriptions):

The command is currently scanning child tables to obtain sample rows.  Columns
<structfield>child_tables_total</structfield>,
<structfield>child_tables_done</structfield>, and
<structfield>current_child_table_relid</structfield> contain the progress
information for this phase.

+    <row>
+     <entry><literal>computing stats</literal></entry>

I think the phase name should really be "computing statistics", that
is, use the full word.

+     <entry>
+       The command is computing stats from the samples obtained
during the table scan.
+     </entry>
+    </row>

So I suggest:

The command is computing statistics from the sample rows obtained during
the table scan

+     <entry><literal>computing extended stats</literal></entry>
+     <entry>
+       The command is computing extended stats from the samples
obtained in the previous phase.
+     </entry>

I suggest:

The command is computing extended statistics from the sample rows obtained
during the table scan.

Thanks,
Amit

Re: progress report for ANALYZE

От

Tatsuro Yamada

Дата:

18 декабря 2019 г., 05:44:04

Hi Amit-san,


>   >>> 1)
>>>>> For now, I'm not sure it should be set current_child_table_relid to zero
>>>>> when the current phase is changed from "acquiring inherited sample rows" to
>>>>> "computing stats". See <Test result> bellow.
>>>>
>>>> In the upthread discussion [1], Robert asked to *not* do such things,
>>>> that is, resetting some values due to phase change.  I'm not sure his
>>>> point applies to this case too though.
>>
>> //Below "computing stats" is for the partitioned table hoge,
>> //therefore the second column from the left side would be
>> //better to set Zero to easy to understand.
> 
> On second thought, maybe we should not reset, because that might be
> considered loss of information.  To avoid confusion, we should simply
> document that current_child_table_relid is only valid during the
> "acquiring inherited sample rows" phase.


Okay, agreed. :)

  
>>>>> 2)
>>>>> There are many "finalizing analyze" phases based on relids in the case
>>>>> of partitioning tables. Would it better to fix the document? or it
>>>>> would be better to reduce it to one?
>>>>>
>> It would be better to modify the document of "finalizing analyze" phase.
>>
>>     # Before modify
>>      The command is updating pg_class. When this phase is completed,
>>      <command>ANALYZE</command> will end.
>>
>>     # Modified
>>      The command is updating pg_class. When this phase is completed,
>>      <command>ANALYZE</command> will end. In the case of partitioned table,
>>      it might be shown on each partitions.
>>
>> What do you think that? I'm going to fix it, if you agreed. :)
> 
> *All* phases are repeated in this case, not not just "finalizing
> analyze", because ANALYZE repeatedly runs for each partition after the
> parent partitioned table's ANALYZE finishes.  ANALYZE's documentation
> mentions that analyzing a partitioned table also analyzes all of its
> partitions, so users should expect to see the progress information for
> each partition.  So, I don't think we need to clarify that if only in
> one phase's description.  Maybe we can add a note after the phase
> description table which mentions this implementation detail about
> partitioned tables.  Like this:
> 
>    <note>
>     <para>
>      Note that when <command>ANALYZE</command> is run on a partitioned table,
>      all of its partitions are also recursively analyzed as also mentioned on
>      <xref linkend="sql-analyze"/>.  In that case, <command>ANALYZE</command>
>      progress is reported first for the parent table, whereby its inheritance
>      statistics are collected, followed by that for each partition.
>     </para>
>    </note>


Ah.. you are right: All phases are repeated, it shouldn't be fixed
the only one phase's description.


> Some more comments on the documentation:
> 
> +       Number of computed extended stats.  This counter only advances
> when the phase
> +       is <literal>computing extended stats</literal>.
> 
> Number of computed extended stats -> Number of extended stats computed


Will fix.

  
> +       Number of analyzed child tables.  This counter only advances
> when the phase
> +       is <literal>computing extended stats</literal>.
> 
> Regarding, "Number of analyzed child table", note that we don't
> "analyze" child tables in this phase, only scan its blocks to collect
> samples for parent's ANALYZE.  Also, the 2nd sentence is wrong -- you
> meant "when the phase is <literal>acquiring inherited sample
> rows</literal>.  I suggest to write this as follows:
> 
> Number of child tables scanned.  This counter only advances when the phase
> is <literal>acquiring inherited sample rows</literal>.


Oops, I will fix it. :)


  
> +     <entry>OID of the child table currently being scanned.
> +       It might be different from relid when analyzing tables that
> have child tables.
> 
> I suggest:
> 
> OID of the child table currently being scanned.  This field is only valid when
> the phase is <literal>computing extended stats</literal>.


Will fix.


> +       The command is currently scanning the
> <structfield>current_relid</structfield>
> +       to obtain samples.
> 
> I suggest:
> 
> The command is currently scanning the the table given by
> <structfield>current_relid</structfield> to obtain sample rows.


Will fix.

  
> +       The command is currently scanning the
> <structfield>current_child_table_relid</structfield>
> +       to obtain samples.
> 
> I suggest (based on phase description pg_stat_progress_create_index
> phase descriptions):
> 
> The command is currently scanning child tables to obtain sample rows.  Columns
> <structfield>child_tables_total</structfield>,
> <structfield>child_tables_done</structfield>, and
> <structfield>current_child_table_relid</structfield> contain the progress
> information for this phase.


Will fix.


> +    <row>
> +     <entry><literal>computing stats</literal></entry>
> 
> I think the phase name should really be "computing statistics", that
> is, use the full word.


Will fix.

  
> +     <entry>
> +       The command is computing stats from the samples obtained
> during the table scan.
> +     </entry>
> +    </row>
> 
> So I suggest:
> 
> The command is computing statistics from the sample rows obtained during
> the table scan


Will fix.

  
> +     <entry><literal>computing extended stats</literal></entry>
> +     <entry>
> +       The command is computing extended stats from the samples
> obtained in the previous phase.
> +     </entry>
> 
> I suggest:
> 
> The command is computing extended statistics from the sample rows obtained
> during the table scan.


Will fix.


Thanks for your above useful suggestions. It helps me a lot. :)


Cheers!
Tatsuro Yamada

Re: progress report for ANALYZE

От

Tatsuro Yamada

Дата:

19 декабря 2019 г., 12:06:50

Hi All,

>> *All* phases are repeated in this case, not not just "finalizing
>> analyze", because ANALYZE repeatedly runs for each partition after the
>> parent partitioned table's ANALYZE finishes.  ANALYZE's documentation
>> mentions that analyzing a partitioned table also analyzes all of its
>> partitions, so users should expect to see the progress information for
>> each partition.  So, I don't think we need to clarify that if only in
>> one phase's description.  Maybe we can add a note after the phase
>> description table which mentions this implementation detail about
>> partitioned tables.  Like this:
>>
>>    <note>
>>     <para>
>>      Note that when <command>ANALYZE</command> is run on a partitioned table,
>>      all of its partitions are also recursively analyzed as also mentioned on
>>      <xref linkend="sql-analyze"/>.  In that case, <command>ANALYZE</command>
>>      progress is reported first for the parent table, whereby its inheritance
>>      statistics are collected, followed by that for each partition.
>>     </para>
>>    </note>
> 
> 
> Ah.. you are right: All phases are repeated, it shouldn't be fixed
> the only one phase's description.
> 
> 
>> Some more comments on the documentation:
>>
>> +       Number of computed extended stats.  This counter only advances
>> when the phase
>> +       is <literal>computing extended stats</literal>.
>>
>> Number of computed extended stats -> Number of extended stats computed
> 
> 
> Will fix.
> 
> 
>> +       Number of analyzed child tables.  This counter only advances
>> when the phase
>> +       is <literal>computing extended stats</literal>.
>>
>> Regarding, "Number of analyzed child table", note that we don't
>> "analyze" child tables in this phase, only scan its blocks to collect
>> samples for parent's ANALYZE.  Also, the 2nd sentence is wrong -- you
>> meant "when the phase is <literal>acquiring inherited sample
>> rows</literal>.  I suggest to write this as follows:
>>
>> Number of child tables scanned.  This counter only advances when the phase
>> is <literal>acquiring inherited sample rows</literal>.
> 
> 
> Oops, I will fix it. :)
> 
> 
> 
>> +     <entry>OID of the child table currently being scanned.
>> +       It might be different from relid when analyzing tables that
>> have child tables.
>>
>> I suggest:
>>
>> OID of the child table currently being scanned.  This field is only valid when
>> the phase is <literal>computing extended stats</literal>.
> 
> 
> Will fix.
> 
> 
>> +       The command is currently scanning the
>> <structfield>current_relid</structfield>
>> +       to obtain samples.
>>
>> I suggest:
>>
>> The command is currently scanning the the table given by
>> <structfield>current_relid</structfield> to obtain sample rows.
> 
> 
> Will fix.
> 
> 
>> +       The command is currently scanning the
>> <structfield>current_child_table_relid</structfield>
>> +       to obtain samples.
>>
>> I suggest (based on phase description pg_stat_progress_create_index
>> phase descriptions):
>>
>> The command is currently scanning child tables to obtain sample rows.  Columns
>> <structfield>child_tables_total</structfield>,
>> <structfield>child_tables_done</structfield>, and
>> <structfield>current_child_table_relid</structfield> contain the progress
>> information for this phase.
> 
> 
> Will fix.
> 
> 
>> +    <row>
>> +     <entry><literal>computing stats</literal></entry>
>>
>> I think the phase name should really be "computing statistics", that
>> is, use the full word.
> 
> 
> Will fix.
> 
> 
>> +     <entry>
>> +       The command is computing stats from the samples obtained
>> during the table scan.
>> +     </entry>
>> +    </row>
>>
>> So I suggest:
>>
>> The command is computing statistics from the sample rows obtained during
>> the table scan
> 
> 
> Will fix.
> 
> 
>> +     <entry><literal>computing extended stats</literal></entry>
>> +     <entry>
>> +       The command is computing extended stats from the samples
>> obtained in the previous phase.
>> +     </entry>
>>
>> I suggest:
>>
>> The command is computing extended statistics from the sample rows obtained
>> during the table scan.
> 
> 
> Will fix.


I fixed the document based on Amit's comments. :)
Please find attached file.


Thanks,
Tatsuro Yamadas

Вложения

v11-Report-progress-for-ANALYZE.patch

Re: progress report for ANALYZE

От

Alvaro Herrera

Дата:

15 января 2020 г., 17:11:10

I just pushed this after some small extra tweaks.

Thanks, Yamada-san, for seeing this to completion!

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: progress report for ANALYZE

От

Justin Pryzby

Дата:

16 января 2020 г., 15:19:31

On Wed, Jan 15, 2020 at 02:11:10PM -0300, Alvaro Herrera wrote:
> I just pushed this after some small extra tweaks.
> 
> Thanks, Yamada-san, for seeing this to completion!

Find attached minor fixes to docs - sorry I didn't look earlier.

Possibly you'd also want to change the other existing instances of "preparing
to begin".

Вложения

v1-0001-Doc-review-for-ANALYZE-progress-a166d408.patch

Re: progress report for ANALYZE

От

Tatsuro Yamada

Дата:

22 января 2020 г., 01:46:11

Hi Alvaro and All reviewers,

On 2020/01/16 2:11, Alvaro Herrera wrote:
> I just pushed this after some small extra tweaks.
> 
> Thanks, Yamada-san, for seeing this to completion!

Thanks for reviewing and committing the patch!
Hope this helps DBA. :-D

P.S.
Next up is progress reporting for query execution?!
To create it, I guess that it needs to improve
progress reporting infrastructure.

Thanks,
Tatsuro Yamada

Re: progress report for ANALYZE

От

Amit Langote

Дата:

22 января 2020 г., 05:52:39

On Fri, Jan 17, 2020 at 12:19 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> On Wed, Jan 15, 2020 at 02:11:10PM -0300, Alvaro Herrera wrote:
> > I just pushed this after some small extra tweaks.
> >
> > Thanks, Yamada-san, for seeing this to completion!
>
> Find attached minor fixes to docs - sorry I didn't look earlier.
>
> Possibly you'd also want to change the other existing instances of "preparing
> to begin".

Spotted a few other issues with the docs:

+       Number of computed extended statistics computed.

Should be: "Number of extended statistics computed."

+     <entry>OID of the child table currently being scanned. This
field is only valid when
+       the phase is <literal>computing extended statistics</literal>.

Should be: "This field is only valid when the phase is
<literal>acquiring inherited sample rows</literal>."

+       durring the table scan.

during

Attached a patch.

Thanks,
Amit

Re: progress report for ANALYZE

От

Amit Langote

Дата:

22 января 2020 г., 06:06:52

On Wed, Jan 22, 2020 at 2:52 PM Amit Langote <amitlangote09@gmail.com> wrote:
>
> On Fri, Jan 17, 2020 at 12:19 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> >
> > On Wed, Jan 15, 2020 at 02:11:10PM -0300, Alvaro Herrera wrote:
> > > I just pushed this after some small extra tweaks.
> > >
> > > Thanks, Yamada-san, for seeing this to completion!
> >
> > Find attached minor fixes to docs - sorry I didn't look earlier.
> >
> > Possibly you'd also want to change the other existing instances of "preparing
> > to begin".
>
> Spotted a few other issues with the docs:
>
> +       Number of computed extended statistics computed.
>
> Should be: "Number of extended statistics computed."
>
> +     <entry>OID of the child table currently being scanned. This
> field is only valid when
> +       the phase is <literal>computing extended statistics</literal>.
>
> Should be: "This field is only valid when the phase is
> <literal>acquiring inherited sample rows</literal>."
>
> +       durring the table scan.
>
> during
>
> Attached a patch.

Oops, really attached this time.

Thanks,
Amit

Вложения

a166d408eb0-doc-fixes.patch

Re: progress report for ANALYZE

От

Michael Paquier

Дата:

23 января 2020 г., 08:20:46

On Wed, Jan 22, 2020 at 03:06:52PM +0900, Amit Langote wrote:
> Oops, really attached this time.

Thanks, applied.  There were clearly two grammar mistakes in the first
patch sent by Justin.  And your suggestions look fine to me.  On top
of that, I have noticed that the indentation of the two tables in the
docs was rather inconsistent.
--
Michael

Вложения

signature.asc

Re: progress report for ANALYZE

От

Alvaro Herrera

Дата:

23 января 2020 г., 21:47:43

On 2020-Jan-22, Tatsuro Yamada wrote:

> Thanks for reviewing and committing the patch!
> Hope this helps DBA. :-D

I'm sure it does!

> P.S.
> Next up is progress reporting for query execution?!

Actually, I think it's ALTER TABLE.

Also, things like VACUUM could report the progress of the index being
processed ...

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: progress report for ANALYZE

От

Amit Langote

Дата:

24 января 2020 г., 14:44:34

On Fri, Jan 24, 2020 at 6:47 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> On 2020-Jan-22, Tatsuro Yamada wrote:
> > P.S.
> > Next up is progress reporting for query execution?!
>
> Actually, I think it's ALTER TABLE.

+1.  Existing infrastructure might be enough to cover ALTER TABLE's
needs, whereas we will very likely need to build entirely different
infrastructure for tracking the progress for SQL query execution.

Thanks,
Amit

Re: progress report for ANALYZE

От

Tatsuro Yamada

Дата:

27 января 2020 г., 10:16:25

Hi,

On 2020/01/24 23:44, Amit Langote wrote:
> On Fri, Jan 24, 2020 at 6:47 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>> On 2020-Jan-22, Tatsuro Yamada wrote:
>>> P.S.
>>> Next up is progress reporting for query execution?!
>>
>> Actually, I think it's ALTER TABLE.
> 
> +1.  Existing infrastructure might be enough to cover ALTER TABLE's
> needs, whereas we will very likely need to build entirely different
> infrastructure for tracking the progress for SQL query execution.

Yeah, I agree.
I will create a little POC patch after reading tablecmds.c and alter.c to
investigate how to report progress. :)

Regards,
Tatsuro Yamada

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: progress report for ANALYZE

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения