Обсуждение: adding status for COPY progress report

Поиск
Список
Период
Сортировка

adding status for COPY progress report

От
Zhihong Yu
Дата:
Hi,
Please see attached for enhancement to COPY command progress.

The added status column would allow users to get the status of the most recent COPY command.

Below is sample output.

Thanks

yugabyte=# SELECT relid::regclass, command, status,
yugabyte-#           type, bytes_processed, bytes_total,
yugabyte-#           tuples_processed, tuples_excluded FROM pg_stat_progress_copy;  relid   |  command  | status | type | bytes_processed | bytes_total | tuples_processed | tuples_excluded
----------+-----------+--------+------+-----------------+-------------+------------------+----------------- copy_tab | COPY FROM | PASS   | FILE |             152 |         152 |               12 |               0
(1 row)
Вложения

Re: adding status for COPY progress report

От
Zhihong Yu
Дата:
Hi,
Here is the updated patch.

Cheers

On Tue, May 24, 2022 at 10:18 AM Zhihong Yu <zyu@yugabyte.com> wrote:
Hi,
Please see attached for enhancement to COPY command progress.

The added status column would allow users to get the status of the most recent COPY command.

Below is sample output.

Thanks

yugabyte=# SELECT relid::regclass, command, status,
yugabyte-#           type, bytes_processed, bytes_total,
yugabyte-#           tuples_processed, tuples_excluded FROM pg_stat_progress_copy;  relid   |  command  | status | type | bytes_processed | bytes_total | tuples_processed | tuples_excluded
----------+-----------+--------+------+-----------------+-------------+------------------+----------------- copy_tab | COPY FROM | PASS   | FILE |             152 |         152 |               12 |               0
(1 row)
Вложения

Re: adding status for COPY progress report

От
Matthias van de Meent
Дата:
On Tue, 24 May 2022 at 19:13, Zhihong Yu <zyu@yugabyte.com> wrote:
>
> Hi,
> Please see attached for enhancement to COPY command progress.
>
> The added status column would allow users to get the status of the most recent COPY command.

I fail to see the merit of retaining completed progress reporting
commands in their views after completion, other than making the
behaviour of the pg_stat_progress-views much more complicated and
adding overhead in places where we want the system to have as little
overhead as possible.

Trying to get the status of a COPY command after it finished on a
different connection seems weird, as that other connection is likely
to have already disconnected / started another task. To be certain
that a backend can see the return status of the COPY command, you'd
have to be certain that the connection doesn't run any other
_progress-able commands in the following seconds / minutes, which
implies control over the connection, which means you already have
access to the resulting status of your COPY command.

Regarding the patch: I really do not like that this leaks entries into
all _progress views: I get garbage data from e.g. the _create_index
and _copy views when VACUUM is running, etc, because you removed the
filter on cmdtype.
Also, the added fields in CopyToStateData / CopyFromStateData seem
useless when a pgstat_progress_update_param in the right place should
suffice.

Kind regards,

Matthias van de Meent



Re: adding status for COPY progress report

От
Zhihong Yu
Дата:

On Tue, May 24, 2022 at 12:37 PM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote:
On Tue, 24 May 2022 at 19:13, Zhihong Yu <zyu@yugabyte.com> wrote:
>
> Hi,
> Please see attached for enhancement to COPY command progress.
>
> The added status column would allow users to get the status of the most recent COPY command.

I fail to see the merit of retaining completed progress reporting
commands in their views after completion, other than making the
behaviour of the pg_stat_progress-views much more complicated and
adding overhead in places where we want the system to have as little
overhead as possible.

Trying to get the status of a COPY command after it finished on a
different connection seems weird, as that other connection is likely
to have already disconnected / started another task. To be certain
that a backend can see the return status of the COPY command, you'd
have to be certain that the connection doesn't run any other
_progress-able commands in the following seconds / minutes, which
implies control over the connection, which means you already have
access to the resulting status of your COPY command.

Regarding the patch: I really do not like that this leaks entries into
all _progress views: I get garbage data from e.g. the _create_index
and _copy views when VACUUM is running, etc, because you removed the
filter on cmdtype.
Also, the added fields in CopyToStateData / CopyFromStateData seem
useless when a pgstat_progress_update_param in the right place should
suffice.

Kind regards,

Matthias van de Meent
Hi,
For #2 above, can you let me know where the pgstat_progress_update_param() call(s) should be added ?

In my patch, pgstat_progress_update_param() is called from error callback and EndCopy().

For #1, if I use param18 (which is not used by other views), would that be better ?

Thanks

Re: adding status for COPY progress report

От
Matthias van de Meent
Дата:
On Tue, 24 May 2022 at 22:12, Zhihong Yu <zyu@yugabyte.com> wrote:
>
> On Tue, May 24, 2022 at 12:37 PM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote:
>>
>> On Tue, 24 May 2022 at 19:13, Zhihong Yu <zyu@yugabyte.com> wrote:
>> >
>> > Hi,
>> > Please see attached for enhancement to COPY command progress.
>> >
>> > The added status column would allow users to get the status of the most recent COPY command.
>>
>> I fail to see the merit of retaining completed progress reporting
>> commands in their views after completion, other than making the
>> behaviour of the pg_stat_progress-views much more complicated and
>> adding overhead in places where we want the system to have as little
>> overhead as possible.
>>
>> Trying to get the status of a COPY command after it finished on a
>> different connection seems weird, as that other connection is likely
>> to have already disconnected / started another task. To be certain
>> that a backend can see the return status of the COPY command, you'd
>> have to be certain that the connection doesn't run any other
>> _progress-able commands in the following seconds / minutes, which
>> implies control over the connection, which means you already have
>> access to the resulting status of your COPY command.
>>
>> Regarding the patch: I really do not like that this leaks entries into
>> all _progress views: I get garbage data from e.g. the _create_index
>> and _copy views when VACUUM is running, etc, because you removed the
>> filter on cmdtype.
>> Also, the added fields in CopyToStateData / CopyFromStateData seem
>> useless when a pgstat_progress_update_param in the right place should
>> suffice.
>>
>> Kind regards,
>>
>> Matthias van de Meent
>
> Hi,
> For #2 above, can you let me know where the pgstat_progress_update_param() call(s) should be added ?
> In my patch, pgstat_progress_update_param() is called from error callback and EndCopy().

In the places that the patch currently sets cstate->status it could
instead directly call pgstat_progress_update_param(..., STATUS_VALUE).
I'm fairly certain that EndCopy is not called when the error callback
is called, so status reporting should not be overwritten when
unconditionally setting the status to OK in EndCopy.

> For #1, if I use param18 (which is not used by other views), would that be better ?

No:

         /*
-         * Report values for only those backends which are running the given
-         * command.
+         * Report values for only those backends which are running or have run.
          */
-        if (!beentry || beentry->st_progress_command != cmdtype)
+        if (!beentry || beentry->st_progress_command_target == InvalidOid)
             continue;

This change removes the filter that ensures that we only return the
backends which have a st_progress_command of the correct cmdtype (i.e.
for _progress_copy only those that have st_progress_command ==
PROGRESS_COMMAND_COPY. Without that filter, you'll return all backends
that have (or have had) their progress fields set at any point. Which
means that the expectation of "the backends returned by
pg_stat_get_progress_info are those running the requested command"
will be incorrect - you'll violate the contract / documented behaviour
of the function: "Returns command progress information for the named
command.".

The numerical index of the column thus doesn't matter, what matters is
that you want special behaviour for only the COPY progress reporting
that doesn't fit with the rest of the progress-reporting
infrastructure, and that the patch as-is breaks all progress reporting
views.

Sidenote: The change is also invalid because the rows that we expect
to return for pg_stat_progress_basebackup always have
st_progress_command_target == InvalidOid, so the backends running
BASE_BACKUP would never be visible with the change as-is. COPY (SELECT
stuff) also would not show up, because that too reports a
command_target of InvalidOid.

Either way, I don't think that this idea is worth pursuing: the
progress views are explicitly there to show the progress of currently
active backends, and not to show the last progress state of backends
that at some point ran a progress-reporting-enabled command.

Kind regards,

Matthias van de Meent



Re: adding status for COPY progress report

От
Zhihong Yu
Дата:


On Tue, May 24, 2022 at 2:12 PM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote:
On Tue, 24 May 2022 at 22:12, Zhihong Yu <zyu@yugabyte.com> wrote:
>
> On Tue, May 24, 2022 at 12:37 PM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote:
>>
>> On Tue, 24 May 2022 at 19:13, Zhihong Yu <zyu@yugabyte.com> wrote:
>> >
>> > Hi,
>> > Please see attached for enhancement to COPY command progress.
>> >
>> > The added status column would allow users to get the status of the most recent COPY command.
>>
>> I fail to see the merit of retaining completed progress reporting
>> commands in their views after completion, other than making the
>> behaviour of the pg_stat_progress-views much more complicated and
>> adding overhead in places where we want the system to have as little
>> overhead as possible.
>>
>> Trying to get the status of a COPY command after it finished on a
>> different connection seems weird, as that other connection is likely
>> to have already disconnected / started another task. To be certain
>> that a backend can see the return status of the COPY command, you'd
>> have to be certain that the connection doesn't run any other
>> _progress-able commands in the following seconds / minutes, which
>> implies control over the connection, which means you already have
>> access to the resulting status of your COPY command.
>>
>> Regarding the patch: I really do not like that this leaks entries into
>> all _progress views: I get garbage data from e.g. the _create_index
>> and _copy views when VACUUM is running, etc, because you removed the
>> filter on cmdtype.
>> Also, the added fields in CopyToStateData / CopyFromStateData seem
>> useless when a pgstat_progress_update_param in the right place should
>> suffice.
>>
>> Kind regards,
>>
>> Matthias van de Meent
>
> Hi,
> For #2 above, can you let me know where the pgstat_progress_update_param() call(s) should be added ?
> In my patch, pgstat_progress_update_param() is called from error callback and EndCopy().

In the places that the patch currently sets cstate->status it could
instead directly call pgstat_progress_update_param(..., STATUS_VALUE).
I'm fairly certain that EndCopy is not called when the error callback
is called, so status reporting should not be overwritten when
unconditionally setting the status to OK in EndCopy.

> For #1, if I use param18 (which is not used by other views), would that be better ?

No:

         /*
-         * Report values for only those backends which are running the given
-         * command.
+         * Report values for only those backends which are running or have run.
          */
-        if (!beentry || beentry->st_progress_command != cmdtype)
+        if (!beentry || beentry->st_progress_command_target == InvalidOid)
             continue;

This change removes the filter that ensures that we only return the
backends which have a st_progress_command of the correct cmdtype (i.e.
for _progress_copy only those that have st_progress_command ==
PROGRESS_COMMAND_COPY. Without that filter, you'll return all backends
that have (or have had) their progress fields set at any point. Which
means that the expectation of "the backends returned by
pg_stat_get_progress_info are those running the requested command"
will be incorrect - you'll violate the contract / documented behaviour
of the function: "Returns command progress information for the named
command.".

The numerical index of the column thus doesn't matter, what matters is
that you want special behaviour for only the COPY progress reporting
that doesn't fit with the rest of the progress-reporting
infrastructure, and that the patch as-is breaks all progress reporting
views.

Sidenote: The change is also invalid because the rows that we expect
to return for pg_stat_progress_basebackup always have
st_progress_command_target == InvalidOid, so the backends running
BASE_BACKUP would never be visible with the change as-is. COPY (SELECT
stuff) also would not show up, because that too reports a
command_target of InvalidOid.

Either way, I don't think that this idea is worth pursuing: the
progress views are explicitly there to show the progress of currently
active backends, and not to show the last progress state of backends
that at some point ran a progress-reporting-enabled command.

Kind regards,

Matthias van de Meent
Hi,
Thanks for the comment.
w.r.t. `Returns command progress information for the named command`,
how about introducing PROGRESS_COMMAND_COPY_DONE which signifies that PROGRESS_COMMAND_COPY was the previous command ?

In pgstat_progress_end_command(): 

    if (beentry->st_progress_command == PROGRESS_COMMAND_COPY)
        beentry->st_progress_command = PROGRESS_COMMAND_COPY_DONE;
    else
        beentry->st_progress_command = PROGRESS_COMMAND_INVALID;
    beentry->st_progress_command_target = InvalidOid;

In pg_stat_get_progress_info():

        if (!beentry || (beentry->st_progress_command != cmdtype &&
                (cmdtype == PROGRESS_COMMAND_COPY
                && beentry->st_progress_command != PROGRESS_COMMAND_COPY_DONE)))
            continue;

Cheers

Re: adding status for COPY progress report

От
Zhihong Yu
Дата:


On Tue, May 24, 2022 at 3:02 PM Zhihong Yu <zyu@yugabyte.com> wrote:


On Tue, May 24, 2022 at 2:12 PM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote:
On Tue, 24 May 2022 at 22:12, Zhihong Yu <zyu@yugabyte.com> wrote:
>
> On Tue, May 24, 2022 at 12:37 PM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote:
>>
>> On Tue, 24 May 2022 at 19:13, Zhihong Yu <zyu@yugabyte.com> wrote:
>> >
>> > Hi,
>> > Please see attached for enhancement to COPY command progress.
>> >
>> > The added status column would allow users to get the status of the most recent COPY command.
>>
>> I fail to see the merit of retaining completed progress reporting
>> commands in their views after completion, other than making the
>> behaviour of the pg_stat_progress-views much more complicated and
>> adding overhead in places where we want the system to have as little
>> overhead as possible.
>>
>> Trying to get the status of a COPY command after it finished on a
>> different connection seems weird, as that other connection is likely
>> to have already disconnected / started another task. To be certain
>> that a backend can see the return status of the COPY command, you'd
>> have to be certain that the connection doesn't run any other
>> _progress-able commands in the following seconds / minutes, which
>> implies control over the connection, which means you already have
>> access to the resulting status of your COPY command.
>>
>> Regarding the patch: I really do not like that this leaks entries into
>> all _progress views: I get garbage data from e.g. the _create_index
>> and _copy views when VACUUM is running, etc, because you removed the
>> filter on cmdtype.
>> Also, the added fields in CopyToStateData / CopyFromStateData seem
>> useless when a pgstat_progress_update_param in the right place should
>> suffice.
>>
>> Kind regards,
>>
>> Matthias van de Meent
>
> Hi,
> For #2 above, can you let me know where the pgstat_progress_update_param() call(s) should be added ?
> In my patch, pgstat_progress_update_param() is called from error callback and EndCopy().

In the places that the patch currently sets cstate->status it could
instead directly call pgstat_progress_update_param(..., STATUS_VALUE).
I'm fairly certain that EndCopy is not called when the error callback
is called, so status reporting should not be overwritten when
unconditionally setting the status to OK in EndCopy.

> For #1, if I use param18 (which is not used by other views), would that be better ?

No:

         /*
-         * Report values for only those backends which are running the given
-         * command.
+         * Report values for only those backends which are running or have run.
          */
-        if (!beentry || beentry->st_progress_command != cmdtype)
+        if (!beentry || beentry->st_progress_command_target == InvalidOid)
             continue;

This change removes the filter that ensures that we only return the
backends which have a st_progress_command of the correct cmdtype (i.e.
for _progress_copy only those that have st_progress_command ==
PROGRESS_COMMAND_COPY. Without that filter, you'll return all backends
that have (or have had) their progress fields set at any point. Which
means that the expectation of "the backends returned by
pg_stat_get_progress_info are those running the requested command"
will be incorrect - you'll violate the contract / documented behaviour
of the function: "Returns command progress information for the named
command.".

The numerical index of the column thus doesn't matter, what matters is
that you want special behaviour for only the COPY progress reporting
that doesn't fit with the rest of the progress-reporting
infrastructure, and that the patch as-is breaks all progress reporting
views.

Sidenote: The change is also invalid because the rows that we expect
to return for pg_stat_progress_basebackup always have
st_progress_command_target == InvalidOid, so the backends running
BASE_BACKUP would never be visible with the change as-is. COPY (SELECT
stuff) also would not show up, because that too reports a
command_target of InvalidOid.

Either way, I don't think that this idea is worth pursuing: the
progress views are explicitly there to show the progress of currently
active backends, and not to show the last progress state of backends
that at some point ran a progress-reporting-enabled command.

Kind regards,

Matthias van de Meent
Hi,
Thanks for the comment.
w.r.t. `Returns command progress information for the named command`,
how about introducing PROGRESS_COMMAND_COPY_DONE which signifies that PROGRESS_COMMAND_COPY was the previous command ?

In pgstat_progress_end_command(): 

    if (beentry->st_progress_command == PROGRESS_COMMAND_COPY)
        beentry->st_progress_command = PROGRESS_COMMAND_COPY_DONE;
    else
        beentry->st_progress_command = PROGRESS_COMMAND_INVALID;
    beentry->st_progress_command_target = InvalidOid;

In pg_stat_get_progress_info():

        if (!beentry || (beentry->st_progress_command != cmdtype &&
                (cmdtype == PROGRESS_COMMAND_COPY
                && beentry->st_progress_command != PROGRESS_COMMAND_COPY_DONE)))
            continue;

Cheers
Hi,
Patch v3 follows advice from Matthias (status field has been dropped).

Thanks 
Вложения

Re: adding status for COPY progress report

От
Matthias van de Meent
Дата:
On Wed, 25 May 2022 at 10:15, Zhihong Yu <zyu@yugabyte.com> wrote:
>
> Hi,
> Patch v3 follows advice from Matthias (status field has been dropped).

Could you argue why you think that this should be added to the
pg_stat_progress_copy view? Again, the progress reporting subsystem is
built to "report the progress of certain commands during command
execution". Why do you think we need to go further than that and allow
some commands to retain their report even after they've finished
executing?

Of note: The contents of >st_progress_param are only defined and
guaranteed to be consistent when the reporting command is running.
Even if no other progress-reporting command is running other commands
or processes in that backend may call functions that update the fields
with somewhat arbitrary values when no progress-reporting command is
actively running, thus corrupting the information for the progress
reporting view.

Could you please provide some insights on why you think that we should
change the progress reporting guts to accomodate something that it was
not built for?


Kind regards,

Matthias van de Meent



Re: adding status for COPY progress report

От
Zhihong Yu
Дата:


On Wed, May 25, 2022 at 3:55 AM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote:
On Wed, 25 May 2022 at 10:15, Zhihong Yu <zyu@yugabyte.com> wrote:
>
> Hi,
> Patch v3 follows advice from Matthias (status field has been dropped).

Could you argue why you think that this should be added to the
pg_stat_progress_copy view? Again, the progress reporting subsystem is
built to "report the progress of certain commands during command
execution". Why do you think we need to go further than that and allow
some commands to retain their report even after they've finished
executing?

Of note: The contents of >st_progress_param are only defined and
guaranteed to be consistent when the reporting command is running.
Even if no other progress-reporting command is running other commands
or processes in that backend may call functions that update the fields
with somewhat arbitrary values when no progress-reporting command is
actively running, thus corrupting the information for the progress
reporting view.

Could you please provide some insights on why you think that we should
change the progress reporting guts to accomodate something that it was
not built for?


Kind regards,

Matthias van de Meent
Hi, Matthias:
When I first followed the procedure in https://paquier.xyz/postgresql-2/postgres-14-monitoring-copy/ , I didn't see the output from the view.
This was because the example used 10 rows where the COPY command finishes quickly.
I had to increase the row count in order to see output from the system view.

With my patch, the user would be able to see the result of COPY command even if the duration for command execution is very short.

I made a slight change in patch v4. With patch v3, we would see the following:

   relid |  command  | status_yb | type | bytes_processed | bytes_total | tuples_processed | tuples_excluded
  -------+-----------+-----------+------+-----------------+-------------+------------------+-----------------
   -     | COPY FROM | PASS      | PIPE |               6 |           0 |                1 |               0

It would be desirable to see the relation for the COPY command.

With the updated patch, I think the interference from other commands in progress reporting has been prevented (see logic inside pg_stat_get_progress_info).

Cheers
Вложения

Re: adding status for COPY progress report

От
Matthias van de Meent
Дата:
On Wed, 25 May 2022 at 16:32, Zhihong Yu <zyu@yugabyte.com> wrote:
>
> On Wed, May 25, 2022 at 3:55 AM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote:
>>
>> On Wed, 25 May 2022 at 10:15, Zhihong Yu <zyu@yugabyte.com> wrote:
>> >
>> > Hi,
>> > Patch v3 follows advice from Matthias (status field has been dropped).
>>
>> Could you argue why you think that this should be added to the
>> pg_stat_progress_copy view? Again, the progress reporting subsystem is
>> built to "report the progress of certain commands during command
>> execution". Why do you think we need to go further than that and allow
>> some commands to retain their report even after they've finished
>> executing?
>>
>> Of note: The contents of >st_progress_param are only defined and
>> guaranteed to be consistent when the reporting command is running.
>> Even if no other progress-reporting command is running other commands
>> or processes in that backend may call functions that update the fields
>> with somewhat arbitrary values when no progress-reporting command is
>> actively running, thus corrupting the information for the progress
>> reporting view.
>>
>> Could you please provide some insights on why you think that we should
>> change the progress reporting guts to accomodate something that it was
>> not built for?
>>
>>
>> Kind regards,
>>
>> Matthias van de Meent
>
> Hi, Matthias:
> When I first followed the procedure in https://paquier.xyz/postgresql-2/postgres-14-monitoring-copy/ , I didn't see
theoutput from the view.
 
> This was because the example used 10 rows where the COPY command finishes quickly.
> I had to increase the row count in order to see output from the system view.
>
> With my patch, the user would be able to see the result of COPY command even if the duration for command execution is
veryshort.
 

I see that that indeed now happens, but the point of the _progress
-views is that they show progress on tasks that are expected to take a
very long time while the connection that initiated the task does not
receive any updates. Good examples being REINDEX and CLUSTER, that
need to process tables of data (potentially terabytes in size) without
completing or sending meaningful data to the client. To show that
there is progress for such long-running tasks the pgstat_progress
subsystem was developed so that some long-running tasks now would show
their (lack of) progress.

The patch you sent, however, is not expected to be updated with
progress of the command: it is the final state of the command that
won't change. In my view, a backend that finished it's command
shouldn't be shown in pg_stat_progress -views.

Kind regards,

Matthias van de Meent.



Re: adding status for COPY progress report

От
Zhihong Yu
Дата:


On Wed, May 25, 2022 at 8:20 AM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote:
On Wed, 25 May 2022 at 16:32, Zhihong Yu <zyu@yugabyte.com> wrote:
>
> On Wed, May 25, 2022 at 3:55 AM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote:
>>
>> On Wed, 25 May 2022 at 10:15, Zhihong Yu <zyu@yugabyte.com> wrote:
>> >
>> > Hi,
>> > Patch v3 follows advice from Matthias (status field has been dropped).
>>
>> Could you argue why you think that this should be added to the
>> pg_stat_progress_copy view? Again, the progress reporting subsystem is
>> built to "report the progress of certain commands during command
>> execution". Why do you think we need to go further than that and allow
>> some commands to retain their report even after they've finished
>> executing?
>>
>> Of note: The contents of >st_progress_param are only defined and
>> guaranteed to be consistent when the reporting command is running.
>> Even if no other progress-reporting command is running other commands
>> or processes in that backend may call functions that update the fields
>> with somewhat arbitrary values when no progress-reporting command is
>> actively running, thus corrupting the information for the progress
>> reporting view.
>>
>> Could you please provide some insights on why you think that we should
>> change the progress reporting guts to accomodate something that it was
>> not built for?
>>
>>
>> Kind regards,
>>
>> Matthias van de Meent
>
> Hi, Matthias:
> When I first followed the procedure in https://paquier.xyz/postgresql-2/postgres-14-monitoring-copy/ , I didn't see the output from the view.
> This was because the example used 10 rows where the COPY command finishes quickly.
> I had to increase the row count in order to see output from the system view.
>
> With my patch, the user would be able to see the result of COPY command even if the duration for command execution is very short.

I see that that indeed now happens, but the point of the _progress
-views is that they show progress on tasks that are expected to take a
very long time while the connection that initiated the task does not
receive any updates. Good examples being REINDEX and CLUSTER, that
need to process tables of data (potentially terabytes in size) without
completing or sending meaningful data to the client. To show that
there is progress for such long-running tasks the pgstat_progress
subsystem was developed so that some long-running tasks now would show
their (lack of) progress.

The patch you sent, however, is not expected to be updated with
progress of the command: it is the final state of the command that
won't change. In my view, a backend that finished it's command
shouldn't be shown in pg_stat_progress -views.

Kind regards,

Matthias van de Meent.
 Hi, Matthias:
Thanks for taking time to evaluate my patch.

I understand that pg_stat_progress views should show progress for on-going operation.

Let's look at the sequences of user activity for long running COPY command.
The user would likely issue queries to pg_stat_progress_copy over time.
Let's say on Nth invocation, the user sees X tuples copied.
On (N+1)st invocation, the view returns nothing.
The user knows that the COPY may have completed - but did the operation succeed or end up with some error ?

I would think that the user should be allowed to know the answer to the above question using the same query to pg_stat_progress_copy view.

What do you think ?

Cheers

Re: adding status for COPY progress report

От
Michael Paquier
Дата:
On Wed, May 25, 2022 at 09:34:51AM -0700, Zhihong Yu wrote:
> Let's look at the sequences of user activity for long running COPY command.
> The user would likely issue queries to pg_stat_progress_copy over time.
> Let's say on Nth invocation, the user sees X tuples copied.
> On (N+1)st invocation, the view returns nothing.
> The user knows that the COPY may have completed - but did the operation
> succeed or end up with some error ?

If I am following this thread correctly and after reading the patch,
that's what the status code of the connection issuing the command is
here for.  You have no guarantee either that the status you are trying
to store in the progress view is not going to be quickly overwritten
by a follow-up command, making the window where this information is
available very small in most cases, limiting its value.  The window
gets even smaller if the connection that failed the COPY is used in a
connection pooler by a different command.

The changes in pgstat_progress_end_command() and
pg_stat_get_progress_info() update st_progress_command_target
depending on the command type involved, breaking the existing contract
of  those routines, particularly the fact that the progress fields
*should* be reset in an error stack.
--
Michael

Вложения

Re: adding status for COPY progress report

От
Zhihong Yu
Дата:


On Wed, May 25, 2022 at 5:51 PM Michael Paquier <michael@paquier.xyz> wrote:
On Wed, May 25, 2022 at 09:34:51AM -0700, Zhihong Yu wrote:
> Let's look at the sequences of user activity for long running COPY command.
> The user would likely issue queries to pg_stat_progress_copy over time.
> Let's say on Nth invocation, the user sees X tuples copied.
> On (N+1)st invocation, the view returns nothing.
> The user knows that the COPY may have completed - but did the operation
> succeed or end up with some error ?

If I am following this thread correctly and after reading the patch,
that's what the status code of the connection issuing the command is
here for.  You have no guarantee either that the status you are trying
to store in the progress view is not going to be quickly overwritten
by a follow-up command, making the window where this information is
available very small in most cases, limiting its value.  The window
gets even smaller if the connection that failed the COPY is used in a
connection pooler by a different command.

Hi,
I tend to think that in case of failed COPY command, the user would spend some time trying to find out 
why the COPY command failed. The investigation would make the window longer.
 
The changes in pgstat_progress_end_command() and
pg_stat_get_progress_info() update st_progress_command_target
depending on the command type involved, breaking the existing contract
of  those routines, particularly the fact that the progress fields
*should* be reset in an error stack.

I searched the code base for how st_progress_command_target is used.
In case there is subsequent command following the COPY, st_progress_command_target would be set to the Oid
of the subsequent command.
In case there is no subsequent command following the COPY command, it seems leaving st_progress_command_target as
the Oid of the COPY command wouldn't hurt.

Maybe you can let me know what side effect not resetting st_progress_command_target would have.

As an alternative, upon seeing PROGRESS_COMMAND_COPY_DONE, we can transfer the value of
st_progress_command_target to a new field called, say, st_progress_command_previous_target (
and resetting st_progress_command_target as usual).

Please let me know what you think.

Thanks

Re: adding status for COPY progress report

От
Amit Langote
Дата:
Hi,

On Thu, May 26, 2022 at 11:35 AM Zhihong Yu <zyu@yugabyte.com> wrote:
>> The changes in pgstat_progress_end_command() and
>> pg_stat_get_progress_info() update st_progress_command_target
>> depending on the command type involved, breaking the existing contract
>> of  those routines, particularly the fact that the progress fields
>> *should* be reset in an error stack.

+1 to what Michael said here.  I don't think the following changes are
acceptable:

@@ -106,7 +106,13 @@ pgstat_progress_end_command(void)
        return;

    PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
-   beentry->st_progress_command = PROGRESS_COMMAND_INVALID;
-   beentry->st_progress_command_target = InvalidOid;
+   if (beentry->st_progress_command == PROGRESS_COMMAND_COPY)
+       // We want to show the relation for the most recent COPY command
+       beentry->st_progress_command = PROGRESS_COMMAND_COPY_DONE;
+   else
+   {
+       beentry->st_progress_command = PROGRESS_COMMAND_INVALID;
+       beentry->st_progress_command_target = InvalidOid;
+   }
    PGSTAT_END_WRITE_ACTIVITY(beentry);
 }

pgstat_progress_end_command() is generic infrastructure and there
shouldn't be anything COPY-specific there.

> I searched the code base for how st_progress_command_target is used.
> In case there is subsequent command following the COPY, st_progress_command_target would be set to the Oid
> of the subsequent command.
> In case there is no subsequent command following the COPY command, it seems leaving st_progress_command_target as
> the Oid of the COPY command wouldn't hurt.
>
> Maybe you can let me know what side effect not resetting st_progress_command_target would have.
>
> As an alternative, upon seeing PROGRESS_COMMAND_COPY_DONE, we can transfer the value of
> st_progress_command_target to a new field called, say, st_progress_command_previous_target (
> and resetting st_progress_command_target as usual).

That doesn't sound like a good idea.

As others have said, there's no point in adding a status field to
pg_stat_progress_copy that only tells whether a COPY is running or
not.  You can already do that by looking at the output of `select *
from pg_stat_progress_copy`.  If the COPY you're interested in is
running, you'll find the corresponding row in the view.  The row is
made to disappear from the view the instance the COPY finishes, either
successfully or due to an error.  Whichever is the case will be known
in the connection that initiated the COPY and you may find it in the
server log.  I don't think we should make Postgres remember anything
about that in the shared memory, or at least not with one-off
adjustments of the shared progress reporting state like in the
proposed patch.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com



Re: adding status for COPY progress report

От
Zhihong Yu
Дата:


On Tue, May 31, 2022 at 1:20 AM Amit Langote <amitlangote09@gmail.com> wrote:
Hi,

On Thu, May 26, 2022 at 11:35 AM Zhihong Yu <zyu@yugabyte.com> wrote:
>> The changes in pgstat_progress_end_command() and
>> pg_stat_get_progress_info() update st_progress_command_target
>> depending on the command type involved, breaking the existing contract
>> of  those routines, particularly the fact that the progress fields
>> *should* be reset in an error stack.

+1 to what Michael said here.  I don't think the following changes are
acceptable:

@@ -106,7 +106,13 @@ pgstat_progress_end_command(void)
        return;

    PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
-   beentry->st_progress_command = PROGRESS_COMMAND_INVALID;
-   beentry->st_progress_command_target = InvalidOid;
+   if (beentry->st_progress_command == PROGRESS_COMMAND_COPY)
+       // We want to show the relation for the most recent COPY command
+       beentry->st_progress_command = PROGRESS_COMMAND_COPY_DONE;
+   else
+   {
+       beentry->st_progress_command = PROGRESS_COMMAND_INVALID;
+       beentry->st_progress_command_target = InvalidOid;
+   }
    PGSTAT_END_WRITE_ACTIVITY(beentry);
 }

pgstat_progress_end_command() is generic infrastructure and there
shouldn't be anything COPY-specific there.

> I searched the code base for how st_progress_command_target is used.
> In case there is subsequent command following the COPY, st_progress_command_target would be set to the Oid
> of the subsequent command.
> In case there is no subsequent command following the COPY command, it seems leaving st_progress_command_target as
> the Oid of the COPY command wouldn't hurt.
>
> Maybe you can let me know what side effect not resetting st_progress_command_target would have.
>
> As an alternative, upon seeing PROGRESS_COMMAND_COPY_DONE, we can transfer the value of
> st_progress_command_target to a new field called, say, st_progress_command_previous_target (
> and resetting st_progress_command_target as usual).

That doesn't sound like a good idea.

As others have said, there's no point in adding a status field to
pg_stat_progress_copy that only tells whether a COPY is running or
not.  You can already do that by looking at the output of `select *
from pg_stat_progress_copy`.  If the COPY you're interested in is
running, you'll find the corresponding row in the view.  The row is
made to disappear from the view the instance the COPY finishes, either
successfully or due to an error.  Whichever is the case will be known
in the connection that initiated the COPY and you may find it in the
server log.  I don't think we should make Postgres remember anything
about that in the shared memory, or at least not with one-off
adjustments of the shared progress reporting state like in the
proposed patch.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

Hi, Matthias, Michael and Amit:
Thanks for your time reviewing my patch.

I took note of what you said.

If I can make the changes more general, I would circle back.

Cheers