Обсуждение: Add WAL read stats to pg_stat_wal

Поиск
Список
Период
Сортировка

Add WAL read stats to pg_stat_wal

От
Bharath Rupireddy
Дата:
Hi,

While working on [1], I was in need to know WAL read stats (number of
times and amount of WAL data read from disk, time taken to read) to
measure the benefit. I had to write a developer patch to capture WAL
read stats as pg_stat_wal currently emits WAL write stats. With recent
works on pg_stat_io which emit data read IO stats too, I think it's
better to not miss WAL read stats. It might help others who keep an
eye on IOPS of the production servers for various reasons. The WAL
read stats I'm thinking useful are wal_read_bytes - total amount of
WAL read, wal_read - total number of times WAL is read from disk,
wal_read_time - total amount of time spent reading WAL (tracked only
when an existing GUC track_wal_io_timing is on).

I came up with a patch and attached it here. The WAL readers that add
to WAL read stats are WAL senders, startup process and other backends
using xlogreader for logical replication or pg_walinspect SQL
functions. They all report stats to shared memory by calling
pgstat_report_wal() in appropriate locations. In standby mode, calling
pgstat_report_wa() for every record seems to be costly. Therefore, I
chose to report stats every 1024 WAL records (a random number,
suggestions for a better a way are welcome here).

Note that the patch needs a bit more work, per [2]. With the patch,
the WAL senders (processes exiting after checkpointer) will generate
stats and we need to either let all or only one WAL sender to write
stats to disk. Allowing one WAL sender to write might be tricky.
Allowing all WAL senders to write might make too many writes to the
stats file. And, we need a lock to let only one process write. I can't
think of a best way here at the moment.

Thoughts?

[1] https://www.postgresql.org/message-id/CALj2ACXKKK=wbiG5_t6dGao5GoecMwRkhr7GjVBM_jg54+Na=Q@mail.gmail.com
[2]
    /*
     * Write out stats after shutdown. This needs to be called by exactly one
     * process during a normal shutdown, and since checkpointer is shut down
     * very late...
     *
     * Walsenders are shut down after the checkpointer, but currently don't
     * report stats. If that changes, we need a more complicated solution.
     */
    before_shmem_exit(pgstat_before_server_shutdown, 0);

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Вложения

Re: Add WAL read stats to pg_stat_wal

От
Andres Freund
Дата:
Hi,

On 2023-02-16 23:39:00 +0530, Bharath Rupireddy wrote:
> While working on [1], I was in need to know WAL read stats (number of
> times and amount of WAL data read from disk, time taken to read) to
> measure the benefit. I had to write a developer patch to capture WAL
> read stats as pg_stat_wal currently emits WAL write stats. With recent
> works on pg_stat_io which emit data read IO stats too, I think it's
> better to not miss WAL read stats. It might help others who keep an
> eye on IOPS of the production servers for various reasons. The WAL
> read stats I'm thinking useful are wal_read_bytes - total amount of
> WAL read, wal_read - total number of times WAL is read from disk,
> wal_read_time - total amount of time spent reading WAL (tracked only
> when an existing GUC track_wal_io_timing is on).

I doesn't really seem useful to have this in pg_stat_wal, because you can't
really figure out where those reads are coming from. Are they crash recovery?
Walsender? ...?

I think this'd better be handled by adding WAL support for pg_stat_io. Then
the WAL reads would be attributed to the relevant backend type, making it
easier to answer such questions.  Going forward I want to add support for
seeing pg_stat_io for individual connections, which'd then automatically
support this feature for the WAL reads as well.

Eventually I think pg_stat_wal should only track wal_records, wal_fpi,
wal_buffers_full and fill the other columns from pg_stat_io.


However, this doesn't "solve" the following issue:

> Note that the patch needs a bit more work, per [2]. With the patch,
> the WAL senders (processes exiting after checkpointer) will generate
> stats and we need to either let all or only one WAL sender to write
> stats to disk. Allowing one WAL sender to write might be tricky.
> Allowing all WAL senders to write might make too many writes to the
> stats file. And, we need a lock to let only one process write. I can't
> think of a best way here at the moment.
> 
> Thoughts?
> 
> [1] https://www.postgresql.org/message-id/CALj2ACXKKK=wbiG5_t6dGao5GoecMwRkhr7GjVBM_jg54+Na=Q@mail.gmail.com
> [2]
>     /*
>      * Write out stats after shutdown. This needs to be called by exactly one
>      * process during a normal shutdown, and since checkpointer is shut down
>      * very late...
>      *
>      * Walsenders are shut down after the checkpointer, but currently don't
>      * report stats. If that changes, we need a more complicated solution.
>      */
>     before_shmem_exit(pgstat_before_server_shutdown, 0);

I wonder if we should keep the checkpointer around for longer. If we have
checkpointer signal postmaster after it wrote the shutdown checkpoint,
postmaster could signal walsenders to shut down, and checkpointer could do
some final work, like writing out the stats.

I suspect this could be useful for other things as well. It's awkward that we
don't have a place to put "just before shutting down" type tasks. And
checkpointer seems well suited for that.

Greetings,

Andres Freund



Re: Add WAL read stats to pg_stat_wal

От
Kyotaro Horiguchi
Дата:
At Thu, 16 Feb 2023 11:11:38 -0800, Andres Freund <andres@anarazel.de> wrote in 
> I wonder if we should keep the checkpointer around for longer. If we have
> checkpointer signal postmaster after it wrote the shutdown checkpoint,
> postmaster could signal walsenders to shut down, and checkpointer could do
> some final work, like writing out the stats.
> I suspect this could be useful for other things as well. It's awkward that we
> don't have a place to put "just before shutting down" type tasks. And
> checkpointer seems well suited for that.

I totally agree that it will be useful, but I'm not quite sure how
checkpointer would be able to let postmaster know about that state
without requiring access to shared memory.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Add WAL read stats to pg_stat_wal

От
Bharath Rupireddy
Дата:
On Fri, Feb 17, 2023 at 12:41 AM Andres Freund <andres@anarazel.de> wrote:
>
> On 2023-02-16 23:39:00 +0530, Bharath Rupireddy wrote:
> > While working on [1], I was in need to know WAL read stats (number of
> > times and amount of WAL data read from disk, time taken to read) to
> > measure the benefit. I had to write a developer patch to capture WAL
> > read stats as pg_stat_wal currently emits WAL write stats. With recent
> > works on pg_stat_io which emit data read IO stats too, I think it's
> > better to not miss WAL read stats. It might help others who keep an
> > eye on IOPS of the production servers for various reasons. The WAL
> > read stats I'm thinking useful are wal_read_bytes - total amount of
> > WAL read, wal_read - total number of times WAL is read from disk,
> > wal_read_time - total amount of time spent reading WAL (tracked only
> > when an existing GUC track_wal_io_timing is on).
>
> I doesn't really seem useful to have this in pg_stat_wal, because you can't
> really figure out where those reads are coming from. Are they crash recovery?
> Walsender? ...?

Yes, that's the limitation with what I've proposed.

> I think this'd better be handled by adding WAL support for pg_stat_io. Then
> the WAL reads would be attributed to the relevant backend type, making it
> easier to answer such questions.  Going forward I want to add support for
> seeing pg_stat_io for individual connections, which'd then automatically
> support this feature for the WAL reads as well.
>
> Eventually I think pg_stat_wal should only track wal_records, wal_fpi,
> wal_buffers_full and fill the other columns from pg_stat_io.

pg_stat_io being one place for all IO related information sounds apt
and useful. And similarly, we might want to push write/read/flush info
from pg_stat_slru to pg_stat_io.

> However, this doesn't "solve" the following issue:
>
> > Note that the patch needs a bit more work, per [2]. With the patch,
> > the WAL senders (processes exiting after checkpointer) will generate
> > stats and we need to either let all or only one WAL sender to write
> > stats to disk. Allowing one WAL sender to write might be tricky.
> > Allowing all WAL senders to write might make too many writes to the
> > stats file. And, we need a lock to let only one process write. I can't
> > think of a best way here at the moment.
> >
> > Thoughts?
> >
> > [1] https://www.postgresql.org/message-id/CALj2ACXKKK=wbiG5_t6dGao5GoecMwRkhr7GjVBM_jg54+Na=Q@mail.gmail.com
> > [2]
> >     /*
> >      * Write out stats after shutdown. This needs to be called by exactly one
> >      * process during a normal shutdown, and since checkpointer is shut down
> >      * very late...
> >      *
> >      * Walsenders are shut down after the checkpointer, but currently don't
> >      * report stats. If that changes, we need a more complicated solution.
> >      */
> >     before_shmem_exit(pgstat_before_server_shutdown, 0);
>
> I wonder if we should keep the checkpointer around for longer. If we have
> checkpointer signal postmaster after it wrote the shutdown checkpoint,
> postmaster could signal walsenders to shut down, and checkpointer could do
> some final work, like writing out the stats.
>
> I suspect this could be useful for other things as well. It's awkward that we
> don't have a place to put "just before shutting down" type tasks. And
> checkpointer seems well suited for that.

Yes, there are some places that still assume checkpointer is the last
process to exit, for instance see [1]. If we can truly make it happen,
it'll be useful. I'll come up with more thoughts (and perhaps a patch)
on this soon.

[1]
        /*
         * Checkpointer is the last process to shut down, so we ask it to hold
         * the keys for a range of other tasks required most of which have
         * nothing to do with checkpointing at all.
         *
         * For various reasons, some config values can change dynamically so
         * the primary copy of them is held in shared memory to make sure all
         * backends see the same value.  We make Checkpointer responsible for
         * updating the shared memory copy if the parameter setting changes
         * because of SIGHUP.
         */

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Add WAL read stats to pg_stat_wal

От
Bharath Rupireddy
Дата:
On Mon, Feb 20, 2023 at 10:51 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> At Thu, 16 Feb 2023 11:11:38 -0800, Andres Freund <andres@anarazel.de> wrote in
> > I wonder if we should keep the checkpointer around for longer. If we have
> > checkpointer signal postmaster after it wrote the shutdown checkpoint,
> > postmaster could signal walsenders to shut down, and checkpointer could do
> > some final work, like writing out the stats.
> > I suspect this could be useful for other things as well. It's awkward that we
> > don't have a place to put "just before shutting down" type tasks. And
> > checkpointer seems well suited for that.
>
> I totally agree that it will be useful, but I'm not quite sure how
> checkpointer would be able to let postmaster know about that state
> without requiring access to shared memory.

The checkpointer can either set a flag in shared memory
(CheckpointerShmem or XLogCtl) or send a multiplexed SIGUSR1 (of
course, this one too needs shared memory access for PMSignalState) or
SIGUSR2 (pqsignal(SIGUSR2, dummy_handler);   /* unused, reserve for
children */) if we don't want shared memory access after it writes a
shutdown checkpoint.

Having said that, what's the problem if we use shared memory to report
the shutdown checkpoint to the postmaster? In case of abnormal
shutdown where shared memory gets corrupted, we don't even write a
shutdown checkpoint, no? In such a case, the postmaster doesn't send
SIGUSR2 to the checkpointer, instead it sends SIGQUIT. AFICS, using
shared memory doesn't seem to have any problem. Do you have any other
thoughts in mind?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Add WAL read stats to pg_stat_wal

От
Andres Freund
Дата:
Hi,

On 2023-02-20 14:21:39 +0900, Kyotaro Horiguchi wrote:
> I totally agree that it will be useful, but I'm not quite sure how
> checkpointer would be able to let postmaster know about that state
> without requiring access to shared memory.

SendPostmasterSignal(PMSIGNAL_SHUTDOWN_CHECKPOINT_COMPLETE);
or such.

Greetings,

Andres Freund



Re: Add WAL read stats to pg_stat_wal

От
Kyotaro Horiguchi
Дата:
At Mon, 20 Feb 2023 08:29:06 -0800, Andres Freund <andres@anarazel.de> wrote in 
> Hi,
> 
> On 2023-02-20 14:21:39 +0900, Kyotaro Horiguchi wrote:
> > I totally agree that it will be useful, but I'm not quite sure how
> > checkpointer would be able to let postmaster know about that state
> > without requiring access to shared memory.
> 
> SendPostmasterSignal(PMSIGNAL_SHUTDOWN_CHECKPOINT_COMPLETE);
> or such.

Ah, that's it. Thanks!

regarsd.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Add WAL read stats to pg_stat_wal

От
Kyotaro Horiguchi
Дата:
At Mon, 20 Feb 2023 20:15:00 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in 
> Having said that, what's the problem if we use shared memory to report
> the shutdown checkpoint to the postmaster? In case of abnormal
> shutdown where shared memory gets corrupted, we don't even write a
> shutdown checkpoint, no? In such a case, the postmaster doesn't send
> SIGUSR2 to the checkpointer, instead it sends SIGQUIT. AFICS, using
> shared memory doesn't seem to have any problem. Do you have any other
> thoughts in mind?

I had a baseless belief that postmaster doesn't touch shared memory,
but as Andres suggested, SendPostmasterSignal() already does that.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center