Обсуждение: [PATCH] Expose checkpoint timestamp and duration in pg_stat_checkpointer

Поиск
Список
Период
Сортировка

[PATCH] Expose checkpoint timestamp and duration in pg_stat_checkpointer

От
Soumya S Murali
Дата:
Hi all,

While debugging checkpointer write behavior, I recently found some of the enhancements related to extending pg_stat_checkpointer by including checkpoint type (manual/timed/immediate), last_checkpoint_time and checkpoint_total_time information to checkpoint completion logs through SQL when `log_checkpoints` is enabled. I am attaching my observations, screenshots and patch in support for this.

1. Log for type of checkpoint occured:

2025-11-20 11:51:06.128 IST [18026] LOG:  checkpoint complete
(immediate): wrote 7286 buffers (44.5%), wrote 4 SLRU buffers; 0 WAL
file(s) added, 0 removed, 27 recycled; write=0.095 s, sync=0.034 s,
total=0.279 s; sync files=17, longest=0.004 s, average=0.002 s;
distance=447382 kB, estimate=531349 kB; lsn=0/7F4EDED8, redo
lsn=0/7F4EDE80

2. Log for the checkpoint_total_time and last_checkpoint_time:

 checkpoint_total_time |       last_checkpoint_time
-----------------------+----------------------------------
                175138 | 2025-11-20 11:58:02.879149+05:30
(1 row)

2025-11-20 11:58:02.879 IST [18026] LOG:  checkpoint complete
(immediate): wrote 0 buffers (0.0%), wrote 0 SLRU buffers; 0 WAL
file(s) added, 0 removed, 0 recycled; write=0.001 s, sync=0.001 s,
total=0.019 s; sync files=0, longest=0.000 s, average=0.000 s;
distance=0 kB, estimate=478214 kB; lsn=0/7F4EDFE0, redo lsn=0/7F4EDF88

Looking forward to more feedback.

Regards,
Soumya

Вложения

Re: [PATCH] Expose checkpoint timestamp and duration in pg_stat_checkpointer

От
Michael Banck
Дата:
Hi,

On Mon, Nov 24, 2025 at 11:40:44AM +0530, Soumya S Murali wrote:
> While debugging checkpointer write behavior, I recently found some of the
> enhancements related to extending pg_stat_checkpointer by including
> checkpoint type (manual/timed/immediate), last_checkpoint_time and
> checkpoint_total_time information to checkpoint completion logs through SQL
> when `log_checkpoints` is enabled. I am attaching my observations,
> screenshots and patch in support for this.
>
> 1. Log for type of checkpoint occured:
>
> 2025-11-20 11:51:06.128 IST [18026] LOG:  checkpoint complete
> (immediate): wrote 7286 buffers (44.5%), wrote 4 SLRU buffers; 0 WAL
> file(s) added, 0 removed, 27 recycled; write=0.095 s, sync=0.034 s,
> total=0.279 s; sync files=17, longest=0.004 s, average=0.002 s;
> distance=447382 kB, estimate=531349 kB; lsn=0/7F4EDED8, redo
> lsn=0/7F4EDE80

I think that'd be useful; the checkpoint complete log line clearly has
the more interesting output, and having it state the type would make it
easier to answer question like "how many buffers did the last wal-based
checkpoint write?

> 2. Log for the checkpoint_total_time and last_checkpoint_time:
>
>  checkpoint_total_time |       last_checkpoint_time
> -----------------------+----------------------------------
>                 175138 | 2025-11-20 11:58:02.879149+05:30
> (1 row)

Reading throught the patch, it looks like checkpoint_total_time is the
total time of the last checkpoint?

> +  proparallel => 'r', prorettype => 'float8', proargtypes => '',
> +  prosrc => 'pg_stat_get_checkpointer_checkpoint_total_time' },

If so, the naming is pretty confusing, last_checkpoint_duration or
something might be clearer.

In general I doubt how much those gauges (as oppposed to counters) only
pertaining to the last checkpoint are useful in pg_stat_checkpointer.
What would be the use case for those two values?

Also, as a nitpick, your patch adds unnecessary newlines and I think
stats_reset should be kept as last column in pg_stat_checkpointer as
usual.


Michael



Re: [PATCH] Expose checkpoint timestamp and duration in pg_stat_checkpointer

От
Álvaro Herrera
Дата:
On 2025-Nov-24, Michael Banck wrote:

> In general I doubt how much those gauges (as oppposed to counters) only
> pertaining to the last checkpoint are useful in pg_stat_checkpointer.
> What would be the use case for those two values?

I think it's useful to know how long checkpoint has to work.  It's a bit
lame to have only one duration (the last one), but at least with this
arrangement you can have external monitoring software connect to the
server, extract that value and save it somewhere else.  Monitoring
systems do this all the time, and we've been waiting for a better
implementation to store monitoring data inside Postgres for years.  I
think we shouldn't block this proposal just because of this issue,
because it can clearly be useful.

However, I'm not sure I'm very interested in knowing only the duration
of the checkpoint.  I mean, much of the time the duration is going to be
whatever fraction of the checkpoint timeout you have as
checkpoint_completion_target, right?  Which includes sleeps.  So I think
you really want two durations: one is the duration itself, and the other
is what fraction of that did the checkpointer sleep in order to achieve
that duration.  So you know how much time checkpointer spent trying to
get the operating system do stuff rather than just sit there waiting.
We already have that data, kinda, in write_time and sync_time, but those
are cumulative rather than just for the last one.  (I guess you can have
the monitoring system compute the deltas as it finds each new
checkpoint.)  I'm not sure how good this system is.

In the past, I looked at a couple of monitoring dashboards offered by
cloud vendors, searching for anything valuable in terms of checkpoints.
What I saw was very disappointing -- mostly just "how many checkpoints
per minute", which is mostly flat zero with periodic spikes.  Totally
useless.  Does anybody know if some vendor has good charts for this?
Also, if we were to add this new proposed duration, how could these
charts improve?

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
"How strange it is to find the words "Perl" and "saner" in such close
proximity, with no apparent sense of irony. I doubt that Larry himself
could have managed it."         (ncm, http://lwn.net/Articles/174769/)



Re: [PATCH] Expose checkpoint timestamp and duration in pg_stat_checkpointer

От
Michael Banck
Дата:
Hi,

On Mon, Nov 24, 2025 at 11:07:41AM +0100, Álvaro Herrera wrote:
> On 2025-Nov-24, Michael Banck wrote:
>
> > In general I doubt how much those gauges (as oppposed to counters) only
> > pertaining to the last checkpoint are useful in pg_stat_checkpointer.
> > What would be the use case for those two values?
>
> I think it's useful to know how long checkpoint has to work.  It's a bit
> lame to have only one duration (the last one), but at least with this
> arrangement you can have external monitoring software connect to the
> server, extract that value and save it somewhere else.  Monitoring
> systems do this all the time, and we've been waiting for a better
> implementation to store monitoring data inside Postgres for years.  I
> think we shouldn't block this proposal just because of this issue,
> because it can clearly be useful.

I don't know - what happens if the monitoring systems reads those values
every minute, but then suddenly Postgres checkpoints every 20 seconds
due to a traffic spike? It would just not see those additional
checkpoints in this case, no?

What monitoring systems do (have to do) is query write_time + sync_time
as total_time in pg_stat_checkpointer and store that along with the
timestamp of the query. Then you (maybe awkwardly) generate a graph of
the checkpoint durations over time.

> However, I'm not sure I'm very interested in knowing only the duration
> of the checkpoint.  I mean, much of the time the duration is going to be
> whatever fraction of the checkpoint timeout you have as
> checkpoint_completion_target, right? Which includes sleeps.

Yeah, that is the other thing I was wondering about, but did not mention
in my mail, good point.

> So I think you really want two durations: one is the duration itself,
> and the other is what fraction of that did the checkpointer sleep in
> order to achieve that duration.  So you know how much time
> checkpointer spent trying to get the operating system do stuff rather
> than just sit there waiting.  We already have that data, kinda, in
> write_time and sync_time, but those are cumulative rather than just
> for the last one.

I think that we either have "last timestamp whatever" or "total", but I
think we don't have "last duration" anywhere?

> (I guess you can have the monitoring system compute
> the deltas as it finds each new checkpoint.)  I'm not sure how good
> this system is.

Right, this is what I meant above. But from what I see on PG18,
total_time just seems tbe write_time + sync_time, do we have the sleep
somewhere?

> In the past, I looked at a couple of monitoring dashboards offered by
> cloud vendors, searching for anything valuable in terms of checkpoints.
> What I saw was very disappointing -- mostly just "how many checkpoints
> per minute", which is mostly flat zero with periodic spikes.  Totally
> useless.  Does anybody know if some vendor has good charts for this?
> Also, if we were to add this new proposed duration, how could these
> charts improve?

I don't have a good answer here.


Michael



Re: [PATCH] Expose checkpoint timestamp and duration in pg_stat_checkpointer

От
Soumya S Murali
Дата:
On Mon, Nov 24, 2025 at 2:48 PM Michael Banck <mbanck@gmx.net> wrote:
>
> Hi,
>
> On Mon, Nov 24, 2025 at 11:40:44AM +0530, Soumya S Murali wrote:
> > While debugging checkpointer write behavior, I recently found some of the
> > enhancements related to extending pg_stat_checkpointer by including
> > checkpoint type (manual/timed/immediate), last_checkpoint_time and
> > checkpoint_total_time information to checkpoint completion logs through SQL
> > when `log_checkpoints` is enabled. I am attaching my observations,
> > screenshots and patch in support for this.
> >
> > 1. Log for type of checkpoint occured:
> >
> > 2025-11-20 11:51:06.128 IST [18026] LOG:  checkpoint complete
> > (immediate): wrote 7286 buffers (44.5%), wrote 4 SLRU buffers; 0 WAL
> > file(s) added, 0 removed, 27 recycled; write=0.095 s, sync=0.034 s,
> > total=0.279 s; sync files=17, longest=0.004 s, average=0.002 s;
> > distance=447382 kB, estimate=531349 kB; lsn=0/7F4EDED8, redo
> > lsn=0/7F4EDE80
>
> I think that'd be useful; the checkpoint complete log line clearly has
> the more interesting output, and having it state the type would make it
> easier to answer question like "how many buffers did the last wal-based
> checkpoint write?

Thank you for the feedback and glad to hear that exposing the
checkpoint type in the completion log seems useful. My main motivation
was exactly this kind of analysis: being able to know the buffer write
patterns with the type of checkpoint that triggered them.

> > 2. Log for the checkpoint_total_time and last_checkpoint_time:
> >
> >  checkpoint_total_time |       last_checkpoint_time
> > -----------------------+----------------------------------
> >                 175138 | 2025-11-20 11:58:02.879149+05:30
> > (1 row)
>
> Reading throught the patch, it looks like checkpoint_total_time is the
> total time of the last checkpoint?
>
> > +  proparallel => 'r', prorettype => 'float8', proargtypes => '',
> > +  prosrc => 'pg_stat_get_checkpointer_checkpoint_total_time' },
>
> If so, the naming is pretty confusing, last_checkpoint_duration or
> something might be clearer.
>
> In general I doubt how much those gauges (as oppposed to counters) only
> pertaining to the last checkpoint are useful in pg_stat_checkpointer.
> What would be the use case for those two values?
>
> Also, as a nitpick, your patch adds unnecessary newlines and I think
> stats_reset should be kept as last column in pg_stat_checkpointer as
> usual.
>
>
> Michael

Yes, the field is intended to represent the duration of the most
recently completed checkpoint, and I agree that renaming it to
last_checkpoint_duration would make the purpose more clear. Even
though it is a single value, it can still help monitoring tools
capture and store each duration over time, so I’ll refine the naming,
remove the unnecessary newlines, and keep stats_reset as the last
column as suggested.

Regards
Soumya



Re: [PATCH] Expose checkpoint timestamp and duration in pg_stat_checkpointer

От
Soumya S Murali
Дата:
On Mon, Nov 24, 2025 at 3:37 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:
>
> On 2025-Nov-24, Michael Banck wrote:
>
> > In general I doubt how much those gauges (as oppposed to counters) only
> > pertaining to the last checkpoint are useful in pg_stat_checkpointer.
> > What would be the use case for those two values?
>
> I think it's useful to know how long checkpoint has to work.  It's a bit
> lame to have only one duration (the last one), but at least with this
> arrangement you can have external monitoring software connect to the
> server, extract that value and save it somewhere else.  Monitoring
> systems do this all the time, and we've been waiting for a better
> implementation to store monitoring data inside Postgres for years.  I
> think we shouldn't block this proposal just because of this issue,
> because it can clearly be useful.
>
> However, I'm not sure I'm very interested in knowing only the duration
> of the checkpoint.  I mean, much of the time the duration is going to be
> whatever fraction of the checkpoint timeout you have as
> checkpoint_completion_target, right?  Which includes sleeps.  So I think
> you really want two durations: one is the duration itself, and the other
> is what fraction of that did the checkpointer sleep in order to achieve
> that duration.  So you know how much time checkpointer spent trying to
> get the operating system do stuff rather than just sit there waiting.
> We already have that data, kinda, in write_time and sync_time, but those
> are cumulative rather than just for the last one.  (I guess you can have
> the monitoring system compute the deltas as it finds each new
> checkpoint.)  I'm not sure how good this system is.

Thank you for the detailed thoughts. I agree that having only the last
checkpoint’s duration is limited, but it still gives monitoring tools
a concrete value they can sample and store over time, which is better
than relying only on counters and logs. I will try whether separating
total duration and actual active write/sync time (vs. sleep time) can
be exposed in a more clearer way, as that seems useful for deeper
diagnosis.

> In the past, I looked at a couple of monitoring dashboards offered by
> cloud vendors, searching for anything valuable in terms of checkpoints.
> What I saw was very disappointing -- mostly just "how many checkpoints
> per minute", which is mostly flat zero with periodic spikes.  Totally
> useless.  Does anybody know if some vendor has good charts for this?
> Also, if we were to add this new proposed duration, how could these
> charts improve?

I will look into this in more depth. Will let you know if I find
something concrete.

Regards
Soumya



Re: [PATCH] Expose checkpoint timestamp and duration in pg_stat_checkpointer

От
Juan José Santamaría Flecha
Дата:


El mié, 26 nov 2025, 11:14, Soumya S Murali <soumyamurali.work@gmail.com> escribió:
On Mon, Nov 24, 2025 at 3:37 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:

> In the past, I looked at a couple of monitoring dashboards offered by
> cloud vendors, searching for anything valuable in terms of checkpoints.
> What I saw was very disappointing -- mostly just "how many checkpoints
> per minute", which is mostly flat zero with periodic spikes.  Totally
> useless.  Does anybody know if some vendor has good charts for this?
> Also, if we were to add this new proposed duration, how could these
> charts improve?

I will look into this in more depth. Will let you know if I find
something concrete.

There is a "Checkpoints" section in the pgbadger reports, and that's probably the most widely used tool.

Regards

Juan José Santamaría Flecha

Re: [PATCH] Expose checkpoint timestamp and duration in pg_stat_checkpointer

От
Michael Banck
Дата:
Hi,

On Wed, Nov 26, 2025 at 06:23:08PM +0100, Juan José Santamaría Flecha wrote:
> El mié, 26 nov 2025, 11:14, Soumya S Murali <soumyamurali.work@gmail.com>
> escribió:
> There is a "Checkpoints" section in the pgbadger reports, and that's
> probably the most widely used tool.

That one parses the Postgres logs, so is unaffected by the changes to
pg_stat_checkpointer discussed here.


Michael



Re: [PATCH] Expose checkpoint timestamp and duration in pg_stat_checkpointer

От
Soumya S Murali
Дата:
Hi all,

On Wed, Nov 26, 2025 at 11:11 PM Michael Banck <mbanck@gmx.net> wrote:
>
> Hi,
>
> On Wed, Nov 26, 2025 at 06:23:08PM +0100, Juan José Santamaría Flecha wrote:
> > El mié, 26 nov 2025, 11:14, Soumya S Murali <soumyamurali.work@gmail.com>
> > escribió:
> > There is a "Checkpoints" section in the pgbadger reports, and that's
> > probably the most widely used tool.
>
> That one parses the Postgres logs, so is unaffected by the changes to
> pg_stat_checkpointer discussed here.

Thank you for the suggestions. I will refer to how pgbadger visualizes
checkpoints and also will check whether any other monitoring tools
provide meaningful checkpoint charts. If I find anything useful, I’ll
share it.

Regards,
Soumya



Re: [PATCH] Expose checkpoint timestamp and duration in pg_stat_checkpointer

От
Soumya S Murali
Дата:
Hi all,

I have updated the code based on the feedback received to my earlier
mails and prepared a patch for further review. In this patch, I have
renamed the checkpoint_total_time to last_checkpoint_duration,
stats_reset has been kept as the last column following the usual
pattern, last_checkpoint_duration and last_checkpoint_time will now be
overwritten per checkpoint and also have removed unnecessary lines as
per the usual format. I had successfully verified the checkpointer
duration with different write loads and I am  attaching the
observations for further reference.

pgbench -c 8 -j 8 -T 30 -p 55432 postgres
pgbench (19devel)
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 50
query mode: simple
number of clients: 8
number of threads: 8
maximum number of tries: 1
duration: 30 s
number of transactions actually processed: 55936
number of failed transactions: 0 (0.000%)
latency average = 4.290 ms
initial connection time = 7.107 ms
tps = 1864.846690 (without initial connection time)

pgbench -c 16 -j 8 -T 60 -p 55432 postgres
pgbench (19devel)
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 50
query mode: simple
number of clients: 16
number of threads: 8
maximum number of tries: 1
duration: 60 s
number of transactions actually processed: 196974
number of failed transactions: 0 (0.000%)
latency average = 4.873 ms
initial connection time = 12.535 ms
tps = 3283.407286 (without initial connection time)
postgres=# SELECT last_checkpoint_duration, last_checkpoint_time,
write_time, sync_time, buffers_written FROM pg_stat_checkpoint
er;
 last_checkpoint_duration |       last_checkpoint_time       |
write_time | sync_time | buffers_written
--------------------------+----------------------------------+------------+-----------+-----------------
                    23940 | 2025-11-28 10:02:29.298905+05:30 |
104873 |       811 |            3468
(1 row)
CHECKPOINT
sleep 1
postgres=# SELECT last_checkpoint_duration, last_checkpoint_time,
write_time, sync_time, buffers_written FROM pg_stat_checkpointer;
 last_checkpoint_duration |       last_checkpoint_time       |
write_time | sync_time | buffers_written
--------------------------+----------------------------------+------------+-----------+-----------------
                      332 | 2025-11-28 10:03:57.828072+05:30 |
104979 |       857 |           10453
(1 row)
2025-11-28 10:03:57.828 IST [11343] LOG:  checkpoint complete
(immediate): wrote 6985 buffers (42.6%), wrote 11 SLRU buffers; 0 WAL
file(s) added, 0 removed, 32 recycled; write=0.106 s, sync=0.046 s,
total=0.332 s; sync files=23, longest=0.004 s, average=0.002 s;
distance=538440 kB, estimate=540445 kB; lsn=0/84DDA138, redo
lsn=0/84DDA0E0

I hope these observations are helpful for further analysis. Thank you
for the earlier reviews and helpful suggestions. Looking forward to
more feedback.

Regards,
Soumya

Вложения

Re: [PATCH] Expose checkpoint timestamp and duration in pg_stat_checkpointer

От
Michael Banck
Дата:
Hi,

On Fri, Nov 28, 2025 at 10:23:54AM +0530, Soumya S Murali wrote:
> I have updated the code based on the feedback received to my earlier
> mails and prepared a patch for further review.

I think the logging change and the pg_stat_checkpointer changes are
different enough that they should be separate patches. If not just
because the logging change seems to not have had any non-positive
feedback.

> In this patch, I have renamed the checkpoint_total_time to
> last_checkpoint_duration, stats_reset has been kept as the last column
> following the usual pattern, last_checkpoint_duration and
> last_checkpoint_time will now be overwritten per checkpoint and also
> have removed unnecessary lines as per the usual format. I had
> successfully verified the checkpointer duration with different write
> loads and I am  attaching the observations for further reference.

I am still not convinced of the usefulness of those changes to
pg_stat_checkpointer, but some feedback on the patch:

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 9217508917..4a45f4f708 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6778,7 +6778,7 @@ LogCheckpointEnd(bool restartpoint, int flags)


     /* Store in PendingCheckpointerStats */
-    PendingCheckpointerStats.checkpoint_total_time += (double) total_msecs;
+    PendingCheckpointerStats.last_checkpoint_duration = (double) total_msecs;
     PendingCheckpointerStats.last_checkpoint_time = CheckpointStats.ckpt_end_t;

[...]

diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index a8eb1f8add..73688041c8 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -263,7 +263,7 @@ typedef struct PgStat_CheckpointerStats
     PgStat_Counter sync_time;
     PgStat_Counter buffers_written;
     PgStat_Counter slru_written;
-    PgStat_Counter checkpoint_total_time;  /* new: total ms of last checkpoint */
+    PgStat_Counter last_checkpoint_duration;  /* new: total ms of last checkpoint */
     TimestampTz    last_checkpoint_time;   /* new: end time of last checkpoint */
     TimestampTz    stat_reset_timestamp;
 } PgStat_CheckpointerStats;

This looks like an incremental patch based on your original one? It is
customary to send the full, updated, patch again.


Michael



Re: [PATCH] Expose checkpoint timestamp and duration in pg_stat_checkpointer

От
Soumya S Murali
Дата:
Hi all,

> On Fri, Nov 28, 2025 at 10:23:54AM +0530, Soumya S Murali wrote:
> > I have updated the code based on the feedback received to my earlier
> > mails and prepared a patch for further review.
>
> I think the logging change and the pg_stat_checkpointer changes are
> different enough that they should be separate patches. If not just
> because the logging change seems to not have had any non-positive
> feedback.

Thank you for the review and for the clarification. I understand the
point about separating the logging change and the pg_stat_checkpointer
additions. As per the suggestion, I will make sure to split them into
two independent patches before sending the updated one.

> > In this patch, I have renamed the checkpoint_total_time to
> > last_checkpoint_duration, stats_reset has been kept as the last column
> > following the usual pattern, last_checkpoint_duration and
> > last_checkpoint_time will now be overwritten per checkpoint and also
> > have removed unnecessary lines as per the usual format. I had
> > successfully verified the checkpointer duration with different write
> > loads and I am  attaching the observations for further reference.
>
> I am still not convinced of the usefulness of those changes to
> pg_stat_checkpointer, but some feedback on the patch:

According to my understanding, The monitoring systems can already poll
pg_stat_checkpointer at a reasonable frequency but with the checkpoint
duration values exposed, I think it will be easier to compute - the
checkpoint deltas, fluctuations in duration, notice unusualities and
the timing instabilities in WAL-driven checkpoints etc. These may seem
simple but are useful signals that many existing monitoring dashboards
lack today.

> diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
> index 9217508917..4a45f4f708 100644
> --- a/src/backend/access/transam/xlog.c
> +++ b/src/backend/access/transam/xlog.c
> @@ -6778,7 +6778,7 @@ LogCheckpointEnd(bool restartpoint, int flags)
>
>
>         /* Store in PendingCheckpointerStats */
> -       PendingCheckpointerStats.checkpoint_total_time += (double) total_msecs;
> +       PendingCheckpointerStats.last_checkpoint_duration = (double) total_msecs;
>         PendingCheckpointerStats.last_checkpoint_time = CheckpointStats.ckpt_end_t;
>
> [...]
>
> diff --git a/src/include/pgstat.h b/src/include/pgstat.h
> index a8eb1f8add..73688041c8 100644
> --- a/src/include/pgstat.h
> +++ b/src/include/pgstat.h
> @@ -263,7 +263,7 @@ typedef struct PgStat_CheckpointerStats
>         PgStat_Counter sync_time;
>         PgStat_Counter buffers_written;
>         PgStat_Counter slru_written;
> -       PgStat_Counter checkpoint_total_time;  /* new: total ms of last checkpoint */
> +       PgStat_Counter last_checkpoint_duration;  /* new: total ms of last checkpoint */
>         TimestampTz    last_checkpoint_time;   /* new: end time of last checkpoint */
>         TimestampTz    stat_reset_timestamp;
>  } PgStat_CheckpointerStats;
>
> This looks like an incremental patch based on your original one? It is
> customary to send the full, updated, patch again.
>
>
> Michael

Ok noted. I will resend a full updated patch set soon and will make
sure every updation goes as per the intended flow.
Thank you for the guidance. Looking forward to more feedback.

Regards,
Soumya



Re: [PATCH] Expose checkpoint timestamp and duration in pg_stat_checkpointer

От
Michael Banck
Дата:
Hi,

On Mon, Dec 01, 2025 at 11:05:19AM +0530, Soumya S Murali wrote:
> > On Fri, Nov 28, 2025 at 10:23:54AM +0530, Soumya S Murali wrote:
> > I am still not convinced of the usefulness of those changes to
> > pg_stat_checkpointer, but some feedback on the patch:
>
> According to my understanding, The monitoring systems can already poll
> pg_stat_checkpointer at a reasonable frequency but with the checkpoint
> duration values exposed, I think it will be easier to compute - the
> checkpoint deltas, fluctuations in duration, notice unusualities and
> the timing instabilities in WAL-driven checkpoints etc. These may seem
> simple but are useful signals that many existing monitoring dashboards
> lack today.

How would such a computation look like? Maybe if you give an example, it
would be easier to understand how this would make things better/more
robust.

I mentioned up-thread that one problem would be multiple checkpoints
having happened between two monitoring runs, where the monitoring system
sees the duration of the last checkpoint, but maybe more than one
happened. Should they keep track of the number of overall checkpoints
and adjust in that case?

To be more general: we don't store the last duration anywhere else (as
far as I can see, happy to be prove wrong), why is this essential for
checkpoint duration, and not other things? Or to put it another way: why
does the patch change it for checkpoint but not all the other places?


Michael



Re: [PATCH] Expose checkpoint timestamp and duration in pg_stat_checkpointer

От
Soumya S Murali
Дата:
Hi all,

Thank you for the review and kind feedback.

On Mon, Dec 1, 2025 at 1:45 PM Michael Banck <mbanck@gmx.net> wrote:
>
> Hi,
>
> On Mon, Dec 01, 2025 at 11:05:19AM +0530, Soumya S Murali wrote:
> > > On Fri, Nov 28, 2025 at 10:23:54AM +0530, Soumya S Murali wrote:
> > > I am still not convinced of the usefulness of those changes to
> > > pg_stat_checkpointer, but some feedback on the patch:
> >
> > According to my understanding, The monitoring systems can already poll
> > pg_stat_checkpointer at a reasonable frequency but with the checkpoint
> > duration values exposed, I think it will be easier to compute - the
> > checkpoint deltas, fluctuations in duration, notice unusualities and
> > the timing instabilities in WAL-driven checkpoints etc. These may seem
> > simple but are useful signals that many existing monitoring dashboards
> > lack today.
>
> How would such a computation look like? Maybe if you give an example, it
> would be easier to understand how this would make things better/more
> robust.

Consider a monitoring agent polls pg_stat_checkpointer every 30
seconds, It will read total write_time, total sync_time, counters and
the last checkpoint duration and timestamp (as in my proposal). Even
if multiple checkpoints happen between two samples, having the last
duration and last timestamp allows the monitoring system to spot
sudden slow checkpoints. For eg:- Imagine if the
last_checkpoint_duration suddenly jumps from approx (300 ms to 5000
ms), the monitoring system can alert immediately, even if multiple
checkpoints happened in between. But this is hard to find out purely
from cumulative write_time/sync_time without doing  complex delta
calculations. And also If the timestamp shows checkpoints happening
much closer together than expected, the tool can alert it as “unusual
high checkpoint frequency” indicating any of the cases like an
aggressive WAL-producing workload errors, checkpoint_completion_target
not being met or I/O layer becoming saturated. This type of detection
becomes easier when the last checkpoint’s end time is visible
directly.

> I mentioned up-thread that one problem would be multiple checkpoints
> having happened between two monitoring runs, where the monitoring system
> sees the duration of the last checkpoint, but maybe more than one
> happened. Should they keep track of the number of overall checkpoints
> and adjust in that case?
>
> To be more general: we don't store the last duration anywhere else (as
> far as I can see, happy to be prove wrong), why is this essential for
> checkpoint duration, and not other things? Or to put it another way: why
> does the patch change it for checkpoint but not all the other places?
>
>
> Michael

You are right that the last duration has not been stored anywhere else
so far and it is a fact that most pg_stat views expose only cumulative
counters. The reason this patch focuses specifically on checkpoints is
that checkpoint timing is one of the few parameters where a single
reading of an event can directly indicate instability and other
irregularities. A single unusual long checkpoint often implies some of
the conditions like backend stalls, WAL flush bottlenecks, extended
buffer recycling, slowdowns in bgwriter or checkpointer I/O
instabilities. So storing the last checkpoint duration is indeed a
small extension, but it offers a direct signal that many monitoring
dashboards currently lack.
I hope this explanation will be helpful to understand more clearly
regarding the patch. Looking forward to more feedback.

Regards,
Soumya