Обсуждение: Fix lag columns in pg_stat_replication not advancing when replay LSN stalls

Поиск

Список

Период

Сортировка

Fix lag columns in pg_stat_replication not advancing when replay LSN stalls

От

Fujii Masao

Дата:

17 октября 2025 г., 06:56:55

Hi,

While testing, I noticed that write_lag and flush_lag in pg_stat_replication
initially advanced but eventually stopped updating. This happened when
I started pg_receivewal, ran pgbench, and periodically monitored
pg_stat_replication.

My analysis shows that this issue occurs when any of the write, flush,
or replay LSNs in the standby’s feedback message stop updating for some time.
In the case of pg_receivewal, the replay LSN is always invalid (never updated),
which triggers the problem. Similarly, in regular streaming replication,
if the replay LSN remains unchanged for a long time—such as during
a recovery conflict—the lag values for both write and flush can stop advancing.

The root cause seems to be that when any of the LSNs stop updating,
the lag tracker's cyclic buffer becomes full (the write head reaches
the slowest read head). In this situation, LagTrackerWrite() and
LagTrackerRead() didn't handle the full-buffer condition properly.
For instance, if the replay LSN stalls, the buffer fills up and the read heads
for "write" and "flush" end up at the same position as the write head.
This causes LagTrackerRead() to return -1 for both, preventing write_lag
and flush_lag from advancing.

The attached patch fixes the problem by treating the slowest read entry
(the one causing the buffer to fill up) as a separate overflow entry,
allowing the lag tracker to continue operating correctly.

Thoughts?

Regards,

--
Fujii Masao

Вложения

v1-0001-Fix-lag-columns-in-pg_stat_replication-not-advanc.patch

Re: Fix lag columns in pg_stat_replication not advancing when replay LSN stalls

От

Chao Li

Дата:

17 октября 2025 г., 11:11:20


> On Oct 17, 2025, at 11:56, Fujii Masao <masao.fujii@gmail.com> wrote:
>
> Hi,
>
> While testing, I noticed that write_lag and flush_lag in pg_stat_replication
> initially advanced but eventually stopped updating. This happened when
> I started pg_receivewal, ran pgbench, and periodically monitored
> pg_stat_replication.
>
> My analysis shows that this issue occurs when any of the write, flush,
> or replay LSNs in the standby’s feedback message stop updating for some time.
> In the case of pg_receivewal, the replay LSN is always invalid (never updated),
> which triggers the problem. Similarly, in regular streaming replication,
> if the replay LSN remains unchanged for a long time—such as during
> a recovery conflict—the lag values for both write and flush can stop advancing.
>
> The root cause seems to be that when any of the LSNs stop updating,
> the lag tracker's cyclic buffer becomes full (the write head reaches
> the slowest read head). In this situation, LagTrackerWrite() and
> LagTrackerRead() didn't handle the full-buffer condition properly.
> For instance, if the replay LSN stalls, the buffer fills up and the read heads
> for "write" and "flush" end up at the same position as the write head.
> This causes LagTrackerRead() to return -1 for both, preventing write_lag
> and flush_lag from advancing.
>
> The attached patch fixes the problem by treating the slowest read entry
> (the one causing the buffer to fill up) as a separate overflow entry,
> allowing the lag tracker to continue operating correctly.
>
> --
> Fujii Masao
> <v1-0001-Fix-lag-columns-in-pg_stat_replication-not-advanc.patch>

It took me some time to understand this fix. My most confusing was that once overwrite happens, how a reader head to
catchup again? Finally I figured it out: 

```
+        lag_tracker->read_heads[head] =
+            (lag_tracker->write_head + 1) % LAG_TRACKER_BUFFER_SIZE;
```

"(lag_tracker->write_head + 1) % LAG_TRACKER_BUFFER_SIZE” points to the oldest LSN in the ring, from where an
overflowedreader head starts to catch up. 

I have no comment on the code change. Nice patch!

All I wonder is if we can add a TAP test for this fix?

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Re: Fix lag columns in pg_stat_replication not advancing when replay LSN stalls

От

Fujii Masao

Дата:

17 октября 2025 г., 17:28:17

On Fri, Oct 17, 2025 at 5:11 PM Chao Li <li.evan.chao@gmail.com> wrote:
> It took me some time to understand this fix. My most confusing was that once overwrite happens, how a reader head to
catchup again? Finally I figured it out: 
>
> ```
> +               lag_tracker->read_heads[head] =
> +                       (lag_tracker->write_head + 1) % LAG_TRACKER_BUFFER_SIZE;
> ```
>
> "(lag_tracker->write_head + 1) % LAG_TRACKER_BUFFER_SIZE” points to the oldest LSN in the ring, from where an
overflowedreader head starts to catch up. 
>
> I have no comment on the code change. Nice patch!

Thanks for the review!

I've updated the source comment to make the code easier to understand.
The updated patch is attached.

> All I wonder is if we can add a TAP test for this fix?

I think it would be good to add a test for this fix, but reproducing
the condition
where the buffer fills up and the slowest read entry overflows takes a time.
Because of that, I'm not sure adding such a potentially slow test is a
good idea.

Regards,

--
Fujii Masao

Вложения

v2-0001-Fix-stalled-lag-columns-in-pg_stat_replication-wh.patch

Re: Fix lag columns in pg_stat_replication not advancing when replay LSN stalls

От

Chao Li

Дата:

18 октября 2025 г., 03:16:01


> On Oct 17, 2025, at 22:28, Fujii Masao <masao.fujii@gmail.com> wrote:
>
> On Fri, Oct 17, 2025 at 5:11 PM Chao Li <li.evan.chao@gmail.com> wrote:
>> It took me some time to understand this fix. My most confusing was that once overwrite happens, how a reader head to
catchup again? Finally I figured it out: 
>>
>> ```
>> +               lag_tracker->read_heads[head] =
>> +                       (lag_tracker->write_head + 1) % LAG_TRACKER_BUFFER_SIZE;
>> ```
>>
>> "(lag_tracker->write_head + 1) % LAG_TRACKER_BUFFER_SIZE” points to the oldest LSN in the ring, from where an
overflowedreader head starts to catch up. 
>>
>> I have no comment on the code change. Nice patch!
>
> Thanks for the review!
>
> I've updated the source comment to make the code easier to understand.
> The updated patch is attached.
>
> <v2-0001-Fix-stalled-lag-columns-in-pg_stat_replication-wh.patch>

Thanks for adding the comment.

I think I put all concentration on the big picture yesterday, so I ignored this implementation detail:

```
+        if (lag_tracker->overflowed[head].lsn > lsn)
+            return now - lag_tracker->overflowed[head].time;
```

Should this “>” be “>=“?

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Re: Fix lag columns in pg_stat_replication not advancing when replay LSN stalls

От

Shinya Kato

Дата:

18 октября 2025 г., 20:04:18

Hi,

On Fri, Oct 17, 2025 at 12:57 PM Fujii Masao <masao.fujii@gmail.com> wrote:
>
> Hi,
>
> While testing, I noticed that write_lag and flush_lag in pg_stat_replication
> initially advanced but eventually stopped updating. This happened when
> I started pg_receivewal, ran pgbench, and periodically monitored
> pg_stat_replication.

Nice catch! I reproduced the same issue.

>
> My analysis shows that this issue occurs when any of the write, flush,
> or replay LSNs in the standby’s feedback message stop updating for some time.
> In the case of pg_receivewal, the replay LSN is always invalid (never updated),
> which triggers the problem. Similarly, in regular streaming replication,
> if the replay LSN remains unchanged for a long time—such as during
> a recovery conflict—the lag values for both write and flush can stop advancing.
>
> The root cause seems to be that when any of the LSNs stop updating,
> the lag tracker's cyclic buffer becomes full (the write head reaches
> the slowest read head). In this situation, LagTrackerWrite() and
> LagTrackerRead() didn't handle the full-buffer condition properly.
> For instance, if the replay LSN stalls, the buffer fills up and the read heads
> for "write" and "flush" end up at the same position as the write head.
> This causes LagTrackerRead() to return -1 for both, preventing write_lag
> and flush_lag from advancing.
>
> The attached patch fixes the problem by treating the slowest read entry
> (the one causing the buffer to fill up) as a separate overflow entry,
> allowing the lag tracker to continue operating correctly.

Thank you for the patch. I have one comment.

+       if (lag_tracker->overflowed[head].lsn > lsn)
+           return now - lag_tracker->overflowed[head].time;

Could this return a negative value if the clock somehow went
backwards? The original code returns -1 in this case, so I'm curious
about this.

--
Best regards,
Shinya Kato
NTT OSS Center

Re: Fix lag columns in pg_stat_replication not advancing when replay LSN stalls

От

Fujii Masao

Дата:

20 октября 2025 г., 04:12:07

On Sat, Oct 18, 2025 at 9:16 AM Chao Li <li.evan.chao@gmail.com> wrote:
> I think I put all concentration on the big picture yesterday, so I ignored this implementation detail:
>
> ```
> +               if (lag_tracker->overflowed[head].lsn > lsn)
> +                       return now - lag_tracker->overflowed[head].time;
> ```
>
> Should this “>” be “>=“?

I think either ">" or ">=" would work. However, the following loop
in LagTrackerRead() advances the read head when the LSN
in the current buffer entry is equal to the given LSN, so I followed
the same logic for the overflow entry and used ">" there.

Regards,

--
Fujii Masao

Re: Fix lag columns in pg_stat_replication not advancing when replay LSN stalls

От

Fujii Masao

Дата:

20 октября 2025 г., 04:17:21

On Sun, Oct 19, 2025 at 2:04 AM Shinya Kato <shinya11.kato@gmail.com> wrote:
> Thank you for the patch. I have one comment.
>
> +       if (lag_tracker->overflowed[head].lsn > lsn)
> +           return now - lag_tracker->overflowed[head].time;
>
> Could this return a negative value if the clock somehow went
> backwards? The original code returns -1 in this case, so I'm curious
> about this.

Thanks for the review!

Yes, you're right. So I've updated the patch so that -1 is returned
when the current time is earlier than the time in the overflow entry,
treating it as "no new sample found".

Regards,

--
Fujii Masao

Вложения

v3-0001-Fix-stalled-lag-columns-in-pg_stat_replication-wh.patch

Re: Fix lag columns in pg_stat_replication not advancing when replay LSN stalls

От

Fujii Masao

Дата:

20 октября 2025 г., 04:20:41

On Mon, Oct 20, 2025 at 10:12 AM Fujii Masao <masao.fujii@gmail.com> wrote:
>
> On Sat, Oct 18, 2025 at 9:16 AM Chao Li <li.evan.chao@gmail.com> wrote:
> > I think I put all concentration on the big picture yesterday, so I ignored this implementation detail:
> >
> > ```
> > +               if (lag_tracker->overflowed[head].lsn > lsn)
> > +                       return now - lag_tracker->overflowed[head].time;
> > ```
> >
> > Should this “>” be “>=“?
>
> I think either ">" or ">=" would work. However, the following loop
> in LagTrackerRead() advances the read head when the LSN
> in the current buffer entry is equal to the given LSN

I forgot to mention which loop actually advances the read head in that case.
It's the following code in LagTrackerRead().

------------------------
/* Read all unread samples up to this LSN or end of buffer. */
while (lag_tracker->read_heads[head] != lag_tracker->write_head &&
   lag_tracker->buffer[lag_tracker->read_heads[head]].lsn <= lsn)
{
time = lag_tracker->buffer[lag_tracker->read_heads[head]].time;
lag_tracker->last_read[head] =
lag_tracker->buffer[lag_tracker->read_heads[head]];
lag_tracker->read_heads[head] =
(lag_tracker->read_heads[head] + 1) % LAG_TRACKER_BUFFER_SIZE;
}
------------------------

Regards,

--
Fujii Masao

Re: Fix lag columns in pg_stat_replication not advancing when replay LSN stalls

От

Shinya Kato

Дата:

20 октября 2025 г., 07:45:01

On Mon, Oct 20, 2025 at 10:17 AM Fujii Masao <masao.fujii@gmail.com> wrote:
>
> On Sun, Oct 19, 2025 at 2:04 AM Shinya Kato <shinya11.kato@gmail.com> wrote:
> > Thank you for the patch. I have one comment.
> >
> > +       if (lag_tracker->overflowed[head].lsn > lsn)
> > +           return now - lag_tracker->overflowed[head].time;
> >
> > Could this return a negative value if the clock somehow went
> > backwards? The original code returns -1 in this case, so I'm curious
> > about this.
>
> Thanks for the review!
>
> Yes, you're right. So I've updated the patch so that -1 is returned
> when the current time is earlier than the time in the overflow entry,
> treating it as "no new sample found".

Thank you for updating the patch. It looks nice to me.

--
Best regards,
Shinya Kato
NTT OSS Center

Re: Fix lag columns in pg_stat_replication not advancing when replay LSN stalls

От

Fujii Masao

Дата:

21 октября 2025 г., 02:44:48

On Mon, Oct 20, 2025 at 1:45 PM Shinya Kato <shinya11.kato@gmail.com> wrote:
>
> On Mon, Oct 20, 2025 at 10:17 AM Fujii Masao <masao.fujii@gmail.com> wrote:
> >
> > On Sun, Oct 19, 2025 at 2:04 AM Shinya Kato <shinya11.kato@gmail.com> wrote:
> > > Thank you for the patch. I have one comment.
> > >
> > > +       if (lag_tracker->overflowed[head].lsn > lsn)
> > > +           return now - lag_tracker->overflowed[head].time;
> > >
> > > Could this return a negative value if the clock somehow went
> > > backwards? The original code returns -1 in this case, so I'm curious
> > > about this.
> >
> > Thanks for the review!
> >
> > Yes, you're right. So I've updated the patch so that -1 is returned
> > when the current time is earlier than the time in the overflow entry,
> > treating it as "no new sample found".
>
> Thank you for updating the patch. It looks nice to me.

Thanks for the review!
Unless there are any objections, I'll commit the patch and
backpatch it to all supported branches.

Regards,

--
Fujii Masao

Re: Fix lag columns in pg_stat_replication not advancing when replay LSN stalls

От

Xuneng Zhou

Дата:

21 октября 2025 г., 17:52:39

Hi Fujii-san,

On Tue, Oct 21, 2025 at 7:45 AM Fujii Masao <masao.fujii@gmail.com> wrote:
>
> On Mon, Oct 20, 2025 at 1:45 PM Shinya Kato <shinya11.kato@gmail.com> wrote:
> >
> > On Mon, Oct 20, 2025 at 10:17 AM Fujii Masao <masao.fujii@gmail.com> wrote:
> > >
> > > On Sun, Oct 19, 2025 at 2:04 AM Shinya Kato <shinya11.kato@gmail.com> wrote:
> > > > Thank you for the patch. I have one comment.
> > > >
> > > > +       if (lag_tracker->overflowed[head].lsn > lsn)
> > > > +           return now - lag_tracker->overflowed[head].time;
> > > >
> > > > Could this return a negative value if the clock somehow went
> > > > backwards? The original code returns -1 in this case, so I'm curious
> > > > about this.
> > >
> > > Thanks for the review!
> > >
> > > Yes, you're right. So I've updated the patch so that -1 is returned
> > > when the current time is earlier than the time in the overflow entry,
> > > treating it as "no new sample found".
> >
> > Thank you for updating the patch. It looks nice to me.
>
> Thanks for the review!
> Unless there are any objections, I'll commit the patch and
> backpatch it to all supported branches.
>

Thanks for working on this.

The patch LGTM.  I am wondering whether it is helpful to add some
comments for this overflowed array
WalTimeSample overflowed[NUM_SYNC_REP_WAIT_MODE];

and replacing literal zeros with the constant InvalidXLogRecPtr for
better readability.

/* InvalidXLogRecPtr means no overflow yet */
if (lag_tracker->overflowed[i].lsn == InvalidXLogRecPtr)

Best,
Xuneng

Re: Fix lag columns in pg_stat_replication not advancing when replay LSN stalls

От

Fujii Masao

Дата:

22 октября 2025 г., 05:34:13

On Tue, Oct 21, 2025 at 11:52 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> The patch LGTM.

Thanks for the review!
I've pushed the patch and backpatched it to all supported versions.

> I am wondering whether it is helpful to add some
> comments for this overflowed array

Yes, do you have any specific suggestions?

> and replacing literal zeros with the constant InvalidXLogRecPtr for
> better readability.
>
> /* InvalidXLogRecPtr means no overflow yet */
> if (lag_tracker->overflowed[i].lsn == InvalidXLogRecPtr)

I couldn't find any code like "lag_tracker->overflowed[i].lsn == 0",
so I'm not sure which part should be replaced with InvalidXLogRecPtr.
Could you point me to the exact location?

Regards,

--
Fujii Masao

Re: Fix lag columns in pg_stat_replication not advancing when replay LSN stalls

От

Xuneng Zhou

Дата:

22 октября 2025 г., 10:49:18

Hi,

On Wed, Oct 22, 2025 at 10:34 AM Fujii Masao <masao.fujii@gmail.com> wrote:
>
> On Tue, Oct 21, 2025 at 11:52 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > The patch LGTM.
>
> Thanks for the review!
> I've pushed the patch and backpatched it to all supported versions.
>
>
> > I am wondering whether it is helpful to add some
> > comments for this overflowed array
>
> Yes, do you have any specific suggestions?
>

How about something like:

/*
 * Overflow entries for read heads that collide with the write head.
 *
 * When the cyclic buffer fills (write head is about to collide with a read
 * head), we save that read head's current sample here and mark it as using
 * overflow (read_heads[i] = -1). This allows the write head to continue
 * advancing while the overflowed mode continues lag computation using the
 * saved sample.
 *
 * Once the standby's reported LSN advances past the overflow entry's LSN,
 * we transition back to normal buffer-based tracking.
 */

> > and replacing literal zeros with the constant InvalidXLogRecPtr for
> > better readability.
> >
> > /* InvalidXLogRecPtr means no overflow yet */
> > if (lag_tracker->overflowed[i].lsn == InvalidXLogRecPtr)
>
> I couldn't find any code like "lag_tracker->overflowed[i].lsn == 0",
> so I'm not sure which part should be replaced with InvalidXLogRecPtr.
> Could you point me to the exact location?
>

Sorry for the noise here, I mean
if (lag_tracker->read_heads[head] == InvalidXLogRecPtr) before.
But with this change, position 0 would be treated as overflow mode,
which is clearly wrong...

Best,
Xuneng

Re: Fix lag columns in pg_stat_replication not advancing when replay LSN stalls

От

Fujii Masao

Дата:

22 октября 2025 г., 17:45:39

On Wed, Oct 22, 2025 at 4:49 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> How about something like:
>
> /*
>  * Overflow entries for read heads that collide with the write head.
>  *
>  * When the cyclic buffer fills (write head is about to collide with a read
>  * head), we save that read head's current sample here and mark it as using
>  * overflow (read_heads[i] = -1). This allows the write head to continue
>  * advancing while the overflowed mode continues lag computation using the
>  * saved sample.
>  *
>  * Once the standby's reported LSN advances past the overflow entry's LSN,
>  * we transition back to normal buffer-based tracking.
>  */

LGTM. Thanks!

I've created a patch adding your suggested comments (attached).
Since this is a follow-up to commit 883a95646a8, which was recently applied
and backpatched to all supported branches, I think we should backpatch
this one as well. Thought?

Regards,

--
Fujii Masao

Вложения

v1-0001-Add-comments-explaining-overflow-entries-in-the-r.patch

Re: Fix lag columns in pg_stat_replication not advancing when replay LSN stalls

От

Xuneng Zhou

Дата:

22 октября 2025 г., 18:25:51

Hi,

On Wed, Oct 22, 2025 at 10:45 PM Fujii Masao <masao.fujii@gmail.com> wrote:
>
> On Wed, Oct 22, 2025 at 4:49 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > How about something like:
> >
> > /*
> >  * Overflow entries for read heads that collide with the write head.
> >  *
> >  * When the cyclic buffer fills (write head is about to collide with a read
> >  * head), we save that read head's current sample here and mark it as using
> >  * overflow (read_heads[i] = -1). This allows the write head to continue
> >  * advancing while the overflowed mode continues lag computation using the
> >  * saved sample.
> >  *
> >  * Once the standby's reported LSN advances past the overflow entry's LSN,
> >  * we transition back to normal buffer-based tracking.
> >  */
>
> LGTM. Thanks!
>
> I've created a patch adding your suggested comments (attached).
> Since this is a follow-up to commit 883a95646a8, which was recently applied
> and backpatched to all supported branches, I think we should backpatch
> this one as well. Thought?
>

LGTM. Thanks for the patch!

Best,
Xuneng

Re: Fix lag columns in pg_stat_replication not advancing when replay LSN stalls

От

Fujii Masao

Дата:

23 октября 2025 г., 07:30:21

On Thu, Oct 23, 2025 at 12:26 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > I've created a patch adding your suggested comments (attached).
> > Since this is a follow-up to commit 883a95646a8, which was recently applied
> > and backpatched to all supported branches, I think we should backpatch
> > this one as well. Thought?
> >
>
> LGTM. Thanks for the patch!

I've pushed the patch. Thanks!

Regards,

--
Fujii Masao

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Fix lag columns in pg_stat_replication not advancing when replay LSN stalls

Вложения

Вложения

Вложения

Вложения