Обсуждение: Fix lag columns in pg_stat_replication not advancing when replay LSN stalls
Hi, While testing, I noticed that write_lag and flush_lag in pg_stat_replication initially advanced but eventually stopped updating. This happened when I started pg_receivewal, ran pgbench, and periodically monitored pg_stat_replication. My analysis shows that this issue occurs when any of the write, flush, or replay LSNs in the standby’s feedback message stop updating for some time. In the case of pg_receivewal, the replay LSN is always invalid (never updated), which triggers the problem. Similarly, in regular streaming replication, if the replay LSN remains unchanged for a long time—such as during a recovery conflict—the lag values for both write and flush can stop advancing. The root cause seems to be that when any of the LSNs stop updating, the lag tracker's cyclic buffer becomes full (the write head reaches the slowest read head). In this situation, LagTrackerWrite() and LagTrackerRead() didn't handle the full-buffer condition properly. For instance, if the replay LSN stalls, the buffer fills up and the read heads for "write" and "flush" end up at the same position as the write head. This causes LagTrackerRead() to return -1 for both, preventing write_lag and flush_lag from advancing. The attached patch fixes the problem by treating the slowest read entry (the one causing the buffer to fill up) as a separate overflow entry, allowing the lag tracker to continue operating correctly. Thoughts? Regards, -- Fujii Masao
Вложения
> On Oct 17, 2025, at 11:56, Fujii Masao <masao.fujii@gmail.com> wrote: > > Hi, > > While testing, I noticed that write_lag and flush_lag in pg_stat_replication > initially advanced but eventually stopped updating. This happened when > I started pg_receivewal, ran pgbench, and periodically monitored > pg_stat_replication. > > My analysis shows that this issue occurs when any of the write, flush, > or replay LSNs in the standby’s feedback message stop updating for some time. > In the case of pg_receivewal, the replay LSN is always invalid (never updated), > which triggers the problem. Similarly, in regular streaming replication, > if the replay LSN remains unchanged for a long time—such as during > a recovery conflict—the lag values for both write and flush can stop advancing. > > The root cause seems to be that when any of the LSNs stop updating, > the lag tracker's cyclic buffer becomes full (the write head reaches > the slowest read head). In this situation, LagTrackerWrite() and > LagTrackerRead() didn't handle the full-buffer condition properly. > For instance, if the replay LSN stalls, the buffer fills up and the read heads > for "write" and "flush" end up at the same position as the write head. > This causes LagTrackerRead() to return -1 for both, preventing write_lag > and flush_lag from advancing. > > The attached patch fixes the problem by treating the slowest read entry > (the one causing the buffer to fill up) as a separate overflow entry, > allowing the lag tracker to continue operating correctly. > > -- > Fujii Masao > <v1-0001-Fix-lag-columns-in-pg_stat_replication-not-advanc.patch> It took me some time to understand this fix. My most confusing was that once overwrite happens, how a reader head to catchup again? Finally I figured it out: ``` + lag_tracker->read_heads[head] = + (lag_tracker->write_head + 1) % LAG_TRACKER_BUFFER_SIZE; ``` "(lag_tracker->write_head + 1) % LAG_TRACKER_BUFFER_SIZE” points to the oldest LSN in the ring, from where an overflowedreader head starts to catch up. I have no comment on the code change. Nice patch! All I wonder is if we can add a TAP test for this fix? Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/
Re: Fix lag columns in pg_stat_replication not advancing when replay LSN stalls
От
Fujii Masao
Дата:
On Fri, Oct 17, 2025 at 5:11 PM Chao Li <li.evan.chao@gmail.com> wrote: > It took me some time to understand this fix. My most confusing was that once overwrite happens, how a reader head to catchup again? Finally I figured it out: > > ``` > + lag_tracker->read_heads[head] = > + (lag_tracker->write_head + 1) % LAG_TRACKER_BUFFER_SIZE; > ``` > > "(lag_tracker->write_head + 1) % LAG_TRACKER_BUFFER_SIZE” points to the oldest LSN in the ring, from where an overflowedreader head starts to catch up. > > I have no comment on the code change. Nice patch! Thanks for the review! I've updated the source comment to make the code easier to understand. The updated patch is attached. > All I wonder is if we can add a TAP test for this fix? I think it would be good to add a test for this fix, but reproducing the condition where the buffer fills up and the slowest read entry overflows takes a time. Because of that, I'm not sure adding such a potentially slow test is a good idea. Regards, -- Fujii Masao
Вложения
> On Oct 17, 2025, at 22:28, Fujii Masao <masao.fujii@gmail.com> wrote: > > On Fri, Oct 17, 2025 at 5:11 PM Chao Li <li.evan.chao@gmail.com> wrote: >> It took me some time to understand this fix. My most confusing was that once overwrite happens, how a reader head to catchup again? Finally I figured it out: >> >> ``` >> + lag_tracker->read_heads[head] = >> + (lag_tracker->write_head + 1) % LAG_TRACKER_BUFFER_SIZE; >> ``` >> >> "(lag_tracker->write_head + 1) % LAG_TRACKER_BUFFER_SIZE” points to the oldest LSN in the ring, from where an overflowedreader head starts to catch up. >> >> I have no comment on the code change. Nice patch! > > Thanks for the review! > > I've updated the source comment to make the code easier to understand. > The updated patch is attached. > > <v2-0001-Fix-stalled-lag-columns-in-pg_stat_replication-wh.patch> Thanks for adding the comment. I think I put all concentration on the big picture yesterday, so I ignored this implementation detail: ``` + if (lag_tracker->overflowed[head].lsn > lsn) + return now - lag_tracker->overflowed[head].time; ``` Should this “>” be “>=“? Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/
Re: Fix lag columns in pg_stat_replication not advancing when replay LSN stalls
От
Shinya Kato
Дата:
Hi, On Fri, Oct 17, 2025 at 12:57 PM Fujii Masao <masao.fujii@gmail.com> wrote: > > Hi, > > While testing, I noticed that write_lag and flush_lag in pg_stat_replication > initially advanced but eventually stopped updating. This happened when > I started pg_receivewal, ran pgbench, and periodically monitored > pg_stat_replication. Nice catch! I reproduced the same issue. > > My analysis shows that this issue occurs when any of the write, flush, > or replay LSNs in the standby’s feedback message stop updating for some time. > In the case of pg_receivewal, the replay LSN is always invalid (never updated), > which triggers the problem. Similarly, in regular streaming replication, > if the replay LSN remains unchanged for a long time—such as during > a recovery conflict—the lag values for both write and flush can stop advancing. > > The root cause seems to be that when any of the LSNs stop updating, > the lag tracker's cyclic buffer becomes full (the write head reaches > the slowest read head). In this situation, LagTrackerWrite() and > LagTrackerRead() didn't handle the full-buffer condition properly. > For instance, if the replay LSN stalls, the buffer fills up and the read heads > for "write" and "flush" end up at the same position as the write head. > This causes LagTrackerRead() to return -1 for both, preventing write_lag > and flush_lag from advancing. > > The attached patch fixes the problem by treating the slowest read entry > (the one causing the buffer to fill up) as a separate overflow entry, > allowing the lag tracker to continue operating correctly. Thank you for the patch. I have one comment. + if (lag_tracker->overflowed[head].lsn > lsn) + return now - lag_tracker->overflowed[head].time; Could this return a negative value if the clock somehow went backwards? The original code returns -1 in this case, so I'm curious about this. -- Best regards, Shinya Kato NTT OSS Center
Re: Fix lag columns in pg_stat_replication not advancing when replay LSN stalls
От
Fujii Masao
Дата:
On Sat, Oct 18, 2025 at 9:16 AM Chao Li <li.evan.chao@gmail.com> wrote: > I think I put all concentration on the big picture yesterday, so I ignored this implementation detail: > > ``` > + if (lag_tracker->overflowed[head].lsn > lsn) > + return now - lag_tracker->overflowed[head].time; > ``` > > Should this “>” be “>=“? I think either ">" or ">=" would work. However, the following loop in LagTrackerRead() advances the read head when the LSN in the current buffer entry is equal to the given LSN, so I followed the same logic for the overflow entry and used ">" there. Regards, -- Fujii Masao
Re: Fix lag columns in pg_stat_replication not advancing when replay LSN stalls
От
Fujii Masao
Дата:
On Sun, Oct 19, 2025 at 2:04 AM Shinya Kato <shinya11.kato@gmail.com> wrote: > Thank you for the patch. I have one comment. > > + if (lag_tracker->overflowed[head].lsn > lsn) > + return now - lag_tracker->overflowed[head].time; > > Could this return a negative value if the clock somehow went > backwards? The original code returns -1 in this case, so I'm curious > about this. Thanks for the review! Yes, you're right. So I've updated the patch so that -1 is returned when the current time is earlier than the time in the overflow entry, treating it as "no new sample found". Regards, -- Fujii Masao
Вложения
Re: Fix lag columns in pg_stat_replication not advancing when replay LSN stalls
От
Fujii Masao
Дата:
On Mon, Oct 20, 2025 at 10:12 AM Fujii Masao <masao.fujii@gmail.com> wrote:
>
> On Sat, Oct 18, 2025 at 9:16 AM Chao Li <li.evan.chao@gmail.com> wrote:
> > I think I put all concentration on the big picture yesterday, so I ignored this implementation detail:
> >
> > ```
> > + if (lag_tracker->overflowed[head].lsn > lsn)
> > + return now - lag_tracker->overflowed[head].time;
> > ```
> >
> > Should this “>” be “>=“?
>
> I think either ">" or ">=" would work. However, the following loop
> in LagTrackerRead() advances the read head when the LSN
> in the current buffer entry is equal to the given LSN
I forgot to mention which loop actually advances the read head in that case.
It's the following code in LagTrackerRead().
------------------------
/* Read all unread samples up to this LSN or end of buffer. */
while (lag_tracker->read_heads[head] != lag_tracker->write_head &&
lag_tracker->buffer[lag_tracker->read_heads[head]].lsn <= lsn)
{
time = lag_tracker->buffer[lag_tracker->read_heads[head]].time;
lag_tracker->last_read[head] =
lag_tracker->buffer[lag_tracker->read_heads[head]];
lag_tracker->read_heads[head] =
(lag_tracker->read_heads[head] + 1) % LAG_TRACKER_BUFFER_SIZE;
}
------------------------
Regards,
--
Fujii Masao
Re: Fix lag columns in pg_stat_replication not advancing when replay LSN stalls
От
Shinya Kato
Дата:
On Mon, Oct 20, 2025 at 10:17 AM Fujii Masao <masao.fujii@gmail.com> wrote: > > On Sun, Oct 19, 2025 at 2:04 AM Shinya Kato <shinya11.kato@gmail.com> wrote: > > Thank you for the patch. I have one comment. > > > > + if (lag_tracker->overflowed[head].lsn > lsn) > > + return now - lag_tracker->overflowed[head].time; > > > > Could this return a negative value if the clock somehow went > > backwards? The original code returns -1 in this case, so I'm curious > > about this. > > Thanks for the review! > > Yes, you're right. So I've updated the patch so that -1 is returned > when the current time is earlier than the time in the overflow entry, > treating it as "no new sample found". Thank you for updating the patch. It looks nice to me. -- Best regards, Shinya Kato NTT OSS Center
Re: Fix lag columns in pg_stat_replication not advancing when replay LSN stalls
От
Fujii Masao
Дата:
On Mon, Oct 20, 2025 at 1:45 PM Shinya Kato <shinya11.kato@gmail.com> wrote: > > On Mon, Oct 20, 2025 at 10:17 AM Fujii Masao <masao.fujii@gmail.com> wrote: > > > > On Sun, Oct 19, 2025 at 2:04 AM Shinya Kato <shinya11.kato@gmail.com> wrote: > > > Thank you for the patch. I have one comment. > > > > > > + if (lag_tracker->overflowed[head].lsn > lsn) > > > + return now - lag_tracker->overflowed[head].time; > > > > > > Could this return a negative value if the clock somehow went > > > backwards? The original code returns -1 in this case, so I'm curious > > > about this. > > > > Thanks for the review! > > > > Yes, you're right. So I've updated the patch so that -1 is returned > > when the current time is earlier than the time in the overflow entry, > > treating it as "no new sample found". > > Thank you for updating the patch. It looks nice to me. Thanks for the review! Unless there are any objections, I'll commit the patch and backpatch it to all supported branches. Regards, -- Fujii Masao
Re: Fix lag columns in pg_stat_replication not advancing when replay LSN stalls
От
Xuneng Zhou
Дата:
Hi Fujii-san, On Tue, Oct 21, 2025 at 7:45 AM Fujii Masao <masao.fujii@gmail.com> wrote: > > On Mon, Oct 20, 2025 at 1:45 PM Shinya Kato <shinya11.kato@gmail.com> wrote: > > > > On Mon, Oct 20, 2025 at 10:17 AM Fujii Masao <masao.fujii@gmail.com> wrote: > > > > > > On Sun, Oct 19, 2025 at 2:04 AM Shinya Kato <shinya11.kato@gmail.com> wrote: > > > > Thank you for the patch. I have one comment. > > > > > > > > + if (lag_tracker->overflowed[head].lsn > lsn) > > > > + return now - lag_tracker->overflowed[head].time; > > > > > > > > Could this return a negative value if the clock somehow went > > > > backwards? The original code returns -1 in this case, so I'm curious > > > > about this. > > > > > > Thanks for the review! > > > > > > Yes, you're right. So I've updated the patch so that -1 is returned > > > when the current time is earlier than the time in the overflow entry, > > > treating it as "no new sample found". > > > > Thank you for updating the patch. It looks nice to me. > > Thanks for the review! > Unless there are any objections, I'll commit the patch and > backpatch it to all supported branches. > Thanks for working on this. The patch LGTM. I am wondering whether it is helpful to add some comments for this overflowed array WalTimeSample overflowed[NUM_SYNC_REP_WAIT_MODE]; and replacing literal zeros with the constant InvalidXLogRecPtr for better readability. /* InvalidXLogRecPtr means no overflow yet */ if (lag_tracker->overflowed[i].lsn == InvalidXLogRecPtr) Best, Xuneng
Re: Fix lag columns in pg_stat_replication not advancing when replay LSN stalls
От
Fujii Masao
Дата:
On Tue, Oct 21, 2025 at 11:52 PM Xuneng Zhou <xunengzhou@gmail.com> wrote: > The patch LGTM. Thanks for the review! I've pushed the patch and backpatched it to all supported versions. > I am wondering whether it is helpful to add some > comments for this overflowed array Yes, do you have any specific suggestions? > and replacing literal zeros with the constant InvalidXLogRecPtr for > better readability. > > /* InvalidXLogRecPtr means no overflow yet */ > if (lag_tracker->overflowed[i].lsn == InvalidXLogRecPtr) I couldn't find any code like "lag_tracker->overflowed[i].lsn == 0", so I'm not sure which part should be replaced with InvalidXLogRecPtr. Could you point me to the exact location? Regards, -- Fujii Masao
Re: Fix lag columns in pg_stat_replication not advancing when replay LSN stalls
От
Xuneng Zhou
Дата:
Hi, On Wed, Oct 22, 2025 at 10:34 AM Fujii Masao <masao.fujii@gmail.com> wrote: > > On Tue, Oct 21, 2025 at 11:52 PM Xuneng Zhou <xunengzhou@gmail.com> wrote: > > The patch LGTM. > > Thanks for the review! > I've pushed the patch and backpatched it to all supported versions. > > > > I am wondering whether it is helpful to add some > > comments for this overflowed array > > Yes, do you have any specific suggestions? > How about something like: /* * Overflow entries for read heads that collide with the write head. * * When the cyclic buffer fills (write head is about to collide with a read * head), we save that read head's current sample here and mark it as using * overflow (read_heads[i] = -1). This allows the write head to continue * advancing while the overflowed mode continues lag computation using the * saved sample. * * Once the standby's reported LSN advances past the overflow entry's LSN, * we transition back to normal buffer-based tracking. */ > > and replacing literal zeros with the constant InvalidXLogRecPtr for > > better readability. > > > > /* InvalidXLogRecPtr means no overflow yet */ > > if (lag_tracker->overflowed[i].lsn == InvalidXLogRecPtr) > > I couldn't find any code like "lag_tracker->overflowed[i].lsn == 0", > so I'm not sure which part should be replaced with InvalidXLogRecPtr. > Could you point me to the exact location? > Sorry for the noise here, I mean if (lag_tracker->read_heads[head] == InvalidXLogRecPtr) before. But with this change, position 0 would be treated as overflow mode, which is clearly wrong... Best, Xuneng
Re: Fix lag columns in pg_stat_replication not advancing when replay LSN stalls
От
Fujii Masao
Дата:
On Wed, Oct 22, 2025 at 4:49 PM Xuneng Zhou <xunengzhou@gmail.com> wrote: > How about something like: > > /* > * Overflow entries for read heads that collide with the write head. > * > * When the cyclic buffer fills (write head is about to collide with a read > * head), we save that read head's current sample here and mark it as using > * overflow (read_heads[i] = -1). This allows the write head to continue > * advancing while the overflowed mode continues lag computation using the > * saved sample. > * > * Once the standby's reported LSN advances past the overflow entry's LSN, > * we transition back to normal buffer-based tracking. > */ LGTM. Thanks! I've created a patch adding your suggested comments (attached). Since this is a follow-up to commit 883a95646a8, which was recently applied and backpatched to all supported branches, I think we should backpatch this one as well. Thought? Regards, -- Fujii Masao
Вложения
Re: Fix lag columns in pg_stat_replication not advancing when replay LSN stalls
От
Xuneng Zhou
Дата:
Hi, On Wed, Oct 22, 2025 at 10:45 PM Fujii Masao <masao.fujii@gmail.com> wrote: > > On Wed, Oct 22, 2025 at 4:49 PM Xuneng Zhou <xunengzhou@gmail.com> wrote: > > How about something like: > > > > /* > > * Overflow entries for read heads that collide with the write head. > > * > > * When the cyclic buffer fills (write head is about to collide with a read > > * head), we save that read head's current sample here and mark it as using > > * overflow (read_heads[i] = -1). This allows the write head to continue > > * advancing while the overflowed mode continues lag computation using the > > * saved sample. > > * > > * Once the standby's reported LSN advances past the overflow entry's LSN, > > * we transition back to normal buffer-based tracking. > > */ > > LGTM. Thanks! > > I've created a patch adding your suggested comments (attached). > Since this is a follow-up to commit 883a95646a8, which was recently applied > and backpatched to all supported branches, I think we should backpatch > this one as well. Thought? > LGTM. Thanks for the patch! Best, Xuneng
Re: Fix lag columns in pg_stat_replication not advancing when replay LSN stalls
От
Fujii Masao
Дата:
On Thu, Oct 23, 2025 at 12:26 AM Xuneng Zhou <xunengzhou@gmail.com> wrote: > > I've created a patch adding your suggested comments (attached). > > Since this is a follow-up to commit 883a95646a8, which was recently applied > > and backpatched to all supported branches, I think we should backpatch > > this one as well. Thought? > > > > LGTM. Thanks for the patch! I've pushed the patch. Thanks! Regards, -- Fujii Masao