Re: standby apply lag on inactive servers
От | Alvaro Herrera |
---|---|
Тема | Re: standby apply lag on inactive servers |
Дата | |
Msg-id | 20200127203419.GA15216@alvherre.pgsql обсуждение исходный текст |
Ответ на | standby apply lag on inactive servers (Alvaro Herrera <alvherre@2ndquadrant.com>) |
Ответы |
Re: standby apply lag on inactive servers
|
Список | pgsql-hackers |
On 2020-Jan-10, Alvaro Herrera wrote: > A customer of ours complained that if you have an inactive primary, > monitoring the apply lag on a standby reports monotonically increasing > lag. The reason for this is that the apply lag is only updated on > COMMIT records, which of course don't occur in inactive servers. > But CHECKPOINT records do occur, so the WAL insert pointer continues to > move forward, which is what causes the spurious lag. > > (I think newer releases are protected from this problem because they > don't emit checkpoints during periods of inactivity. I didn't verify > this.) > > This patch fixes the problem by using the checkpoint timestamp to update > the lag tracker in the standby. This requires a little change in where > this update is invoked, because previously it was done only for the XACT > rmgr; this makes the patch a little bigger than it should. Here's a version of the patch that applies to current master. It does fix the problem that CHECKPOINT wal records are not considered when determining time-of-latest-record. However, it does *not* fix the monitoring problem I mentioned (which relied on comparing pg_last_xact_replay_timestamp() to 'now()') ... because commit 6ef2eba3f57f (pg10) made an idle server not emit checkpoint records anymore. That is, my parenthical remark was completely wrong: the new versions not only are "protected", but also this fix doesn't fix them. Luckily, the way to fix monitoring for servers of versions 10 and later is to use the new replay_lag (etc) columns in pg_stat_replication, commit 6912acc04f0b (also pg10). I am inclined to apply this to all branches unless there are strong objections, because the current code seems pretty arbitrary anyway. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Вложения
В списке pgsql-hackers по дате отправления: