Re: [PATCH] Fix pg_rewind false positives caused by shutdown-only WAL

Поиск
Список
Период
Сортировка
От BharatDB
Тема Re: [PATCH] Fix pg_rewind false positives caused by shutdown-only WAL
Дата
Msg-id CAAh00ERqqAhgA_BJJccwE0BXxUWMk+FHzMoLo1kWcsm+qdNVjw@mail.gmail.com
обсуждение исходный текст
Ответ на [PATCH] Fix pg_rewind false positives caused by shutdown-only WAL  (Srinath Reddy Sadipiralla <srinath2133@gmail.com>)
Ответы Re: [PATCH] Fix pg_rewind false positives caused by shutdown-only WAL
Список pgsql-hackers

Dear Srinath,

Subject: [PATCH] pg_rewind: Ignore shutdown checkpoints when determining rewind necessity.

While working with pg_rewind, I noticed that it can sometimes request a rewind even when no real changes exist after a failover. This happens because pg_rewind currently determines the end-of-WAL on the target using the last shutdown checkpoint (or minRecoveryPoint for a standby). In a clean failover scenario—where a standby is promoted and the old primary is later shut down—the only WAL record generated after divergence may be a shutdown checkpoint. Although the data on both nodes is identical, pg_rewind treats this shutdown record as meaningful and unnecessarily forces a rewind. The proposed patch fixes this by ignoring shutdown checkpoints (XLOG_CHECKPOINT_SHUTDOWN) when determining the end-of-WAL, scanning backward until a non-shutdown record is found. This ensures that rewinds are triggered only when actual modifications exist after divergence, avoiding unnecessary rewinds in clean failover situations.

Also, with the proposed fix implemented in my local script, it gives the following results:

  • Old primary shuts down cleanly.

  • Standby is promoted successfully.

  • pg_rewind correctly detects no rewind is needed.

  • Data on both clusters matches perfectly.

I believe this change will prevent unnecessary rewinds in production environments, improve reliability, and avoid potential confusion during failovers. 

Thank you for your consideration.

Best regards,
Soumya.



On Sat, Sep 6, 2025 at 10:04 PM Srinath Reddy Sadipiralla <srinath2133@gmail.com> wrote:
Hi all,

While working with pg_rewind, I noticed that it can sometimes request a rewind even when no actual changes exist after a failover.

Problem:
Currently, pg_rewind determines the end-of-WAL on the target by using the last shutdown checkpoint (or minRecoveryPoint for a standby). This creates a false positive scenario:

1)Suppose a standby is promoted to become the new primary.
2)Later, the old primary is cleanly shut down.
3)The only WAL record generated on the old primary after divergence is a shutdown checkpoint.

At this point, the old primary and new primary contain identical data. However, since the shutdown checkpoint extends the WAL past the divergence point, pg_rewind concludes:

if (target_wal_endrec > divergerec)
    rewind_needed = true;

That forces a rewind even though there are no meaningful changes.

To reproduce this scenario use the below attached script.

Fix:
The attached patch changes the logic so that pg_rewind no longer treats shutdown checkpoints as meaningful records when determining the end-of-WAL. Instead, we scan backward from the last checkpoint until we find the most recent valid WAL record that is not a shutdown-only related record.

This ensures rewind is only triggered when there are actual modifications after divergence, avoiding unnecessary rewinds in clean failover scenarios.


--
Thanks,
Srinath Reddy Sadipiralla
EDB: https://www.enterprisedb.com/
Вложения

В списке pgsql-hackers по дате отправления: