Re: Clear logical slot's 'synced' flag on promotion of standby
От | Ashutosh Sharma |
---|---|
Тема | Re: Clear logical slot's 'synced' flag on promotion of standby |
Дата | |
Msg-id | CAE9k0PnzW7CWk-hCnedS1BDM7c7KJHjOxVfKMf3PYQq-WCYTMQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Clear logical slot's 'synced' flag on promotion of standby (Ajin Cherian <itsajin@gmail.com>) |
Ответы |
Re: Clear logical slot's 'synced' flag on promotion of standby
Re: Clear logical slot's 'synced' flag on promotion of standby |
Список | pgsql-hackers |
Hi Ajin, On Thu, Sep 18, 2025 at 4:16 PM Ajin Cherian <itsajin@gmail.com> wrote: > > On Fri, Sep 12, 2025 at 1:56 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > The approach seems valid and should work, but introducing a new file > > like promote.inprogress for this purpose might be excessive. We can > > first try analyzing existing information to determine whether we can > > distinguish between the two scenarios -- a primary in crash recovery > > immediately after a promotion attempt versus a regular primary. If we > > are unable to find any way, we can revisit the idea. > > > > I needed a way to reset slots not only during promotion, but also > after a crash that occurs while slots are being reset, so there would > be a fallback mechanism to clear them again on startup. As Shveta > pointed out, it wasn’t trivial to tell apart a standby restarting > after crashing during promotion from a primary restarting after a > crash. So I decided to just reset slots every time primary (or a > standby after promotion) restarts. > > Because this fallback logic will run on every primary restart, it was > important to minimize overhead added by the patch. After some > discussion, I placed the reset logic in RestoreSlotFromDisk(), which > is invoked by StartupReplicationSlots() whenever the server starts. > Because RestoreSlotFromDisk() already loops through all slots, this > adds minimum extra work; but also ensures the synced flag is cleared > when running on a primary. > > The next challenge was finding a reliable flag to distinguish > primaries from standbys, since we really don’t want to reset the flag > on a standby. I tested StandbyMode, RecoveryInProgress(), and > InRecovery. But during restarts, both RecoveryInProgress() and > InRecovery are always true on both primary and standby. In all my > testing, StandbyMode was the only variable that consistently > differentiated between the two, which is what I used. > +/* + * ResetSyncedSlots() + * + * Reset all replication slots that have synced=true to synced=false. + */ I feel this is not correct, we are force resetting sync flag status for all logical slots, not just the one that is set to true. -- @@ -2664,6 +2715,10 @@ RestoreSlotFromDisk(const char *name) memcpy(&slot->data, &cp.slotdata, sizeof(ReplicationSlotPersistentData)); + /* reset synced flag if this is a primary server */ + if (!StandbyMode) + slot->data.synced = false; + I think you also need to ensure that you are only doing this for the logical slots, it's currently just checking if the slot is in-use or not. -- With Regards, Ashutosh Sharma.
В списке pgsql-hackers по дате отправления: