Re: Clear logical slot's 'synced' flag on promotion of standby

Поиск
Список
Период
Сортировка
От Masahiko Sawada
Тема Re: Clear logical slot's 'synced' flag on promotion of standby
Дата
Msg-id CAD21AoCpj0Sr7hYJXgF2Ata-zfoovO9OBF_QhreYxx23L2S9Ew@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Clear logical slot's 'synced' flag on promotion of standby  (shveta malik <shveta.malik@gmail.com>)
Список pgsql-hackers
On Wed, Sep 10, 2025 at 9:00 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Wed, Sep 10, 2025 at 5:23 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Mon, Sep 8, 2025 at 11:21 PM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > > Hi,
> > >
> > > This is a spin-off thread from [1].
> > >
> > > Currently, in the slot-sync worker, we have an error scenario [2]
> > > where, during slot synchronization, if we detect a slot with the same
> > > name and its synced flag is set to false, we emit an error. The
> > > rationale is to avoid potentially overwriting a user-created slot.
> > >
> > > But while analyzing [1], we observed that this error can lead to
> > > inconsistent behavior during switchovers. On the first switchover, the
> > > new standby logs an error: "Exiting from slot synchronization because
> > > a slot with the same name already exists on the standby."   But during
> > > a double switchover, this error does not occur.
> > >
> > > Upon re-evaluating this, it seems more appropriate to clear the synced
> > > flag after promotion, as the flag does not hold any meaning on the
> > > primary. Doing so would ensure consistent behavior across all
> > > switchovers, as the same error will be raised avoiding the risk of
> > > overwriting user's slots.
> >
> > There is the following comment in FinishWalRecovery():
> >
> > /*
> >  * Shutdown the slot sync worker to drop any temporary slots acquired by
> >  * it and to prevent it from keep trying to fetch the failover slots.
> >  *
> >  * We do not update the 'synced' column in 'pg_replication_slots' system
> >  * view from true to false here, as any failed update could leave 'synced'
> >  * column false for some slots. This could cause issues during slot sync
> >  * after restarting the server as a standby. While updating the 'synced'
> >  * column after switching to the new timeline is an option, it does not
> >  * simplify the handling for the 'synced' column. Therefore, we retain the
> >  * 'synced' column as true after promotion as it may provide useful
> >  * information about the slot origin.
> >  */
> > ShutDownSlotSync();
> >
> > Does the patch address the above concerns?
> >
>
> Yes, the patch is attempting to address the above concern. it is
> trying to Reset synced-column after switching to a new timeline. There
> is an issue though as pointed out by Ashutosh in [1], which needs to
> be addressed.

Nice.

There's an ongoing discussion about a patch that would allow users to
overwrite slot properties[1]. IIUC, the reported inconsistency during
switchover would be resolved by that slot-overwriting patch. I'm
looking into the relationship between the patch discussed in this
thread and the slot-overwriting patch. While I'm not yet convinced
that the proposed allowing slot patch is the right approach, suppose
that we do allow slot overwriting somehow, what value would the patch
proposed in this thread add? Would its only benefit be ensuring that
the 'synced' flag is set to false on the primary?

Regards,

[1] https://www.postgresql.org/message-id/CAA5-nLAqGpBFEAr2XNYMj3E%2B39caQra_SJeB5MCtp7PCyLTiOg%40mail.gmail.com


--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



В списке pgsql-hackers по дате отправления: