Обсуждение: Re: failover logical replication slots

Поиск
Список
Период
Сортировка

Re: failover logical replication slots

От
Amit Kapila
Дата:
On Wed, Jun 11, 2025 at 10:17 PM Fabrice Chapuis
<fabrice636861@gmail.com> wrote:
>
> Thanks for your reply.
> The problem I see is that after creating a new subscription, we have:
>
> 1) if a failover occurs, on the new primary node, the failover and sync flags are both set to true, so there's no
problem.
>
> 2) when the old node returns as a secondary in the cluster, the failover flag is set to true and the sync flag is set
tofalse then 
> the error message is generated:  ERROR: exiting from slot synchronization because same name slot "sub_test" already
existson the standby 
>
> Why not change the value of the synced flag when the standby is joining the cluster ? If the slot on the primary node
hasthe same name as the slot on the secondary node and the failover flag is set to true, 
>
> if ((slot = SearchNamedReplicationSlot(remote_slot->name, true))) {
> slot->data.synced = true
> ...

IIUC, Hou-san also mentioned the same idea, but it is not that
straightforward because the user may have created a logical slot with
the same name but with a few other different properties like
two_phase, slot_type, etc. I think we can try to compare all such slot
properties to ensure that we can overwrite the same name slot, but
there is still a chance that we may overwrite a slot that the user has
created for some other purpose. Now, we may want to extend this
functionality such that we give some knob to user which allows us to
overwrite the existing slots with same name. Then user can use this
knob (GUC or something else) when starting the node as standby after
switchover and allow the overwrite for existing slots.

As mentioned by Hou-San and Dilip, I also think it is more important
for the old node that comes as a standby to remove logical slots to
avoid WAL accumulation. For example, we can provide a function like
pg_drop_all_slots() with a type parameter indicating logical or
physical, and then utilities like patroni that provide switchover
functionality can use that function to remove all existing slots
(maybe keep the slots that are required for failover) when starting
the node as a standby.

--
With Regards,
Amit Kapila.



Re: failover logical replication slots

От
Fabrice Chapuis
Дата:
Thanks for the reply Amit, 

I don't really understand the logic of the implementation. If the slot name matches that of the primary slot and this slot is in failover mode, how could it be any different on the standby slot?
After the first failover, the following failovers will work given that the sync flag is true on both the primary and standby slots.

After new sandby is attached to the primary, can we imagine that when the sync worker process is started we check if a failover slot exists on the standby, if so we drop it before recreating  a new one for syncing?

Regards,

Fabrice



On Thu, Jun 12, 2025 at 5:14 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Jun 11, 2025 at 10:17 PM Fabrice Chapuis
<fabrice636861@gmail.com> wrote:
>
> Thanks for your reply.
> The problem I see is that after creating a new subscription, we have:
>
> 1) if a failover occurs, on the new primary node, the failover and sync flags are both set to true, so there's no problem.
>
> 2) when the old node returns as a secondary in the cluster, the failover flag is set to true and the sync flag is set to false then
> the error message is generated:  ERROR: exiting from slot synchronization because same name slot "sub_test" already exists on the standby
>
> Why not change the value of the synced flag when the standby is joining the cluster ? If the slot on the primary node has the same name as the slot on the secondary node and the failover flag is set to true,
>
> if ((slot = SearchNamedReplicationSlot(remote_slot->name, true))) {
> slot->data.synced = true
> ...

IIUC, Hou-san also mentioned the same idea, but it is not that
straightforward because the user may have created a logical slot with
the same name but with a few other different properties like
two_phase, slot_type, etc. I think we can try to compare all such slot
properties to ensure that we can overwrite the same name slot, but
there is still a chance that we may overwrite a slot that the user has
created for some other purpose. Now, we may want to extend this
functionality such that we give some knob to user which allows us to
overwrite the existing slots with same name. Then user can use this
knob (GUC or something else) when starting the node as standby after
switchover and allow the overwrite for existing slots.

As mentioned by Hou-San and Dilip, I also think it is more important
for the old node that comes as a standby to remove logical slots to
avoid WAL accumulation. For example, we can provide a function like
pg_drop_all_slots() with a type parameter indicating logical or
physical, and then utilities like patroni that provide switchover
functionality can use that function to remove all existing slots
(maybe keep the slots that are required for failover) when starting
the node as a standby.

--
With Regards,
Amit Kapila.

Re: failover logical replication slots

От
Amit Kapila
Дата:
On Thu, Jun 12, 2025 at 2:32 PM Fabrice Chapuis <fabrice636861@gmail.com> wrote:
>
> Thanks for the reply Amit,
>
> I don't really understand the logic of the implementation. If the slot name matches that of the primary slot and this
slotis in failover mode, how could it be any different on the standby slot? 
>

On the standby's we do allow creating logical slots (For example, one
can use pg_create_logical_replication_slot()). So, the same name slot
can be created on standby by the user before we start sync. As of now,
we don't allow setting the failover option for slots on standby's but
in future, it could be supported to allow syncing slots from standbys
(something like cascaded replication).

> After the first failover, the following failovers will work given that the sync flag is true on both the primary and
standbyslots. 
>
> After new sandby is attached to the primary, can we imagine that when the sync worker process is started we check if
afailover slot exists on the standby, if so we drop it before recreating  a new one for syncing? 
>

This has the risk of dropping an unwarranted slot.

--
With Regards,
Amit Kapila.



Re: failover logical replication slots

От
Amit Kapila
Дата:
On Thu, Jun 12, 2025 at 3:07 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Jun 12, 2025 at 2:32 PM Fabrice Chapuis <fabrice636861@gmail.com> wrote:
> >
>
> > After the first failover, the following failovers will work given that the sync flag is true on both the primary
andstandby slots. 
> >
> > After new sandby is attached to the primary, can we imagine that when the sync worker process is started we check
ifa failover slot exists on the standby, if so we drop it before recreating  a new one for syncing? 
> >
>
> This has the risk of dropping an unwarranted slot.
>

On thinking further, even if we decide to support this functionality
of overwriting the existing slots in some way, what is guarantee that
the new standby will enable syncslot functionality (via
sync_replication_slots)? If standby doesn't enable the
sync_replication_slots then such slots will remain dangling and lead
to the accumulation of WAL. So, I think the first thing to do is to
avoid such cases, both for failover and non-failover slots. Then we
should consider ways to allow overwriting existing slots on standby in
the scenario you explained.

--
With Regards,
Amit Kapila.



Re: failover logical replication slots

От
Fabrice Chapuis
Дата:

However, the problem still persists: it is currently not possible to perform an automatic switchover after creating a new subscription.

Would it be reasonable to consider adding a GUC to address this issue?
I can propose a patch in that sense if it seems appropriate.

What is your opinion

Regards,

Fabrice


On Thu, Jun 12, 2025 at 11:37 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Jun 12, 2025 at 2:32 PM Fabrice Chapuis <fabrice636861@gmail.com> wrote:
>
> Thanks for the reply Amit,
>
> I don't really understand the logic of the implementation. If the slot name matches that of the primary slot and this slot is in failover mode, how could it be any different on the standby slot?
>

On the standby's we do allow creating logical slots (For example, one
can use pg_create_logical_replication_slot()). So, the same name slot
can be created on standby by the user before we start sync. As of now,
we don't allow setting the failover option for slots on standby's but
in future, it could be supported to allow syncing slots from standbys
(something like cascaded replication).

> After the first failover, the following failovers will work given that the sync flag is true on both the primary and standby slots.
>
> After new sandby is attached to the primary, can we imagine that when the sync worker process is started we check if a failover slot exists on the standby, if so we drop it before recreating  a new one for syncing?
>

This has the risk of dropping an unwarranted slot.

--
With Regards,
Amit Kapila.

Re: failover logical replication slots

От
Amit Kapila
Дата:
On Thu, Jun 12, 2025 at 3:53 PM Fabrice Chapuis <fabrice636861@gmail.com> wrote:
>
> However, the problem still persists: it is currently not possible to perform an automatic switchover after creating a
newsubscription. 
>
> Would it be reasonable to consider adding a GUC to address this issue?
> I can propose a patch in that sense if it seems appropriate.
>

Yeah, we can consider that, though I don't know at this stage if GUC
is the only way, but I hope you understand that it will be for PG19.

--
With Regards,
Amit Kapila.



Re: failover logical replication slots

От
Fabrice Chapuis
Дата:
The parameter sync_replication_slots could be tested if it is set to true before doing any action on failover slots.

Regards,
Fabrice

On Thu, Jun 12, 2025 at 12:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Jun 12, 2025 at 3:07 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Jun 12, 2025 at 2:32 PM Fabrice Chapuis <fabrice636861@gmail.com> wrote:
> >
>
> > After the first failover, the following failovers will work given that the sync flag is true on both the primary and standby slots.
> >
> > After new sandby is attached to the primary, can we imagine that when the sync worker process is started we check if a failover slot exists on the standby, if so we drop it before recreating  a new one for syncing?
> >
>
> This has the risk of dropping an unwarranted slot.
>

On thinking further, even if we decide to support this functionality
of overwriting the existing slots in some way, what is guarantee that
the new standby will enable syncslot functionality (via
sync_replication_slots)? If standby doesn't enable the
sync_replication_slots then such slots will remain dangling and lead
to the accumulation of WAL. So, I think the first thing to do is to
avoid such cases, both for failover and non-failover slots. Then we
should consider ways to allow overwriting existing slots on standby in
the scenario you explained.

--
With Regards,
Amit Kapila.

Re: failover logical replication slots

От
Fabrice Chapuis
Дата:
yes of course, maybe for PG 19

Regards,
Fabrice

On Thu, Jun 12, 2025 at 12:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Jun 12, 2025 at 3:53 PM Fabrice Chapuis <fabrice636861@gmail.com> wrote:
>
> However, the problem still persists: it is currently not possible to perform an automatic switchover after creating a new subscription.
>
> Would it be reasonable to consider adding a GUC to address this issue?
> I can propose a patch in that sense if it seems appropriate.
>

Yeah, we can consider that, though I don't know at this stage if GUC
is the only way, but I hope you understand that it will be for PG19.

--
With Regards,
Amit Kapila.

Re: failover logical replication slots

От
Fabrice Chapuis
Дата:
Hi Amit,
Here is a proposed solution to handle the problem of creating the logical replication slot on standby after a switchover.
Thank you for your comments and help on this issue

Regards

Fabrice

diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 656e66e..296840a 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -627,6 +627,7 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
        ReplicationSlot *slot;
        XLogRecPtr      latestFlushPtr;
        bool            slot_updated = false;
+       bool            overwriting_failover_slot = true; /* could be a GUC */

        /*
         * Make sure that concerned WAL is received and flushed before syncing
@@ -654,19 +655,37 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
        if ((slot = SearchNamedReplicationSlot(remote_slot->name, true)))
        {
                bool            synced;
+               bool            failover_status = remote_slot->failover;;

                SpinLockAcquire(&slot->mutex);
                synced = slot->data.synced;
                SpinLockRelease(&slot->mutex);

-               /* User-created slot with the same name exists, raise ERROR. */
-               if (!synced)
-                       ereport(ERROR,
+               if (!synced){
+                       /*
+                        * Check if we need to overwrite an existing failover slot and
+                        * if slot has the failover flag set to true
+                        * and the sync_replication_slots is on,
+                        * other check could be added here */
+                       if (overwriting_failover_slot && failover_status && sync_replication_slots){
+
+                               /* Get rid of a replication slot that is no longer wanted */
+                               ReplicationSlotDrop(remote_slot->name, true);
+                               ereport(WARNING,
+                                       errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+                                       errmsg("slot \"%s\" already exists"
+                                               " on the standby but it will be dropped because overwriting_failover_slot is set to true",
+                                               remote_slot->name));
+                               return false; /* Going back to the main loop after droping the failover slot */
+                       }
+                       /* User-created slot with the same name exists, raise ERROR. */
+                       else
+                               ereport(ERROR,
                                        errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                                        errmsg("exiting from slot synchronization because same"
                                                   " name slot \"%s\" already exists on the standby",
                                                   remote_slot->name));
-
+               }
                /*
                 * The slot has been synchronized before.
                 *


On Thu, Jun 12, 2025 at 4:27 PM Fabrice Chapuis <fabrice636861@gmail.com> wrote:
yes of course, maybe for PG 19

Regards,
Fabrice

On Thu, Jun 12, 2025 at 12:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Jun 12, 2025 at 3:53 PM Fabrice Chapuis <fabrice636861@gmail.com> wrote:
>
> However, the problem still persists: it is currently not possible to perform an automatic switchover after creating a new subscription.
>
> Would it be reasonable to consider adding a GUC to address this issue?
> I can propose a patch in that sense if it seems appropriate.
>

Yeah, we can consider that, though I don't know at this stage if GUC
is the only way, but I hope you understand that it will be for PG19.

--
With Regards,
Amit Kapila.

Re: failover logical replication slots

От
Amit Kapila
Дата:
On Fri, Jul 11, 2025 at 8:42 PM Fabrice Chapuis <fabrice636861@gmail.com> wrote:
>
> Hi Amit,
> Here is a proposed solution to handle the problem of creating the logical replication slot on standby after a
switchover.
> Thank you for your comments and help on this issue
>
> Regards
>
> Fabrice
>
> diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
> index 656e66e..296840a 100644
> --- a/src/backend/replication/logical/slotsync.c
> +++ b/src/backend/replication/logical/slotsync.c
> @@ -627,6 +627,7 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
>         ReplicationSlot *slot;
>         XLogRecPtr      latestFlushPtr;
>         bool            slot_updated = false;
> +       bool            overwriting_failover_slot = true; /* could be a GUC */
>
>         /*
>          * Make sure that concerned WAL is received and flushed before syncing
> @@ -654,19 +655,37 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
>         if ((slot = SearchNamedReplicationSlot(remote_slot->name, true)))
>         {
>                 bool            synced;
> +               bool            failover_status = remote_slot->failover;;
>
>                 SpinLockAcquire(&slot->mutex);
>                 synced = slot->data.synced;
>                 SpinLockRelease(&slot->mutex);
>
> -               /* User-created slot with the same name exists, raise ERROR. */
> -               if (!synced)
> -                       ereport(ERROR,
> +               if (!synced){
> +                       /*
> +                        * Check if we need to overwrite an existing failover slot and
> +                        * if slot has the failover flag set to true
> +                        * and the sync_replication_slots is on,
> +                        * other check could be added here */
> +                       if (overwriting_failover_slot && failover_status && sync_replication_slots){
> +

I think we don't need to explicitly check sync_replication_slots, as
we should reach here only when that flag is set. I think we should
introduce an pg_alter_replication_slot which allows to overwrite
existing slots during sync by setting a parameter like allow_overwrite
(or something like that). This API will be useful for other purposes,
like changing two_phase or failover properties of the slot after the
creation of the slot. BTW, we also discussed supporting
pg_drop_all_slots kind of API as well. See if you are interested in
implementing that API as well.

Note: I suggest starting a new thread with the concrete proposal for
the new API or GUC, stating how it will be helpful. It might help in
getting suggestions from others as well.

--
With Regards,
Amit Kapila.



Re: failover logical replication slots

От
Fabrice Chapuis
Дата:
Thanks for this feedback,
I'll remove the check on the sync_replication_slots parameter.
I think it is interesting as you suggest to start with the idea of the pg_alter_replication_slot API, I will make a new proposal by opening a new thread, hoping to be supported in my approach. Is there already an initiative about the pg_drop_all_slots API?

Best Regards,

Fabrice

On Sun, Jul 13, 2025 at 1:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Jul 11, 2025 at 8:42 PM Fabrice Chapuis <fabrice636861@gmail.com> wrote:
>
> Hi Amit,
> Here is a proposed solution to handle the problem of creating the logical replication slot on standby after a switchover.
> Thank you for your comments and help on this issue
>
> Regards
>
> Fabrice
>
> diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
> index 656e66e..296840a 100644
> --- a/src/backend/replication/logical/slotsync.c
> +++ b/src/backend/replication/logical/slotsync.c
> @@ -627,6 +627,7 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
>         ReplicationSlot *slot;
>         XLogRecPtr      latestFlushPtr;
>         bool            slot_updated = false;
> +       bool            overwriting_failover_slot = true; /* could be a GUC */
>
>         /*
>          * Make sure that concerned WAL is received and flushed before syncing
> @@ -654,19 +655,37 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
>         if ((slot = SearchNamedReplicationSlot(remote_slot->name, true)))
>         {
>                 bool            synced;
> +               bool            failover_status = remote_slot->failover;;
>
>                 SpinLockAcquire(&slot->mutex);
>                 synced = slot->data.synced;
>                 SpinLockRelease(&slot->mutex);
>
> -               /* User-created slot with the same name exists, raise ERROR. */
> -               if (!synced)
> -                       ereport(ERROR,
> +               if (!synced){
> +                       /*
> +                        * Check if we need to overwrite an existing failover slot and
> +                        * if slot has the failover flag set to true
> +                        * and the sync_replication_slots is on,
> +                        * other check could be added here */
> +                       if (overwriting_failover_slot && failover_status && sync_replication_slots){
> +

I think we don't need to explicitly check sync_replication_slots, as
we should reach here only when that flag is set. I think we should
introduce an pg_alter_replication_slot which allows to overwrite
existing slots during sync by setting a parameter like allow_overwrite
(or something like that). This API will be useful for other purposes,
like changing two_phase or failover properties of the slot after the
creation of the slot. BTW, we also discussed supporting
pg_drop_all_slots kind of API as well. See if you are interested in
implementing that API as well.

Note: I suggest starting a new thread with the concrete proposal for
the new API or GUC, stating how it will be helpful. It might help in
getting suggestions from others as well.

--
With Regards,
Amit Kapila.

Re: failover logical replication slots

От
Amit Kapila
Дата:
On Mon, Jul 14, 2025 at 9:10 PM Fabrice Chapuis <fabrice636861@gmail.com> wrote:
>
> Thanks for this feedback,
> I'll remove the check on the sync_replication_slots parameter.
> I think it is interesting as you suggest to start with the idea of the pg_alter_replication_slot API, I will make a
newproposal by opening a new thread, hoping to be supported in my approach. Is there already an initiative about the
pg_drop_all_slotsAPI? 
>

No, to my knowledge, there is no other initiative for the
pg_drop_all_slots() API. I think we first got the use case where users
can drop all slots on new standby after failover to avoid excessive
resource usage.

--
With Regards,
Amit Kapila.