Обсуждение: Re: failover logical replication slots
On Wed, Jun 11, 2025 at 10:17 PM Fabrice Chapuis <fabrice636861@gmail.com> wrote: > > Thanks for your reply. > The problem I see is that after creating a new subscription, we have: > > 1) if a failover occurs, on the new primary node, the failover and sync flags are both set to true, so there's no problem. > > 2) when the old node returns as a secondary in the cluster, the failover flag is set to true and the sync flag is set tofalse then > the error message is generated: ERROR: exiting from slot synchronization because same name slot "sub_test" already existson the standby > > Why not change the value of the synced flag when the standby is joining the cluster ? If the slot on the primary node hasthe same name as the slot on the secondary node and the failover flag is set to true, > > if ((slot = SearchNamedReplicationSlot(remote_slot->name, true))) { > slot->data.synced = true > ... IIUC, Hou-san also mentioned the same idea, but it is not that straightforward because the user may have created a logical slot with the same name but with a few other different properties like two_phase, slot_type, etc. I think we can try to compare all such slot properties to ensure that we can overwrite the same name slot, but there is still a chance that we may overwrite a slot that the user has created for some other purpose. Now, we may want to extend this functionality such that we give some knob to user which allows us to overwrite the existing slots with same name. Then user can use this knob (GUC or something else) when starting the node as standby after switchover and allow the overwrite for existing slots. As mentioned by Hou-San and Dilip, I also think it is more important for the old node that comes as a standby to remove logical slots to avoid WAL accumulation. For example, we can provide a function like pg_drop_all_slots() with a type parameter indicating logical or physical, and then utilities like patroni that provide switchover functionality can use that function to remove all existing slots (maybe keep the slots that are required for failover) when starting the node as a standby. -- With Regards, Amit Kapila.
Thanks for the reply Amit,
I don't really understand the logic of the implementation. If the slot name matches that of the primary slot and this slot is in failover mode, how could it be any different on the standby slot?
After the first failover, the following failovers will work given that the sync flag is true on both the primary and standby slots.
After new sandby is attached to the primary, can we imagine that when the sync worker process is started we check if a failover slot exists on the standby, if so we drop it before recreating a new one for syncing?
Regards,
Fabrice
On Thu, Jun 12, 2025 at 5:14 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Jun 11, 2025 at 10:17 PM Fabrice Chapuis
<fabrice636861@gmail.com> wrote:
>
> Thanks for your reply.
> The problem I see is that after creating a new subscription, we have:
>
> 1) if a failover occurs, on the new primary node, the failover and sync flags are both set to true, so there's no problem.
>
> 2) when the old node returns as a secondary in the cluster, the failover flag is set to true and the sync flag is set to false then
> the error message is generated: ERROR: exiting from slot synchronization because same name slot "sub_test" already exists on the standby
>
> Why not change the value of the synced flag when the standby is joining the cluster ? If the slot on the primary node has the same name as the slot on the secondary node and the failover flag is set to true,
>
> if ((slot = SearchNamedReplicationSlot(remote_slot->name, true))) {
> slot->data.synced = true
> ...
IIUC, Hou-san also mentioned the same idea, but it is not that
straightforward because the user may have created a logical slot with
the same name but with a few other different properties like
two_phase, slot_type, etc. I think we can try to compare all such slot
properties to ensure that we can overwrite the same name slot, but
there is still a chance that we may overwrite a slot that the user has
created for some other purpose. Now, we may want to extend this
functionality such that we give some knob to user which allows us to
overwrite the existing slots with same name. Then user can use this
knob (GUC or something else) when starting the node as standby after
switchover and allow the overwrite for existing slots.
As mentioned by Hou-San and Dilip, I also think it is more important
for the old node that comes as a standby to remove logical slots to
avoid WAL accumulation. For example, we can provide a function like
pg_drop_all_slots() with a type parameter indicating logical or
physical, and then utilities like patroni that provide switchover
functionality can use that function to remove all existing slots
(maybe keep the slots that are required for failover) when starting
the node as a standby.
--
With Regards,
Amit Kapila.
On Thu, Jun 12, 2025 at 2:32 PM Fabrice Chapuis <fabrice636861@gmail.com> wrote: > > Thanks for the reply Amit, > > I don't really understand the logic of the implementation. If the slot name matches that of the primary slot and this slotis in failover mode, how could it be any different on the standby slot? > On the standby's we do allow creating logical slots (For example, one can use pg_create_logical_replication_slot()). So, the same name slot can be created on standby by the user before we start sync. As of now, we don't allow setting the failover option for slots on standby's but in future, it could be supported to allow syncing slots from standbys (something like cascaded replication). > After the first failover, the following failovers will work given that the sync flag is true on both the primary and standbyslots. > > After new sandby is attached to the primary, can we imagine that when the sync worker process is started we check if afailover slot exists on the standby, if so we drop it before recreating a new one for syncing? > This has the risk of dropping an unwarranted slot. -- With Regards, Amit Kapila.
On Thu, Jun 12, 2025 at 3:07 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Jun 12, 2025 at 2:32 PM Fabrice Chapuis <fabrice636861@gmail.com> wrote: > > > > > After the first failover, the following failovers will work given that the sync flag is true on both the primary andstandby slots. > > > > After new sandby is attached to the primary, can we imagine that when the sync worker process is started we check ifa failover slot exists on the standby, if so we drop it before recreating a new one for syncing? > > > > This has the risk of dropping an unwarranted slot. > On thinking further, even if we decide to support this functionality of overwriting the existing slots in some way, what is guarantee that the new standby will enable syncslot functionality (via sync_replication_slots)? If standby doesn't enable the sync_replication_slots then such slots will remain dangling and lead to the accumulation of WAL. So, I think the first thing to do is to avoid such cases, both for failover and non-failover slots. Then we should consider ways to allow overwriting existing slots on standby in the scenario you explained. -- With Regards, Amit Kapila.
However, the problem still persists: it is currently not possible to perform an automatic switchover after creating a new subscription.
Would it be reasonable to consider adding a GUC to address this issue?
I can propose a patch in that sense if it seems appropriate.
What is your opinion
Regards,
Fabrice
On Thu, Jun 12, 2025 at 11:37 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Jun 12, 2025 at 2:32 PM Fabrice Chapuis <fabrice636861@gmail.com> wrote:
>
> Thanks for the reply Amit,
>
> I don't really understand the logic of the implementation. If the slot name matches that of the primary slot and this slot is in failover mode, how could it be any different on the standby slot?
>
On the standby's we do allow creating logical slots (For example, one
can use pg_create_logical_replication_slot()). So, the same name slot
can be created on standby by the user before we start sync. As of now,
we don't allow setting the failover option for slots on standby's but
in future, it could be supported to allow syncing slots from standbys
(something like cascaded replication).
> After the first failover, the following failovers will work given that the sync flag is true on both the primary and standby slots.
>
> After new sandby is attached to the primary, can we imagine that when the sync worker process is started we check if a failover slot exists on the standby, if so we drop it before recreating a new one for syncing?
>
This has the risk of dropping an unwarranted slot.
--
With Regards,
Amit Kapila.
On Thu, Jun 12, 2025 at 3:53 PM Fabrice Chapuis <fabrice636861@gmail.com> wrote: > > However, the problem still persists: it is currently not possible to perform an automatic switchover after creating a newsubscription. > > Would it be reasonable to consider adding a GUC to address this issue? > I can propose a patch in that sense if it seems appropriate. > Yeah, we can consider that, though I don't know at this stage if GUC is the only way, but I hope you understand that it will be for PG19. -- With Regards, Amit Kapila.
The parameter sync_replication_slots could be tested if it is set to true before doing any action on failover slots.
Regards,
Fabrice
On Thu, Jun 12, 2025 at 12:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Jun 12, 2025 at 3:07 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Jun 12, 2025 at 2:32 PM Fabrice Chapuis <fabrice636861@gmail.com> wrote:
> >
>
> > After the first failover, the following failovers will work given that the sync flag is true on both the primary and standby slots.
> >
> > After new sandby is attached to the primary, can we imagine that when the sync worker process is started we check if a failover slot exists on the standby, if so we drop it before recreating a new one for syncing?
> >
>
> This has the risk of dropping an unwarranted slot.
>
On thinking further, even if we decide to support this functionality
of overwriting the existing slots in some way, what is guarantee that
the new standby will enable syncslot functionality (via
sync_replication_slots)? If standby doesn't enable the
sync_replication_slots then such slots will remain dangling and lead
to the accumulation of WAL. So, I think the first thing to do is to
avoid such cases, both for failover and non-failover slots. Then we
should consider ways to allow overwriting existing slots on standby in
the scenario you explained.
--
With Regards,
Amit Kapila.
yes of course, maybe for PG 19
Regards,
Fabrice
On Thu, Jun 12, 2025 at 12:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Jun 12, 2025 at 3:53 PM Fabrice Chapuis <fabrice636861@gmail.com> wrote:
>
> However, the problem still persists: it is currently not possible to perform an automatic switchover after creating a new subscription.
>
> Would it be reasonable to consider adding a GUC to address this issue?
> I can propose a patch in that sense if it seems appropriate.
>
Yeah, we can consider that, though I don't know at this stage if GUC
is the only way, but I hope you understand that it will be for PG19.
--
With Regards,
Amit Kapila.
Hi Amit,
Here is a proposed solution to handle the problem of creating the logical replication slot on standby after a switchover.
Thank you for your comments and help on this issue
Here is a proposed solution to handle the problem of creating the logical replication slot on standby after a switchover.
Thank you for your comments and help on this issue
Regards
Fabrice
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 656e66e..296840a 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -627,6 +627,7 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
ReplicationSlot *slot;
XLogRecPtr latestFlushPtr;
bool slot_updated = false;
+ bool overwriting_failover_slot = true; /* could be a GUC */
/*
* Make sure that concerned WAL is received and flushed before syncing
@@ -654,19 +655,37 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
if ((slot = SearchNamedReplicationSlot(remote_slot->name, true)))
{
bool synced;
+ bool failover_status = remote_slot->failover;;
SpinLockAcquire(&slot->mutex);
synced = slot->data.synced;
SpinLockRelease(&slot->mutex);
- /* User-created slot with the same name exists, raise ERROR. */
- if (!synced)
- ereport(ERROR,
+ if (!synced){
+ /*
+ * Check if we need to overwrite an existing failover slot and
+ * if slot has the failover flag set to true
+ * and the sync_replication_slots is on,
+ * other check could be added here */
+ if (overwriting_failover_slot && failover_status && sync_replication_slots){
+
+ /* Get rid of a replication slot that is no longer wanted */
+ ReplicationSlotDrop(remote_slot->name, true);
+ ereport(WARNING,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("slot \"%s\" already exists"
+ " on the standby but it will be dropped because overwriting_failover_slot is set to true",
+ remote_slot->name));
+ return false; /* Going back to the main loop after droping the failover slot */
+ }
+ /* User-created slot with the same name exists, raise ERROR. */
+ else
+ ereport(ERROR,
errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("exiting from slot synchronization because same"
" name slot \"%s\" already exists on the standby",
remote_slot->name));
-
+ }
/*
* The slot has been synchronized before.
*
On Thu, Jun 12, 2025 at 4:27 PM Fabrice Chapuis <fabrice636861@gmail.com> wrote:
yes of course, maybe for PG 19Regards,FabriceOn Thu, Jun 12, 2025 at 12:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote:On Thu, Jun 12, 2025 at 3:53 PM Fabrice Chapuis <fabrice636861@gmail.com> wrote:
>
> However, the problem still persists: it is currently not possible to perform an automatic switchover after creating a new subscription.
>
> Would it be reasonable to consider adding a GUC to address this issue?
> I can propose a patch in that sense if it seems appropriate.
>
Yeah, we can consider that, though I don't know at this stage if GUC
is the only way, but I hope you understand that it will be for PG19.
--
With Regards,
Amit Kapila.
On Fri, Jul 11, 2025 at 8:42 PM Fabrice Chapuis <fabrice636861@gmail.com> wrote: > > Hi Amit, > Here is a proposed solution to handle the problem of creating the logical replication slot on standby after a switchover. > Thank you for your comments and help on this issue > > Regards > > Fabrice > > diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c > index 656e66e..296840a 100644 > --- a/src/backend/replication/logical/slotsync.c > +++ b/src/backend/replication/logical/slotsync.c > @@ -627,6 +627,7 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid) > ReplicationSlot *slot; > XLogRecPtr latestFlushPtr; > bool slot_updated = false; > + bool overwriting_failover_slot = true; /* could be a GUC */ > > /* > * Make sure that concerned WAL is received and flushed before syncing > @@ -654,19 +655,37 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid) > if ((slot = SearchNamedReplicationSlot(remote_slot->name, true))) > { > bool synced; > + bool failover_status = remote_slot->failover;; > > SpinLockAcquire(&slot->mutex); > synced = slot->data.synced; > SpinLockRelease(&slot->mutex); > > - /* User-created slot with the same name exists, raise ERROR. */ > - if (!synced) > - ereport(ERROR, > + if (!synced){ > + /* > + * Check if we need to overwrite an existing failover slot and > + * if slot has the failover flag set to true > + * and the sync_replication_slots is on, > + * other check could be added here */ > + if (overwriting_failover_slot && failover_status && sync_replication_slots){ > + I think we don't need to explicitly check sync_replication_slots, as we should reach here only when that flag is set. I think we should introduce an pg_alter_replication_slot which allows to overwrite existing slots during sync by setting a parameter like allow_overwrite (or something like that). This API will be useful for other purposes, like changing two_phase or failover properties of the slot after the creation of the slot. BTW, we also discussed supporting pg_drop_all_slots kind of API as well. See if you are interested in implementing that API as well. Note: I suggest starting a new thread with the concrete proposal for the new API or GUC, stating how it will be helpful. It might help in getting suggestions from others as well. -- With Regards, Amit Kapila.
Thanks for this feedback,
I'll remove the check on the sync_replication_slots parameter.
I think it is interesting as you suggest to start with the idea of the pg_alter_replication_slot API, I will make a new proposal by opening a new thread, hoping to be supported in my approach. Is there already an initiative about the pg_drop_all_slots API?
I'll remove the check on the sync_replication_slots parameter.
I think it is interesting as you suggest to start with the idea of the pg_alter_replication_slot API, I will make a new proposal by opening a new thread, hoping to be supported in my approach. Is there already an initiative about the pg_drop_all_slots API?
Best Regards,
Fabrice
On Sun, Jul 13, 2025 at 1:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Jul 11, 2025 at 8:42 PM Fabrice Chapuis <fabrice636861@gmail.com> wrote:
>
> Hi Amit,
> Here is a proposed solution to handle the problem of creating the logical replication slot on standby after a switchover.
> Thank you for your comments and help on this issue
>
> Regards
>
> Fabrice
>
> diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
> index 656e66e..296840a 100644
> --- a/src/backend/replication/logical/slotsync.c
> +++ b/src/backend/replication/logical/slotsync.c
> @@ -627,6 +627,7 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
> ReplicationSlot *slot;
> XLogRecPtr latestFlushPtr;
> bool slot_updated = false;
> + bool overwriting_failover_slot = true; /* could be a GUC */
>
> /*
> * Make sure that concerned WAL is received and flushed before syncing
> @@ -654,19 +655,37 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
> if ((slot = SearchNamedReplicationSlot(remote_slot->name, true)))
> {
> bool synced;
> + bool failover_status = remote_slot->failover;;
>
> SpinLockAcquire(&slot->mutex);
> synced = slot->data.synced;
> SpinLockRelease(&slot->mutex);
>
> - /* User-created slot with the same name exists, raise ERROR. */
> - if (!synced)
> - ereport(ERROR,
> + if (!synced){
> + /*
> + * Check if we need to overwrite an existing failover slot and
> + * if slot has the failover flag set to true
> + * and the sync_replication_slots is on,
> + * other check could be added here */
> + if (overwriting_failover_slot && failover_status && sync_replication_slots){
> +
I think we don't need to explicitly check sync_replication_slots, as
we should reach here only when that flag is set. I think we should
introduce an pg_alter_replication_slot which allows to overwrite
existing slots during sync by setting a parameter like allow_overwrite
(or something like that). This API will be useful for other purposes,
like changing two_phase or failover properties of the slot after the
creation of the slot. BTW, we also discussed supporting
pg_drop_all_slots kind of API as well. See if you are interested in
implementing that API as well.
Note: I suggest starting a new thread with the concrete proposal for
the new API or GUC, stating how it will be helpful. It might help in
getting suggestions from others as well.
--
With Regards,
Amit Kapila.
On Mon, Jul 14, 2025 at 9:10 PM Fabrice Chapuis <fabrice636861@gmail.com> wrote: > > Thanks for this feedback, > I'll remove the check on the sync_replication_slots parameter. > I think it is interesting as you suggest to start with the idea of the pg_alter_replication_slot API, I will make a newproposal by opening a new thread, hoping to be supported in my approach. Is there already an initiative about the pg_drop_all_slotsAPI? > No, to my knowledge, there is no other initiative for the pg_drop_all_slots() API. I think we first got the use case where users can drop all slots on new standby after failover to avoid excessive resource usage. -- With Regards, Amit Kapila.