Обсуждение: Proposal: GUC to control starting/stopping logical subscription workers

Поиск
Список
Период
Сортировка

Proposal: GUC to control starting/stopping logical subscription workers

От
SATYANARAYANA NARLAPURAM
Дата:
Hi all,

I couldn't find a previous discussion on a new GUC to globally enable or disable logical subscription workers at the instance level. So starting a new thread on this.

In multi-region or high-availability setups, a promoted standby often requires a controlled switchover before it should start applying logical replication changes from upstream. Without such control, a promoted standby may immediately attempt to connect to the publisher as a logical subscriber, which can cause it to unexpectedly take over replication slots, start pulling changes before the setup is ready, or even conflict with the original primary that is still using those slots. Disabling the subscription on the primary before promoting a standby is not possible in all cases, for example during PITR or data center outages.

Providing a way to keep logical subscriptions globally disabled—via a GUC setting—prior to promotion ensures that no changes are accidentally pulled or applied before the system is fully prepared. This avoids race conditions and the risk of data divergence.

I would like to propose adding a GUC with the following behavior:
  1. Default value for the GUC is ON, same behavior as now without the GUC 
  2. When off, no new apply workers start and existing ones exit gracefully similar to when subscription disabled
  3. When turned on again, behavior will be the same as the current behavior
  4. This GUC shouldn't require a restart

Attaching a draft patch. Please let me know your thoughts.

Thanks,
Satya


Вложения

Re: Proposal: GUC to control starting/stopping logical subscription workers

От
Bharath Rupireddy
Дата:
Hi,

On Tue, Aug 12, 2025 at 8:40 PM SATYANARAYANA NARLAPURAM
<satyanarlapuram@gmail.com> wrote:
>
> I couldn't find a previous discussion on a new GUC to globally enable or disable logical subscription workers at the
instancelevel. So starting a new thread on this. 
>
> In multi-region or high-availability setups, a promoted standby often requires a controlled switchover before it
shouldstart applying logical replication changes from upstream. Without such control, a promoted standby may
immediatelyattempt to connect to the publisher as a logical subscriber, which can cause it to unexpectedly take over
replicationslots, start pulling changes before the setup is ready, or even conflict with the original primary that is
stillusing those slots. Disabling the subscription on the primary before promoting a standby is not possible in all
cases,for example during PITR or data center outages. 
>
> Providing a way to keep logical subscriptions globally disabled—via a GUC setting—prior to promotion ensures that no
changesare accidentally pulled or applied before the system is fully prepared. This avoids race conditions and the risk
ofdata divergence. 
>
> I would like to propose adding a GUC with the following behavior:
>
> Default value for the GUC is ON, same behavior as now without the GUC
> When off, no new apply workers start and existing ones exit gracefully similar to when subscription disabled
> When turned on again, behavior will be the same as the current behavior
> This GUC shouldn't require a restart

If I understand correctly, the end effect is similar to disabling all
subscriptions. Why not just add ALTER SUBSCRIPTION ... DISABLE for all
subscriptions in the failover work flow? Migration of logical
replication slots docs says so -
https://www.postgresql.org/docs/18/logical-replication-upgrade.html.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Proposal: GUC to control starting/stopping logical subscription workers

От
SATYANARAYANA NARLAPURAM
Дата:
HI Bharat,
 

If I understand correctly, the end effect is similar to disabling all
subscriptions. Why not just add ALTER SUBSCRIPTION ... DISABLE for all
subscriptions in the failover work flow? Migration of logical
replication slots docs says so -
https://www.postgresql.org/docs/18/logical-replication-upgrade.html.

The scenarios I am talking in this case are no major version upgrade, but PITR and Standby promotion cases. 
Server is in read only mode (catalog cannot be updated) before promotion and subscriptions cannot be disabled.

 
--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Proposal: GUC to control starting/stopping logical subscription workers

От
Bharath Rupireddy
Дата:
Hi,

On Tue, Sep 9, 2025 at 1:16 PM SATYANARAYANA NARLAPURAM
<satyanarlapuram@gmail.com> wrote:
>
>> If I understand correctly, the end effect is similar to disabling all
>> subscriptions. Why not just add ALTER SUBSCRIPTION ... DISABLE for all
>> subscriptions in the failover work flow? Migration of logical
>> replication slots docs says so -
>> https://www.postgresql.org/docs/18/logical-replication-upgrade.html.
>
> The scenarios I am talking in this case are no major version upgrade, but PITR and Standby promotion cases.
> Server is in read only mode (catalog cannot be updated) before promotion and subscriptions cannot be disabled.

Thanks for clarifying. AFAICS, failover slots won't have this issue.
All the replication connections start to fail during standby's
promotion (StartLogicalReplication->CreateDecodingContext->errmsg("cannot
use replication slot \"%s\" for logical decoding") and replication
from publisher resumes automatically after promotion and slots are
fully synced.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Proposal: GUC to control starting/stopping logical subscription workers

От
"Euler Taveira"
Дата:
On Wed, Aug 13, 2025, at 12:40 AM, SATYANARAYANA NARLAPURAM wrote:
> I couldn't find a previous discussion on a new GUC to globally enable
> or disable logical subscription workers at the instance level. So
> starting a new thread on this.
>

max_logical_replication_workers.

> In multi-region or high-availability setups, a promoted standby often
> requires a controlled switchover before it should start applying
> logical replication changes from upstream. Without such control, a
> promoted standby may immediately attempt to connect to the publisher as
> a logical subscriber, which can cause it to unexpectedly take over
> replication slots, start pulling changes before the setup is ready, or
> even conflict with the original primary that is still using those
> slots. Disabling the subscription on the primary before promoting a
> standby is not possible in all cases, for example during PITR or data
> center outages.
> Providing a way to keep logical subscriptions globally disabled—via a
> GUC setting—prior to promotion ensures that no changes are accidentally
> pulled or applied before the system is fully prepared. This avoids race
> conditions and the risk of data divergence.
>

Why do you need another GUC? The max_logical_replication_workers parameter is
useful for this exact scenario. For example, pg_createsubscriber uses it to not
start logical replication while converting a physical replica into a logical
one.

> I would like to propose adding a GUC with the following behavior:
>  1. Default value for the GUC is ON, same behavior as now without the
> GUC
>  2. When off, no new apply workers start and existing ones exit
> gracefully similar to when subscription disabled
>  3. When turned on again, behavior will be the same as the current
> behavior
>  4. This GUC shouldn't require a restart
>

That's the only point not covered by the current behavior. You don't explain
why it is a requirement.


--
Euler Taveira
EDB   https://www.enterprisedb.com/



Re: Proposal: GUC to control starting/stopping logical subscription workers

От
SATYANARAYANA NARLAPURAM
Дата:
Hi Euler,

On Wed, Sep 10, 2025 at 5:11 PM Euler Taveira <euler@eulerto.com> wrote:
On Wed, Aug 13, 2025, at 12:40 AM, SATYANARAYANA NARLAPURAM wrote:
> I couldn't find a previous discussion on a new GUC to globally enable
> or disable logical subscription workers at the instance level. So
> starting a new thread on this.
>

max_logical_replication_workers.

Thanks for the pointer, it was not obvious to me earlier. This should work in my scenario. Should the documents state that setting this to zero has the same effect of disabling the publishers and subscribers?
 

> In multi-region or high-availability setups, a promoted standby often
> requires a controlled switchover before it should start applying
> logical replication changes from upstream. Without such control, a
> promoted standby may immediately attempt to connect to the publisher as
> a logical subscriber, which can cause it to unexpectedly take over
> replication slots, start pulling changes before the setup is ready, or
> even conflict with the original primary that is still using those
> slots. Disabling the subscription on the primary before promoting a
> standby is not possible in all cases, for example during PITR or data
> center outages.
> Providing a way to keep logical subscriptions globally disabled—via a
> GUC setting—prior to promotion ensures that no changes are accidentally
> pulled or applied before the system is fully prepared. This avoids race
> conditions and the risk of data divergence.
>

Why do you need another GUC? The max_logical_replication_workers parameter is
useful for this exact scenario. For example, pg_createsubscriber uses it to not
start logical replication while converting a physical replica into a logical
one.

As mentioned earlier, I don't have any scenario why a separate GUC needed based on the above explanation.
 

> I would like to propose adding a GUC with the following behavior:
>  1. Default value for the GUC is ON, same behavior as now without the
> GUC
>  2. When off, no new apply workers start and existing ones exit
> gracefully similar to when subscription disabled
>  3. When turned on again, behavior will be the same as the current
> behavior
>  4. This GUC shouldn't require a restart
>

That's the only point not covered by the current behavior. You don't explain
why it is a requirement.

 Maybe not restarting the instance is the only use case but I can live with it.