Обсуждение: Unexpected Standby Shutdown on sync_replication_slots change

Поиск
Список
Период
Сортировка

Unexpected Standby Shutdown on sync_replication_slots change

От
Hugo DUBOIS
Дата:

Hello,

I'm not sure if it's a bug but I've encountered an unexpected behavior when dynamically changing the sync_replication_slots parameter on a PostgreSQL 17 standby server. Instead of logging an error and continuing to run, the standby instance shuts down with a FATAL error, which is not the anticipated behavior for a dynamic parameter change, especially when the documentation doesn't indicate such an outcome.

Steps to Reproduce

  1. Set up a physical replication between two PostgreSQL 17.5 instances.

  2. Ensure wal_level on the primary (and consequently on the standby) is set to replica.

  3. Start both the primary and standby instances, confirming replication is active.

  4. On the standby instance, dynamically change the sync_replication_slots parameter (I have run the following query: ALTER SYSTEM SET sync_replication_slots = 'on'; followed by SELECT pg_reload_conf();)

Expected Behavior

I expected the standby instance to continue running and log an error message (similar to how hot_standby_feedback behaves when not enabled, e.g., a loop of LOG: replication slot synchronization requires "hot_standby_feedback" to be enabled). A FATAL error leading to an unexpected shutdown for a dynamic parameter change on a running standby is not the anticipated behavior. The documentation for sync_replication_slots also doesn't indicate that a misconfiguration or incompatible wal_level would lead to a shutdown.

Actual Behavior

Upon attempting to set sync_replication_slots to on on the standby with wal_level set to replica, the standby instance immediately shuts down with the following log messages:

LOG:  database system is ready to accept read-only connections
LOG:  started streaming WAL from primary at 0/3000000 on timeline 1
LOG:  received SIGHUP, reloading configuration files
LOG:  parameter "sync_replication_slots" changed to "on"
FATAL:  replication slot synchronization requires "wal_level" >= "logical"

Environment

  • PostgreSQL Version: 17.5

Re: Unexpected Standby Shutdown on sync_replication_slots change

От
Fujii Masao
Дата:
On Thu, Jul 24, 2025 at 10:54 PM Hugo DUBOIS <hdubois@scaleway.com> wrote:
>
> Hello,
>
> I'm not sure if it's a bug but I've encountered an unexpected behavior when dynamically changing the
sync_replication_slotsparameter on a PostgreSQL 17 standby server. Instead of logging an error and continuing to run,
thestandby instance shuts down with a FATAL error, which is not the anticipated behavior for a dynamic parameter
change,especially when the documentation doesn't indicate such an outcome. 
>
> Steps to Reproduce
>
> Set up a physical replication between two PostgreSQL 17.5 instances.
>
> Ensure wal_level on the primary (and consequently on the standby) is set to replica.
>
> Start both the primary and standby instances, confirming replication is active.
>
> On the standby instance, dynamically change the sync_replication_slots parameter (I have run the following query:
ALTERSYSTEM SET sync_replication_slots = 'on'; followed by SELECT pg_reload_conf();) 
>
> Expected Behavior
>
> I expected the standby instance to continue running and log an error message (similar to how hot_standby_feedback
behaveswhen not enabled, e.g., a loop of LOG: replication slot synchronization requires "hot_standby_feedback" to be
enabled).A FATAL error leading to an unexpected shutdown for a dynamic parameter change on a running standby is not the
anticipatedbehavior. The documentation for sync_replication_slots also doesn't indicate that a misconfiguration or
incompatiblewal_level would lead to a shutdown. 
>
> Actual Behavior
>
> Upon attempting to set sync_replication_slots to on on the standby with wal_level set to replica, the standby
instanceimmediately shuts down with the following log messages: 
>
> LOG:  database system is ready to accept read-only connections
> LOG:  started streaming WAL from primary at 0/3000000 on timeline 1
> LOG:  received SIGHUP, reloading configuration files
> LOG:  parameter "sync_replication_slots" changed to "on"
> FATAL:  replication slot synchronization requires "wal_level" >= "logical"
>
> Environment
>
> PostgreSQL Version: 17.5

Thanks for the report!

I was able to reproduce the issue even on the latest master (v19dev).
I agree that the current behavior—where changing a GUC parameter can
cause the server to shut down—is unexpected and should be avoided.

From what I’ve seen in the code, the problem stems from postmaster
calling ValidateSlotSyncParams() before starting the slot sync worker.
That function raises an ERROR if wal_level is not logical while
sync_replication_slots is enabled. Since ERROR is treated as FATAL
in postmaster, it causes the server to exit.

To fix this, we could modify ValidateSlotSyncParams() so it doesn’t
raise an ERROR in this case, as follows.

 ValidateSlotSyncParams(int elevel)
 {
  /*
  * Logical slot sync/creation requires wal_level >= logical.
- *
- * Since altering the wal_level requires a server restart, so error out in
- * this case regardless of elevel provided by caller.
  */
  if (wal_level < WAL_LEVEL_LOGICAL)
- ereport(ERROR,
+ {
+ ereport(elevel,
  errcode(ERRCODE_INVALID_PARAMETER_VALUE),
  errmsg("replication slot synchronization requires \"wal_level\" >=
\"logical\""));
+ return false;
+ }

Regards,

--
Fujii Masao



Re: Unexpected Standby Shutdown on sync_replication_slots change

От
Fujii Masao
Дата:
On Fri, Jul 25, 2025 at 12:55 AM Fujii Masao <masao.fujii@gmail.com> wrote:
>
> On Thu, Jul 24, 2025 at 10:54 PM Hugo DUBOIS <hdubois@scaleway.com> wrote:
> >
> > Hello,
> >
> > I'm not sure if it's a bug but I've encountered an unexpected behavior when dynamically changing the
sync_replication_slotsparameter on a PostgreSQL 17 standby server. Instead of logging an error and continuing to run,
thestandby instance shuts down with a FATAL error, which is not the anticipated behavior for a dynamic parameter
change,especially when the documentation doesn't indicate such an outcome. 
> >
> > Steps to Reproduce
> >
> > Set up a physical replication between two PostgreSQL 17.5 instances.
> >
> > Ensure wal_level on the primary (and consequently on the standby) is set to replica.
> >
> > Start both the primary and standby instances, confirming replication is active.
> >
> > On the standby instance, dynamically change the sync_replication_slots parameter (I have run the following query:
ALTERSYSTEM SET sync_replication_slots = 'on'; followed by SELECT pg_reload_conf();) 
> >
> > Expected Behavior
> >
> > I expected the standby instance to continue running and log an error message (similar to how hot_standby_feedback
behaveswhen not enabled, e.g., a loop of LOG: replication slot synchronization requires "hot_standby_feedback" to be
enabled).A FATAL error leading to an unexpected shutdown for a dynamic parameter change on a running standby is not the
anticipatedbehavior. The documentation for sync_replication_slots also doesn't indicate that a misconfiguration or
incompatiblewal_level would lead to a shutdown. 
> >
> > Actual Behavior
> >
> > Upon attempting to set sync_replication_slots to on on the standby with wal_level set to replica, the standby
instanceimmediately shuts down with the following log messages: 
> >
> > LOG:  database system is ready to accept read-only connections
> > LOG:  started streaming WAL from primary at 0/3000000 on timeline 1
> > LOG:  received SIGHUP, reloading configuration files
> > LOG:  parameter "sync_replication_slots" changed to "on"
> > FATAL:  replication slot synchronization requires "wal_level" >= "logical"
> >
> > Environment
> >
> > PostgreSQL Version: 17.5
>
> Thanks for the report!
>
> I was able to reproduce the issue even on the latest master (v19dev).
> I agree that the current behavior—where changing a GUC parameter can
> cause the server to shut down—is unexpected and should be avoided.
>
> From what I’ve seen in the code, the problem stems from postmaster
> calling ValidateSlotSyncParams() before starting the slot sync worker.
> That function raises an ERROR if wal_level is not logical while
> sync_replication_slots is enabled. Since ERROR is treated as FATAL
> in postmaster, it causes the server to exit.
>
> To fix this, we could modify ValidateSlotSyncParams() so it doesn’t
> raise an ERROR in this case, as follows.
>
>  ValidateSlotSyncParams(int elevel)
>  {
>   /*
>   * Logical slot sync/creation requires wal_level >= logical.
> - *
> - * Since altering the wal_level requires a server restart, so error out in
> - * this case regardless of elevel provided by caller.
>   */
>   if (wal_level < WAL_LEVEL_LOGICAL)
> - ereport(ERROR,
> + {
> + ereport(elevel,
>   errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>   errmsg("replication slot synchronization requires \"wal_level\" >=
> \"logical\""));
> + return false;
> + }

I've created a patch to implement the above—attached.

Note that this patch does not change the existing behavior when
the misconfiguration (sync_replication_slots enabled but wal_level not
set to logical) is detected at server startup. In that case, the server
still shuts down with a FATAL error, which is consistent with other
settings like summarize_wal.

Regards,

--
Fujii Masao

Вложения

Re: Unexpected Standby Shutdown on sync_replication_slots change

От
shveta malik
Дата:
On Fri, Jul 25, 2025 at 12:20 AM Fujii Masao <masao.fujii@gmail.com> wrote:
>
> On Fri, Jul 25, 2025 at 12:55 AM Fujii Masao <masao.fujii@gmail.com> wrote:
> >
> > On Thu, Jul 24, 2025 at 10:54 PM Hugo DUBOIS <hdubois@scaleway.com> wrote:
> > >
> > > Hello,
> > >
> > > I'm not sure if it's a bug but I've encountered an unexpected behavior when dynamically changing the
sync_replication_slotsparameter on a PostgreSQL 17 standby server. Instead of logging an error and continuing to run,
thestandby instance shuts down with a FATAL error, which is not the anticipated behavior for a dynamic parameter
change,especially when the documentation doesn't indicate such an outcome. 
> > >
> > > Steps to Reproduce
> > >
> > > Set up a physical replication between two PostgreSQL 17.5 instances.
> > >
> > > Ensure wal_level on the primary (and consequently on the standby) is set to replica.
> > >
> > > Start both the primary and standby instances, confirming replication is active.
> > >
> > > On the standby instance, dynamically change the sync_replication_slots parameter (I have run the following query:
ALTERSYSTEM SET sync_replication_slots = 'on'; followed by SELECT pg_reload_conf();) 
> > >
> > > Expected Behavior
> > >
> > > I expected the standby instance to continue running and log an error message (similar to how hot_standby_feedback
behaveswhen not enabled, e.g., a loop of LOG: replication slot synchronization requires "hot_standby_feedback" to be
enabled).A FATAL error leading to an unexpected shutdown for a dynamic parameter change on a running standby is not the
anticipatedbehavior. The documentation for sync_replication_slots also doesn't indicate that a misconfiguration or
incompatiblewal_level would lead to a shutdown. 
> > >
> > > Actual Behavior
> > >
> > > Upon attempting to set sync_replication_slots to on on the standby with wal_level set to replica, the standby
instanceimmediately shuts down with the following log messages: 
> > >
> > > LOG:  database system is ready to accept read-only connections
> > > LOG:  started streaming WAL from primary at 0/3000000 on timeline 1
> > > LOG:  received SIGHUP, reloading configuration files
> > > LOG:  parameter "sync_replication_slots" changed to "on"
> > > FATAL:  replication slot synchronization requires "wal_level" >= "logical"
> > >
> > > Environment
> > >
> > > PostgreSQL Version: 17.5
> >
> > Thanks for the report!
> >
> > I was able to reproduce the issue even on the latest master (v19dev).
> > I agree that the current behavior—where changing a GUC parameter can
> > cause the server to shut down—is unexpected and should be avoided.
> >
> > From what I’ve seen in the code, the problem stems from postmaster
> > calling ValidateSlotSyncParams() before starting the slot sync worker.
> > That function raises an ERROR if wal_level is not logical while
> > sync_replication_slots is enabled. Since ERROR is treated as FATAL
> > in postmaster, it causes the server to exit.
> >
> > To fix this, we could modify ValidateSlotSyncParams() so it doesn’t
> > raise an ERROR in this case, as follows.
> >
> >  ValidateSlotSyncParams(int elevel)
> >  {
> >   /*
> >   * Logical slot sync/creation requires wal_level >= logical.
> > - *
> > - * Since altering the wal_level requires a server restart, so error out in
> > - * this case regardless of elevel provided by caller.
> >   */
> >   if (wal_level < WAL_LEVEL_LOGICAL)
> > - ereport(ERROR,
> > + {
> > + ereport(elevel,
> >   errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> >   errmsg("replication slot synchronization requires \"wal_level\" >=
> > \"logical\""));
> > + return false;
> > + }
>
> I've created a patch to implement the above—attached.

Thank You for the patch.

> Note that this patch does not change the existing behavior when
> the misconfiguration (sync_replication_slots enabled but wal_level not
> set to logical) is detected at server startup. In that case, the server
> still shuts down with a FATAL error, which is consistent with other
> settings like summarize_wal.
>

Validated the behaviour, the patch looks good to me.

thanks
Shveta



Re: Unexpected Standby Shutdown on sync_replication_slots change

От
Hugo DUBOIS
Дата:

Thanks for the quick patch. I've tested it on the REL_17_STABLE branch, and it's working fine.

On a standby node, after dynamically setting the sync_replication_slots parameter, I observed the following logs. The instance did not shut down, which seems correct:

2025-07-25 09:26:34.613 UTC [4420] LOG:  received SIGHUP, reloading configuration files
2025-07-25 09:26:34.614 UTC [4420] LOG:  parameter "sync_replication_slots" changed to "on"
2025-07-25 09:26:34.615 UTC [4420] LOG:  replication slot synchronization requires "wal_level" >= "logical"
2025-07-25 09:27:34.662 UTC [4420] LOG:  replication slot synchronization requires "wal_level" >= "logical"

The instance did not restart as expected, showing this fatal log:

2025-07-25 09:27:45.668 UTC [4430] FATAL:  replication slot synchronization ("sync_replication_slots" = on) requires "wal_level" >= "logical"

I have a couple of observations:

  • With this patch, a primary instance will not restart if the configuration is incorrect.

  • Only wal_level is checked, but the ValidateSlotSyncParams function includes other mandatory parameters. These are not being checked during startup.

Regards,

Hugo

Le ven. 25 juil. 2025 à 08:16, shveta malik <shveta.malik@gmail.com> a écrit :
On Fri, Jul 25, 2025 at 12:20 AM Fujii Masao <masao.fujii@gmail.com> wrote:
>
> On Fri, Jul 25, 2025 at 12:55 AM Fujii Masao <masao.fujii@gmail.com> wrote:
> >
> > On Thu, Jul 24, 2025 at 10:54 PM Hugo DUBOIS <hdubois@scaleway.com> wrote:
> > >
> > > Hello,
> > >
> > > I'm not sure if it's a bug but I've encountered an unexpected behavior when dynamically changing the sync_replication_slots parameter on a PostgreSQL 17 standby server. Instead of logging an error and continuing to run, the standby instance shuts down with a FATAL error, which is not the anticipated behavior for a dynamic parameter change, especially when the documentation doesn't indicate such an outcome.
> > >
> > > Steps to Reproduce
> > >
> > > Set up a physical replication between two PostgreSQL 17.5 instances.
> > >
> > > Ensure wal_level on the primary (and consequently on the standby) is set to replica.
> > >
> > > Start both the primary and standby instances, confirming replication is active.
> > >
> > > On the standby instance, dynamically change the sync_replication_slots parameter (I have run the following query: ALTER SYSTEM SET sync_replication_slots = 'on'; followed by SELECT pg_reload_conf();)
> > >
> > > Expected Behavior
> > >
> > > I expected the standby instance to continue running and log an error message (similar to how hot_standby_feedback behaves when not enabled, e.g., a loop of LOG: replication slot synchronization requires "hot_standby_feedback" to be enabled). A FATAL error leading to an unexpected shutdown for a dynamic parameter change on a running standby is not the anticipated behavior. The documentation for sync_replication_slots also doesn't indicate that a misconfiguration or incompatible wal_level would lead to a shutdown.
> > >
> > > Actual Behavior
> > >
> > > Upon attempting to set sync_replication_slots to on on the standby with wal_level set to replica, the standby instance immediately shuts down with the following log messages:
> > >
> > > LOG:  database system is ready to accept read-only connections
> > > LOG:  started streaming WAL from primary at 0/3000000 on timeline 1
> > > LOG:  received SIGHUP, reloading configuration files
> > > LOG:  parameter "sync_replication_slots" changed to "on"
> > > FATAL:  replication slot synchronization requires "wal_level" >= "logical"
> > >
> > > Environment
> > >
> > > PostgreSQL Version: 17.5
> >
> > Thanks for the report!
> >
> > I was able to reproduce the issue even on the latest master (v19dev).
> > I agree that the current behavior—where changing a GUC parameter can
> > cause the server to shut down—is unexpected and should be avoided.
> >
> > From what I’ve seen in the code, the problem stems from postmaster
> > calling ValidateSlotSyncParams() before starting the slot sync worker.
> > That function raises an ERROR if wal_level is not logical while
> > sync_replication_slots is enabled. Since ERROR is treated as FATAL
> > in postmaster, it causes the server to exit.
> >
> > To fix this, we could modify ValidateSlotSyncParams() so it doesn’t
> > raise an ERROR in this case, as follows.
> >
> >  ValidateSlotSyncParams(int elevel)
> >  {
> >   /*
> >   * Logical slot sync/creation requires wal_level >= logical.
> > - *
> > - * Since altering the wal_level requires a server restart, so error out in
> > - * this case regardless of elevel provided by caller.
> >   */
> >   if (wal_level < WAL_LEVEL_LOGICAL)
> > - ereport(ERROR,
> > + {
> > + ereport(elevel,
> >   errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> >   errmsg("replication slot synchronization requires \"wal_level\" >=
> > \"logical\""));
> > + return false;
> > + }
>
> I've created a patch to implement the above—attached.

Thank You for the patch.

> Note that this patch does not change the existing behavior when
> the misconfiguration (sync_replication_slots enabled but wal_level not
> set to logical) is detected at server startup. In that case, the server
> still shuts down with a FATAL error, which is consistent with other
> settings like summarize_wal.
>

Validated the behaviour, the patch looks good to me.

thanks
Shveta


--

Cordialement,

Scaleway

Hugo DUBOIS
Devops Engineer

hdubois@scaleway.com

Re: Unexpected Standby Shutdown on sync_replication_slots change

От
Fujii Masao
Дата:
On Fri, Jul 25, 2025 at 6:40 PM Hugo DUBOIS <hdubois@scaleway.com> wrote:
>
> Thanks for the quick patch. I've tested it on the REL_17_STABLE branch, and it's working fine.
>
> On a standby node, after dynamically setting the sync_replication_slots parameter, I observed the following logs. The
instancedid not shut down, which seems correct: 
>
> 2025-07-25 09:26:34.613 UTC [4420] LOG:  received SIGHUP, reloading configuration files
> 2025-07-25 09:26:34.614 UTC [4420] LOG:  parameter "sync_replication_slots" changed to "on"
> 2025-07-25 09:26:34.615 UTC [4420] LOG:  replication slot synchronization requires "wal_level" >= "logical"
> 2025-07-25 09:27:34.662 UTC [4420] LOG:  replication slot synchronization requires "wal_level" >= "logical"
>
> The instance did not restart as expected, showing this fatal log:
>
> 2025-07-25 09:27:45.668 UTC [4430] FATAL:  replication slot synchronization ("sync_replication_slots" = on) requires
"wal_level">= "logical" 

Thanks for testing!


> I have a couple of observations:
>
> With this patch, a primary instance will not restart if the configuration is incorrect.

Right. In HEAD, the server shuts down due to the misconfiguration only when
starting up as a standby and entering hot standby mode. With this patch,
the server shuts down earlier—during startup—even when starting as a primary.
If starting as a standby, the shutdown now happens before entering hot standby.

It seems hard to match the current behavior exactly in the patch, and I’m not
sure it’s worth trying. I think we have two options:

#1. Make the server always shut down on misconfiguration, regardless of
    whether it starts as a primary or standby, and before reaching hot standby.

#2. Allow the server to start even if the configuration is invalid.

The current patch follows option #1. But if there are valid use cases for
starting a primary with sync_replication_slots enabled and wal_level not set
to logical, then option #2 might be better. What do you think?


> Only wal_level is checked, but the ValidateSlotSyncParams function includes other mandatory parameters. These are not
beingchecked during startup. 

The other parameters checked by ValidateSlotSyncParams() can be changed
without restarting the server, so users can fix them on the fly while
the server is running with the misconfiguration. But wal_level requires
a server restart, so it's important to catch that upfront.

Regards,

--
Fujii Masao



Re: Unexpected Standby Shutdown on sync_replication_slots change

От
Hugo DUBOIS
Дата:


Le ven. 25 juil. 2025 à 12:32, Fujii Masao <masao.fujii@gmail.com> a écrit :
On Fri, Jul 25, 2025 at 6:40 PM Hugo DUBOIS <hdubois@scaleway.com> wrote:
>
> Thanks for the quick patch. I've tested it on the REL_17_STABLE branch, and it's working fine.
>
> On a standby node, after dynamically setting the sync_replication_slots parameter, I observed the following logs. The instance did not shut down, which seems correct:
>
> 2025-07-25 09:26:34.613 UTC [4420] LOG:  received SIGHUP, reloading configuration files
> 2025-07-25 09:26:34.614 UTC [4420] LOG:  parameter "sync_replication_slots" changed to "on"
> 2025-07-25 09:26:34.615 UTC [4420] LOG:  replication slot synchronization requires "wal_level" >= "logical"
> 2025-07-25 09:27:34.662 UTC [4420] LOG:  replication slot synchronization requires "wal_level" >= "logical"
>
> The instance did not restart as expected, showing this fatal log:
>
> 2025-07-25 09:27:45.668 UTC [4430] FATAL:  replication slot synchronization ("sync_replication_slots" = on) requires "wal_level" >= "logical"

Thanks for testing!


> I have a couple of observations:
>
> With this patch, a primary instance will not restart if the configuration is incorrect.

Right. In HEAD, the server shuts down due to the misconfiguration only when
starting up as a standby and entering hot standby mode. With this patch,
the server shuts down earlier—during startup—even when starting as a primary.
If starting as a standby, the shutdown now happens before entering hot standby.

It seems hard to match the current behavior exactly in the patch, and I’m not
sure it’s worth trying. I think we have two options:

#1. Make the server always shut down on misconfiguration, regardless of
    whether it starts as a primary or standby, and before reaching hot standby.

#2. Allow the server to start even if the configuration is invalid.

The current patch follows option #1. But if there are valid use cases for
starting a primary with sync_replication_slots enabled and wal_level not set
to logical, then option #2 might be better. What do you think?

I'm not sure if there's a particular use case for wal_level and sync_replication_slots not matching on a primary. So, for me, Option 1 seems correct.
 
 
> Only wal_level is checked, but the ValidateSlotSyncParams function includes other mandatory parameters. These are not being checked during startup.

The other parameters checked by ValidateSlotSyncParams() can be changed
without restarting the server, so users can fix them on the fly while
the server is running with the misconfiguration. But wal_level requires
a server restart, so it's important to catch that upfront.

Ok.
 
Regards,

--
Fujii Masao

Re: Unexpected Standby Shutdown on sync_replication_slots change

От
Fujii Masao
Дата:
On Fri, Jul 25, 2025 at 9:13 PM Hugo DUBOIS <hdubois@scaleway.com> wrote:
> I'm not sure if there's a particular use case for wal_level and sync_replication_slots not matching on a primary. So,
forme, Option 1 seems correct. 

I also prefer option #1.

However, on second thought, if some users are already running a server
(non-standby) with sync_replication_slots enabled and wal_level != logical
in v17, switching to option #1 could break their setup after a minor
version update. That would be surprising and confusing.

To avoid that, I think we should go with option #2—at least for v17.

Attached is an updated patch implementing option #2.

Regards,

--
Fujii Masao

Вложения

Re: Unexpected Standby Shutdown on sync_replication_slots change

От
Hugo DUBOIS
Дата:
I'm okay with that too, as long as the standby doesn't unexpectedly quit. The patch looks good to me.

Le ven. 25 juil. 2025 à 18:01, Fujii Masao <masao.fujii@gmail.com> a écrit :
On Fri, Jul 25, 2025 at 9:13 PM Hugo DUBOIS <hdubois@scaleway.com> wrote:
> I'm not sure if there's a particular use case for wal_level and sync_replication_slots not matching on a primary. So, for me, Option 1 seems correct.

I also prefer option #1.

However, on second thought, if some users are already running a server
(non-standby) with sync_replication_slots enabled and wal_level != logical
in v17, switching to option #1 could break their setup after a minor
version update. That would be surprising and confusing.

To avoid that, I think we should go with option #2—at least for v17.

Attached is an updated patch implementing option #2.

Regards,

--
Fujii Masao

Re: Unexpected Standby Shutdown on sync_replication_slots change

От
shveta malik
Дата:
On Fri, Jul 25, 2025 at 9:31 PM Fujii Masao <masao.fujii@gmail.com> wrote:
>
> On Fri, Jul 25, 2025 at 9:13 PM Hugo DUBOIS <hdubois@scaleway.com> wrote:
> > I'm not sure if there's a particular use case for wal_level and sync_replication_slots not matching on a primary.
So,for me, Option 1 seems correct. 
>
> I also prefer option #1.
>
> However, on second thought, if some users are already running a server
> (non-standby) with sync_replication_slots enabled and wal_level != logical
> in v17, switching to option #1 could break their setup after a minor
> version update. That would be surprising and confusing.
>
> To avoid that, I think we should go with option #2—at least for v17.
>

I still prefer option #1. On HEAD, option #1 makes sense because
wal_level is a GUC that requires a server restart and thus using
ERROR/FATAL in this context is appropriate. It is also consistent with
the rest of the code (other modules) wherever wal_level checks are
performed during startup.

Regarding the back branches, in the case of a primary server, if
sync_replication_slots is left ON while wal_level < logical, then
issuing a ERROR/FATAL is still more suitable. The user can simply
correct the configuration (disable sync_replication_slots) and proceed
with the startup after the minor version upgrade. Also, given that
option #1 is the better fit for HEAD, I don't think it's worth having
different behavior in the back branches. Thoughts?

thanks
Shveta



Re: Unexpected Standby Shutdown on sync_replication_slots change

От
Amit Kapila
Дата:
On Mon, Jul 28, 2025 at 10:18 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Fri, Jul 25, 2025 at 9:31 PM Fujii Masao <masao.fujii@gmail.com> wrote:
> >
> > On Fri, Jul 25, 2025 at 9:13 PM Hugo DUBOIS <hdubois@scaleway.com> wrote:
> > > I'm not sure if there's a particular use case for wal_level and sync_replication_slots not matching on a primary.
So,for me, Option 1 seems correct. 
> >
> > I also prefer option #1.
> >
> > However, on second thought, if some users are already running a server
> > (non-standby) with sync_replication_slots enabled and wal_level != logical
> > in v17, switching to option #1 could break their setup after a minor
> > version update. That would be surprising and confusing.
> >
> > To avoid that, I think we should go with option #2—at least for v17.
> >
>
> I still prefer option #1. On HEAD, option #1 makes sense because
> wal_level is a GUC that requires a server restart and thus using
> ERROR/FATAL in this context is appropriate. It is also consistent with
> the rest of the code (other modules) wherever wal_level checks are
> performed during startup.
>
> Regarding the back branches, in the case of a primary server, if
> sync_replication_slots is left ON while wal_level < logical, then
> issuing a ERROR/FATAL is still more suitable. The user can simply
> correct the configuration (disable sync_replication_slots) and proceed
> with the startup after the minor version upgrade. Also, given that
> option #1 is the better fit for HEAD, I don't think it's worth having
> different behavior in the back branches.
>

+1.

--
With Regards,
Amit Kapila.



Re: Unexpected Standby Shutdown on sync_replication_slots change

От
Fujii Masao
Дата:
On Mon, Jul 28, 2025 at 6:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Jul 28, 2025 at 10:18 AM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Fri, Jul 25, 2025 at 9:31 PM Fujii Masao <masao.fujii@gmail.com> wrote:
> > >
> > > On Fri, Jul 25, 2025 at 9:13 PM Hugo DUBOIS <hdubois@scaleway.com> wrote:
> > > > I'm not sure if there's a particular use case for wal_level and sync_replication_slots not matching on a
primary.So, for me, Option 1 seems correct. 
> > >
> > > I also prefer option #1.
> > >
> > > However, on second thought, if some users are already running a server
> > > (non-standby) with sync_replication_slots enabled and wal_level != logical
> > > in v17, switching to option #1 could break their setup after a minor
> > > version update. That would be surprising and confusing.
> > >
> > > To avoid that, I think we should go with option #2—at least for v17.
> > >
> >
> > I still prefer option #1. On HEAD, option #1 makes sense because
> > wal_level is a GUC that requires a server restart and thus using
> > ERROR/FATAL in this context is appropriate. It is also consistent with
> > the rest of the code (other modules) wherever wal_level checks are
> > performed during startup.
> >
> > Regarding the back branches, in the case of a primary server, if
> > sync_replication_slots is left ON while wal_level < logical, then
> > issuing a ERROR/FATAL is still more suitable. The user can simply
> > correct the configuration (disable sync_replication_slots) and proceed
> > with the startup after the minor version upgrade. Also, given that
> > option #1 is the better fit for HEAD, I don't think it's worth having
> > different behavior in the back branches.
> >
>
> +1.

I think it's basically not acceptable for a server to start up
successfully before a minor version update, but then fail to start with
the same configuration after the update. If that's absolutely necessary
to fix a bug, it might be justifiable. But in this case, I don't think
it's required.

Blocking startup when sync_replication_slots is enabled and wal_level
is not logical could be helpful. But that feels more like an improvement
than a bug fix. I'm fine adding that to master, but I don't think we should
apply it to old branches.

That said, both Shveta and Amit support backpatching this change,
so I'd like to hear more opinions before we decide.

Regards,

--
Fujii Masao



Re: Unexpected Standby Shutdown on sync_replication_slots change

От
Laurenz Albe
Дата:
On Tue, 2025-07-29 at 13:40 +0900, Fujii Masao wrote:
> I think it's basically not acceptable for a server to start up
> successfully before a minor version update, but then fail to start with
> the same configuration after the update. If that's absolutely necessary
> to fix a bug, it might be justifiable. But in this case, I don't think
> it's required.
>
> Blocking startup when sync_replication_slots is enabled and wal_level
> is not logical could be helpful. But that feels more like an improvement
> than a bug fix. I'm fine adding that to master, but I don't think we should
> apply it to old branches.
>
> That said, both Shveta and Amit support backpatching this change,
> so I'd like to hear more opinions before we decide.

I side with you on that one.

I think it would be fine to backpatch the breaking change to v18,
if we mention the behavior change in the release notes.

Yours,
Laurenz Albe



Re: Unexpected Standby Shutdown on sync_replication_slots change

От
shveta malik
Дата:
On Tue, Jul 29, 2025 at 12:04 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
>
> On Tue, 2025-07-29 at 13:40 +0900, Fujii Masao wrote:
> > I think it's basically not acceptable for a server to start up
> > successfully before a minor version update, but then fail to start with
> > the same configuration after the update. If that's absolutely necessary
> > to fix a bug, it might be justifiable. But in this case, I don't think
> > it's required.
> >
> > Blocking startup when sync_replication_slots is enabled and wal_level
> > is not logical could be helpful. But that feels more like an improvement
> > than a bug fix. I'm fine adding that to master, but I don't think we should
> > apply it to old branches.
> >
> > That said, both Shveta and Amit support backpatching this change,
> > so I'd like to hear more opinions before we decide.
>
> I side with you on that one.
>
> I think it would be fine to backpatch the breaking change to v18,
> if we mention the behavior change in the release notes.
>

I am fine with the suggestion.

thanks
Shveta



Re: Unexpected Standby Shutdown on sync_replication_slots change

От
Xuneng Zhou
Дата:
Hi,

On Wed, Jul 30, 2025 at 11:15 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Tue, Jul 29, 2025 at 12:04 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> >
> > On Tue, 2025-07-29 at 13:40 +0900, Fujii Masao wrote:
> > > I think it's basically not acceptable for a server to start up
> > > successfully before a minor version update, but then fail to start with
> > > the same configuration after the update. If that's absolutely necessary
> > > to fix a bug, it might be justifiable. But in this case, I don't think
> > > it's required.
> > >
> > > Blocking startup when sync_replication_slots is enabled and wal_level
> > > is not logical could be helpful. But that feels more like an improvement
> > > than a bug fix. I'm fine adding that to master, but I don't think we should
> > > apply it to old branches.
> > >
> > > That said, both Shveta and Amit support backpatching this change,
> > > so I'd like to hear more opinions before we decide.
> >
> > I side with you on that one.
> >
> > I think it would be fine to backpatch the breaking change to v18,
> > if we mention the behavior change in the release notes.
> >
>
> I am fine with the suggestion.

+ 1

This fix does feel more like an enhancement to user feedback and
safety nets, not a correction of a broken or unsafe code path. Unable
to start up with the same settings for it might cause
more-than-necessary surprise.

Best,
Xuneng



Re: Unexpected Standby Shutdown on sync_replication_slots change

От
Fujii Masao
Дата:
On Sat, Aug 2, 2025 at 1:28 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> This fix does feel more like an enhancement to user feedback and
> safety nets, not a correction of a broken or unsafe code path. Unable
> to start up with the same settings for it might cause
> more-than-necessary surprise.

Agreed.

At first, since there seems to be rough consensus on applying the current patch
(which prevents unexpected shutdown while the server is running),
I've pushed it to master and back-patched it to v17.

As for the follow-up change that prevents the server from starting with
an invalid configuration, let's continue the discussion. I've attached
the patch and agree it's a good fit for master. However, I'm afraid
it's too late to include it in v18, as beta2 has already been released
and this change is more of an improvement than a bug fix.

Regards,

--
Fujii Masao

Вложения

Re: Unexpected Standby Shutdown on sync_replication_slots change

От
shveta malik
Дата:
On Mon, Aug 4, 2025 at 5:29 PM Fujii Masao <masao.fujii@gmail.com> wrote:
>
> On Sat, Aug 2, 2025 at 1:28 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > This fix does feel more like an enhancement to user feedback and
> > safety nets, not a correction of a broken or unsafe code path. Unable
> > to start up with the same settings for it might cause
> > more-than-necessary surprise.
>
> Agreed.
>
> At first, since there seems to be rough consensus on applying the current patch
> (which prevents unexpected shutdown while the server is running),
> I've pushed it to master and back-patched it to v17.
>
> As for the follow-up change that prevents the server from starting with
> an invalid configuration, let's continue the discussion. I've attached
> the patch and agree it's a good fit for master.

+1. The patch looks good.

> However, I'm afraid
> it's too late to include it in v18, as beta2 has already been released
> and this change is more of an improvement than a bug fix.
>

If it's not possible to get it to v18 now, then I am okay to have it
on master alone.

thanks
Shveta



Re: Unexpected Standby Shutdown on sync_replication_slots change

От
shveta malik
Дата:
On Tue, Aug 5, 2025 at 10:38 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Mon, Aug 4, 2025 at 5:29 PM Fujii Masao <masao.fujii@gmail.com> wrote:
> >
> > On Sat, Aug 2, 2025 at 1:28 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > > This fix does feel more like an enhancement to user feedback and
> > > safety nets, not a correction of a broken or unsafe code path. Unable
> > > to start up with the same settings for it might cause
> > > more-than-necessary surprise.
> >
> > Agreed.
> >
> > At first, since there seems to be rough consensus on applying the current patch
> > (which prevents unexpected shutdown while the server is running),
> > I've pushed it to master and back-patched it to v17.
> >
> > As for the follow-up change that prevents the server from starting with
> > an invalid configuration, let's continue the discussion. I've attached
> > the patch and agree it's a good fit for master.
>
> +1. The patch looks good.
>
> > However, I'm afraid
> > it's too late to include it in v18, as beta2 has already been released
> > and this change is more of an improvement than a bug fix.
> >
>
> If it's not possible to get it to v18 now, then I am okay to have it
> on master alone.
>

Just wanted to confirm, do we plan to do it on master at-least?

thanks
Shveta



Re: Unexpected Standby Shutdown on sync_replication_slots change

От
Fujii Masao
Дата:
On Wed, Aug 20, 2025 at 7:16 PM shveta malik <shveta.malik@gmail.com> wrote:
> Just wanted to confirm, do we plan to do it on master at-least?

Yeah, barring any objections, I'll push the patch to the master.

Regards,

--
Fujii Masao



Re: Unexpected Standby Shutdown on sync_replication_slots change

От
shveta malik
Дата:
On Thu, Aug 21, 2025 at 7:41 AM Fujii Masao <masao.fujii@gmail.com> wrote:
>
> On Wed, Aug 20, 2025 at 7:16 PM shveta malik <shveta.malik@gmail.com> wrote:
> > Just wanted to confirm, do we plan to do it on master at-least?
>
> Yeah, barring any objections, I'll push the patch to the master.
>

Okay, thanks!



Re: Unexpected Standby Shutdown on sync_replication_slots change

От
Fujii Masao
Дата:
On Thu, Aug 21, 2025 at 1:29 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Thu, Aug 21, 2025 at 7:41 AM Fujii Masao <masao.fujii@gmail.com> wrote:
> >
> > On Wed, Aug 20, 2025 at 7:16 PM shveta malik <shveta.malik@gmail.com> wrote:
> > > Just wanted to confirm, do we plan to do it on master at-least?
> >
> > Yeah, barring any objections, I'll push the patch to the master.
> >
>
> Okay, thanks!

I've pushed the patch to master. Thanks!

Regards,

--
Fujii Masao