Обсуждение: issue with synchronized_standby_slots

Поиск
Список
Период
Сортировка

issue with synchronized_standby_slots

От
Fabrice Chapuis
Дата:
Hi,
With PG 17.5 and using logical replication failover slots.
When trying to change the value of synchronized_standby_slots, node2 was not running then the error  invalid value for parameter "synchronized_standby_slots": "node1,node2"  was generated.
The problem is that statement were affected by this and they can't execute.

STATEMENT:  select service_period,sp1_0.address_line_1 from tbl1  where sp1_0.vn=$1 order by sp1_0.start_of_period
2025-08-24 13:14:29.417 CEST [848477]: [1-1] user=,db=,client=,application= ERROR:  invalid value for parameter "synchronized_standby_slots": "node1,node2"
2025-08-24 13:14:29.417 CEST [848477]: [2-1] user=,db=,client=,application= DETAIL:  replication slot "s029054a" does not exist
2025-08-24 13:14:29.417 CEST [848477]: [3-1] user=,db=,client=,application= CONTEXT:  while setting parameter "synchronized_standby_slots" to "node1,node2"
2025-08-24 13:14:29.418 CEST [777453]: [48-1] user=,db=,client=,application= LOG:  background worker "parallel worker" (PID 848476) exited with exit code 1
2025-08-24 13:14:29.418 CEST [777453]: [49-1] user=,db=,client=,application= LOG:  background worker "parallel worker" (PID 848477) exited with exit code 1

Is this issue already observed

Thanks for your feedback

Fabrice


RE: issue with synchronized_standby_slots

От
"Zhijie Hou (Fujitsu)"
Дата:
On Thursday, September 4, 2025 9:27 PM Fabrice Chapuis <fabrice636861@gmail.com>  wrote:
> With PG 17.5 and using logical replication failover slots. When trying to
> change the value of synchronized_standby_slots, node2 was not running then the
> error  invalid value for parameter "synchronized_standby_slots": "node1,node2"
> was generated. The problem is that statement were affected by this and they
> can't execute.
>
> STATEMENT:  select service_period,sp1_0.address_line_1 from tbl1  where http://sp1_0.vn=$1 order by
sp1_0.start_of_period
> 2025-08-24 13:14:29.417 CEST [848477]: [1-1] user=,db=,client=,application= ERROR:  invalid value for parameter
"synchronized_standby_slots":"node1,node2"
 
> 2025-08-24 13:14:29.417 CEST [848477]: [2-1] user=,db=,client=,application= DETAIL:  replication slot "s029054a" does
notexist
 
> 2025-08-24 13:14:29.417 CEST [848477]: [3-1] user=,db=,client=,application= CONTEXT:  while setting parameter
"synchronized_standby_slots"to "node1,node2"
 
> 2025-08-24 13:14:29.418 CEST [777453]: [48-1] user=,db=,client=,application= LOG:  background worker "parallel
worker"(PID 848476) exited with exit code 1
 
> 2025-08-24 13:14:29.418 CEST [777453]: [49-1] user=,db=,client=,application= LOG:  background worker "parallel
worker"(PID 848477) exited with exit code 1
 
> 
> Is this issue already observed

Thank you for reporting this issue. It seems you've added a nonexistent slot to
synchronized_standby_slots before the server startup. The server does not verify
the existence of slots at startup due to the absence of slot shared information,
allowing the server to start successfully. However, when the parallel apply
worker starts, it re-verifies the GUC setting, resulting in the ERROR you saw.

I think this scenario is not necessarily a bug, as adding nonexistent slots to GUC is
disallowed. Such slots can block the logical failover slot's advancement,
increasing the risk of disk bloat due to WAL or dead rows, which is why we added
the ERROR. There are precedents for this kind of behavior, like
default_table_access_method and default_tablespace, which prevent queries if
invalid values are set before server startup.

To resolve the issue, you can remove the invalid slot from the GUC and add it
back after creating the physical slot.

I also thought about how to improve user experience for this, but it's not
feasible to verify slot existence at startup because replication has not been
restored to shared memory during GUC checks. Another option might be to simply
remove slot existence/type checks from GUC validation.

Best Regards,
Hou zj

Re: issue with synchronized_standby_slots

От
Fabrice Chapuis
Дата:
Thanks for your reply Zhijie,

I understand that the error  invalid value for parameter will be diplayed in case of bad value for the GUC synchronized_standby_slots or if a standby node configured is not up and running.
But the problem I noticed is that statements could not execute normally and error code is returned to the applcation.
This append after an upgrade from PG 14 to PG 17.
I could try to reproduce the issue

Regards,

Fabrice

On Fri, Sep 5, 2025 at 6:07 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote:
On Thursday, September 4, 2025 9:27 PM Fabrice Chapuis <fabrice636861@gmail.com>  wrote:
> With PG 17.5 and using logical replication failover slots. When trying to
> change the value of synchronized_standby_slots, node2 was not running then the
> error  invalid value for parameter "synchronized_standby_slots": "node1,node2"
> was generated. The problem is that statement were affected by this and they
> can't execute.
>
> STATEMENT:  select service_period,sp1_0.address_line_1 from tbl1  where http://sp1_0.vn=$1 order by sp1_0.start_of_period
> 2025-08-24 13:14:29.417 CEST [848477]: [1-1] user=,db=,client=,application= ERROR:  invalid value for parameter "synchronized_standby_slots": "node1,node2"
> 2025-08-24 13:14:29.417 CEST [848477]: [2-1] user=,db=,client=,application= DETAIL:  replication slot "s029054a" does not exist
> 2025-08-24 13:14:29.417 CEST [848477]: [3-1] user=,db=,client=,application= CONTEXT:  while setting parameter "synchronized_standby_slots" to "node1,node2"
> 2025-08-24 13:14:29.418 CEST [777453]: [48-1] user=,db=,client=,application= LOG:  background worker "parallel worker" (PID 848476) exited with exit code 1
> 2025-08-24 13:14:29.418 CEST [777453]: [49-1] user=,db=,client=,application= LOG:  background worker "parallel worker" (PID 848477) exited with exit code 1
>
> Is this issue already observed

Thank you for reporting this issue. It seems you've added a nonexistent slot to
synchronized_standby_slots before the server startup. The server does not verify
the existence of slots at startup due to the absence of slot shared information,
allowing the server to start successfully. However, when the parallel apply
worker starts, it re-verifies the GUC setting, resulting in the ERROR you saw.

I think this scenario is not necessarily a bug, as adding nonexistent slots to GUC is
disallowed. Such slots can block the logical failover slot's advancement,
increasing the risk of disk bloat due to WAL or dead rows, which is why we added
the ERROR. There are precedents for this kind of behavior, like
default_table_access_method and default_tablespace, which prevent queries if
invalid values are set before server startup.

To resolve the issue, you can remove the invalid slot from the GUC and add it
back after creating the physical slot.

I also thought about how to improve user experience for this, but it's not
feasible to verify slot existence at startup because replication has not been
restored to shared memory during GUC checks. Another option might be to simply
remove slot existence/type checks from GUC validation.

Best Regards,
Hou zj

Re: issue with synchronized_standby_slots

От
Alexander Kukushkin
Дата:
Hi,


On Sun, 7 Sept 2025 at 10:15, Fabrice Chapuis <fabrice636861@gmail.com> wrote:
Thanks for your reply Zhijie,

I understand that the error  invalid value for parameter will be diplayed in case of bad value for the GUC synchronized_standby_slots or if a standby node configured is not up and running.
But the problem I noticed is that statements could not execute normally and error code is returned to the applcation.
This append after an upgrade from PG 14 to PG 17.
I could try to reproduce the issue
 
> STATEMENT:  select service_period,sp1_0.address_line_1 from tbl1  where http://sp1_0.vn=$1 order by sp1_0.start_of_period
> 2025-08-24 13:14:29.417 CEST [848477]: [1-1] user=,db=,client=,application= ERROR:  invalid value for parameter "synchronized_standby_slots": "node1,node2"
> 2025-08-24 13:14:29.417 CEST [848477]: [2-1] user=,db=,client=,application= DETAIL:  replication slot "s029054a" does not exist
> 2025-08-24 13:14:29.417 CEST [848477]: [3-1] user=,db=,client=,application= CONTEXT:  while setting parameter "synchronized_standby_slots" to "node1,node2"
> 2025-08-24 13:14:29.418 CEST [777453]: [48-1] user=,db=,client=,application= LOG:  background worker "parallel worker" (PID 848476) exited with exit code 1
> 2025-08-24 13:14:29.418 CEST [777453]: [49-1] user=,db=,client=,application= LOG:  background worker "parallel worker" (PID 848477) exited with exit code 1
>
> Is this issue already observed

Recently we also hit this problem.

I think in a current state check_synchronized_standby_slots() and validate_sync_standby_slots() functions are not very useful:
- When the hook is executed from postmaster it only checks that synchronized_standby_slots contains a valid list, but doesn't check that replication slots exists, because MyProc is NULL. It happens both, on start and on reload.
- When executed from other backends set_config_with_handle() is called with elevel = 0, and therefore elevel becomes DEBUG3, which results in no useful error/warning messages.

There are a couple of places where check_synchronized_standby_slots() failure is not ignored:
1. alter system set synchronized_standby_slots='invalid value';
2. Parallel workers, because RestoreGUCState() calls set_config_option_ext()->set_config_with_handle() with elevel=ERROR. As a result, parallel workers fail to start with the error.

With parallel workers it is actually even worse - we get the error even in case of standby:
1. start standby with synchronized_standby_slots referring to non-existing slots
2. SET parallel_setup_cost, parallel_tuple_cost, and min_parallel_table_scan_size to 0
3. Run  select * from pg_namespace; and observe following error:
ERROR:  invalid value for parameter "synchronized_standby_slots": "a1,b1"
DETAIL:  replication slot "a1" does not exist
CONTEXT:  while setting parameter "synchronized_standby_slots" to "a1,b1"
parallel worker

We may argue a lot that invalid configuration must not be used, but the thing is that the main problem being solved by synchronized_standby_slots is delaying logical decoding at certains LSN until it was sent to enough physical standby.
This feature must not affect normal queries, only logical replication.

Please find attached patch fixing a problem for parallel workers.

Regards,
--
Alexander Kukushkin
Вложения

Re: issue with synchronized_standby_slots

От
Fujii Masao
Дата:
On Mon, Sep 8, 2025 at 6:26 PM Alexander Kukushkin <cyberdemn@gmail.com> wrote:
>
> Hi,
>
>
> On Sun, 7 Sept 2025 at 10:15, Fabrice Chapuis <fabrice636861@gmail.com> wrote:
>>
>> Thanks for your reply Zhijie,
>>
>> I understand that the error  invalid value for parameter will be diplayed in case of bad value for the GUC
synchronized_standby_slotsor if a standby node configured is not up and running. 
>> But the problem I noticed is that statements could not execute normally and error code is returned to the
applcation.
>> This append after an upgrade from PG 14 to PG 17.
>> I could try to reproduce the issue
>
>
>>
>> > STATEMENT:  select service_period,sp1_0.address_line_1 from tbl1  where http://sp1_0.vn=$1 order by
sp1_0.start_of_period
>>>
>>> > 2025-08-24 13:14:29.417 CEST [848477]: [1-1] user=,db=,client=,application= ERROR:  invalid value for parameter
"synchronized_standby_slots":"node1,node2" 
>>> > 2025-08-24 13:14:29.417 CEST [848477]: [2-1] user=,db=,client=,application= DETAIL:  replication slot "s029054a"
doesnot exist 
>>> > 2025-08-24 13:14:29.417 CEST [848477]: [3-1] user=,db=,client=,application= CONTEXT:  while setting parameter
"synchronized_standby_slots"to "node1,node2" 
>>> > 2025-08-24 13:14:29.418 CEST [777453]: [48-1] user=,db=,client=,application= LOG:  background worker "parallel
worker"(PID 848476) exited with exit code 1 
>>> > 2025-08-24 13:14:29.418 CEST [777453]: [49-1] user=,db=,client=,application= LOG:  background worker "parallel
worker"(PID 848477) exited with exit code 1 
>>> >
>>> > Is this issue already observed
>
>
> Recently we also hit this problem.
>
> I think in a current state check_synchronized_standby_slots() and validate_sync_standby_slots() functions are not
veryuseful: 
> - When the hook is executed from postmaster it only checks that synchronized_standby_slots contains a valid list, but
doesn'tcheck that replication slots exists, because MyProc is NULL. It happens both, on start and on reload. 

This looks quite problematic. If a non-existent slot is specified in
synchronized_standby_slots in postgresql.conf and the configuration file
is reloaded, no error is reported. This happens because the postmaster
cannot detect that the slot doesn't exist (since it has no MyProc).
As a result, synchronized_standby_slots in the postmaster is set to
that slot. New backends then inherit this setting from the postmaster,
while already running backends correctly detect that the slot doesn't
exist and fail to apply it.

This leads to an inconsistent state: the reload succeeds with no error,
but some backends apply the new setting while others do not.
That inconsistency seems like an issue.

To fix this, ISTM that the GUC check hook for synchronized_standby_slots
should be revised so it doesn't rely on MyProc or perform slot existence
checks there....

Regards,

--
Fujii Masao



Re: issue with synchronized_standby_slots

От
Amit Kapila
Дата:
On Mon, Sep 8, 2025 at 2:56 PM Alexander Kukushkin <cyberdemn@gmail.com> wrote:
>
> Recently we also hit this problem.
>
> I think in a current state check_synchronized_standby_slots() and validate_sync_standby_slots() functions are not
veryuseful: 
> - When the hook is executed from postmaster it only checks that synchronized_standby_slots contains a valid list, but
doesn'tcheck that replication slots exists, because MyProc is NULL. It happens both, on start and on reload. 
> - When executed from other backends set_config_with_handle() is called with elevel = 0, and therefore elevel becomes
DEBUG3,which results in no useful error/warning messages. 
>
> There are a couple of places where check_synchronized_standby_slots() failure is not ignored:
> 1. alter system set synchronized_standby_slots='invalid value';
> 2. Parallel workers, because RestoreGUCState() calls set_config_option_ext()->set_config_with_handle() with
elevel=ERROR.As a result, parallel workers fail to start with the error. 
>
> With parallel workers it is actually even worse - we get the error even in case of standby:
> 1. start standby with synchronized_standby_slots referring to non-existing slots
> 2. SET parallel_setup_cost, parallel_tuple_cost, and min_parallel_table_scan_size to 0
> 3. Run  select * from pg_namespace; and observe following error:
> ERROR:  invalid value for parameter "synchronized_standby_slots": "a1,b1"
> DETAIL:  replication slot "a1" does not exist
> CONTEXT:  while setting parameter "synchronized_standby_slots" to "a1,b1"
> parallel worker
>

I see the same behaviour for default_table_access_method and
default_tablespace. For example, see failure cases:
postgres=# Alter system set default_table_access_method='missing';
ERROR:  invalid value for parameter "default_table_access_method": "missing"
DETAIL:  Table access method "missing" does not exist.

postgres=# SET parallel_setup_cost=0;
SET
postgres=# SET parallel_tuple_cost=0;
SET
postgres=# Set min_parallel_table_scan_size to 0;
SET
postgres=# select * from pg_namespace;
ERROR:  invalid value for parameter "default_table_access_method": "missing"
DETAIL:  Table access method "missing" does not exist.
CONTEXT:  while setting parameter "default_table_access_method" to "missing"
parallel worker

OTOH, there is no ERROR on reload or restart.

It is fair to argue that invalid GUC values should be ignored in
certain cases like parallel query but we should have the same solution
for other similar parameters as well.

As for the synchronized_standby_slots, we can follow the behavior
similar to check_synchronous_standby_names and just give parsing
ERRORs. Any non-existent slot related errors can be given when that
parameter is later used.

--
With Regards,
Amit Kapila.



RE: issue with synchronized_standby_slots

От
"Zhijie Hou (Fujitsu)"
Дата:
On Tuesday, September 9, 2025 1:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> On Mon, Sep 8, 2025 at 2:56 PM Alexander Kukushkin
> <cyberdemn@gmail.com> wrote:
> >
> > Recently we also hit this problem.
> >
> > I think in a current state check_synchronized_standby_slots() and
> validate_sync_standby_slots() functions are not very useful:
> > - When the hook is executed from postmaster it only checks that
> synchronized_standby_slots contains a valid list, but doesn't check that
> replication slots exists, because MyProc is NULL. It happens both, on start and
> on reload.
> > - When executed from other backends set_config_with_handle() is called
> with elevel = 0, and therefore elevel becomes DEBUG3, which results in no
> useful error/warning messages.
> >
> > There are a couple of places where check_synchronized_standby_slots()
> failure is not ignored:
> > 1. alter system set synchronized_standby_slots='invalid value'; 2.
> > Parallel workers, because RestoreGUCState() calls
> set_config_option_ext()->set_config_with_handle() with elevel=ERROR. As a
> result, parallel workers fail to start with the error.
> >
> > With parallel workers it is actually even worse - we get the error even in case
> of standby:
> > 1. start standby with synchronized_standby_slots referring to
> > non-existing slots 2. SET parallel_setup_cost, parallel_tuple_cost,
> > and min_parallel_table_scan_size to 0 3. Run  select * from pg_namespace;
> and observe following error:
> > ERROR:  invalid value for parameter "synchronized_standby_slots": "a1,b1"
> > DETAIL:  replication slot "a1" does not exist
> > CONTEXT:  while setting parameter "synchronized_standby_slots" to
> "a1,b1"
> > parallel worker
> >
> 
> I see the same behaviour for default_table_access_method and
> default_tablespace. For example, see failure cases:
> postgres=# Alter system set default_table_access_method='missing';
> ERROR:  invalid value for parameter "default_table_access_method":
> "missing"
> DETAIL:  Table access method "missing" does not exist.
> 
> postgres=# SET parallel_setup_cost=0;
> SET
> postgres=# SET parallel_tuple_cost=0;
> SET
> postgres=# Set min_parallel_table_scan_size to 0; SET postgres=# select *
> from pg_namespace;
> ERROR:  invalid value for parameter "default_table_access_method":
> "missing"
> DETAIL:  Table access method "missing" does not exist.
> CONTEXT:  while setting parameter "default_table_access_method" to
> "missing"
> parallel worker
> 
> OTOH, there is no ERROR on reload or restart.
> 
> It is fair to argue that invalid GUC values should be ignored in certain cases like
> parallel query but we should have the same solution for other similar
> parameters as well.
> 
> As for the synchronized_standby_slots, we can follow the behavior similar to
> check_synchronous_standby_names and just give parsing ERRORs. Any
> non-existent slot related errors can be given when that parameter is later used.

I agree. For synchronized_standby_slots, I think it is acceptable to report only
parsing errors, because slots could be dropped even after validating the slot
existence during GUC loading. Additionally, we would report WARNINGs for
non-existent slots during the wait function anyway (e.g., in
StandbySlotsHaveCaughtup()).

Best Regards,
Hou zj

Re: issue with synchronized_standby_slots

От
Fujii Masao
Дата:
On Tue, Sep 9, 2025 at 2:39 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
> I agree. For synchronized_standby_slots, I think it is acceptable to report only
> parsing errors, because slots could be dropped even after validating the slot
> existence during GUC loading. Additionally, we would report WARNINGs for
> non-existent slots during the wait function anyway (e.g., in
> StandbySlotsHaveCaughtup()).

+1, to also address the issue I reported upthread.

As I understand it, only the walsender and the slot sync worker actually
use synchronized_standby_slots. But in the current code, all processes check
for slot existence via the GUC check hook, which wastes cycles. The proposed
change would eliminate that overhead.

Regards,

--
Fujii Masao



Re: issue with synchronized_standby_slots

От
Rahila Syed
Дата:
Hi,


 As for the synchronized_standby_slots, we can follow the behavior
similar to check_synchronous_standby_names and just give parsing
ERRORs. Any non-existent slot related errors can be given when that
parameter is later used.

--

Please find attached a patch that implements this. I will work on adding a test for it.  

Thank you,
Rahila Syed
Вложения

Re: issue with synchronized_standby_slots

От
Ashutosh Sharma
Дата:
On Wed, Sep 10, 2025 at 3:33 PM Rahila Syed <rahilasyed90@gmail.com> wrote:
>
> Hi,
>
>
>>  As for the synchronized_standby_slots, we can follow the behavior
>> similar to check_synchronous_standby_names and just give parsing
>> ERRORs. Any non-existent slot related errors can be given when that
>> parameter is later used.
>>
>> --
>
>
> Please find attached a patch that implements this. I will work on adding a test for it.
>

I suggest removing validate_sync_standby_slots() entirely since there
isn’t much left in it.

The below logic currently in validate_sync_standby_slots() to parse
the list of slots can be directly moved into
check_synchronized_standby_slots().

/* Verify syntax and parse string into a list of identifiers */
ok = SplitIdentifierString(rawname, ',', elemlist);

if (!ok)
GUC_check_errdetail("List syntax is invalid.");

--
With Regards,
Ashutosh Sharma.



Re: issue with synchronized_standby_slots

От
Shlok Kyal
Дата:
On Wed, 10 Sept 2025 at 15:33, Rahila Syed <rahilasyed90@gmail.com> wrote:
>
> Hi,
>
>
>>  As for the synchronized_standby_slots, we can follow the behavior
>> similar to check_synchronous_standby_names and just give parsing
>> ERRORs. Any non-existent slot related errors can be given when that
>> parameter is later used.
>>
>> --
>
>
> Please find attached a patch that implements this. I will work on adding a test for it.
>
Hi Rahila,

I think we should also add a parsing check for slot names specified in
the GUC synchronize_standby_slots as suggested by Amit in [1].
I think we should also update the comment message in function
StandbySlotsHaveCaughtup
       /*
         * If a slot name provided in synchronized_standby_slots does not
         * exist, report a message and exit the loop.
         *
         * Though validate_sync_standby_slots (the GUC check_hook) tries to
         * avoid this, it can nonetheless happen because the user can specify
         * a nonexistent slot name before server startup. That function cannot
         * validate such a slot during startup, as ReplicationSlotCtl is not
         * initialized by then.  Also, the user might have dropped one slot.
         */
I made the changes in the above for the same and attached the updated patch.

[1]: https://www.postgresql.org/message-id/CAA4eK1Jprqfs_URBmZ5%3DOOL98D05rPiGup1sscWgcCGcWU%3D9iA%40mail.gmail.com

Thanks,
Shlok Kyal

Вложения

Re: issue with synchronized_standby_slots

От
Alexander Kukushkin
Дата:
Hi,

On Wed, 10 Sept 2025 at 13:34, Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
I think we should also add a parsing check for slot names specified in
the GUC synchronize_standby_slots as suggested by Amit in [1].
I made the changes in the above for the same and attached the updated patch.

I agree, validating that list contains valid replication slot names is a good idea.
However, you used ReplicationSlotValidateName() function, which is not a good fit for it, especially when it is called with elevel=ERROR in postmaster. 

--
Regards,
--
Alexander Kukushkin

Re: issue with synchronized_standby_slots

От
Amit Kapila
Дата:
On Wed, Sep 10, 2025 at 5:23 PM Alexander Kukushkin <cyberdemn@gmail.com> wrote:
>
> On Wed, 10 Sept 2025 at 13:34, Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
>>
>> I think we should also add a parsing check for slot names specified in
>> the GUC synchronize_standby_slots as suggested by Amit in [1].
>> I made the changes in the above for the same and attached the updated patch.
>
>
> I agree, validating that list contains valid replication slot names is a good idea.
> However, you used ReplicationSlotValidateName() function, which is not a good fit for it, especially when it is
calledwith elevel=ERROR in postmaster. 
>

Can you please explain why you think so? And what is your proposal for the same?

BTW, we should also try to conclude on my yesterday's point as to why
it is okay to have the same behavior for default_tablespace and
default_table_access_method and not for this parameter? I am asking
because if we change the current behavior, tomorrow, we can get
complaints that one expects the old behaviour as that was similar to
other GUCs like default_tablespace and default_table_access_method.

--
With Regards,
Amit Kapila.



Re: issue with synchronized_standby_slots

От
Amit Kapila
Дата:
On Thu, Sep 11, 2025 at 9:02 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Sep 10, 2025 at 5:23 PM Alexander Kukushkin <cyberdemn@gmail.com> wrote:
> >
> > On Wed, 10 Sept 2025 at 13:34, Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
> >>
> >> I think we should also add a parsing check for slot names specified in
> >> the GUC synchronize_standby_slots as suggested by Amit in [1].
> >> I made the changes in the above for the same and attached the updated patch.
> >
> >
> > I agree, validating that list contains valid replication slot names is a good idea.
> > However, you used ReplicationSlotValidateName() function, which is not a good fit for it, especially when it is
calledwith elevel=ERROR in postmaster. 
> >
>
> Can you please explain why you think so? And what is your proposal for the same?
>

You are right and I think we should use WARNING here as is used in
check_primary_slot_name() for the same function call. For ERROR
reporting, we need to use GUC_check_* functions. Also, probably the
ERROR during startup could lead to shutdown.

--
With Regards,
Amit Kapila.



Re: issue with synchronized_standby_slots

От
Shlok Kyal
Дата:
On Thu, 11 Sept 2025 at 09:20, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Sep 11, 2025 at 9:02 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Sep 10, 2025 at 5:23 PM Alexander Kukushkin <cyberdemn@gmail.com> wrote:
> > >
> > > On Wed, 10 Sept 2025 at 13:34, Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
> > >>
> > >> I think we should also add a parsing check for slot names specified in
> > >> the GUC synchronize_standby_slots as suggested by Amit in [1].
> > >> I made the changes in the above for the same and attached the updated patch.
> > >
> > >
> > > I agree, validating that list contains valid replication slot names is a good idea.
> > > However, you used ReplicationSlotValidateName() function, which is not a good fit for it, especially when it is
calledwith elevel=ERROR in postmaster. 
> > >
> >
> > Can you please explain why you think so? And what is your proposal for the same?
> >
>
> You are right and I think we should use WARNING here as is used in
> check_primary_slot_name() for the same function call. For ERROR
> reporting, we need to use GUC_check_* functions. Also, probably the
> ERROR during startup could lead to shutdown.
>
I tested by setting elevel=ERROR and elevel=WARNING in the function
ReplicationSlotValidateName.

For elevel=ERROR,
After hitting ereport inside function ReplicationSlotValidateName, the
PG_CATCH() in 'call_string_check_hook' and process is terminated.
Server logs are
2025-09-11 10:01:17.909 IST [1995206] FATAL:  replication slot name
"myslot1*" contains invalid character
2025-09-11 10:01:17.909 IST [1995206] HINT:  Replication slot names
may only contain lower case letters, numbers, and the underscore
character.

For level=WARNING,
Even after hitting the ereport, it is continuing with the normal flow of code.
Server logs are:
2025-09-11 10:27:30.195 IST [2013341] WARNING:  replication slot name
"myslot1*" contains invalid character
2025-09-11 10:27:30.195 IST [2013341] HINT:  Replication slot names
may only contain lower case letters, numbers, and the underscore
character.
2025-09-11 10:28:13.863 IST [2013341] LOG:  invalid value for
parameter "synchronized_standby_slots": "myslot1*"
2025-09-11 10:28:13.863 IST [2013341] FATAL:  configuration file
"/home/ubuntu/Project/inst/pg_11_9_tmp_4/bin/primary/postgresql.conf"
contains errors

I think we can use ReplicationSlotValidateName with elevel=WARNING here.
I have attached an updated patch with this change.

Thanks,
Shlok Kyal

Вложения

Re: issue with synchronized_standby_slots

От
Ashutosh Sharma
Дата:

On Thu, Sep 11, 2025 at 11:00 AM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
>
> On Thu, 11 Sept 2025 at 09:20, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Sep 11, 2025 at 9:02 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Wed, Sep 10, 2025 at 5:23 PM Alexander Kukushkin <cyberdemn@gmail.com> wrote:
> > > >
> > > > On Wed, 10 Sept 2025 at 13:34, Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
> > > >>
> > > >> I think we should also add a parsing check for slot names specified in
> > > >> the GUC synchronize_standby_slots as suggested by Amit in [1].
> > > >> I made the changes in the above for the same and attached the updated patch.
> > > >
> > > >
> > > > I agree, validating that list contains valid replication slot names is a good idea.
> > > > However, you used ReplicationSlotValidateName() function, which is not a good fit for it, especially when it is called with elevel=ERROR in postmaster.
> > > >
> > >
> > > Can you please explain why you think so? And what is your proposal for the same?
> > >
> >
> > You are right and I think we should use WARNING here as is used in
> > check_primary_slot_name() for the same function call. For ERROR
> > reporting, we need to use GUC_check_* functions. Also, probably the
> > ERROR during startup could lead to shutdown.
> >
> I tested by setting elevel=ERROR and elevel=WARNING in the function
> ReplicationSlotValidateName.
>
> For elevel=ERROR,
> After hitting ereport inside function ReplicationSlotValidateName, the
> PG_CATCH() in 'call_string_check_hook' and process is terminated.
> Server logs are
> 2025-09-11 10:01:17.909 IST [1995206] FATAL:  replication slot name
> "myslot1*" contains invalid character
> 2025-09-11 10:01:17.909 IST [1995206] HINT:  Replication slot names
> may only contain lower case letters, numbers, and the underscore
> character.
>
> For level=WARNING,
> Even after hitting the ereport, it is continuing with the normal flow of code.
> Server logs are:
> 2025-09-11 10:27:30.195 IST [2013341] WARNING:  replication slot name
> "myslot1*" contains invalid character
> 2025-09-11 10:27:30.195 IST [2013341] HINT:  Replication slot names
> may only contain lower case letters, numbers, and the underscore
> character.
> 2025-09-11 10:28:13.863 IST [2013341] LOG:  invalid value for
> parameter "synchronized_standby_slots": "myslot1*"
> 2025-09-11 10:28:13.863 IST [2013341] FATAL:  configuration file
> "/home/ubuntu/Project/inst/pg_11_9_tmp_4/bin/primary/postgresql.conf"
> contains errors
>
> I think we can use ReplicationSlotValidateName with elevel=WARNING here.
> I have attached an updated patch with this change.
>

I would suggest getting rid of the "ok" flag, it’s probably not needed. I’d rather rewrite validate_sync_standby_slots() like this:

static bool
validate_sync_standby_slots(char *rawname, List **elemlist)
{
    /* Verify syntax and parse string into a list of identifiers */
    if (!SplitIdentifierString(rawname, ',', elemlist))
        GUC_check_errdetail("List syntax is invalid.");
    else
    {
        foreach_ptr(char, name, *elemlist)
        {
            if (!ReplicationSlotValidateName(name, false, WARNING))
            return false;
        }
    }

    return true;
}

--
With Regards,
Ashutosh Sharma.

Re: issue with synchronized_standby_slots

От
Ashutosh Sharma
Дата:

On Thu, Sep 11, 2025 at 11:07 AM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:

On Thu, Sep 11, 2025 at 11:00 AM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
>
> On Thu, 11 Sept 2025 at 09:20, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Sep 11, 2025 at 9:02 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Wed, Sep 10, 2025 at 5:23 PM Alexander Kukushkin <cyberdemn@gmail.com> wrote:
> > > >
> > > > On Wed, 10 Sept 2025 at 13:34, Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
> > > >>
> > > >> I think we should also add a parsing check for slot names specified in
> > > >> the GUC synchronize_standby_slots as suggested by Amit in [1].
> > > >> I made the changes in the above for the same and attached the updated patch.
> > > >
> > > >
> > > > I agree, validating that list contains valid replication slot names is a good idea.
> > > > However, you used ReplicationSlotValidateName() function, which is not a good fit for it, especially when it is called with elevel=ERROR in postmaster.
> > > >
> > >
> > > Can you please explain why you think so? And what is your proposal for the same?
> > >
> >
> > You are right and I think we should use WARNING here as is used in
> > check_primary_slot_name() for the same function call. For ERROR
> > reporting, we need to use GUC_check_* functions. Also, probably the
> > ERROR during startup could lead to shutdown.
> >
> I tested by setting elevel=ERROR and elevel=WARNING in the function
> ReplicationSlotValidateName.
>
> For elevel=ERROR,
> After hitting ereport inside function ReplicationSlotValidateName, the
> PG_CATCH() in 'call_string_check_hook' and process is terminated.
> Server logs are
> 2025-09-11 10:01:17.909 IST [1995206] FATAL:  replication slot name
> "myslot1*" contains invalid character
> 2025-09-11 10:01:17.909 IST [1995206] HINT:  Replication slot names
> may only contain lower case letters, numbers, and the underscore
> character.
>
> For level=WARNING,
> Even after hitting the ereport, it is continuing with the normal flow of code.
> Server logs are:
> 2025-09-11 10:27:30.195 IST [2013341] WARNING:  replication slot name
> "myslot1*" contains invalid character
> 2025-09-11 10:27:30.195 IST [2013341] HINT:  Replication slot names
> may only contain lower case letters, numbers, and the underscore
> character.
> 2025-09-11 10:28:13.863 IST [2013341] LOG:  invalid value for
> parameter "synchronized_standby_slots": "myslot1*"
> 2025-09-11 10:28:13.863 IST [2013341] FATAL:  configuration file
> "/home/ubuntu/Project/inst/pg_11_9_tmp_4/bin/primary/postgresql.conf"
> contains errors
>
> I think we can use ReplicationSlotValidateName with elevel=WARNING here.
> I have attached an updated patch with this change.
>

I would suggest getting rid of the "ok" flag, it’s probably not needed. I’d rather rewrite validate_sync_standby_slots() like this:

static bool
validate_sync_standby_slots(char *rawname, List **elemlist)
{
    /* Verify syntax and parse string into a list of identifiers */
    if (!SplitIdentifierString(rawname, ',', elemlist))
        GUC_check_errdetail("List syntax is invalid.");
    else
    {
        foreach_ptr(char, name, *elemlist)
        {
            if (!ReplicationSlotValidateName(name, false, WARNING))
            return false;
        }
    }

    return true;
}

 
Apart from this change, if you agree, then you may also add a test-case that sets synchronized_standby_slots to some reserved name like "pg_conflict_detection" to verify for the WARNING message.

--
With Regards,
Ashutosh Sharma.

Re: issue with synchronized_standby_slots

От
Rahila Syed
Дата:
Hi,

 

BTW, we should also try to conclude on my yesterday's point as to why
it is okay to have the same behavior for default_tablespace and
default_table_access_method and not for this parameter? I am asking
because if we change the current behavior, tomorrow, we can get
complaints that one expects the old behaviour as that was similar to
other GUCs like default_tablespace and default_table_access_method.


Fair point. I haven't examined the validation of GUCs in parallel workers closely,
but one argument for preventing parallel workers from failing due to an incorrect
value of synchronized_standby_slots is that a select query works in this situation
without parallel workers.

Whereas, for incorrect values of default_tablespace and default_table_access_method,
most commands would fail regardless of whether parallel workers are enabled.

PFA a test for the original bug report on this thread.  This applies on the v3 version of the patch
that was shared.

Thank you,
Rahila Syed

Вложения

Re: issue with synchronized_standby_slots

От
Shlok Kyal
Дата:
On Thu, 11 Sept 2025 at 12:00, Rahila Syed <rahilasyed90@gmail.com> wrote:
>
> Hi,
>
>
>>
>>
>> BTW, we should also try to conclude on my yesterday's point as to why
>> it is okay to have the same behavior for default_tablespace and
>> default_table_access_method and not for this parameter? I am asking
>> because if we change the current behavior, tomorrow, we can get
>> complaints that one expects the old behaviour as that was similar to
>> other GUCs like default_tablespace and default_table_access_method.
>>
>
> Fair point. I haven't examined the validation of GUCs in parallel workers closely,
> but one argument for preventing parallel workers from failing due to an incorrect
> value of synchronized_standby_slots is that a select query works in this situation
> without parallel workers.
>
> Whereas, for incorrect values of default_tablespace and default_table_access_method,
> most commands would fail regardless of whether parallel workers are enabled.
>
> PFA a test for the original bug report on this thread.  This applies on the v3 version of the patch
> that was shared.
>
Thanks for sharing the patch. I checked the test and it looks good to
me. But I am not sure if we should have a new file for the test. I
have added the test in the '040_standby_failover_slots_sync.pl' file
along with some other tests.
Also I have addressed the comments by Ashutosh in [1][2].

I have attached the updated v4 patch

[1]: https://www.postgresql.org/message-id/CAE9k0P%3Dx3J3nmSmYKmTkiFXTDKLxJkXFO4%2BVHJyNu01Od6CZfg%40mail.gmail.com
[2]: https://www.postgresql.org/message-id/CAE9k0P%3DOFMFCRy9aDGWZ3bt91tbB1WnzsAbzXN72iWBaGVuMrw%40mail.gmail.com

Thanks,
Shlok Kyal

Вложения

Re: issue with synchronized_standby_slots

От
Amit Kapila
Дата:
On Fri, Sep 12, 2025 at 2:34 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
>
> I have attached the updated v4 patch
>

+# Cannot be set synchronized_standby_slots to a reserved slot name
+($result, $stdout, $stderr) = $primary->psql('postgres',
+ "ALTER SYSTEM SET synchronized_standby_slots='pg_conflict_detection'");
+ok( $stderr =~
+   m/WARNING:  replication slot name "pg_conflict_detection" is reserved/,
+ "Cannot use a reserverd slot name");
+
+# Cannot be set synchronized_standby_slots to slot name with invalid characters
+($result, $stdout, $stderr) = $primary->psql('postgres',
+ "ALTER SYSTEM SET synchronized_standby_slots='invalid*'");
+ok( $stderr =~
+   m/WARNING:  replication slot name "invalid\*" contains invalid character/,
+ "Cannot use a invalid slot name");

These tests can be present in some sql file. I think you have kept
these in the .pl file to keep it along with other tests but I think
these are better suited for some .sql file.

--
With Regards,
Amit Kapila.



Re: issue with synchronized_standby_slots

От
Shlok Kyal
Дата:
On Tue, 23 Sept 2025 at 09:55, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Sep 12, 2025 at 2:34 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
> >
> > I have attached the updated v4 patch
> >
>
> +# Cannot be set synchronized_standby_slots to a reserved slot name
> +($result, $stdout, $stderr) = $primary->psql('postgres',
> + "ALTER SYSTEM SET synchronized_standby_slots='pg_conflict_detection'");
> +ok( $stderr =~
> +   m/WARNING:  replication slot name "pg_conflict_detection" is reserved/,
> + "Cannot use a reserverd slot name");
> +
> +# Cannot be set synchronized_standby_slots to slot name with invalid characters
> +($result, $stdout, $stderr) = $primary->psql('postgres',
> + "ALTER SYSTEM SET synchronized_standby_slots='invalid*'");
> +ok( $stderr =~
> +   m/WARNING:  replication slot name "invalid\*" contains invalid character/,
> + "Cannot use a invalid slot name");
>
> These tests can be present in some sql file. I think you have kept
> these in the .pl file to keep it along with other tests but I think
> these are better suited for some .sql file.
>
Thanks for reviewing the patch.
I have moved the tests to the guc.sql file. I have attached the updated patch.

Thanks,
Shlok Kyal

Вложения

Re: issue with synchronized_standby_slots

От
Ashutosh Sharma
Дата:
Hi Amit,

On Tue, Sep 23, 2025 at 1:00 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
>
> On Tue, 23 Sept 2025 at 09:55, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Fri, Sep 12, 2025 at 2:34 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
> > >
> > > I have attached the updated v4 patch
> > >
> >
> > +# Cannot be set synchronized_standby_slots to a reserved slot name
> > +($result, $stdout, $stderr) = $primary->psql('postgres',
> > + "ALTER SYSTEM SET synchronized_standby_slots='pg_conflict_detection'");
> > +ok( $stderr =~
> > +   m/WARNING:  replication slot name "pg_conflict_detection" is reserved/,
> > + "Cannot use a reserverd slot name");
> > +
> > +# Cannot be set synchronized_standby_slots to slot name with invalid characters
> > +($result, $stdout, $stderr) = $primary->psql('postgres',
> > + "ALTER SYSTEM SET synchronized_standby_slots='invalid*'");
> > +ok( $stderr =~
> > +   m/WARNING:  replication slot name "invalid\*" contains invalid character/,
> > + "Cannot use a invalid slot name");
> >
> > These tests can be present in some sql file. I think you have kept
> > these in the .pl file to keep it along with other tests but I think
> > these are better suited for some .sql file.
> >
> Thanks for reviewing the patch.
> I have moved the tests to the guc.sql file. I have attached the updated patch.
>

Are we planning to wait for [1] to go in first, since this also
depends on ReplicationSlotValidateName?

[1] - https://www.postgresql.org/message-id/flat/CAHGQGwFud-cvthCTfusBfKHBS6Jj6kdAPTdLWKvP2qjUX6L_wA%40mail.gmail.com

--
With Regards,
Ashutosh Sharma.



Re: issue with synchronized_standby_slots

От
Amit Kapila
Дата:
On Wed, Sep 24, 2025 at 12:14 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>
> Hi Amit,
>
> On Tue, Sep 23, 2025 at 1:00 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
> >
> > On Tue, 23 Sept 2025 at 09:55, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Fri, Sep 12, 2025 at 2:34 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
> > > >
> > > > I have attached the updated v4 patch
> > > >
> > >
> > > +# Cannot be set synchronized_standby_slots to a reserved slot name
> > > +($result, $stdout, $stderr) = $primary->psql('postgres',
> > > + "ALTER SYSTEM SET synchronized_standby_slots='pg_conflict_detection'");
> > > +ok( $stderr =~
> > > +   m/WARNING:  replication slot name "pg_conflict_detection" is reserved/,
> > > + "Cannot use a reserverd slot name");
> > > +
> > > +# Cannot be set synchronized_standby_slots to slot name with invalid characters
> > > +($result, $stdout, $stderr) = $primary->psql('postgres',
> > > + "ALTER SYSTEM SET synchronized_standby_slots='invalid*'");
> > > +ok( $stderr =~
> > > +   m/WARNING:  replication slot name "invalid\*" contains invalid character/,
> > > + "Cannot use a invalid slot name");
> > >
> > > These tests can be present in some sql file. I think you have kept
> > > these in the .pl file to keep it along with other tests but I think
> > > these are better suited for some .sql file.
> > >
> > Thanks for reviewing the patch.
> > I have moved the tests to the guc.sql file. I have attached the updated patch.
> >
>
> Are we planning to wait for [1] to go in first, since this also
> depends on ReplicationSlotValidateName?
>

I think we can go either way. It somewhat depends on whether we want
to backpatch this change. The argument for having this as a HEAD-only
patch is that, we have a similar behavior for other variables like
default_table_access_method and default_tablespace in HEAD and
back-branches. We want to change to a better behavior for this GUC, so
better to keep this as a HEAD-only patch. Do you or others have
thoughts on this matter?

If we decide to do this as a HEAD-only patch, then I think we can push
this without waiting for the other patch to commit as we have ample
time to get that done and we already have a solution as well for it.
OTOH, if we want to backpatch this then I would prefer to wait for the
other patch to be committed first.

--
With Regards,
Amit Kapila.



Re: issue with synchronized_standby_slots

От
Ashutosh Sharma
Дата:
Hi,

On Wed, Sep 24, 2025 at 3:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Sep 24, 2025 at 12:14 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
> >
> > Hi Amit,
> >
> > On Tue, Sep 23, 2025 at 1:00 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
> > >
> > > On Tue, 23 Sept 2025 at 09:55, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > On Fri, Sep 12, 2025 at 2:34 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
> > > > >
> > > > > I have attached the updated v4 patch
> > > > >
> > > >
> > > > +# Cannot be set synchronized_standby_slots to a reserved slot name
> > > > +($result, $stdout, $stderr) = $primary->psql('postgres',
> > > > + "ALTER SYSTEM SET synchronized_standby_slots='pg_conflict_detection'");
> > > > +ok( $stderr =~
> > > > +   m/WARNING:  replication slot name "pg_conflict_detection" is reserved/,
> > > > + "Cannot use a reserverd slot name");
> > > > +
> > > > +# Cannot be set synchronized_standby_slots to slot name with invalid characters
> > > > +($result, $stdout, $stderr) = $primary->psql('postgres',
> > > > + "ALTER SYSTEM SET synchronized_standby_slots='invalid*'");
> > > > +ok( $stderr =~
> > > > +   m/WARNING:  replication slot name "invalid\*" contains invalid character/,
> > > > + "Cannot use a invalid slot name");
> > > >
> > > > These tests can be present in some sql file. I think you have kept
> > > > these in the .pl file to keep it along with other tests but I think
> > > > these are better suited for some .sql file.
> > > >
> > > Thanks for reviewing the patch.
> > > I have moved the tests to the guc.sql file. I have attached the updated patch.
> > >
> >
> > Are we planning to wait for [1] to go in first, since this also
> > depends on ReplicationSlotValidateName?
> >
>
> I think we can go either way. It somewhat depends on whether we want
> to backpatch this change. The argument for having this as a HEAD-only
> patch is that, we have a similar behavior for other variables like
> default_table_access_method and default_tablespace in HEAD and
> back-branches. We want to change to a better behavior for this GUC, so
> better to keep this as a HEAD-only patch. Do you or others have
> thoughts on this matter?
>
> If we decide to do this as a HEAD-only patch, then I think we can push
> this without waiting for the other patch to commit as we have ample
> time to get that done and we already have a solution as well for it.
> OTOH, if we want to backpatch this then I would prefer to wait for the
> other patch to be committed first.
>

Although it's a behaviour change at GUC level, to me it doesn't look
like something that would break the user applications, so I'm inclined
toward backpatching it, but I'd definitely want to hear if others see
potential compatibility risks that I might be missing.

--
With Regards,
Ashutosh Sharma.