Re: Synchronizing slots from primary to standby

Поиск
Список
Период
Сортировка
От Drouvot, Bertrand
Тема Re: Synchronizing slots from primary to standby
Дата
Msg-id da2d3264-7049-48b1-914a-9c8631c8e384@gmail.com
обсуждение исходный текст
Ответ на Re: Synchronizing slots from primary to standby  (Ajin Cherian <itsajin@gmail.com>)
Ответы Re: Synchronizing slots from primary to standby  (shveta malik <shveta.malik@gmail.com>)
Re: Synchronizing slots from primary to standby  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
Hi,

On 10/24/23 7:44 AM, Ajin Cherian wrote:
> On Mon, Oct 23, 2023 at 11:22 PM Drouvot, Bertrand
> <bertranddrouvot.pg@gmail.com> wrote:
>>
>> @@ -602,6 +602,9 @@ CreateDecodingContext(XLogRecPtr start_lsn,
>>           SnapBuildSetTwoPhaseAt(ctx->snapshot_builder, start_lsn);
>>       }
>>
>> +   /* set failover in the slot, as requested */
>> +   slot->data.failover = ctx->failover;
>> +
>>
>> I think we can get rid of this change in CreateDecodingContext().
>>
> Yes, I too noticed this in my testing, however just removing this from
> CreateDecodingContext will not allow us to change the slot's failover flag
> using Alter subscription.

Oh right.

> I am thinking of moving this change to
> StartLogicalReplication prior to calling CreateDecodingContext by
> parsing the command options in StartReplicationCmd
> without adding it to the LogicalDecodingContext.
> 

Yeah, that looks like a good place to update "failover".

Doing more testing and I have a couple of remarks about he current behavior.

1) Let's imagine that:

- there is no standby
- standby_slot_names is set to a valid slot on the primary (but due to the above, not linked to any standby)
- then a create subscription on a subscriber WITH (failover = true) would start the
synchronisation but never finish (means leaving a "synchronisation" slot like
"pg_32811_sync_24576_7293415241672430356"
in place coming from ReplicationSlotNameForTablesync()).

That's expected, but maybe we should emit a warning in WalSndWaitForStandbyConfirmation() on the primary when there is
a slot part of standby_slot_names which is not active/does not have an active_pid attached to it?

2) When we create a subscription, another slot is created during the subscription synchronization, namely
like "pg_16397_sync_16388_7293447291374081805" (coming from ReplicationSlotNameForTablesync()).

This extra slot appears to have failover also set to true.

So, If the standby refresh the list of slot to sync when the subscription is still synchronizing we'd see things like
on the standby:

LOG:  waiting for remote slot "mysub" LSN (0/C0034808) and catalog xmin (756) to pass local slot LSN (0/C0034840) and
andcatalog xmin (756)
 
LOG:  wait over for remote slot "mysub" as its LSN (0/C00368B0) and catalog xmin (756) has now passed local slot LSN
(0/C0034840)and catalog xmin (756)
 
LOG:  waiting for remote slot "pg_16397_sync_16388_7293447291374081805" LSN (0/C0034808) and catalog xmin (756) to pass
localslot LSN (0/C00368E8) and and catalog xmin (756)
 
WARNING:  slot "pg_16397_sync_16388_7293447291374081805" disappeared from the primary, aborting slot creation

I'm not sure this "pg_16397_sync_16388_7293447291374081805" should have failover set to true. If there is a failover
during the subscription creation, better to re-launch the subscription instead?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alena Rybakina
Дата:
Сообщение: Re: Simplify create_merge_append_path a bit for clarity
Следующее
От: Dean Rasheed
Дата:
Сообщение: Re: Bug: RLS policy FOR SELECT is used to check new rows