Re: Synchronizing slots from primary to standby

Поиск
Список
Период
Сортировка
От shveta malik
Тема Re: Synchronizing slots from primary to standby
Дата
Msg-id CAJpy0uD0t7MA=3L1v+bhANE2BpSRiwaJpJ4fA_RUQWif7RAYNQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Synchronizing slots from primary to standby  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: Synchronizing slots from primary to standby  (shveta malik <shveta.malik@gmail.com>)
Re: Synchronizing slots from primary to standby  ("Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com>)
Список pgsql-hackers
On Wed, Oct 4, 2023 at 5:36 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Oct 3, 2023 at 9:27 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Tue, Oct 3, 2023 at 7:56 PM Drouvot, Bertrand
> > <bertranddrouvot.pg@gmail.com> wrote:
> > >
> > > Hi,
> > >
> > > On 10/3/23 12:54 PM, Amit Kapila wrote:
> > > > On Mon, Oct 2, 2023 at 11:39 AM Drouvot, Bertrand
> > > > <bertranddrouvot.pg@gmail.com> wrote:
> > > >>
> > > >> On 9/29/23 1:33 PM, Amit Kapila wrote:
> > > >>> On Thu, Sep 28, 2023 at 6:31 PM Drouvot, Bertrand
> > > >>> <bertranddrouvot.pg@gmail.com> wrote:
> > > >>>>
> > > >>>
> > > >>>> - probably open corner cases like: what if a standby is down? would that mean
> > > >>>> that synchronize_slot_names not being send to the primary would allow the decoding
> > > >>>> on the primary to go ahead?
> > > >>>>
> > > >>>
> > > >>> Good question. BTW, irrespective of whether we have
> > > >>> 'standby_slot_names' parameters or not, how should we behave if
> > > >>> standby is down? Say, if 'synchronize_slot_names' is only specified on
> > > >>> standby then in such a situation primary won't be even aware that some
> > > >>> of the logical walsenders need to wait.
> > > >>
> > > >> Exactly, that's why I was thinking keeping standby_slot_names to address
> > > >> this scenario. In such a case one could simply decide to keep or remove
> > > >> the associated physical replication slot from standby_slot_names. Keep would
> > > >> mean "wait" and removing would mean allow to decode on the primary.
> > > >>
> > > >>> OTOH, one can say that users
> > > >>> should configure 'synchronize_slot_names' on both primary and standby
> > > >>> but note that this value could be different for different standby's,
> > > >>> so we can't configure it on primary.
> > > >>>
> > > >>
> > > >> Yeah, I think that's a good use case for standby_slot_names, what do you think?
> > > >>
> > > >
> > > > But, even if we keep 'standby_slot_names' for this purpose, the
> > > > primary doesn't know the value of 'synchronize_slot_names' once the
> > > > standby is down and or the primary is restarted. So, how will we know
> > > > which logical WAL senders needs to wait for 'standby_slot_names'?
> > > >
> > >
> > > Yeah right, I also think we'd need:
> > >
> > > - synchronize_slot_names on both primary and standby
> > >
> > > But now we would need to take care of different standby having different values (
> > > as you said up-thread)....
> > >
> > > Thinking out loud: What about a single GUC on the primary (not standby_slot_names nor
> > > synchronize_slot_names) but say logical_slots_wait_for_standby that could be a list of say
> > > "logical_slot_name:physical_slot".
> > >
> > > I think this GUC would help us define each walsender behavior (should the standby(s)
> > > be up or down):
> > >
> >
> > It may help in defining the walsender's behaviour better for sure. But
> > the problem I see once we start defining sync-slot-names on primary
> > (in any form whether as independent GUC or as above mapping GUC) is
> > that it needs to be then in sync with standbys, as each standby for
> > sure needs to maintain its own sync-slot-names GUC to make it aware of
> > what all it needs to sync.
>
> Yes, I also think so. Also, defining such a GUC where user wants to
> sync all the slots which would normally be the case would be a night
> mare for the users.
>
> >
> > This brings us to the original question of
> > how do we actually keep these configurations in sync between primary
> > and standby if we plan to maintain it on both?
> >
> >
> > > - don't wait if its associated logical_slot is not listed in this GUC
> > > - or wait based on its associated "list" of mapped physical slots (would probably
> > > have to deal with the min restart_lsn for all the corresponding mapped ones).
> > >
> > > I don't think we can avoid having to define at least one GUC on the primary (at least to
> > > handle the case of standby(s) being down).
> > >
>
> How about an alternate scheme where we define sync_slot_names on
> standby but then store the physical_slot_name in the corresponding
> logical slot (ReplicationSlotPersistentData) to be synced? So, the
> standby will send the list of 'sync_slot_names' and the primary will
> add the physical standby's slot_name in each of the corresponding
> sync_slot. Now, if we do this then even after restart, we should be
> able to know for which physical slot each logical slot needs to wait.
> We can even provide an SQL API to reset the value of
> standby_slot_names in logical slots as a way to unblock decoding in
> case of emergency (for example, corresponding when physical standby
> never comes up).
>


Looks like a better approach to me. It solves most of the pain points like:
1) Avoids the need of multiple GUCs
2) Primary and standby need not to worry to be in sync if we maintain
sync-slot-names GUC on both
3) User still gets the flexibility to remove a standby from wait-lost
of primary's logical-walsenders' using reset SQL API.

Now some initial thoughts:
1) Since each logical slot could be needed to be synched by multiple
physical-standbys, so in ReplicationSlotPersistentData, we need to
hold a list of standby's name. So this brings us to question as in how
much shall we allocate initially in shared-memory? Shall it be for
max_replication_slots (worst case scenario) in each
ReplicationSlotPersistentData to hold physical-standby names?

2) If standby sends '*', then we need to update each logical-slot with
that standby-name. Or do we have better way to deal with '*'? Need to
think more on this.

JFYI, on the similar line, currently in ReplicationSlotPersistentData,
we are maintaining a flag for slot-sync feature which is:

        bool            synced; /* Is this a slot created by a
sync-slot worker? */

This flag currently holds significance only on physical-standby. This
has been added to distinguish between a slot created by user for
logical decoding purpose and the ones being synced from primary. It is
needed when we have to choose obsolete slots (synced ones) to drop on
standby or block get_changes on standby for synced slots. It can be
reused on primary for above approach if needed.

thanks
Shveta



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: Making aggregate deserialization (and WAL receive) functions slightly faster
Следующее
От: shveta malik
Дата:
Сообщение: Re: Synchronizing slots from primary to standby