Re: Synchronizing slots from primary to standby

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: Synchronizing slots from primary to standby
Дата
Msg-id CAA4eK1+AumKenLjtVW2y4CpxBr_bo_AVZ67RWdDeJFt+Kgrj0A@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Synchronizing slots from primary to standby  (shveta malik <shveta.malik@gmail.com>)
Ответы Re: Synchronizing slots from primary to standby  ("Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com>)
Список pgsql-hackers
On Tue, Dec 5, 2023 at 10:38 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Mon, Dec 4, 2023 at 10:07 PM Drouvot, Bertrand
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > >
> > >> ~~~
> > >> 4. primary_slot_name GUC value test:
> > >>
> > >> When standby is started with a non-existing primary_slot_name, the
> > >> wal-receiver gives an error but the slot-sync worker does not raise
> > >> any error/warning. It is no-op though as it has a check 'if
> > >> (XLogRecPtrIsInvalid(WalRcv->latestWalEnd)) do nothing'.   Is this
> > >> okay or shall the slot-sync worker too raise an error and exit?
> > >>
> > >> In another case, when standby is started with valid primary_slot_name,
> > >> but it is changed to some invalid value in runtime, then walreceiver
> > >> starts giving error but the slot-sync worker keeps on running. In this
> > >> case, unlike the previous case, it even did not go to no-op mode (as
> > >> it sees valid WalRcv->latestWalEnd from the earlier run) and keep
> > >> pinging primary repeatedly for slots.  Shall here it should error out
> > >> or at least be no-op until we give a valid primary_slot_name?
> > >>
> > >
> >
> > Nice catch, thanks!
> >
> > > I reviewed it. There is no way to test the existence/validity of
> > > 'primary_slot_name' on standby without making a connection to primary.
> > > If primary_slot_name is invalid from the start, slot-sync worker will
> > > be no-op (as you tested) as WalRecv->latestWalENd will be invalid, and
> > > if 'primary_slot_name' is changed to invalid on runtime, slot-sync
> > > worker will still keep on pinging primary. But that should be okay (in
> > > fact needed) as it needs to sync at-least the previous slot's
> > > positions (in case it is delayed in doing so for some reason earlier).
> > > And once the slots are up-to-date on standby, even if worker pings
> > > primary, it will not see any change in slots lsns and thus go for
> > > longer nap. I think, it is not worth the effort to introduce the
> > > complexity of checking validity of 'primary_slot_name' on primary from
> > > standby for this rare scenario.
> > >
> >
> > Maybe another option could be to have the walreceiver a way to let the slot sync
> > worker knows that it (the walreceiver) was not able to start due to non existing
> > replication slot on the primary? (that way we'd avoid the slot sync worker having
> > to talk to the primary).
>
> Few points:
> 1) I think if we do it, we should do it in generic way i.e. slotsync
> worker should go to no-op if walreceiver is not able to start due to
> any reason and not only due to invalid primary_slot_name.
> 2) Secondly, slotsync worker needs to make sure it has synced the
> slots so far i.e. worker should not go to no-op immediately on seeing
> missing WalRcv process if there are pending slots to be synced.
>

Won't it be better to just ping and check the validity of
'primary_slot_name' at the start of slot-sync and if it is changed
anytime? I think it would be better to avoid adding dependency on
walreciever state as that sounds like needless complexity.

--
With Regards,
Amit Kapila.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Thomas Munro
Дата:
Сообщение: UBSan pointer overflow in xlogreader.c
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: undetected deadlock in ALTER SUBSCRIPTION ... REFRESH PUBLICATION