Re: Synchronizing slots from primary to standby

Поиск
Список
Период
Сортировка
От shveta malik
Тема Re: Synchronizing slots from primary to standby
Дата
Msg-id CAJpy0uAt7c8ccND7yejoH=SmawCDON-DN2W=qhrKbNJ506ovYQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Synchronizing slots from primary to standby  ("Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com>)
Ответы Re: Synchronizing slots from primary to standby  ("Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com>)
Список pgsql-hackers
On Tue, Dec 5, 2023 at 2:18 PM Drouvot, Bertrand
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On 12/5/23 6:08 AM, shveta malik wrote:
> > On Mon, Dec 4, 2023 at 10:07 PM Drouvot, Bertrand
> > <bertranddrouvot.pg@gmail.com> wrote:
> >> Maybe another option could be to have the walreceiver a way to let the slot sync
> >> worker knows that it (the walreceiver) was not able to start due to non existing
> >> replication slot on the primary? (that way we'd avoid the slot sync worker having
> >> to talk to the primary).
> >
> > Few points:
> > 1) I think if we do it, we should do it in generic way i.e. slotsync
> > worker should go to no-op if walreceiver is not able to start due to
> > any reason and not only due to invalid primary_slot_name.
>
> Agree.
>
> > 2) Secondly, slotsync worker needs to make sure it has synced the
> > slots so far i.e. worker should not go to no-op immediately on seeing
> > missing WalRcv process if there are pending slots to be synced.
>
> Agree.
>
> > So the generic way I see to have this optimization is:
> > 1) Slotsync worker can use 'WalRcv->pid' to figure out if WalReceiver
> > is running or not.
>
> Not sure that would work because the walreceiver keeps try re-starting
> and so get a pid before reaching the "could not start WAL streaming: ERROR:  replication slot "XXXX" does not exist"
> error.
>

yes, right. pid will keep on toggling.

> We may want to add an extra check on walrcv->walRcvState (or should/could be enough by its own).
> But walrcv->walRcvState is set to WALRCV_STREAMING way before walrcv_startstreaming().
>

Agree. Check on 'walrcv->walRcvState' alone should suffice.

> Wouldn't that make sense to move it once we are sure that
> walrcv_startstreaming() returns true and first_stream is true, here?
>
> "
>                          if (first_stream)
> +                       {
>                                  ereport(LOG,
>                                                  (errmsg("started streaming WAL from primary at %X/%X on timeline
%u",
>                                                                  LSN_FORMAT_ARGS(startpoint), startpointTLI)));
> +                               SpinLockAcquire(&walrcv->mutex);
> +                               walrcv->walRcvState = WALRCV_STREAMING;
> +                               SpinLockRelease(&walrcv->mutex);
> +                       }
> "
>

Yes, it makes sense and is the basis for current slot-sync worker
changes being discussed.

> > 2) Slotsync worker should check null 'WalRcv->pid' only when
> > no-activity is observed for threshold time i.e. it can do it during
> > existing logic of increasing naptime.
> > 3) On finding null  'WalRcv->pid', worker can mark a flag to go to
> > no-op unless WalRcv->pid becomes valid again. Marking this flag during
> > increasing naptime will guarantee that the worker has taken all the
> > changes so far i.e. standby is not lagging in terms of slots.
> >
>
> 2) and 3) looks good to me but with a check on walrcv->walRcvState
> looking for WALRCV_STREAMING state instead of looking for a non null
> WalRcv->pid.

yes. But I think, the worker should enter no-op, when walRcvState is
WALRCV_STOPPED and not when walRcvState != WALRCV_STREAMING as it is
okay to have WALRCV_WAITING/STARTING/RESTARTING. But the worker should
exit no-op only when it finds walRcvState switched back to
WALRCV_STREAMING.

>
> And only if it makes sense to move the walrcv->walRcvState = WALRCV_STREAMING as
> mentioned above (I think it does).
>

yes, I agree.

thanks
Shveta



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Nikita Malakhov
Дата:
Сообщение: Re: Avoid detoast overhead when possible
Следующее
От: Alexander Pyhalov
Дата:
Сообщение: Re: Add semi-join pushdown to postgres_fdw