Re: Synchronizing slots from primary to standby

Поиск
Список
Период
Сортировка
От Masahiko Sawada
Тема Re: Synchronizing slots from primary to standby
Дата
Msg-id CAD21AoCo7CGYSRk+Z_o_0UH+sXC5e22HakYTNR7JF-mYhGJNsQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Synchronizing slots from primary to standby  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: Synchronizing slots from primary to standby  (Dilip Kumar <dilipbalaut@gmail.com>)
Re: Synchronizing slots from primary to standby  (Amit Kapila <amit.kapila16@gmail.com>)
RE: Synchronizing slots from primary to standby  ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>)
Список pgsql-hackers
On Tue, Feb 6, 2024 at 3:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Feb 5, 2024 at 7:56 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > ---
> > Since Two processes (e.g. the slotsync worker and
> > pg_sync_replication_slots()) concurrently fetch and update the slot
> > information, there is a race condition where slot's
> > confirmed_flush_lsn goes backward.
> >
>
> Right, this is possible, though there shouldn't be a problem because
> anyway, slotsync is an async process. Till we hold restart_lsn, the
> required WAL won't be removed. Having said that, I can think of two
> ways to avoid it: (a) We can have some flag in shared memory using
> which we can detect whether any other process is doing slot
> syncronization and then either error out at that time or simply wait
> or may take nowait kind of parameter from user to decide what to do?
> If this is feasible, we can simply error out for the first version and
> extend it later if we see any use cases for the same (b) similar to
> restart_lsn, if confirmed_flush_lsn is getting moved back, raise an
> error, this is good for now but in future we may still have another
> similar issue, so I would prefer (a) among these but I am fine if you
> prefer (b) or have some other ideas like just note down in comments
> that this is a harmless case and can happen only very rarely.

Thank you for sharing the ideas. I would prefer (a). For (b), the same
issue still happens for other fields.

>
> >
> > ---
> > +     It is recommended that subscriptions are first disabled before promoting
> > f+     the standby and are enabled back after altering the connection string.
> >
> > I think it's better to describe the reason why it's recommended to
> > disable subscriptions before the standby promotion.
> >
>
> Agreed. The reason I see for this is that if we don't disable the
> subscription before promotion and changing the connection string there
> is a chance that the old primary comes back and the subscriber can
> have some additional data, though the chances of same are less.
>
> > ---
> > +/* Slot sync worker objects */
> > +extern PGDLLIMPORT char *PrimaryConnInfo;
> > +extern PGDLLIMPORT char *PrimarySlotName;
> >
> > These two variables are declared also in xlogrecovery.h. Is it
> > intentional? If so, I think it's better to write comments.
> >
> > ---
> > Global functions and variables used by the slotsync worker are
> > declared in logicalworker.h and worker_internal.h. But is it really
> > okay to make a dependency between the slotsync worker and logical
> > replication workers? IIUC the slotsync worker is conceptually a
> > separate feature from the logical replication. I think the slotsync
> > worker can have its own header file.
> >
>
> +1.
>
> >
> > ---
> > +     Confirm that the standby server is not lagging behind the subscribers.
> > +     This step can be skipped if
> > +     <link linkend="guc-standby-slot-names"><varname>standby_slot_names</varname></link>
> > +     has been correctly configured.
> >
> > How can the user confirm if standby_slot_names is correctly configured?
> >
>
> I think users can refer to LOGs to see if it has changed since the
> first time it was configured. I tried by existing parameter and see
> the following in LOG:
> LOG:  received SIGHUP, reloading configuration files
> 2024-02-06 11:38:59.069 IST [9240] LOG:  parameter "autovacuum" changed to "on"
>
> If the user can't confirm then it is better to follow the steps
> mentioned in the patch. Do you want something else to be written in
> docs for this? If so, what?

IIUC even if a wrong slot name is specified to standby_slot_names or
even standby_slot_names is empty, the standby server might not be
lagging behind the subscribers depending on the timing. But when
checking it the next time, the standby server might lag behind the
subscribers. So what I wanted to know is how the user can confirm if a
failover-enabled subscription is ensured not to go in front of
failover-candidate standbys (i.e., standbys using the slots listed in
standby_slot_names).

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bertrand Drouvot
Дата:
Сообщение: Re: Synchronizing slots from primary to standby
Следующее
От: Yugo NAGATA
Дата:
Сообщение: Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row