Re: Synchronizing slots from primary to standby

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: Synchronizing slots from primary to standby
Дата
Msg-id CAA4eK1JaHpcS9ZpAGYHSeDQAtLbas55HLqcfnn7n3uWwH-Ko8A@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Synchronizing slots from primary to standby  (Peter Smith <smithpb2250@gmail.com>)
Ответы Re: Synchronizing slots from primary to standby  (Amit Kapila <amit.kapila16@gmail.com>)
RE: Synchronizing slots from primary to standby  ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>)
Список pgsql-hackers
On Thu, Feb 29, 2024 at 8:29 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Wed, Feb 28, 2024 at 1:23 PM Zhijie Hou (Fujitsu)
> <houzj.fnst@fujitsu.com> wrote:
> >
> > On Tuesday, February 27, 2024 3:18 PM Peter Smith <smithpb2250@gmail.com> wrote:
> ...
> > > 20.
> > > +#
> > > +# | ----> standby1 (primary_slot_name = sb1_slot) # | ----> standby2
> > > +(primary_slot_name = sb2_slot) # primary ----- | # | ----> subscriber1
> > > +(failover = true) # | ----> subscriber2 (failover = false)
> > >
> > > In the diagram, the "--->" means a mixture of physical standbys and logical
> > > pub/sub replication. Maybe it can be a bit clearer?
> > >
> > > SUGGESTION
> > >
> > > # primary (publisher)
> > > #
> > > #     (physical standbys)
> > > #     | ----> standby1 (primary_slot_name = sb1_slot)
> > > #     | ----> standby2 (primary_slot_name = sb2_slot)
> > > #
> > > #     (logical replication)
> > > #     | ----> subscriber1 (failover = true, slot_name = lsub1_slot)
> > > #     | ----> subscriber2 (failover = false, slot_name = lsub2_slot)
> > >
> >
> > I think one can distinguish it based on the 'standby' and 'subscriber' as well, because
> > 'standby' normally refer to physical standby while the other refer to logical.
> >

I think Peter's suggestion will make the setup clear.

>
> Ok, but shouldn't it at least include info about the logical slot
> names associated with the subscribers (slot_name = lsub1_slot,
> slot_name = lsub2_slot) like suggested above?
>
> ======
>
> Here are some more review comments for v100-0001
>
> ======
> doc/src/sgml/config.sgml
>
> 1.
> +       <para>
> +        Lists the streaming replication standby server slot names that logical
> +        WAL sender processes will wait for. Logical WAL sender processes will
> +        send decoded changes to plugins only after the specified replication
> +        slots confirm receiving WAL. This guarantees that logical replication
> +        slots with failover enabled do not consume changes until those changes
> +        are received and flushed to corresponding physical standbys. If a
> +        logical replication connection is meant to switch to a physical standby
> +        after the standby is promoted, the physical replication slot for the
> +        standby should be listed here. Note that logical replication will not
> +        proceed if the slots specified in the standby_slot_names do
> not exist or
> +        are invalidated.
> +       </para>
>
> Is that note ("Note that logical replication will not proceed if the
> slots specified in the standby_slot_names do not exist or are
> invalidated") meant only for subscriptions marked for 'failover' or
> any subscription? Maybe wording can be modified to help clarify it?
>

I think it is implicit that here we are talking about failover slots.
I think clarifying again the same could be repetitive considering the
previous sentence: "This guarantees that logical replication slots
with failover enabled do not consume .." have mentioned it.

> ======
> src/backend/replication/slot.c
>
> 2.
> +/*
> + * A helper function to validate slots specified in GUC standby_slot_names.
> + */
> +static bool
> +validate_standby_slots(char **newval)
> +{
> + char    *rawname;
> + List    *elemlist;
> + bool ok;
> +
> + /* Need a modifiable copy of string */
> + rawname = pstrdup(*newval);
> +
> + /* Verify syntax and parse string into a list of identifiers */
> + ok = SplitIdentifierString(rawname, ',', &elemlist);
> +
> + if (!ok)
> + {
> + GUC_check_errdetail("List syntax is invalid.");
> + }
> +
> + /*
> + * If the replication slots' data have been initialized, verify if the
> + * specified slots exist and are logical slots.
> + */
> + else if (ReplicationSlotCtl)
> + {
> + foreach_ptr(char, name, elemlist)
> + {
> + ReplicationSlot *slot;
> +
> + slot = SearchNamedReplicationSlot(name, true);
> +
> + if (!slot)
> + {
> + GUC_check_errdetail("replication slot \"%s\" does not exist",
> + name);
> + ok = false;
> + break;
> + }
> +
> + if (!SlotIsPhysical(slot))
> + {
> + GUC_check_errdetail("\"%s\" is not a physical replication slot",
> + name);
> + ok = false;
> + break;
> + }
> + }
> + }
> +
> + pfree(rawname);
> + list_free(elemlist);
> + return ok;
> +}
>
> 2a.
> I didn't mention this previously because I thought this function was
> not going to change anymore, but since Bertrand suggested some changes
> [1], I will say IMO the { } are fine here for the single statement,
> but I think it will be better to rearrange this code to be like below.
> Having a 2nd NOP 'else' gives a much better place where you can put
> your ReplicationSlotCtl comment.
>
> if (!ok)
> {
>   GUC_check_errdetail("List syntax is invalid.");
> }
> else if (!ReplicationSlotCtl)
> {
>   <Insert big comment here about why it is OK to skip when
> ReplicationSlotCtl is NULL>
> }
> else
> {
>   foreach_ptr ...
> }
>

+1. This will make the code and reasoning to skip clear.

Few additional comments on the latest patch:
=================================
1.
 static XLogRecPtr
 WalSndWaitForWal(XLogRecPtr loc)
{
...
+ if (!XLogRecPtrIsInvalid(RecentFlushPtr) && loc <= RecentFlushPtr &&
+ (!replication_active || StandbyConfirmedFlush(loc, WARNING)))
+ {
+ /*
+ * Fast path to avoid acquiring the spinlock in case we already know
+ * we have enough WAL available and all the standby servers have
+ * confirmed receipt of WAL up to RecentFlushPtr. This is particularly
+ * interesting if we're far behind.
+ */
  return RecentFlushPtr;
+ }
...
...
+ * Wait for WALs to be flushed to disk only if the postmaster has not
+ * asked us to stop.
+ */
+ if (loc > RecentFlushPtr && !got_STOPPING)
+ wait_event = WAIT_EVENT_WAL_SENDER_WAIT_FOR_WAL;
+
+ /*
+ * Check if the standby slots have caught up to the flushed position.
+ * It is good to wait up to RecentFlushPtr and then let it send the
+ * changes to logical subscribers one by one which are already covered
+ * in RecentFlushPtr without needing to wait on every change for
+ * standby confirmation. Note that after receiving the shutdown signal,
+ * an ERROR is reported if any slots are dropped, invalidated, or
+ * inactive. This measure is taken to prevent the walsender from
+ * waiting indefinitely.
+ */
+ else if (replication_active &&
+ !StandbyConfirmedFlush(RecentFlushPtr, WARNING))
+ {
+ wait_event = WAIT_EVENT_WAIT_FOR_STANDBY_CONFIRMATION;
+ wait_for_standby = true;
+ }
+ else
+ {
+ /* Already caught up and doesn't need to wait for standby_slots. */
  break;
+ }
...
}

Can we try to move these checks into a separate function that returns
a boolean and has an out parameter as wait_event?

2. How about naming StandbyConfirmedFlush() as StandbySlotsAreCaughtup?

--
With Regards,
Amit Kapila.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: Supporting MERGE on updatable views
Следующее
От: Daniel Gustafsson
Дата:
Сообщение: Re: Remove AIX Support (was: Re: Relation bulk write facility)