Re: Synchronizing slots from primary to standby
От | Amit Kapila |
---|---|
Тема | Re: Synchronizing slots from primary to standby |
Дата | |
Msg-id | CAA4eK1JaHpcS9ZpAGYHSeDQAtLbas55HLqcfnn7n3uWwH-Ko8A@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Synchronizing slots from primary to standby (Peter Smith <smithpb2250@gmail.com>) |
Ответы |
Re: Synchronizing slots from primary to standby
(Amit Kapila <amit.kapila16@gmail.com>)
RE: Synchronizing slots from primary to standby ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>) |
Список | pgsql-hackers |
On Thu, Feb 29, 2024 at 8:29 AM Peter Smith <smithpb2250@gmail.com> wrote: > > On Wed, Feb 28, 2024 at 1:23 PM Zhijie Hou (Fujitsu) > <houzj.fnst@fujitsu.com> wrote: > > > > On Tuesday, February 27, 2024 3:18 PM Peter Smith <smithpb2250@gmail.com> wrote: > ... > > > 20. > > > +# > > > +# | ----> standby1 (primary_slot_name = sb1_slot) # | ----> standby2 > > > +(primary_slot_name = sb2_slot) # primary ----- | # | ----> subscriber1 > > > +(failover = true) # | ----> subscriber2 (failover = false) > > > > > > In the diagram, the "--->" means a mixture of physical standbys and logical > > > pub/sub replication. Maybe it can be a bit clearer? > > > > > > SUGGESTION > > > > > > # primary (publisher) > > > # > > > # (physical standbys) > > > # | ----> standby1 (primary_slot_name = sb1_slot) > > > # | ----> standby2 (primary_slot_name = sb2_slot) > > > # > > > # (logical replication) > > > # | ----> subscriber1 (failover = true, slot_name = lsub1_slot) > > > # | ----> subscriber2 (failover = false, slot_name = lsub2_slot) > > > > > > > I think one can distinguish it based on the 'standby' and 'subscriber' as well, because > > 'standby' normally refer to physical standby while the other refer to logical. > > I think Peter's suggestion will make the setup clear. > > Ok, but shouldn't it at least include info about the logical slot > names associated with the subscribers (slot_name = lsub1_slot, > slot_name = lsub2_slot) like suggested above? > > ====== > > Here are some more review comments for v100-0001 > > ====== > doc/src/sgml/config.sgml > > 1. > + <para> > + Lists the streaming replication standby server slot names that logical > + WAL sender processes will wait for. Logical WAL sender processes will > + send decoded changes to plugins only after the specified replication > + slots confirm receiving WAL. This guarantees that logical replication > + slots with failover enabled do not consume changes until those changes > + are received and flushed to corresponding physical standbys. If a > + logical replication connection is meant to switch to a physical standby > + after the standby is promoted, the physical replication slot for the > + standby should be listed here. Note that logical replication will not > + proceed if the slots specified in the standby_slot_names do > not exist or > + are invalidated. > + </para> > > Is that note ("Note that logical replication will not proceed if the > slots specified in the standby_slot_names do not exist or are > invalidated") meant only for subscriptions marked for 'failover' or > any subscription? Maybe wording can be modified to help clarify it? > I think it is implicit that here we are talking about failover slots. I think clarifying again the same could be repetitive considering the previous sentence: "This guarantees that logical replication slots with failover enabled do not consume .." have mentioned it. > ====== > src/backend/replication/slot.c > > 2. > +/* > + * A helper function to validate slots specified in GUC standby_slot_names. > + */ > +static bool > +validate_standby_slots(char **newval) > +{ > + char *rawname; > + List *elemlist; > + bool ok; > + > + /* Need a modifiable copy of string */ > + rawname = pstrdup(*newval); > + > + /* Verify syntax and parse string into a list of identifiers */ > + ok = SplitIdentifierString(rawname, ',', &elemlist); > + > + if (!ok) > + { > + GUC_check_errdetail("List syntax is invalid."); > + } > + > + /* > + * If the replication slots' data have been initialized, verify if the > + * specified slots exist and are logical slots. > + */ > + else if (ReplicationSlotCtl) > + { > + foreach_ptr(char, name, elemlist) > + { > + ReplicationSlot *slot; > + > + slot = SearchNamedReplicationSlot(name, true); > + > + if (!slot) > + { > + GUC_check_errdetail("replication slot \"%s\" does not exist", > + name); > + ok = false; > + break; > + } > + > + if (!SlotIsPhysical(slot)) > + { > + GUC_check_errdetail("\"%s\" is not a physical replication slot", > + name); > + ok = false; > + break; > + } > + } > + } > + > + pfree(rawname); > + list_free(elemlist); > + return ok; > +} > > 2a. > I didn't mention this previously because I thought this function was > not going to change anymore, but since Bertrand suggested some changes > [1], I will say IMO the { } are fine here for the single statement, > but I think it will be better to rearrange this code to be like below. > Having a 2nd NOP 'else' gives a much better place where you can put > your ReplicationSlotCtl comment. > > if (!ok) > { > GUC_check_errdetail("List syntax is invalid."); > } > else if (!ReplicationSlotCtl) > { > <Insert big comment here about why it is OK to skip when > ReplicationSlotCtl is NULL> > } > else > { > foreach_ptr ... > } > +1. This will make the code and reasoning to skip clear. Few additional comments on the latest patch: ================================= 1. static XLogRecPtr WalSndWaitForWal(XLogRecPtr loc) { ... + if (!XLogRecPtrIsInvalid(RecentFlushPtr) && loc <= RecentFlushPtr && + (!replication_active || StandbyConfirmedFlush(loc, WARNING))) + { + /* + * Fast path to avoid acquiring the spinlock in case we already know + * we have enough WAL available and all the standby servers have + * confirmed receipt of WAL up to RecentFlushPtr. This is particularly + * interesting if we're far behind. + */ return RecentFlushPtr; + } ... ... + * Wait for WALs to be flushed to disk only if the postmaster has not + * asked us to stop. + */ + if (loc > RecentFlushPtr && !got_STOPPING) + wait_event = WAIT_EVENT_WAL_SENDER_WAIT_FOR_WAL; + + /* + * Check if the standby slots have caught up to the flushed position. + * It is good to wait up to RecentFlushPtr and then let it send the + * changes to logical subscribers one by one which are already covered + * in RecentFlushPtr without needing to wait on every change for + * standby confirmation. Note that after receiving the shutdown signal, + * an ERROR is reported if any slots are dropped, invalidated, or + * inactive. This measure is taken to prevent the walsender from + * waiting indefinitely. + */ + else if (replication_active && + !StandbyConfirmedFlush(RecentFlushPtr, WARNING)) + { + wait_event = WAIT_EVENT_WAIT_FOR_STANDBY_CONFIRMATION; + wait_for_standby = true; + } + else + { + /* Already caught up and doesn't need to wait for standby_slots. */ break; + } ... } Can we try to move these checks into a separate function that returns a boolean and has an out parameter as wait_event? 2. How about naming StandbyConfirmedFlush() as StandbySlotsAreCaughtup? -- With Regards, Amit Kapila.
В списке pgsql-hackers по дате отправления:
Следующее
От: Daniel GustafssonДата:
Сообщение: Re: Remove AIX Support (was: Re: Relation bulk write facility)