RE: Synchronizing slots from primary to standby
От | Zhijie Hou (Fujitsu) |
---|---|
Тема | RE: Synchronizing slots from primary to standby |
Дата | |
Msg-id | OS0PR01MB5716E281A55E5053DF4976D1945E2@OS0PR01MB5716.jpnprd01.prod.outlook.com обсуждение исходный текст |
Ответ на | Re: Synchronizing slots from primary to standby (Amit Kapila <amit.kapila16@gmail.com>) |
Ответы |
RE: Synchronizing slots from primary to standby
("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>)
Re: Synchronizing slots from primary to standby (Peter Smith <smithpb2250@gmail.com>) |
Список | pgsql-hackers |
On Thursday, February 29, 2024 7:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Feb 29, 2024 at 3:23 PM Amit Kapila <amit.kapila16@gmail.com> > wrote: > > > > Few additional comments on the latest patch: > > ================================= > > 1. > > static XLogRecPtr > > WalSndWaitForWal(XLogRecPtr loc) > > { > > ... > > + if (!XLogRecPtrIsInvalid(RecentFlushPtr) && loc <= RecentFlushPtr && > > + (!replication_active || StandbyConfirmedFlush(loc, WARNING))) { > > + /* > > + * Fast path to avoid acquiring the spinlock in case we already know > > + * we have enough WAL available and all the standby servers have > > + * confirmed receipt of WAL up to RecentFlushPtr. This is > > + particularly > > + * interesting if we're far behind. > > + */ > > return RecentFlushPtr; > > + } > > ... > > ... > > + * Wait for WALs to be flushed to disk only if the postmaster has not > > + * asked us to stop. > > + */ > > + if (loc > RecentFlushPtr && !got_STOPPING) wait_event = > > + WAIT_EVENT_WAL_SENDER_WAIT_FOR_WAL; > > + > > + /* > > + * Check if the standby slots have caught up to the flushed position. > > + * It is good to wait up to RecentFlushPtr and then let it send the > > + * changes to logical subscribers one by one which are already > > + covered > > + * in RecentFlushPtr without needing to wait on every change for > > + * standby confirmation. Note that after receiving the shutdown > > + signal, > > + * an ERROR is reported if any slots are dropped, invalidated, or > > + * inactive. This measure is taken to prevent the walsender from > > + * waiting indefinitely. > > + */ > > + else if (replication_active && > > + !StandbyConfirmedFlush(RecentFlushPtr, WARNING)) { wait_event = > > + WAIT_EVENT_WAIT_FOR_STANDBY_CONFIRMATION; > > + wait_for_standby = true; > > + } > > + else > > + { > > + /* Already caught up and doesn't need to wait for standby_slots. */ > > break; > > + } > > ... > > } > > > > Can we try to move these checks into a separate function that returns > > a boolean and has an out parameter as wait_event? > > > > 2. How about naming StandbyConfirmedFlush() as > StandbySlotsAreCaughtup? > > > > Some more comments: > ================== > 1. > + foreach_ptr(char, name, elemlist) > + { > + ReplicationSlot *slot; > + > + slot = SearchNamedReplicationSlot(name, true); > + > + if (!slot) > + { > + GUC_check_errdetail("replication slot \"%s\" does not exist", name); > + ok = false; break; } > + > + if (!SlotIsPhysical(slot)) > + { > + GUC_check_errdetail("\"%s\" is not a physical replication slot", > + name); ok = false; break; } } > > Won't the second check need protection via ReplicationSlotControlLock? Yes, added. > > 2. In WaitForStandbyConfirmation(), we are anyway calling > StandbyConfirmedFlush() before the actual sleep in the loop, so does calling it > at the beginning of the function will serve any purpose? If so, it is better to add > some comments explaining the same. It is used as a fast-path to avoid calling condition variable stuff, I think we can directly put failover check and list check in the beginning instead of calling that function. > > 3. Also do we need to perform the validation checks done in > StandbyConfirmedFlush() repeatedly when it is invoked in a loop? We can > probably separate those checks and perform them just once. I have removed slot.failover check from the StandbyConfirmedFlush function, so that we can do it when necessary. I didn’t change the check for standby_slot_names_list because the list could be changed in the loop when reloading config. > > 4. > + * > + * XXX: If needed, this can be attempted in future. > > Remove this part of the comment. Removed. Attach the V102 patch set which addressed Amit and Shveta's comments. Thanks Shveta for helping addressing the comments off-list. Best Regards, Hou zj
Вложения
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Michael PaquierДата:
Сообщение: Re: Propagate sanity checks of ProcessUtility() to standard_ProcessUtility()?
Следующее
От: "Zhijie Hou (Fujitsu)"Дата:
Сообщение: RE: Synchronizing slots from primary to standby