RE: Synchronizing slots from primary to standby

Поиск
Список
Период
Сортировка
От Zhijie Hou (Fujitsu)
Тема RE: Synchronizing slots from primary to standby
Дата
Msg-id OS0PR01MB5716E281A55E5053DF4976D1945E2@OS0PR01MB5716.jpnprd01.prod.outlook.com
обсуждение исходный текст
Ответ на Re: Synchronizing slots from primary to standby  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы RE: Synchronizing slots from primary to standby  ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>)
Re: Synchronizing slots from primary to standby  (Peter Smith <smithpb2250@gmail.com>)
Список pgsql-hackers
On Thursday, February 29, 2024 7:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> On Thu, Feb 29, 2024 at 3:23 PM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> >
> > Few additional comments on the latest patch:
> > =================================
> > 1.
> >  static XLogRecPtr
> >  WalSndWaitForWal(XLogRecPtr loc)
> > {
> > ...
> > + if (!XLogRecPtrIsInvalid(RecentFlushPtr) && loc <= RecentFlushPtr &&
> > + (!replication_active || StandbyConfirmedFlush(loc, WARNING))) {
> > + /*
> > + * Fast path to avoid acquiring the spinlock in case we already know
> > + * we have enough WAL available and all the standby servers have
> > + * confirmed receipt of WAL up to RecentFlushPtr. This is
> > + particularly
> > + * interesting if we're far behind.
> > + */
> >   return RecentFlushPtr;
> > + }
> > ...
> > ...
> > + * Wait for WALs to be flushed to disk only if the postmaster has not
> > + * asked us to stop.
> > + */
> > + if (loc > RecentFlushPtr && !got_STOPPING) wait_event =
> > + WAIT_EVENT_WAL_SENDER_WAIT_FOR_WAL;
> > +
> > + /*
> > + * Check if the standby slots have caught up to the flushed position.
> > + * It is good to wait up to RecentFlushPtr and then let it send the
> > + * changes to logical subscribers one by one which are already
> > + covered
> > + * in RecentFlushPtr without needing to wait on every change for
> > + * standby confirmation. Note that after receiving the shutdown
> > + signal,
> > + * an ERROR is reported if any slots are dropped, invalidated, or
> > + * inactive. This measure is taken to prevent the walsender from
> > + * waiting indefinitely.
> > + */
> > + else if (replication_active &&
> > + !StandbyConfirmedFlush(RecentFlushPtr, WARNING)) { wait_event =
> > + WAIT_EVENT_WAIT_FOR_STANDBY_CONFIRMATION;
> > + wait_for_standby = true;
> > + }
> > + else
> > + {
> > + /* Already caught up and doesn't need to wait for standby_slots. */
> >   break;
> > + }
> > ...
> > }
> >
> > Can we try to move these checks into a separate function that returns
> > a boolean and has an out parameter as wait_event?
> >
> > 2. How about naming StandbyConfirmedFlush() as
> StandbySlotsAreCaughtup?
> >
> 
> Some more comments:
> ==================
> 1.
> + foreach_ptr(char, name, elemlist)
> + {
> + ReplicationSlot *slot;
> +
> + slot = SearchNamedReplicationSlot(name, true);
> +
> + if (!slot)
> + {
> + GUC_check_errdetail("replication slot \"%s\" does not exist", name);
> + ok = false; break; }
> +
> + if (!SlotIsPhysical(slot))
> + {
> + GUC_check_errdetail("\"%s\" is not a physical replication slot",
> + name); ok = false; break; } }
> 
> Won't the second check need protection via ReplicationSlotControlLock?

Yes, added.

> 
> 2. In WaitForStandbyConfirmation(), we are anyway calling
> StandbyConfirmedFlush() before the actual sleep in the loop, so does calling it
> at the beginning of the function will serve any purpose? If so, it is better to add
> some comments explaining the same.

It is used as a fast-path to avoid calling condition variable stuff, I think we can directly
put failover check and list check in the beginning instead of calling that function.

> 
> 3. Also do we need to perform the validation checks done in
> StandbyConfirmedFlush() repeatedly when it is invoked in a loop? We can
> probably separate those checks and perform them just once.

I have removed slot.failover check from the StandbyConfirmedFlush
function, so that we can do it when necessary. I didn’t change the check
for standby_slot_names_list because the list could be changed in the loop
when reloading config.

> 
> 4.
> +   *
> +   * XXX: If needed, this can be attempted in future.
> 
> Remove this part of the comment.

Removed.

Attach the V102 patch set which addressed Amit and Shveta's comments.
Thanks Shveta for helping addressing the comments off-list.

Best Regards,
Hou zj

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: Propagate sanity checks of ProcessUtility() to standard_ProcessUtility()?
Следующее
От: "Zhijie Hou (Fujitsu)"
Дата:
Сообщение: RE: Synchronizing slots from primary to standby