RE: Synchronizing slots from primary to standby

Поиск
Список
Период
Сортировка
От Zhijie Hou (Fujitsu)
Тема RE: Synchronizing slots from primary to standby
Дата
Msg-id OS0PR01MB57169D5B90FA6B981BD7226794B6A@OS0PR01MB5716.jpnprd01.prod.outlook.com
обсуждение исходный текст
Ответ на Re: Synchronizing slots from primary to standby  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
On Thursday, November 16, 2023 6:13 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> On Wed, Nov 15, 2023 at 5:21 PM shveta malik <shveta.malik@gmail.com>
> wrote:
> >
> > PFA v34.
> >
> 
> Few comments on v34-0001*
> =======================
> 1.
> + char buf[100];
> +
> + buf[0] = '\0';
> +
> + if (MySubscription->twophasestate ==
> + LOGICALREP_TWOPHASE_STATE_PENDING)
> + strcat(buf, "twophase");
> + if (MySubscription->failoverstate ==
> + LOGICALREP_FAILOVER_STATE_PENDING)
> + {
> + if (buf[0] != '\0')
> + strcat(buf, " and ");
> + strcat(buf, "failover");
> + }
> +
>   ereport(LOG,
> - (errmsg("logical replication apply worker for subscription \"%s\"
> will restart so that two_phase can be enabled",
> - MySubscription->name)));
> + (errmsg("logical replication apply worker for subscription \"%s\"
> will restart so that %s can be enabled",
> + MySubscription->name, buf)));
> 
> I feel it is better to separate elogs rather than construct the string. It would be
> easier for the translation.
> 
> 2.
> -
>  /* Initialize walsender process before entering the main command loop */
> 
> Spurious line removal
> 
> 3.
> @@ -440,17 +448,8 @@ pg_physical_replication_slot_advance(XLogRecPtr
> moveto)
> 
>   if (startlsn < moveto)
>   {
> - SpinLockAcquire(&MyReplicationSlot->mutex);
> - MyReplicationSlot->data.restart_lsn = moveto;
> - SpinLockRelease(&MyReplicationSlot->mutex);
> + PhysicalConfirmReceivedLocation(moveto);
>   retlsn = moveto;
> -
> - /*
> - * Dirty the slot so as it is written out at the next checkpoint. Note
> - * that the LSN position advanced may still be lost in the event of a
> - * crash, but this makes the data consistent after a clean shutdown.
> - */
> - ReplicationSlotMarkDirty();
>   }
> 
> I think this change has been made so that we can wakeup logical walsenders
> from a central location. In general, this is a good idea but it seems calling
> PhysicalConfirmReceivedLocation() would make an additional call to
> ReplicationSlotsComputeRequiredLSN() which is already called in the caller of
> pg_physical_replication_slot_advance(), so not sure such unification is a good
> idea here.
> 
> 4.
> + * Here logical walsender associated with failover logical slot waits
> + * for physical standbys corresponding to physical slots specified in
> + * standby_slot_names GUC.
> + */
> +void
> +WalSndWaitForStandbyConfirmation(XLogRecPtr wait_for_lsn)
> 
> In the above comments, we don't seem to follow the 80-col limit.
> Please check all other comments in the patch for similar problem.
> 
> 5.
> +static void
> +WalSndRereadConfigAndSlots(List **standby_slots) {
> + char    *pre_standby_slot_names = pstrdup(standby_slot_names);
> +
> + ProcessConfigFile(PGC_SIGHUP);
> +
> + if (strcmp(pre_standby_slot_names, standby_slot_names) != 0) {
> + list_free(*standby_slots); *standby_slots = GetStandbySlotList(true);
> + }
> +
> + pfree(pre_standby_slot_names);
> +}
> 
> The function name is misleading w.r.t the functionality. Can we name it on the
> lines of WalSndRereadConfigAndReInitSlotList()? I know it is a bit longer but
> couldn't come up with anything better.
> 
> 6.
> + /*
> + * Fast path to entering the loop in case we already know we have
> + * enough WAL available and all the standby servers has confirmed
> + * receipt of WAL upto RecentFlushPtr.
> 
> I think this comment is a bit misleading because it is a fast path to avoid
> entering the loop. I think we can keep the existing comment
> here: "Fast path to avoid acquiring the spinlock in case we already know ..."
> 
> 7.
> @@ -3381,7 +3673,9 @@ WalSndWait(uint32 socket_events, long timeout,
> uint32 wait_event)
>   * And, we use separate shared memory CVs for physical and logical
>   * walsenders for selective wake ups, see WalSndWakeup() for more details.
>   */
> - if (MyWalSnd->kind == REPLICATION_KIND_PHYSICAL)
> + if (wait_for_standby)
> + ConditionVariablePrepareToSleep(&WalSndCtl->wal_confirm_rcv_cv);
> + else if (MyWalSnd->kind == REPLICATION_KIND_PHYSICAL)
> 
> The comment above this change needs to be updated for the usage of this new
> CV.
> 
> 8.
> +WAL_SENDER_WAIT_FOR_STANDBY_CONFIRMATION "Waiting for physical
> standby confirmation in WAL sender process."
> 
> I feel the above description is not clear. How about being more specific with
> something along the lines of: "Waiting for the WAL to be received by physical
> standby in WAL sender process."
> 
> 9.
> + {"standby_slot_names", PGC_SIGHUP, REPLICATION_PRIMARY,
> + gettext_noop("List of streaming replication standby server slot "
> + "names that logical walsenders waits for."),
> 
> I think we slightly simplify it by saying: "Lists streaming replication standby
> server slot names that logical WAL sender processes wait for.". It would be
> more consistent with a few other similar variables.
> 
> 10.
> + gettext_noop("List of streaming replication standby server slot "
> + "names that logical walsenders waits for."), gettext_noop("Decoded
> + changes are sent out to plugins by logical "
> + "walsenders only after specified replication slots "
> + "confirm receiving WAL."),
> 
> Instead of walsenders, let's use WAL sender processes.
> 
> 11.
> @@ -6622,10 +6623,12 @@ describeSubscriptions(const char *pattern, bool
> verbose)
>   appendPQExpBuffer(&buf,
>     ", suborigin AS \"%s\"\n"
>     ", subpasswordrequired AS \"%s\"\n"
> -   ", subrunasowner AS \"%s\"\n",
> +   ", subrunasowner AS \"%s\"\n"
> +   ", subfailoverstate AS \"%s\"\n",
>     gettext_noop("Origin"),
>     gettext_noop("Password required"),
> -   gettext_noop("Run as owner?"));
> +   gettext_noop("Run as owner?"),
> +   gettext_noop("Enable failover?"));
> 
> Let's name the new column as "Failover" and also it should be displayed only
> when pset.sversion is >=17.
> 
> 12.
> @@ -93,6 +97,8 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId)
> BKI_SHARED_RELATION BKI_ROW
>   bool subrunasowner; /* True if replication should execute as the
>   * subscription owner */
> 
> + char subfailoverstate; /* Enable Failover State */
> 
> This should be listed in system_views.sql in the below GRANT statement:
> GRANT SELECT (oid, subdbid, subskiplsn, subname, subowner, subenabled,
>               subbinary, substream, subtwophasestate, subdisableonerr,
>   subpasswordrequired, subrunasowner,
>               subslotname, subsynccommit, subpublications, suborigin)
> 
> 13.
> + ConditionVariable wal_confirm_rcv_cv;
> +
>   WalSnd walsnds[FLEXIBLE_ARRAY_MEMBER];  } WalSndCtlData;
> 
> It is better to add a comment for this new variable explaining its use.

Thanks for the comments.
Here is the new version patch set which addressed all above comments and the comment in [1].

[1] https://www.postgresql.org/message-id/1e0b2eb4-c977-482d-b16e-c52711c34d6c%40gmail.com

Best Regards,
Hou zj

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: [PoC] Reducing planning time when tables have many partitions
Следующее
От: Dilip Kumar
Дата:
Сообщение: Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock