Re: Synchronizing slots from primary to standby

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: Synchronizing slots from primary to standby
Дата
Msg-id CAA4eK1J5zTmm4NE4os59WgU4AZPNb74X-n67pY8SkoDfzsN_jA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Synchronizing slots from primary to standby  (shveta malik <shveta.malik@gmail.com>)
Список pgsql-hackers
On Tue, Dec 19, 2023 at 5:17 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Mon, Dec 18, 2023 at 4:22 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Fri, Dec 15, 2023 at 11:03 AM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > > Sorry, I missed attaching the patch. PFA v48.
> > >
> >
> > Few comments on v48_0002
> > ========================
>
> Thanks for reviewing. These are addressed in v50.
>

I was still reviewing the v48 version and have a few comments as
below. If some of these are already addressed or not relevant, feel
free to ignore them.
1.
+ /*
+ * Slot sync worker can be stopped at any time.
+ * Use exit status 1 so the background worker is restarted.

We don't need to start the second line of comment in a separate line.

2.
+ * The assumption is that these dropped local invalidated slots will get
+ * recreated in next sync-cycle and it is okay to drop and recreate such slots

In the above line '.. local invalidated ..' sounds redundant. Shall we
remove it?

3.
+ if (remote_slot->confirmed_lsn > WalRcv->latestWalEnd)
+ {
+ SpinLockRelease(&WalRcv->mutex);
+ elog(ERROR, "skipping sync of slot \"%s\" as the received slot sync "


This error message looks odd to me. At least, it should be exiting
instead of skipping because we won't continue after this.

4.
+ /* User created slot with the same name exists, raise ERROR. */
+ if (sync_state == SYNCSLOT_STATE_NONE)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("skipping sync of slot \"%s\" as it is a user created"
+ " slot", remote_slot->name),
+ errdetail("This slot has failover enabled on the primary and"
+    " thus is sync candidate but user created slot with"
+    " the same name already exists on the standby")));

Same problem as above. The skipping in error message doesn't seem to
be suitable for the purpose. Additionally, errdetail message should
end with a full stop.

5.
+ /* Slot ready for sync, so sync it. */
+ if (sync_state == SYNCSLOT_STATE_READY)
+ {
+ /*
+ * Sanity check: With hot_standby_feedback enabled and
+ * invalidations handled appropriately as above, this should never
+ * happen.
+ */
+ if (remote_slot->restart_lsn < slot->data.restart_lsn)
+ elog(ERROR,
+ "not synchronizing local slot \"%s\" LSN(%X/%X)"
+ " to remote slot's LSN(%X/%X) as synchronization "

The start of the error message sounds odd. Shall we say 'cannot
synchronize ...'?

6. All except one of the callers of local_slot_update() marks the slot
dirty and the same is required as well. I think the remaining caller
should also mark it dirty and we should move
ReplicationSlotMarkDirty() in the caller space.

7.
+synchronize_one_slot(WalReceiverConn *wrconn, RemoteSlot *remote_slot)
{
...
+ /*
+ * Copy the invalidation cause from remote only if local slot is not
+ * invalidated locally, we don't want to overwrite existing one.
+ */
+ if (slot->data.invalidated == RS_INVAL_NONE)
+ {
+ SpinLockAcquire(&slot->mutex);
+ slot->data.invalidated = remote_slot->invalidated;
+ SpinLockRelease(&slot->mutex);
+ }
...

It doesn't seem that after changing the invalidated flag, we always
mark the slot dirty. Am, I missing something?

8.
+ /*
+ * Drop local slots that no longer need to be synced. Do it before
+ * synchronize_one_slot to allow dropping of slots before actual sync
+ * which are invalidated locally while still valid on the primary server.
+ */
+ drop_obsolete_slots(remote_slot_list);

The second part of the above comment seems redundant as that is obvious.

9.
+static WalReceiverConn *
+remote_connect(void)
+{
+ WalReceiverConn *wrconn = NULL;
+ char    *err;
+
+ wrconn = walrcv_connect(PrimaryConnInfo, true, false,
+ cluster_name[0] ? cluster_name : "slotsyncworker",
+ &err);
+ if (wrconn == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_CONNECTION_FAILURE),
+ errmsg("could not connect to the primary server: %s", err)));
+ return wrconn;
+}

Do we need a function for this? It appears to be called from just one
place, so not sure if it is helpful to have a function for this.

--
With Regards,
Amit Kapila.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Hayato Kuroda (Fujitsu)"
Дата:
Сообщение: RE: Synchronizing slots from primary to standby
Следующее
От: Karina Litskevich
Дата:
Сообщение: Correct the documentation of extension building infrastructure