Re: Synchronizing slots from primary to standby

Поиск
Список
Период
Сортировка
От Peter Smith
Тема Re: Synchronizing slots from primary to standby
Дата
Msg-id CAHut+Pv9=bw3BL3w9XcvvO+Lr=JsX6xo9DNUuqOD2ywWsvSujw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Synchronizing slots from primary to standby  (shveta malik <shveta.malik@gmail.com>)
Ответы Re: Synchronizing slots from primary to standby  (shveta malik <shveta.malik@gmail.com>)
Re: Synchronizing slots from primary to standby  (shveta malik <shveta.malik@gmail.com>)
Список pgsql-hackers
On Mon, Oct 9, 2023 at 9:34 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Wed, Oct 4, 2023 at 8:53 AM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > Here are some review comments for v20-0002.
> >
>
> Thanks Peter for the feedback. Comments from 31 till end are addressed
> in v22. First 30 comments will be addressed in the next version.
>

Thanks for addressing my previous comments.

I checked those and went through other changes in v22-0002 to give a
few more review comments below.

I understand there are some design changes coming soon regarding the
use of GUCs so maybe a few of these comments will become redundant.

======
doc/src/sgml/config.sgml

1.
           A password needs to be provided too, if the sender demands password
           authentication.  It can be provided in the
           <varname>primary_conninfo</varname> string, or in a separate
-          <filename>~/.pgpass</filename> file on the standby server (use
-          <literal>replication</literal> as the database name).
-          Do not specify a database name in the
-          <varname>primary_conninfo</varname> string.
+          <filename>~/.pgpass</filename> file on the standby server.
+         </para>
+         <para>
+          Specify a database name in <varname>primary_conninfo</varname> string
+          to allow synchronization of slots from the primary to standby. This
+          dbname will only be used for slots synchronization purpose and will
+          be irrelevant for streaming.
          </para>

1a.
"Specify a database name in...". Shouldn't that say "Specify dbname in..."?

~

1b.
BEFORE
This dbname will only be used for slots synchronization purpose and
will be irrelevant for streaming.

SUGGESTION
This will only be used for slot synchronization. It is ignored for streaming.

======
doc/src/sgml/system-views.sgml

2. pg_replication_slots

+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>synced_slot</structfield> <type>bool</type>
+      </para>
+      <para>
+       True if this logical slot is created on physical standby as part of
+       slot-synchronization from primary server. Always false for
physical slots.
+      </para></entry>
+     </row>

/on physical standby/on the physical standby/

/from primary server/from the primary server/

======
src/backend/replication/logical/launcher.c

3. LaunchSlotSyncWorkers

+ /*
+ * If we failed to launch this slotsync worker, return and try
+ * launching rest of the workers in next sync cycle. But change
+ * launcher's wait time to minimum of wal_retrieve_retry_interval and
+ * default wait time to try next sync-cycle sooner.
+ */

3a.
Use consistent terms -- choose "sync cycle" or "sync-cycle"

~

3b.
Is it correct to just say "rest of the workers"; won't it also try to
relaunch this same failed worker again?

~~~

4. LauncherMain

+ /*
+ * Stop the slot-sync workers if any of the related GUCs changed.
+ * These will be relaunched using the new values during next
+ * sync-cycle. Also revalidate the new configurations and
+ * reconnect.
+ */
+ if (SlotSyncConfigsChanged())
+ {
+ slotsync_workers_stop();
+
+ if (wrconn)
+ walrcv_disconnect(wrconn);
+
+ if (RecoveryInProgress())
+ wrconn = slotsync_remote_connect();
+ }

Was it overkill to disconnect/reconnect every time any of those GUCs
changed? Or is it enough to do that only if the
PrimaryConnInfoPreReload was changed?

======
src/backend/replication/logical/logical.c

5. CreateDecodingContext

+ /*
+ * Do not allow consumption of a "synchronized" slot until the standby
+ * gets promoted.
+ */
+ if (RecoveryInProgress() && slot->data.synced)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("cannot use replication slot \"%s\" for logical decoding",
+ NameStr(slot->data.name)),
+ errdetail("This slot is being synced from primary."),
+ errhint("Specify another replication slot.")));
+

/from primary/from the primary/

======
src/backend/replication/logical/slotsync.c

6. use_slot_in_query

+ /*
+ * Return TRUE if either slot is not yet created on standby or if it
+ * belongs to one of the dbs passed in dbids.
+ */
+ if (!slot_found || relevant_db)
+ return true;
+
+ return false;

Same as single line:

return (!slot_found || relevant_db);

~~~

7. synchronize_one_slot

+ /*
+ * If the local restart_lsn and/or local catalog_xmin is ahead of
+ * those on the remote then we cannot create the local slot in sync
+ * with primary because that would mean moving local slot backwards
+ * and we might not have WALs retained for old lsns. In this case we
+ * will wait for primary's restart_lsn and catalog_xmin to catch up
+ * with the local one before attempting the sync.
+ */

/moving local slot/moving the local slot/

/with primary/with the primary/

/wait for primary's/wait for the primary's/

~~~

8. ProcessSlotSyncInterrupts

+ if (ConfigReloadPending)
+ {
+ ConfigReloadPending = false;
+
+ /* Save the PrimaryConnInfo before reloading */
+ *conninfo_prev = pstrdup(PrimaryConnInfo);

If the configuration keeps changing then there might be a slow leak
here because I didn't notice anywhere where this strdup'ed string is
getting freed. Is that something worth worrying about?

======
src/backend/replication/slot.c

9. ReplicationSlotDrop

+ /*
+ * Do not allow users to drop the slots which are currently being synced
+ * from the primary to standby.
+ */
+ if (user_cmd && RecoveryInProgress() && MyReplicationSlot->data.synced)
+ {
+ ReplicationSlotRelease();
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("cannot drop replication slot"),
+ errdetail("This slot is being synced from primary.")));
+ }
+

9a.
/to standby/to the standby/

~

9b.
Shouldn't the errmsg name the slot? Otherwise, the message might not
be so useful.

~

9c.
/synced from primary/synced from the primary/

======
src/backend/replication/walsender.c


10. ListSlotDatabaseOIDs

+ LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
+ for (slotno = 0; slotno < max_replication_slots; slotno++)
+ {
+ ReplicationSlot *slot = &ReplicationSlotCtl->replication_slots[slotno];

This is all new code so you can use C99 for loop variable declaration here.

~~~

11.
+ /* If synchronize_slot_names is '*', then skip physical slots */
+ if (SlotIsPhysical(slot))
+ continue;
+


Some mental gymnastics are needed to understand how this code means "
synchronize_slot_names is '*'".

IMO it would be easier to understand if the previous "if
(numslot_names)" was rewritten as if/else.

======
.../utils/activity/wait_event_names.txt

12.
 RECOVERY_WAL_STREAM "Waiting in main loop of startup process for WAL
to arrive, during streaming recovery."
+REPL_SLOTSYNC_MAIN "Waiting in main loop of worker for synchronizing
slots to a standby from primary."
+REPL_SLOTSYNC_PRIMARY_CATCHP "Waiting for primary to catch-up in
worker for synchronizing slots to a standby from primary."
 SYSLOGGER_MAIN "Waiting in main loop of syslogger process."

12a.
Maybe those descriptions can be simplified a bit?

SUGGESTION
REPL_SLOTSYNC_MAIN "Waiting in the main loop of slot-sync worker."
REPL_SLOTSYNC_PRIMARY_CATCHP "Waiting for the primary to catch up, in
slot-sync worker."

~

12b.
typo?

/REPL_SLOTSYNC_PRIMARY_CATCHP/REPL_SLOTSYNC_PRIMARY_CATCHUP/

======
src/include/replication/walreceiver.h

13. WalRcvRepSlotDbData

+/*
+ * Slot's DBid related data
+ */
+typedef struct WalRcvRepSlotDbData
+{
+ Oid database; /* Slot's DBid received from remote */
+} WalRcvRepSlotDbData;

Just calling this new field 'database' seems odd. Searching PG src I
found typical fields/variables like this one are called 'databaseid',
or 'dboid', or 'dbid', or 'db_id' etc.

======
Kind Regards,
Peter Smith.
Fujitsu Australia



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: Add a new BGWORKER_BYPASS_ROLELOGINCHECK flag
Следующее
От: "Drouvot, Bertrand"
Дата:
Сообщение: Re: Add a new BGWORKER_BYPASS_ROLELOGINCHECK flag