Re: Synchronizing slots from primary to standby

Поиск
Список
Период
Сортировка
От Peter Smith
Тема Re: Synchronizing slots from primary to standby
Дата
Msg-id CAHut+PtRJ_4x3re0bPn791PTL6kc2TRm1A2EPY1kjTCax_9F=A@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Synchronizing slots from primary to standby  (shveta malik <shveta.malik@gmail.com>)
Ответы Re: Synchronizing slots from primary to standby  (shveta malik <shveta.malik@gmail.com>)
Список pgsql-hackers
Here are some review comments for v750002.

(this is a WIP but this is what I found so far...)

======
doc/src/sgml/protocol.sgml

1.
> > 2. ALTER_REPLICATION_SLOT slot_name ( option [, ...] ) #
> >
> >           <para>
> > -          If true, the slot is enabled to be synced to the standbys.
> > +          If true, the slot is enabled to be synced to the standbys
> > +          so that logical replication can be resumed after failover.
> >           </para>
> >
> > This also should have the sentence "The default is false.", e.g. the
> > same as the same option in CREATE_REPLICATION_SLOT says.
>
> I have not added this. I feel the default value related details should
> be present in the 'CREATE' part, it is not meaningful for the "ALTER"
> part. ALTER does not have any defaults, it just modifies the options
> given by the user.

You are correct. My mistake.

======
src/backend/postmaster/bgworker.c

2.
 #include "replication/logicalworker.h"
+#include "replication/worker_internal.h"
 #include "storage/dsm.h"

Is this change needed when the rest of the code is removed?

======
src/backend/replication/logical/slotsync.c

3. synchronize_one_slot

> > 3.
> > + /*
> > + * Make sure that concerned WAL is received and flushed before syncing
> > + * slot to target lsn received from the primary server.
> > + *
> > + * This check will never pass if on the primary server, user has
> > + * configured standby_slot_names GUC correctly, otherwise this can hit
> > + * frequently.
> > + */
> > + latestFlushPtr = GetStandbyFlushRecPtr(NULL);
> > + if (remote_slot->confirmed_lsn > latestFlushPtr)
> >
> > BEFORE
> > This check will never pass if on the primary server, user has
> > configured standby_slot_names GUC correctly, otherwise this can hit
> > frequently.
> >
> > SUGGESTION (simpler way to say the same thing?)
> > This will always be the case unless the standby_slot_names GUC is not
> > correctly configured on the primary server.
>
> It is not true. It will not hit this condition "always" but has higher
> chances to hit it when standby_slot_names is not configured. I think
> you meant 'unless the standby_slot_names GUC is correctly configured'.
> I feel the current comment gives clear info (less confusing) and thus
> I have not changed it for the time being. I can consider if I get more
> comments there.

Hmm. I meant what I wrote. The "This" of my suggested text refers to
the previous sentence in the comment (not about "hitting" ?? your
condition).

TBH, regardless of the wording you choose, I think it will be much
clearer to move the comment to be inside the if.

SUGGESTION
/*
 * Make sure that concerned WAL is received and flushed before syncing
 * slot to target lsn received from the primary server.
 */
latestFlushPtr = GetStandbyFlushRecPtr(NULL);
if (remote_slot->confirmed_lsn > latestFlushPtr)
{
  /*
   * Can get here only when if GUC 'standby_slot_names' on the primary
   * server was not configured correctly.
   */
...
}

~~~

4.
+static bool
+validate_parameters_and_get_dbname(char **dbname)
+{
+ /*
+ * A physical replication slot(primary_slot_name) is required on the
+ * primary to ensure that the rows needed by the standby are not removed
+ * after restarting, so that the synchronized slot on the standby will not
+ * be invalidated.
+ */
+ if (PrimarySlotName == NULL || *PrimarySlotName == '\0')
+ {
+ ereport(LOG,
+ /* translator: %s is a GUC variable name */
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("bad configuration for slot synchronization"),
+ errhint("\"%s\" must be defined.", "primary_slot_name"));
+ return false;
+ }
+
+ /*
+ * hot_standby_feedback must be enabled to cooperate with the physical
+ * replication slot, which allows informing the primary about the xmin and
+ * catalog_xmin values on the standby.
+ */
+ if (!hot_standby_feedback)
+ {
+ ereport(LOG,
+ /* translator: %s is a GUC variable name */
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("bad configuration for slot synchronization"),
+ errhint("\"%s\" must be enabled.", "hot_standby_feedback"));
+ return false;
+ }
+
+ /*
+ * Logical decoding requires wal_level >= logical and we currently only
+ * synchronize logical slots.
+ */
+ if (wal_level < WAL_LEVEL_LOGICAL)
+ {
+ ereport(LOG,
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("bad configuration for slot synchronization"),
+ errhint("\"wal_level\" must be >= logical."));
+ return false;
+ }
+
+ /*
+ * The primary_conninfo is required to make connection to primary for
+ * getting slots information.
+ */
+ if (PrimaryConnInfo == NULL || *PrimaryConnInfo == '\0')
+ {
+ ereport(LOG,
+ /* translator: %s is a GUC variable name */
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("bad configuration for slot synchronization"),
+ errhint("\"%s\" must be defined.", "primary_conninfo"));
+ return false;
+ }
+
+ /*
+ * The slot sync worker needs a database connection for walrcv_exec to
+ * work.
+ */
+ *dbname = walrcv_get_dbname_from_conninfo(PrimaryConnInfo);
+ if (*dbname == NULL)
+ {
+ ereport(LOG,
+
+ /*
+ * translator: 'dbname' is a specific option; %s is a GUC variable
+ * name
+ */
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("bad configuration for slot synchronization"),
+ errhint("'dbname' must be specified in \"%s\".", "primary_conninfo"));
+ return false;
+ }
+
+ return true;
+}

I wonder if it is better to log all the problems in one go instead of
making users stumble onto them one at a time after fixing one and then
hitting the next problem. e.g. just set some variable "all_ok =
false;" each time instead of all the "return false;"

Then at the end of the function just "return all_ok;"

======
Kind Regards,
Peter Smith.
Fujitsu Australia



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Masahiko Sawada
Дата:
Сообщение: Re: Commitfest 2024-01 is now closed
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: Synchronizing slots from primary to standby