Re: Synchronizing slots from primary to standby

Поиск
Список
Период
Сортировка
От shveta malik
Тема Re: Synchronizing slots from primary to standby
Дата
Msg-id CAJpy0uCDzWfD_eazk_ayDdC8XOz6FfMy7E7rJEkpC7bYeUxx8Q@mail.gmail.com
обсуждение исходный текст
Ответ на RE: Synchronizing slots from primary to standby  ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>)
Список pgsql-hackers
On Thu, Dec 21, 2023 at 6:37 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Shveta,
>
> Thanks for updating the patch! Here is my comments for v52-0002.

Thanks for the feedback Kuroda-san. I have addressed these in v53.

> ~~~~~
> system-views.sgml
>
> 01.
>
> ```
> +
> +     <row>
> +      <entry role="catalog_table_entry"><para role="column_definition">
> +       <structfield>sync_state</structfield> <type>char</type>
> +      </para>
> +      <para>
> +      Defines slot synchronization state. This is meaningful on the physical
> +      standby which has configured <xref linkend="guc-enable-syncslot"/> = true.
> +      Possible values are:
> +       <itemizedlist>
> +        <listitem>
> +         <para><literal>n</literal> = none for user created slots,
> ...
> ```
>
> Hmm. I'm not sure why we must show a single character to a user. I'm OK for
> pg_subscription.srsubstate because it is a "catalog" - the actual value would be
> recorded in the heap. But pg_replication_slot is just a view so that we can replace
> internal representations to other strings. E.g., pg_replication_slots.wal_status.
> How about using {none, initialized, ready} or something?

Done.

> ~~~~~
> postmaster.c
>
> 02. bgworker_should_start_now
>
> ```
> +            if (start_time == BgWorkerStart_ConsistentState_HotStandby &&
> +                     pmState != PM_RUN)
> +                return true;
> ```
>
> I'm not sure the second condition is really needed. The line will be executed when
> pmState is PM_HOT_STANDBY. Is there a possibility that pmState is changed around here?

'case PM_RUN:' is a fall-through and thus we need to have this second
condition under 'case PM_HOT_STANDBY' for
BgWorkerStart_ConsistentState_HotStandby to avoid the worker getting
started on non-standby.

> ~~~~~
> libpqwalreceiver.c
>
> 03. PQWalReceiverFunctions
>
> ```
> +    .walrcv_get_dbname_from_conninfo = libpqrcv_get_dbname_from_conninfo,
> ```
>
> Just to confirm - is there a rule for ordering?

No, I think. I am not aware of any.

> ~~~~~
> slotsync.c
>
> 04. SlotSyncWorkerCtx
>
> ```
> typedef struct SlotSyncWorkerCtx
> {
>         pid_t           pid;
>         slock_t         mutex;
> } SlotSyncWorkerCtx;
>
> SlotSyncWorkerCtx *SlotSyncWorker = NULL;
> ```
>
> Per other files like launcher.c, should we use a name like "SlotSyncWorkerCtxStruct"?

Modified.

> 05. SlotSyncWorkerRegister()
>
> Your coding will work well, but there is another approach which validates
> slotsync parameters here. In this case, the postmaster should exit ASAP. This can
> notify that there are some wrong settings to users earlier. Thought?

I think the postmaster should not exit. IMO, slot-sync worker being a
child process of postmaster, should not control start or exit of
postmaster. The worker should only exit itself if slot-sync GUCs are
not set. Have you seen any other case where postmaster exits if any of
its bgworker processes has invalid GUCs?

> 06. wait_for_primary_slot_catchup
>
> ```
> +        CHECK_FOR_INTERRUPTS();
> +
> +        /* Handle any termination request if any */
> +        ProcessSlotSyncInterrupts(wrconn);
> ```
>
> ProcessSlotSyncInterrupts() also has CHECK_FOR_INTERRUPTS(), so no need to call.

yes, removed.

> 07. wait_for_primary_slot_catchup
>
> ```
> +        /*
> +         * XXX: Is waiting for 2 seconds before retrying enough or more or
> +         * less?
> +         */
> +        rc = WaitLatch(MyLatch,
> +                       WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
> +                       2000L,
> +                       WAIT_EVENT_REPL_SLOTSYNC_PRIMARY_CATCHUP);
> +
> +        ResetLatch(MyLatch);
> +
> +        /* Emergency bailout if postmaster has died */
> +        if (rc & WL_POSTMASTER_DEATH)
> +            proc_exit(1);
> ```
>
> Is there any reasons not to use WL_EXIT_ON_PM_DEATH event? If not, you can use.

I think we should use WL_EXIT_ON_PM_DEATH. Corrected now.

> 08. synchronize_slots
>
> ```
> +    SpinLockAcquire(&WalRcv->mutex);
> +    if (!WalRcv ||
> +        (WalRcv->slotname[0] == '\0') ||
> +        XLogRecPtrIsInvalid(WalRcv->latestWalEnd))
> +    {
> ...
> ```
>
> Assuming that WalRcv is still NULL. In this case, does the first SpinLockAcquire()
> lead a segmentation fault?

It may. Thanks for pointing this out. Modified.

> 09. synchronize_slots
>
> ```
> +    elog(DEBUG2, "slot sync worker's query:%s \n", s.data);
> ```
>
> The query is not dynamical one, so I think no need to print even if the debug
> mode.

Okay. Removed.

> 10. synchronize_one_slot
>
> IIUC, this function can synchronize slots even if the used plugin on primary is
> not installed on the secondary server. If the slot is created by the slotsync
> worker, users will recognize it after the server is promoted and the decode is
> starting. I felt it is not good specification. Can we detect in the validation
> phase?

Noted the concern. Let me review more on this. I will revert back.

> ~~~~~
> not the source code
>
> 11.
>
> I tested the typical case - promoting a publisher from a below diagram.
> A physical replication slot "physical" was specified as standby_slot_names.
>
> ```
> node A (primary) --> node B (secondary)
> |
> |
> node C (subscriber)
> ```
>
> And after the promoting, below lines were periodically output on logfiles for
> node B and C.
>
> ```
> WARNING:  replication slot "physical" specified in parameter "standby_slot_names" does not exist, ignoring
> ```

It seems like you have set standby_slot_names on the standby, that is
why promoted standby is emitting this warning. It is not recommended
to set it on standby as it is the primary GUC.  Having said that, I
understand that even on primary, we may get this repeated warning if
standby_slot_names is not set correctly. This WARNING is intentional,
as the user should know that this setting is wrong. So I am not sure
if we should suppress this. I would like to know what others think on
this.

> Do you have idea to suppress the warning? IIUC it is a normal behavior of the
> walsender so that we cannot avoid the periodical outputs.
>
> The steps of the test was as follows:
>
> 1. stop the node A via pg_ctl stop
> 2. promota the node B via pg_ctl promote
> 3. change the connection string of the subscription via ALTER SUBSCRIPTION ... CONNECTION ...
>

thanks
Shveta



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Christoph Berg
Дата:
Сообщение: Re: Set log_lock_waits=on by default
Следующее
От: Heikki Linnakangas
Дата:
Сообщение: Re: Avoid computing ORDER BY junk columns unnecessarily