Re: [HACKERS] make async slave to wait for lsn to be replayed

Поиск
Список
Период
Сортировка
От Bharath Rupireddy
Тема Re: [HACKERS] make async slave to wait for lsn to be replayed
Дата
Msg-id CALj2ACUhgYVHYH5JeNKro1v+SqmDcGyNHn0smWpsQMDLikrN=w@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] make async slave to wait for lsn to be replayed  (Alexander Korotkov <aekorotkov@gmail.com>)
Ответы Re: [HACKERS] make async slave to wait for lsn to be replayed  (Alexander Korotkov <aekorotkov@gmail.com>)
Список pgsql-hackers
On Sun, Mar 31, 2024 at 7:41 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
>
> Hi!

Thanks for the patch. I have a few comments on the v16 patch.

1. Is there any specific reason for pg_wal_replay_wait() being a
procedure rather than a function? I haven't read the full thread, but
I vaguely noticed the discussion on the new wait mechanism holding up
a snapshot or some other resource. Is that the reason to use a stored
procedure over a function? If yes, can we specify it somewhere in the
commit message and just before the procedure definition in
system_functions.sql?

2. Is the pg_wal_replay_wait first procedure that postgres provides
out of the box?

3. Defining a procedure for the first time in system_functions.sql
which is supposed to be for functions seems a bit unusual to me.

4.
+
+    endtime = TimestampTzPlusMilliseconds(GetCurrentTimestamp(), timeout);
+

+        if (timeout > 0)
+        {
+            delay_ms = (endtime - GetCurrentTimestamp()) / 1000;
+            latch_events |= WL_TIMEOUT;
+            if (delay_ms <= 0)
+                break;
+        }

Why is endtime calculated even for timeout <= 0 only to just skip it
later? Can't we just do a fastpath exit if timeout = 0 and targetLSN <

5.
 Parameter
+        <parameter>timeout</parameter> is the time in milliseconds to wait
+        for the <parameter>target_lsn</parameter>
+        replay. When <parameter>timeout</parameter> value equals to zero no
+        timeout is applied.
+        replay. When <parameter>timeout</parameter> value equals to zero no
+        timeout is applied.

It turns out to be "When timeout value equals to zero no timeout is
applied." I guess, we can just say something like the following which
I picked up from pg_terminate_backend timeout parameter description.

       <para>
        If <parameter>timeout</parameter> is not specified or zero, this
        function returns if  the WAL upto
<literal>target_lsn</literal>  is replayed.
        If the <parameter>timeout</parameter> is specified (in
        milliseconds) and greater than zero, the function waits until the
        server actually replays the WAL upto <literal>target_lsn</literal>  or
        until the given time has passed. On timeout, an error is emitted.
       </para></entry>

6.
+        ereport(ERROR,
+                (errcode(ERRCODE_QUERY_CANCELED),
+                 errmsg("canceling waiting for LSN due to timeout")));

We can be a bit more informative here and say targetLSN and currentLSN
something like - "timed out while waiting for target LSN %X/%X to be
replayed; current LSN %X/%X"?

7.
+    if (context->atomic)
+        ereport(ERROR,
+                (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+                 errmsg("pg_wal_replay_wait() must be only called in
non-atomic context")));
+

Can we say what a "non-atomic context '' is in a user understandable
way like explicit txn or whatever that might  be? "non-atomic context'
might not sound great to the end -user.

8.
+    the <literal>movie</literal> table and get the <acronym>lsn</acronym> after
+    changes just made.  This example uses
<function>pg_current_wal_insert_lsn</function>
+    to get the <acronym>lsn</acronym> given that
<varname>synchronous_commit</varname>
+    could be set to <literal>off</literal>.

Can we just mention that run pg_current_wal_insert_lsn on the primary?

9. To me the following query blocks even though I didn't mention timeout.
CALL pg_wal_replay_wait('0/fffffff');

10. Can't we do some input validation on the timeout parameter and
emit an error for negative values just like pg_terminate_backend?
CALL pg_wal_replay_wait('0/ffffff', -100);

11.
+
+        if (timeout > 0)
+        {
+            delay_ms = (endtime - GetCurrentTimestamp()) / 1000;
+            latch_events |= WL_TIMEOUT;
+            if (delay_ms <= 0)
+                break;
+        }
+

Can we avoid calling GetCurrentTimestamp in a for loop which can be
costly at times especially when pg_wal_replay_wait is called with
larger timeouts on multiple backends? Can't we reuse
pg_terminate_backend's timeout logic in
pg_wait_until_termination, perhaps reducing waittime to 1msec or so?

12. Why should we let every user run pg_wal_replay_wait procedure?
Can't we revoke execute from the public in system_functions.sql so
that one can decide who to run this function? Per comment #11, one can
easily cause a lot of activity by running this function on hundreds of
sessions.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: pg_combinebackup --copy-file-range
Следующее
От: Pavel Stehule
Дата:
Сообщение: Re: Schema variables - new implementation for Postgres 15