Обсуждение: Re: Implement waiting for wal lsn replay: reloaded

Поиск

Список

Период

Сортировка

Re: Implement waiting for wal lsn replay: reloaded

От

Yura Sokolov

Дата:

06 февраля 2025 г., 11:31:28

27.11.2024 07:08, Alexander Korotkov wrote:
> Present solution
> 
> The present patch implements a new utility command WAIT FOR LSN
> 'target_lsn' [, TIMEOUT 'timeout'][, THROW 'throw'].  Unlike previous
> attempts to implement custom syntax, it uses only one extra unreserved
> keyword.  The parameters are implemented as generic_option_list.
> 
> Custom syntax eliminates the problem of running within an empty
> transaction of REPEATABLE READ level or higher.  We don't need to
> lookup a system catalog.  Thus, we have to set a transaction snapshot.
> 
> Also, revising PlannedStmtRequiresSnapshot() allows us to avoid
> holding a snapshot to return a value.  Therefore, the WAIT command in
> the attached patch returns its result status.
> 
> Also, the attached patch explicitly checks if the standby has been
> promoted to throw the most relevant form of an error.  The issue of
> inaccurate error messages has been previously spotted in [5].
> 
> Any comments?

Good day, Alexander.

I briefly looked into patch and have couple of minor remarks:

1. I don't like `palloc` in the `WaitLSNWakeup`. I believe it wont issue
problems, but still don't like it. I'd prefer to see local fixed array, say
of 16 elements, and loop around remaining function body acting in batch of
16 wakeups. Doubtfully there will be more than 16 waiting clients often,
and even then it wont be much heavier than fetching all at once.

2. I'd move `inHeap` field between `procno` and `phNode` to fill the gap
between fields on 64bit platforms.
Well, I believe, it would be better to tweak `pairingheap_node` to make it
clear if it is in heap or not. But such change would be unrelated to
current patch's sense. So lets stick with `inHeap`, but move it a bit.

Non-code question: do you imagine for `WAIT` command reuse for other cases?
Is syntax rule in gram.y convenient enough for such reuse? I believe, `LSN`
is not part of syntax to not introduce new keyword. But is it correct way?
I have no answer or strong opinion.

Otherwise, the patch looks quite strong to me.

-------
regards
Yura Sokolov

Re: Implement waiting for wal lsn replay: reloaded

От

Yura Sokolov

Дата:

28 февраля 2025 г., 16:03:33

17.02.2025 00:27, Alexander Korotkov wrote:
> On Thu, Feb 6, 2025 at 10:31 AM Yura Sokolov <y.sokolov@postgrespro.ru> wrote:
>> I briefly looked into patch and have couple of minor remarks:
>>
>> 1. I don't like `palloc` in the `WaitLSNWakeup`. I believe it wont issue
>> problems, but still don't like it. I'd prefer to see local fixed array, say
>> of 16 elements, and loop around remaining function body acting in batch of
>> 16 wakeups. Doubtfully there will be more than 16 waiting clients often,
>> and even then it wont be much heavier than fetching all at once.
> 
> OK, I've refactored this to use static array of 16 size.  palloc() is
> used only if we don't fit static array.

I've rebased patch and:
- fixed compiler warning in wait.c ("maybe uninitialized 'result'").
- made a loop without call to palloc in WaitLSNWakeup. It is with "goto" to
keep indentation, perhaps `do {} while` would be better?

-------
regards
Yura Sokolov aka funny-falcon

Вложения

v3-0001-Implement-WAIT-FOR-command.patch

Re: Implement waiting for wal lsn replay: reloaded

От

Yura Sokolov

Дата:

28 февраля 2025 г., 16:55:21

28.02.2025 16:03, Yura Sokolov пишет:
> 17.02.2025 00:27, Alexander Korotkov wrote:
>> On Thu, Feb 6, 2025 at 10:31 AM Yura Sokolov <y.sokolov@postgrespro.ru> wrote:
>>> I briefly looked into patch and have couple of minor remarks:
>>>
>>> 1. I don't like `palloc` in the `WaitLSNWakeup`. I believe it wont issue
>>> problems, but still don't like it. I'd prefer to see local fixed array, say
>>> of 16 elements, and loop around remaining function body acting in batch of
>>> 16 wakeups. Doubtfully there will be more than 16 waiting clients often,
>>> and even then it wont be much heavier than fetching all at once.
>>
>> OK, I've refactored this to use static array of 16 size.  palloc() is
>> used only if we don't fit static array.
> 
> I've rebased patch and:
> - fixed compiler warning in wait.c ("maybe uninitialized 'result'").
> - made a loop without call to palloc in WaitLSNWakeup. It is with "goto" to
> keep indentation, perhaps `do {} while` would be better?

And fixed:
   'WAIT' is marked as BARE_LABEL in kwlist.h, but it is missing from
gram.y's bare_label_keyword rule

-------
regards
Yura Sokolov aka funny-falcon

Вложения

v4-0001-Implement-WAIT-FOR-command.patch

Re: Implement waiting for wal lsn replay: reloaded

От

Alexander Korotkov

Дата:

10 марта 2025 г., 14:30:31

On Fri, Feb 28, 2025 at 3:55 PM Yura Sokolov <y.sokolov@postgrespro.ru> wrote:
> 28.02.2025 16:03, Yura Sokolov пишет:
> > 17.02.2025 00:27, Alexander Korotkov wrote:
> >> On Thu, Feb 6, 2025 at 10:31 AM Yura Sokolov <y.sokolov@postgrespro.ru> wrote:
> >>> I briefly looked into patch and have couple of minor remarks:
> >>>
> >>> 1. I don't like `palloc` in the `WaitLSNWakeup`. I believe it wont issue
> >>> problems, but still don't like it. I'd prefer to see local fixed array, say
> >>> of 16 elements, and loop around remaining function body acting in batch of
> >>> 16 wakeups. Doubtfully there will be more than 16 waiting clients often,
> >>> and even then it wont be much heavier than fetching all at once.
> >>
> >> OK, I've refactored this to use static array of 16 size.  palloc() is
> >> used only if we don't fit static array.
> >
> > I've rebased patch and:
> > - fixed compiler warning in wait.c ("maybe uninitialized 'result'").
> > - made a loop without call to palloc in WaitLSNWakeup. It is with "goto" to
> > keep indentation, perhaps `do {} while` would be better?
>
> And fixed:
>    'WAIT' is marked as BARE_LABEL in kwlist.h, but it is missing from
> gram.y's bare_label_keyword rule

Thank you, Yura.  I've further revised the patch.  Mostly added the
documentation including SQL command reference and few paragraphs in
the high availability chapter explaining the read-your-writes
consistency concept.

------
Regards,
Alexander Korotkov
Supabase

Вложения

v5-0001-Implement-WAIT-FOR-command.patch

Re: Implement waiting for wal lsn replay: reloaded

От

Yura Sokolov

Дата:

12 марта 2025 г., 17:44:28

10.03.2025 14:30, Alexander Korotkov пишет:
> On Fri, Feb 28, 2025 at 3:55 PM Yura Sokolov <y.sokolov@postgrespro.ru> wrote:
>> 28.02.2025 16:03, Yura Sokolov пишет:
>>> 17.02.2025 00:27, Alexander Korotkov wrote:
>>>> On Thu, Feb 6, 2025 at 10:31 AM Yura Sokolov <y.sokolov@postgrespro.ru> wrote:
>>>>> I briefly looked into patch and have couple of minor remarks:
>>>>>
>>>>> 1. I don't like `palloc` in the `WaitLSNWakeup`. I believe it wont issue
>>>>> problems, but still don't like it. I'd prefer to see local fixed array, say
>>>>> of 16 elements, and loop around remaining function body acting in batch of
>>>>> 16 wakeups. Doubtfully there will be more than 16 waiting clients often,
>>>>> and even then it wont be much heavier than fetching all at once.
>>>>
>>>> OK, I've refactored this to use static array of 16 size.  palloc() is
>>>> used only if we don't fit static array.
>>>
>>> I've rebased patch and:
>>> - fixed compiler warning in wait.c ("maybe uninitialized 'result'").
>>> - made a loop without call to palloc in WaitLSNWakeup. It is with "goto" to
>>> keep indentation, perhaps `do {} while` would be better?
>>
>> And fixed:
>>    'WAIT' is marked as BARE_LABEL in kwlist.h, but it is missing from
>> gram.y's bare_label_keyword rule
> 
> Thank you, Yura.  I've further revised the patch.  Mostly added the
> documentation including SQL command reference and few paragraphs in
> the high availability chapter explaining the read-your-writes
> consistency concept.

Good day, Alexander.

Looking "for the last time" to the patch I found there remains
`pg_wal_replay_wait` function in documentation and one comment.
So I fixed it in documentation, and removed sentence from comment.

Otherwise v6 is just rebased v5.

-------
regards
Yura Sokolov aka funny-falcon

Вложения

v6-0001-Implement-WAIT-FOR-command.patch

Re: Implement waiting for wal lsn replay: reloaded

От

Tomas Vondra

Дата:

13 марта 2025 г., 17:15:01

Hi,

I did a quick look at this patch. I haven't found any correctness
issues, but I have some general review comments and questions about the
grammar / syntax.

1) The sgml docs don't really show the syntax very nicely, it only shows
this at the beginning of wait_for.sgml:

   WAIT FOR ( <replaceable class="parameter">parameter</replaceable>
'<replaceable class="parameter">value</replaceable>' [, ... ] ) ]

I kinda understand this comes from using the generic option list (I'll
get to that shortly), but I think it'd be much better to actually show
the "full" syntax here, instead of leaving the "parameters" to later.


2) The syntax description suggests "(" and ")" are required, but that
does not seem to be the case - in fact, it's not even optional, and when
I try using that, I get syntax error.


3) I have my doubts about using the generic_option_list for this. Yes, I
understand this allows using fewer reserved keywords, but it leads to
some weirdness and I'm not sure it's worth it. Not sure what the right
trade off is here.

Anyway, some examples of the weird stuff implied by this approach:

- it forces "," between the options, which is a clear difference from
what we do for every other command

- it forces everything to be a string, i.e. you can' say "TIMEOUT 10",
it has to be "TIMEOUT '10'"

I don't have a very strong opinion on this, but the result seems a bit
strange to me.


4) I'm not sure I understand the motivation of the "throw false" mode,
and I'm not sure I understand this description in the sgml docs:

    On timeout, or if the server is promoted before
    <parameter>lsn</parameter> is reached, an error is emitted,
    as soon as <parameter>throw</parameter> is not specified or set to
    true.
    If <parameter>throw</parameter> is set to false, then the command
    doesn't throw errors.

I find it a bit confusing. What is the use case for this mode?


5) One place in the docs says:

      The target log sequence number to wait for.

   Thie is literally the only place using "log sequence number" in our
   code base, I'd just use "LSN" just like every other place.


6) The docs for the TIMEOUT parameter say this:

   <varlistentry>
    <term><replaceable class="parameter">timeout</replaceable></term>
    <listitem>
     <para>
      When specified and greater than zero, the command waits until
      <parameter>lsn</parameter> is reached or the specified
      <parameter>timeout</parameter> has elapsed.  Must be a non-
      negative integer, the default is zero.
     </para>
    </listitem>
   </varlistentry>

   That doesn't say what unit does the option use. Is is seconds,
   milliseconds or what?

   In fact, it'd be nice to let users specify that in the value, similar
   to other options (e.g. SET statement_timeout = '10s').


7) One place in the docs says this:

    That is, after this function execution, the value returned by
    <function>pg_last_wal_replay_lsn</function> should be greater ...

  I think the reference to "function execution" is obsolete?


8) I find this confusing:

    However, if <command>WAIT FOR</command> is
    called on primary promoted from standby and <literal>lsn</literal>
    was already replayed, then the <command>WAIT FOR</command> command
    just exits immediately.

  Does this mean running the WAIT command on a primary (after it was
  already promoted) will exit immediately? Why does it matter that it
  was promoted from a standby? Shouldn't it exit immediately even for
  a standalone instance?


9) xlogwait.c

I think this should start with a basic "design" description of how the
wait is implemented, in a comment at the top of the file. That is, what
we keep in the shared memory, what happens during a wait, how it uses
the pairing heap, etc. After reading this comment I should understand
how it all fits together.


10) WaitForLSNReplay / WaitLSNWakeup

I think the function comment should document the important stuff (e.g.
return values for various situations, how it groups waiters into chunks
of 16 elements during wakeup, ...).


11) WaitLSNProcInfo / WaitLSNState

Does this need to be exposed in xlogwait.h? These structs seem private
to xlogwait.c, so maybe declare it there?


regards

-- 
Tomas Vondra

Re: Implement waiting for wal lsn replay: reloaded

От

vignesh C

Дата:

16 марта 2025 г., 16:32:11

On Wed, 12 Mar 2025 at 20:14, Yura Sokolov <y.sokolov@postgrespro.ru> wrote:
>
> Otherwise v6 is just rebased v5.

I noticed that Tomas's comments from [1] are not yet addressed, I have
changed the commitfest status to Waiting on Author, please address
them and update it to Needs review.
[1] - https://www.postgresql.org/message-id/09a98dc9-eeb1-471d-b990-072513c3d584@vondra.me

Regards,
Vignesh

Re: Implement waiting for wal lsn replay: reloaded

От

Alexander Korotkov

Дата:

29 апреля 2025 г., 14:27:25

Hi, Tomas.

Thank you so much for your review!  Please find the revised patchset.

On Thu, Mar 13, 2025 at 4:15 PM Tomas Vondra <tomas@vondra.me> wrote:
> I did a quick look at this patch. I haven't found any correctness
> issues, but I have some general review comments and questions about the
> grammar / syntax.
>
> 1) The sgml docs don't really show the syntax very nicely, it only shows
> this at the beginning of wait_for.sgml:
>
>    WAIT FOR ( <replaceable class="parameter">parameter</replaceable>
> '<replaceable class="parameter">value</replaceable>' [, ... ] ) ]
>
> I kinda understand this comes from using the generic option list (I'll
> get to that shortly), but I think it'd be much better to actually show
> the "full" syntax here, instead of leaving the "parameters" to later.

Sounds reasonable, changed to show the full syntax in the synopsis.

> 2) The syntax description suggests "(" and ")" are required, but that
> does not seem to be the case - in fact, it's not even optional, and when
> I try using that, I get syntax error.

Good catch, fixed.

> 3) I have my doubts about using the generic_option_list for this. Yes, I
> understand this allows using fewer reserved keywords, but it leads to
> some weirdness and I'm not sure it's worth it. Not sure what the right
> trade off is here.
>
> Anyway, some examples of the weird stuff implied by this approach:
>
> - it forces "," between the options, which is a clear difference from
> what we do for every other command
>
> - it forces everything to be a string, i.e. you can' say "TIMEOUT 10",
> it has to be "TIMEOUT '10'"
>
> I don't have a very strong opinion on this, but the result seems a bit
> strange to me.

I've improved the syntax.  I still tried to keep the number of new
keywords and grammar rules minimal.  That leads to moving some parser
login into wait.c.  This is probably a bit awkward, but saves our
grammar from bloat.  Let me know what do you think about this
approach.

> 4) I'm not sure I understand the motivation of the "throw false" mode,
> and I'm not sure I understand this description in the sgml docs:
>
>     On timeout, or if the server is promoted before
>     <parameter>lsn</parameter> is reached, an error is emitted,
>     as soon as <parameter>throw</parameter> is not specified or set to
>     true.
>     If <parameter>throw</parameter> is set to false, then the command
>     doesn't throw errors.
>
> I find it a bit confusing. What is the use case for this mode?

The idea here is that application could do some handling of these
errors without having to parse the error messages (parsing error
messages is inconvenient because of localization etc).

> 5) One place in the docs says:
>
>       The target log sequence number to wait for.
>
>    Thie is literally the only place using "log sequence number" in our
>    code base, I'd just use "LSN" just like every other place.

OK fixed.

> 6) The docs for the TIMEOUT parameter say this:
>
>    <varlistentry>
>     <term><replaceable class="parameter">timeout</replaceable></term>
>     <listitem>
>      <para>
>       When specified and greater than zero, the command waits until
>       <parameter>lsn</parameter> is reached or the specified
>       <parameter>timeout</parameter> has elapsed.  Must be a non-
>       negative integer, the default is zero.
>      </para>
>     </listitem>
>    </varlistentry>
>
>    That doesn't say what unit does the option use. Is is seconds,
>    milliseconds or what?
>
>    In fact, it'd be nice to let users specify that in the value, similar
>    to other options (e.g. SET statement_timeout = '10s').

The default unit of milliseconds is specified.  Also, an alternative
way to specify timeout is now supported.  Timeout might be a string
literal consisting of numeric and unit specifier.

> 7) One place in the docs says this:
>
>     That is, after this function execution, the value returned by
>     <function>pg_last_wal_replay_lsn</function> should be greater ...
>
>   I think the reference to "function execution" is obsolete?

Actually, this is just the function, which reports current replay LSN,
not function introduced by previous version of this patch.  We refer
it to just express the constraint that LSN must be replayed after
execution of the command.

> 8) I find this confusing:
>
>     However, if <command>WAIT FOR</command> is
>     called on primary promoted from standby and <literal>lsn</literal>
>     was already replayed, then the <command>WAIT FOR</command> command
>     just exits immediately.
>
>   Does this mean running the WAIT command on a primary (after it was
>   already promoted) will exit immediately? Why does it matter that it
>   was promoted from a standby? Shouldn't it exit immediately even for
>   a standalone instance?

I think the previous sentence should give an idea that otherwise error
gets thrown.  That also happens immediately for sure.

> 9) xlogwait.c
>
> I think this should start with a basic "design" description of how the
> wait is implemented, in a comment at the top of the file. That is, what
> we keep in the shared memory, what happens during a wait, how it uses
> the pairing heap, etc. After reading this comment I should understand
> how it all fits together.

OK, I've added the header comment.

> 10) WaitForLSNReplay / WaitLSNWakeup
>
> I think the function comment should document the important stuff (e.g.
> return values for various situations, how it groups waiters into chunks
> of 16 elements during wakeup, ...).

Revised header comments for those functions too.

> 11) WaitLSNProcInfo / WaitLSNState
>
> Does this need to be exposed in xlogwait.h? These structs seem private
> to xlogwait.c, so maybe declare it there?

Hmm, I don't remember why I moved them to xlogwait.h.  OK, moved them
back to xlogwait.c.


------
Regards,
Alexander Korotkov
Supabase

Вложения

v6-0001-Implement-WAIT-FOR-command.patch

Re: Implement waiting for wal lsn replay: reloaded

От

Álvaro Herrera

Дата:

05 августа 2025 г., 16:47:07

On 2025-Apr-29, Alexander Korotkov wrote:

> > 11) WaitLSNProcInfo / WaitLSNState
> >
> > Does this need to be exposed in xlogwait.h? These structs seem private
> > to xlogwait.c, so maybe declare it there?
> 
> Hmm, I don't remember why I moved them to xlogwait.h.  OK, moved them
> back to xlogwait.c.

This change made the code no longer compile, because
WaitLSNState->minWaitedLSN is used in xlogrecovery.c which no longer has
access to the field definition.  A rebased version with that change
reverted is attached.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
Thou shalt study thy libraries and strive not to reinvent them without
cause, that thy code may be short and readable and thy days pleasant
and productive. (7th Commandment for C Programmers)

Вложения

v7-0001-Implement-WAIT-FOR-command.patch

Re: Implement waiting for wal lsn replay: reloaded

От

Xuneng Zhou

Дата:

07 августа 2025 г., 18:00:50

Hi,

Thanks for working on this.

I’ve just come across this thread and haven’t had a chance to dig into
the patch yet, but I’m keen to review it soon. In the meantime, I have
a quick question: is WAIT FOR REPLY intended mainly for user-defined
functions, or can internal code invoke it as well?

During a recent performance run [1] I noticed heavy polling in
read_local_xlog_page_guts(). Heikki’s comment from a few months ago
also hints that we could replace this check–sleep–repeat loop with the
condition-variable (CV) infrastructure used by walsender:

/*
* Loop waiting for xlog to be available if necessary
*
* TODO: The walsender has its own version of this function, which uses a
* condition variable to wake up whenever WAL is flushed. We could use the
* same infrastructure here, instead of the check/sleep/repeat style of
* loop.
*/

Because read_local_xlog_page_guts() waits for a specific flush or
replay LSN, polling becomes inefficient when the wait is long. I built
a POC patch that swaps polling for CVs, but a single global CV (or
even separate “flush” and “replay” CVs) isn’t ideal:

The wake-up routines don’t know which LSN each waiter cares about, so
they’d have to broadcast on every flush/replay. Caching the minimum
outstanding LSN could reduce spuriously awakened waiters, yet wouldn’t
eliminate them—multiple backends might wait for different LSNs
simultaneously. A more precise solution would require a request queue
that maps waiters to target LSNs and issues targeted wake-ups, adding
complexity.

Walsender accepts the potential broadcast overhead by using two cvs
for different waiters, so it might be acceptable for
read_local_xlog_page_guts() as well. However, if WAIT FOR REPLY
becomes available to backend code, we might leverage it to eliminate
the polling for waiting replay in read_local_xlog_page_guts() without
introducing a bespoke dispatcher. I’d appreciate any thoughts on
whether that use case is in scope.

Best,
Xuneng

[1] https://www.postgresql.org/message-id/CABPTF7VuFYm9TtA9vY8ZtS77qsT+yL_HtSDxUFnW3XsdB5b9ew@mail.gmail.com

Re: Implement waiting for wal lsn replay: reloaded

От

Alexander Korotkov

Дата:

08 августа 2025 г., 09:54:18

Hello, Álvaro!

On Wed, Aug 6, 2025 at 6:01 AM Álvaro Herrera <alvherre@kurilemu.de> wrote:
>
> On 2025-Apr-29, Alexander Korotkov wrote:
>
> > > 11) WaitLSNProcInfo / WaitLSNState
> > >
> > > Does this need to be exposed in xlogwait.h? These structs seem private
> > > to xlogwait.c, so maybe declare it there?
> >
> > Hmm, I don't remember why I moved them to xlogwait.h.  OK, moved them
> > back to xlogwait.c.
>
> This change made the code no longer compile, because
> WaitLSNState->minWaitedLSN is used in xlogrecovery.c which no longer has
> access to the field definition.  A rebased version with that change
> reverted is attached.

Thank you!  The rebased version looks correct for me.

------
Regards,
Alexander Korotkov
Supabase

Re: Implement waiting for wal lsn replay: reloaded

От

Alexander Korotkov

Дата:

08 августа 2025 г., 10:08:49

Hi, Xuneng Zhou!

On Thu, Aug 7, 2025 at 6:01 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> Thanks for working on this.
>
> I’ve just come across this thread and haven’t had a chance to dig into
> the patch yet, but I’m keen to review it soon.

Great.  Thank you for your attention to this patch.  I appreciate your
intention to review it.

> In the meantime, I have
> a quick question: is WAIT FOR REPLY intended mainly for user-defined
> functions, or can internal code invoke it as well?

Currently, WaitForLSNReplay() is assumed to only be called from
backend, as corresponding shmem is allocated only per-backend.  But
there is absolutely no problem to tweak the patch to allocate shmem
for every Postgres process.  This would enable to call
WaitForLSNReplay() wherever it is needed.  There is only no problem to
extend this approach to support other kinds of LSNs not just replay
LSN.


> During a recent performance run [1] I noticed heavy polling in
> read_local_xlog_page_guts(). Heikki’s comment from a few months ago
> also hints that we could replace this check–sleep–repeat loop with the
> condition-variable (CV) infrastructure used by walsender:
>
> /*
>  * Loop waiting for xlog to be available if necessary
>  *
>  * TODO: The walsender has its own version of this function, which uses a
>  * condition variable to wake up whenever WAL is flushed. We could use the
>  * same infrastructure here, instead of the check/sleep/repeat style of
>  * loop.
>  */
>
> Because read_local_xlog_page_guts() waits for a specific flush or
> replay LSN, polling becomes inefficient when the wait is long. I built
> a POC patch that swaps polling for CVs, but a single global CV (or
> even separate “flush” and “replay” CVs) isn’t ideal:
>
> The wake-up routines don’t know which LSN each waiter cares about, so
> they’d have to broadcast on every flush/replay. Caching the minimum
> outstanding LSN could reduce spuriously awakened waiters, yet wouldn’t
> eliminate them—multiple backends might wait for different LSNs
> simultaneously. A more precise solution would require a request queue
> that maps waiters to target LSNs and issues targeted wake-ups, adding
> complexity.
>
> Walsender accepts the potential broadcast overhead by using two cvs
> for different waiters, so it might be acceptable for
> read_local_xlog_page_guts() as well. However, if WAIT FOR REPLY
> becomes available to backend code, we might leverage it to eliminate
> the polling for waiting replay in read_local_xlog_page_guts() without
> introducing a bespoke dispatcher. I’d appreciate any thoughts on
> whether that use case is in scope.

This looks like a great new use-case for facilities developed in this
patch!  I'll remove the restriction to use WaitForLSNReplay() only in
backend.  I think you can write a patch with additional pairing heap
for flush LSN and include that into thread about
read_local_xlog_page_guts() optimization.  Let me know if you need any
assistance.

------
Regards,
Alexander Korotkov
Supabase

Re: Implement waiting for wal lsn replay: reloaded

От

Xuneng Zhou

Дата:

09 августа 2025 г., 13:52:37

Hi Alexander!

> > In the meantime, I have
> > a quick question: is WAIT FOR REPLY intended mainly for user-defined
> > functions, or can internal code invoke it as well?
>
> Currently, WaitForLSNReplay() is assumed to only be called from
> backend, as corresponding shmem is allocated only per-backend.  But
> there is absolutely no problem to tweak the patch to allocate shmem
> for every Postgres process.  This would enable to call
> WaitForLSNReplay() wherever it is needed.  There is only no problem to
> extend this approach to support other kinds of LSNs not just replay
> LSN.

Thanks for extending the functionality of the Wait For Replay patch!

> This looks like a great new use-case for facilities developed in this
> patch!  I'll remove the restriction to use WaitForLSNReplay() only in
> backend.  I think you can write a patch with additional pairing heap
> for flush LSN and include that into thread about
> read_local_xlog_page_guts() optimization.  Let me know if you need any
> assistance.

This could be a more elegant approach which would solve the polling
issue well. I'll prepare a follow-up patch for it.

Best,
Xuneng

Re: Implement waiting for wal lsn replay: reloaded

От

Xuneng Zhou

Дата:

09 августа 2025 г., 14:27:25

Hi,

> On Thu, Aug 7, 2025 at 6:01 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > Thanks for working on this.
> >
> > I’ve just come across this thread and haven’t had a chance to dig into
> > the patch yet, but I’m keen to review it soon.
>
> Great.  Thank you for your attention to this patch.  I appreciate your
> intention to review it.

I did a quick pass over v7. There are a few thoughts to share—mostly
around documentation, build, and tests, plus some minor nits. The core
logic looks solid to me. I’ll take a deeper look as I work on a
follow‑up patch to add waiting for flush LSNs. And the patch seems to
need rebase; it can't be applied to HEAD cleanly for now.

Build
1) Consider adding a comma in `src/test/recovery/meson.build` after
`'t/048_vacuum_horizon_floor.pl'` so the list remains valid.

Core code
2) It may be safer for `WaitLSNWakeup()` to assert against the stack array size:
) Perhaps `Assert(numWakeUpProcs < WAKEUP_PROC_STATIC_ARRAY_SIZE);`
rather than `MaxBackends`.
For option parsing UX in `wait.c`, we might prefer:
3) Using `ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR),
errmsg(...)))` instead of `elog(ERROR, ...)` for consistency and
translatability.
4) Explicitly rejecting duplicate `LSN`/`TIMEOUT` options with a syntax error.
5) The result column label could align better with other utility
outputs if shortened to `status` (lowercase, no space).
6) After `parse_real()`, it could help to validate/clamp the timeout
to avoid overflow when converting to `int64` and when passing a `long`
to `WaitLatch()`.
7) If `nodes/print.h` in `src/backend/commands/wait.c` isn’t used, we
might drop the include.
8) A couple of comment nits: “do it this outside” → “do this outside”.

Tests
9) We might consider adding cases for:
- Negative `TIMEOUT` (to exercise the error path).
- Syntax errors (unknown option; duplicate `LSN`/`TIMEOUT`; missing `LSN`).

Documentation
`doc/src/sgml/ref/wait_for.sgml`
10) The index term could be updated to `<primary>WAIT FOR</primary>`.
11) The synopsis might read more clearly as:
- WAIT FOR LSN '<lsn>' [ TIMEOUT <milliseconds |
'duration-with-units'> ] [ NO_THROW ]
12) The purpose line might be smoother as “wait for a target LSN to be
replayed, optionally with a timeout”.
13) Return values might use `<literal>` for `success`, `timeout`, `not
in recovery`.
14) Consistently calling this a “command” (rather than
function/procedure) could reduce confusion.
15) The example text might read more cleanly as “If the target LSN is
not reached before the timeout …”.

`doc/src/sgml/high-availability.sgml`
16) The sentence could read “However, it is possible to address this
without switching to synchronous replication.”

`src/backend/utils/activity/wait_event_names.txt`
17) The description for `WAIT_FOR_WAL_REPLAY` might be clearer as
“Waiting for WAL replay to reach a target LSN on a standby.”

Best,
Xuneng

Re: Implement waiting for wal lsn replay: reloaded

От

Xuneng Zhou

Дата:

27 августа 2025 г., 18:54:25

Hi all,

I did a rebase for the patch to v8 and incorporated a few changes:

1) Updated documentation, added new tests, and applied minor code
adjustments based on prior review comments.
2) Tweaked the initialization of waitReplayLSNState so that
non-backend processes can call wait for replay.

Started a new thread [1] and attached a patch addressing the polling
issue in the function
read_local_xlog_page_guts built on the infra of patch v8.

[1] https://www.postgresql.org/message-id/CABPTF7Vr99gZ5GM_ZYbYnd9MMnoVW3pukBEviVoHKRvJW-dE3g@mail.gmail.com

Feedbacks welcome.

Best,
Xuneng

Вложения

v8-0001-Implement-WAIT-FOR-command.patch

Re: Implement waiting for wal lsn replay: reloaded

От

Alexander Korotkov

Дата:

13 сентября 2025 г., 22:31:32

Hi, Xuneng!

On Wed, Aug 27, 2025 at 6:54 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> I did a rebase for the patch to v8 and incorporated a few changes:
>
> 1) Updated documentation, added new tests, and applied minor code
> adjustments based on prior review comments.
> 2) Tweaked the initialization of waitReplayLSNState so that
> non-backend processes can call wait for replay.
>
> Started a new thread [1] and attached a patch addressing the polling
> issue in the function
> read_local_xlog_page_guts built on the infra of patch v8.
>
> [1] https://www.postgresql.org/message-id/CABPTF7Vr99gZ5GM_ZYbYnd9MMnoVW3pukBEviVoHKRvJW-dE3g@mail.gmail.com
>
> Feedbacks welcome.

Thank you for your reviewing and revising this patch.

I see you've integrated most of your points expressed in [1].  I went
though them and I've integrated the rest of them.  Except this one.

> 11) The synopsis might read more clearly as:
> - WAIT FOR LSN '<lsn>' [ TIMEOUT <milliseconds | 'duration-with-units'> ] [ NO_THROW ]

I didn't find examples on how we do the similar things on other places
of docs.  This is why I decided to leave this place as it currently
is.

Also, I found some mess up with typedefs.list.  I've returned the
changes to typdefs.list back and re-indented the sources.

I'd like to ask your opinion of the way this feature is implemented in
terms of grammar: generic parsing implemented in gram.y and the rest
is done in wait.c.  I think this approach should minimize additional
keywords and states for parsing code.  This comes at the price of more
complex code in wait.c, but I think this is a fair price.

Links.
1. https://www.postgresql.org/message-id/CABPTF7VsoGDMBq34MpLrMSZyxNZvVbgH6-zxtJOg5AwOoYURbw%40mail.gmail.com

------
Regards,
Alexander Korotkov
Supabase

Вложения

v9-0001-Implement-WAIT-FOR-command.patch

Re: Implement waiting for wal lsn replay: reloaded

От

Xuneng Zhou

Дата:

14 сентября 2025 г., 16:51:21

Hi Alexander,

On Sun, Sep 14, 2025 at 3:31 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
>
> Hi, Xuneng!
>
> On Wed, Aug 27, 2025 at 6:54 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > I did a rebase for the patch to v8 and incorporated a few changes:
> >
> > 1) Updated documentation, added new tests, and applied minor code
> > adjustments based on prior review comments.
> > 2) Tweaked the initialization of waitReplayLSNState so that
> > non-backend processes can call wait for replay.
> >
> > Started a new thread [1] and attached a patch addressing the polling
> > issue in the function
> > read_local_xlog_page_guts built on the infra of patch v8.
> >
> > [1] https://www.postgresql.org/message-id/CABPTF7Vr99gZ5GM_ZYbYnd9MMnoVW3pukBEviVoHKRvJW-dE3g@mail.gmail.com
> >
> > Feedbacks welcome.
>
> Thank you for your reviewing and revising this patch.
>
> I see you've integrated most of your points expressed in [1].  I went
> though them and I've integrated the rest of them.  Except this one.
>
> > 11) The synopsis might read more clearly as:
> > - WAIT FOR LSN '<lsn>' [ TIMEOUT <milliseconds | 'duration-with-units'> ] [ NO_THROW ]
>
> I didn't find examples on how we do the similar things on other places
> of docs.  This is why I decided to leave this place as it currently
> is.

+1. I re-check other commands with similar parameter patterns, and
they follow the approach in v9.

>
> Also, I found some mess up with typedefs.list.  I've returned the
> changes to typdefs.list back and re-indented the sources.

 Thanks for catching and fixing that.

> I'd like to ask your opinion of the way this feature is implemented in
> terms of grammar: generic parsing implemented in gram.y and the rest
> is done in wait.c.  I think this approach should minimize additional
> keywords and states for parsing code.  This comes at the price of more
> complex code in wait.c, but I think this is a fair price.

It's LGTM. The same pattern is observed in VACUUM, EXPLAIN, and CREATE
PUBLICATION - all use minimal grammar rules that produce generic
option lists, with the actual interpretation done in their respective
implementation files. The moderate complexity in wait.c seems
acceptable.

Best,
Xuneng

Re: Implement waiting for wal lsn replay: reloaded

От

Alexander Korotkov

Дата:

15 сентября 2025 г., 21:59:42

Hi, Xuneng!

On Sun, Sep 14, 2025 at 4:51 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
>
> On Sun, Sep 14, 2025 at 3:31 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > On Wed, Aug 27, 2025 at 6:54 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > > I did a rebase for the patch to v8 and incorporated a few changes:
> > >
> > > 1) Updated documentation, added new tests, and applied minor code
> > > adjustments based on prior review comments.
> > > 2) Tweaked the initialization of waitReplayLSNState so that
> > > non-backend processes can call wait for replay.
> > >
> > > Started a new thread [1] and attached a patch addressing the polling
> > > issue in the function
> > > read_local_xlog_page_guts built on the infra of patch v8.
> > >
> > > [1] https://www.postgresql.org/message-id/CABPTF7Vr99gZ5GM_ZYbYnd9MMnoVW3pukBEviVoHKRvJW-dE3g@mail.gmail.com
> > >
> > > Feedbacks welcome.
> >
> > Thank you for your reviewing and revising this patch.
> >
> > I see you've integrated most of your points expressed in [1].  I went
> > though them and I've integrated the rest of them.  Except this one.
> >
> > > 11) The synopsis might read more clearly as:
> > > - WAIT FOR LSN '<lsn>' [ TIMEOUT <milliseconds | 'duration-with-units'> ] [ NO_THROW ]
> >
> > I didn't find examples on how we do the similar things on other places
> > of docs.  This is why I decided to leave this place as it currently
> > is.
>
> +1. I re-check other commands with similar parameter patterns, and
> they follow the approach in v9.
>
> >
> > Also, I found some mess up with typedefs.list.  I've returned the
> > changes to typdefs.list back and re-indented the sources.
>
>  Thanks for catching and fixing that.
>
> > I'd like to ask your opinion of the way this feature is implemented in
> > terms of grammar: generic parsing implemented in gram.y and the rest
> > is done in wait.c.  I think this approach should minimize additional
> > keywords and states for parsing code.  This comes at the price of more
> > complex code in wait.c, but I think this is a fair price.
>
> It's LGTM. The same pattern is observed in VACUUM, EXPLAIN, and CREATE
> PUBLICATION - all use minimal grammar rules that produce generic
> option lists, with the actual interpretation done in their respective
> implementation files. The moderate complexity in wait.c seems
> acceptable.

The attached revision of patch contains fix of the typo in the comment
you reported off-list.

------
Regards,
Alexander Korotkov
Supabase

Вложения

v10-0001-Implement-WAIT-FOR-command.patch

Re: Implement waiting for wal lsn replay: reloaded

От

Álvaro Herrera

Дата:

15 сентября 2025 г., 23:24:05

On 2025-Sep-15, Alexander Korotkov wrote:

> > It's LGTM. The same pattern is observed in VACUUM, EXPLAIN, and CREATE
> > PUBLICATION - all use minimal grammar rules that produce generic
> > option lists, with the actual interpretation done in their respective
> > implementation files. The moderate complexity in wait.c seems
> > acceptable.

Actually I find the code in ExecWaitStmt pretty unusual.  We tend to use
lists of DefElem (a name optionally followed by a value) instead of
individual scattered elements that must later be matched up.  Why not
use utility_option_list instead and then loop on the list of DefElems?
It'd be a lot simpler.

Also, we've found that failing to surround the options by parens leads
to pain down the road, so maybe add that.  Given that the LSN seems to
be mandatory, maybe make it something like

WAIT FOR LSN 'xy/zzy' [ WITH ( utility_option_list ) ]

This requires that you make LSN a keyword, albeit unreserved.  Or you
could make it
WAIT FOR Ident [the rest]
and then ensure in C that the identifier matches the word LSN, such as
we do for "permissive" and "restrictive" in
RowSecurityDefaultPermissive.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/

Re: Implement waiting for wal lsn replay: reloaded

От

Xuneng Zhou

Дата:

26 сентября 2025 г., 14:22:42

Hi Álvaro,

Thanks for your review.

On Tue, Sep 16, 2025 at 4:24 AM Álvaro Herrera <alvherre@kurilemu.de> wrote:
>
> On 2025-Sep-15, Alexander Korotkov wrote:
>
> > > It's LGTM. The same pattern is observed in VACUUM, EXPLAIN, and CREATE
> > > PUBLICATION - all use minimal grammar rules that produce generic
> > > option lists, with the actual interpretation done in their respective
> > > implementation files. The moderate complexity in wait.c seems
> > > acceptable.
>
> Actually I find the code in ExecWaitStmt pretty unusual.  We tend to use
> lists of DefElem (a name optionally followed by a value) instead of
> individual scattered elements that must later be matched up.  Why not
> use utility_option_list instead and then loop on the list of DefElems?
> It'd be a lot simpler.

I took a look at commands like VACUUM and EXPLAIN and they do follow
this pattern. v11 will make use of utility_option_list.

> Also, we've found that failing to surround the options by parens leads
> to pain down the road, so maybe add that.  Given that the LSN seems to
> be mandatory, maybe make it something like
>
> WAIT FOR LSN 'xy/zzy' [ WITH ( utility_option_list ) ]
>
> This requires that you make LSN a keyword, albeit unreserved.  Or you
> could make it
> WAIT FOR Ident [the rest]
> and then ensure in C that the identifier matches the word LSN, such as
> we do for "permissive" and "restrictive" in
> RowSecurityDefaultPermissive.

Shall make LSN an unreserved keyword as well.

Best,
Xuneng

Re: Implement waiting for wal lsn replay: reloaded

От

Xuneng Zhou

Дата:

28 сентября 2025 г., 12:02:43

Hi,

On Fri, Sep 26, 2025 at 7:22 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
>
> Hi Álvaro,
>
> Thanks for your review.
>
> On Tue, Sep 16, 2025 at 4:24 AM Álvaro Herrera <alvherre@kurilemu.de> wrote:
> >
> > On 2025-Sep-15, Alexander Korotkov wrote:
> >
> > > > It's LGTM. The same pattern is observed in VACUUM, EXPLAIN, and CREATE
> > > > PUBLICATION - all use minimal grammar rules that produce generic
> > > > option lists, with the actual interpretation done in their respective
> > > > implementation files. The moderate complexity in wait.c seems
> > > > acceptable.
> >
> > Actually I find the code in ExecWaitStmt pretty unusual.  We tend to use
> > lists of DefElem (a name optionally followed by a value) instead of
> > individual scattered elements that must later be matched up.  Why not
> > use utility_option_list instead and then loop on the list of DefElems?
> > It'd be a lot simpler.
>
> I took a look at commands like VACUUM and EXPLAIN and they do follow
> this pattern. v11 will make use of utility_option_list.
>
> > Also, we've found that failing to surround the options by parens leads
> > to pain down the road, so maybe add that.  Given that the LSN seems to
> > be mandatory, maybe make it something like
> >
> > WAIT FOR LSN 'xy/zzy' [ WITH ( utility_option_list ) ]
> >
> > This requires that you make LSN a keyword, albeit unreserved.  Or you
> > could make it
> > WAIT FOR Ident [the rest]
> > and then ensure in C that the identifier matches the word LSN, such as
> > we do for "permissive" and "restrictive" in
> > RowSecurityDefaultPermissive.
>
> Shall make LSN an unreserved keyword as well.

Here's the updated v11.  Many thanks Jian for off-list discussions and review.

Best,
Xuneng

Вложения

v11-0001-Implement-WAIT-FOR-command.patch

Re: Implement waiting for wal lsn replay: reloaded

От

Xuneng Zhou

Дата:

04 октября 2025 г., 04:35:32

Hi,

On Sun, Sep 28, 2025 at 5:02 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
>
> Hi,
>
> On Fri, Sep 26, 2025 at 7:22 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> >
> > Hi Álvaro,
> >
> > Thanks for your review.
> >
> > On Tue, Sep 16, 2025 at 4:24 AM Álvaro Herrera <alvherre@kurilemu.de> wrote:
> > >
> > > On 2025-Sep-15, Alexander Korotkov wrote:
> > >
> > > > > It's LGTM. The same pattern is observed in VACUUM, EXPLAIN, and CREATE
> > > > > PUBLICATION - all use minimal grammar rules that produce generic
> > > > > option lists, with the actual interpretation done in their respective
> > > > > implementation files. The moderate complexity in wait.c seems
> > > > > acceptable.
> > >
> > > Actually I find the code in ExecWaitStmt pretty unusual.  We tend to use
> > > lists of DefElem (a name optionally followed by a value) instead of
> > > individual scattered elements that must later be matched up.  Why not
> > > use utility_option_list instead and then loop on the list of DefElems?
> > > It'd be a lot simpler.
> >
> > I took a look at commands like VACUUM and EXPLAIN and they do follow
> > this pattern. v11 will make use of utility_option_list.
> >
> > > Also, we've found that failing to surround the options by parens leads
> > > to pain down the road, so maybe add that.  Given that the LSN seems to
> > > be mandatory, maybe make it something like
> > >
> > > WAIT FOR LSN 'xy/zzy' [ WITH ( utility_option_list ) ]
> > >
> > > This requires that you make LSN a keyword, albeit unreserved.  Or you
> > > could make it
> > > WAIT FOR Ident [the rest]
> > > and then ensure in C that the identifier matches the word LSN, such as
> > > we do for "permissive" and "restrictive" in
> > > RowSecurityDefaultPermissive.
> >
> > Shall make LSN an unreserved keyword as well.
>
> Here's the updated v11.  Many thanks Jian for off-list discussions and review.

v12 removed unused
+WaitStmt
+WaitStmtParam in pgindent/typedefs.list.

Best,
Xuneng

Вложения

v12-0001-Implement-WAIT-FOR-command.patch

Re: Implement waiting for wal lsn replay: reloaded

От

Xuneng Zhou

Дата:

14 октября 2025 г., 16:03:30

Hi,

On Sat, Oct 4, 2025 at 9:35 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
>
> Hi,
>
> On Sun, Sep 28, 2025 at 5:02 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> >
> > Hi,
> >
> > On Fri, Sep 26, 2025 at 7:22 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > >
> > > Hi Álvaro,
> > >
> > > Thanks for your review.
> > >
> > > On Tue, Sep 16, 2025 at 4:24 AM Álvaro Herrera <alvherre@kurilemu.de> wrote:
> > > >
> > > > On 2025-Sep-15, Alexander Korotkov wrote:
> > > >
> > > > > > It's LGTM. The same pattern is observed in VACUUM, EXPLAIN, and CREATE
> > > > > > PUBLICATION - all use minimal grammar rules that produce generic
> > > > > > option lists, with the actual interpretation done in their respective
> > > > > > implementation files. The moderate complexity in wait.c seems
> > > > > > acceptable.
> > > >
> > > > Actually I find the code in ExecWaitStmt pretty unusual.  We tend to use
> > > > lists of DefElem (a name optionally followed by a value) instead of
> > > > individual scattered elements that must later be matched up.  Why not
> > > > use utility_option_list instead and then loop on the list of DefElems?
> > > > It'd be a lot simpler.
> > >
> > > I took a look at commands like VACUUM and EXPLAIN and they do follow
> > > this pattern. v11 will make use of utility_option_list.
> > >
> > > > Also, we've found that failing to surround the options by parens leads
> > > > to pain down the road, so maybe add that.  Given that the LSN seems to
> > > > be mandatory, maybe make it something like
> > > >
> > > > WAIT FOR LSN 'xy/zzy' [ WITH ( utility_option_list ) ]
> > > >
> > > > This requires that you make LSN a keyword, albeit unreserved.  Or you
> > > > could make it
> > > > WAIT FOR Ident [the rest]
> > > > and then ensure in C that the identifier matches the word LSN, such as
> > > > we do for "permissive" and "restrictive" in
> > > > RowSecurityDefaultPermissive.
> > >
> > > Shall make LSN an unreserved keyword as well.
> >
> > Here's the updated v11.  Many thanks Jian for off-list discussions and review.
>
> v12 removed unused
> +WaitStmt
> +WaitStmtParam in pgindent/typedefs.list.
>

Hi, I’ve split the patch into multiple patch sets for easier review,
per Michael’s advice [1].

[1] https://www.postgresql.org/message-id/aOMsv9TszlB1n-W7%40paquier.xyz

Best,
Xuneng

Вложения

Re: Implement waiting for wal lsn replay: reloaded

От

Xuneng Zhou

Дата:

15 октября 2025 г., 03:23:09

Hi,

On Tue, Oct 14, 2025 at 9:03 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
>
> Hi,
>
> On Sat, Oct 4, 2025 at 9:35 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> >
> > Hi,
> >
> > On Sun, Sep 28, 2025 at 5:02 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > >
> > > Hi,
> > >
> > > On Fri, Sep 26, 2025 at 7:22 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > > >
> > > > Hi Álvaro,
> > > >
> > > > Thanks for your review.
> > > >
> > > > On Tue, Sep 16, 2025 at 4:24 AM Álvaro Herrera <alvherre@kurilemu.de> wrote:
> > > > >
> > > > > On 2025-Sep-15, Alexander Korotkov wrote:
> > > > >
> > > > > > > It's LGTM. The same pattern is observed in VACUUM, EXPLAIN, and CREATE
> > > > > > > PUBLICATION - all use minimal grammar rules that produce generic
> > > > > > > option lists, with the actual interpretation done in their respective
> > > > > > > implementation files. The moderate complexity in wait.c seems
> > > > > > > acceptable.
> > > > >
> > > > > Actually I find the code in ExecWaitStmt pretty unusual.  We tend to use
> > > > > lists of DefElem (a name optionally followed by a value) instead of
> > > > > individual scattered elements that must later be matched up.  Why not
> > > > > use utility_option_list instead and then loop on the list of DefElems?
> > > > > It'd be a lot simpler.
> > > >
> > > > I took a look at commands like VACUUM and EXPLAIN and they do follow
> > > > this pattern. v11 will make use of utility_option_list.
> > > >
> > > > > Also, we've found that failing to surround the options by parens leads
> > > > > to pain down the road, so maybe add that.  Given that the LSN seems to
> > > > > be mandatory, maybe make it something like
> > > > >
> > > > > WAIT FOR LSN 'xy/zzy' [ WITH ( utility_option_list ) ]
> > > > >
> > > > > This requires that you make LSN a keyword, albeit unreserved.  Or you
> > > > > could make it
> > > > > WAIT FOR Ident [the rest]
> > > > > and then ensure in C that the identifier matches the word LSN, such as
> > > > > we do for "permissive" and "restrictive" in
> > > > > RowSecurityDefaultPermissive.
> > > >
> > > > Shall make LSN an unreserved keyword as well.
> > >
> > > Here's the updated v11.  Many thanks Jian for off-list discussions and review.
> >
> > v12 removed unused
> > +WaitStmt
> > +WaitStmtParam in pgindent/typedefs.list.
> >
>
> Hi, I’ve split the patch into multiple patch sets for easier review,
> per Michael’s advice [1].
>
> [1] https://www.postgresql.org/message-id/aOMsv9TszlB1n-W7%40paquier.xyz
>

Patch 2 in v13 is corrupted and patch 3 has an error. Sorry for the
noise. Here's v14.

Best,
Xuneng

Вложения

Re: Implement waiting for wal lsn replay: reloaded

От

Xuneng Zhou

Дата:

15 октября 2025 г., 11:40:03

Hi,

On Wed, Oct 15, 2025 at 8:23 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
>
> Hi,
>
> On Tue, Oct 14, 2025 at 9:03 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> >
> > Hi,
> >
> > On Sat, Oct 4, 2025 at 9:35 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > >
> > > Hi,
> > >
> > > On Sun, Sep 28, 2025 at 5:02 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > > >
> > > > Hi,
> > > >
> > > > On Fri, Sep 26, 2025 at 7:22 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > > > >
> > > > > Hi Álvaro,
> > > > >
> > > > > Thanks for your review.
> > > > >
> > > > > On Tue, Sep 16, 2025 at 4:24 AM Álvaro Herrera <alvherre@kurilemu.de> wrote:
> > > > > >
> > > > > > On 2025-Sep-15, Alexander Korotkov wrote:
> > > > > >
> > > > > > > > It's LGTM. The same pattern is observed in VACUUM, EXPLAIN, and CREATE
> > > > > > > > PUBLICATION - all use minimal grammar rules that produce generic
> > > > > > > > option lists, with the actual interpretation done in their respective
> > > > > > > > implementation files. The moderate complexity in wait.c seems
> > > > > > > > acceptable.
> > > > > >
> > > > > > Actually I find the code in ExecWaitStmt pretty unusual.  We tend to use
> > > > > > lists of DefElem (a name optionally followed by a value) instead of
> > > > > > individual scattered elements that must later be matched up.  Why not
> > > > > > use utility_option_list instead and then loop on the list of DefElems?
> > > > > > It'd be a lot simpler.
> > > > >
> > > > > I took a look at commands like VACUUM and EXPLAIN and they do follow
> > > > > this pattern. v11 will make use of utility_option_list.
> > > > >
> > > > > > Also, we've found that failing to surround the options by parens leads
> > > > > > to pain down the road, so maybe add that.  Given that the LSN seems to
> > > > > > be mandatory, maybe make it something like
> > > > > >
> > > > > > WAIT FOR LSN 'xy/zzy' [ WITH ( utility_option_list ) ]
> > > > > >
> > > > > > This requires that you make LSN a keyword, albeit unreserved.  Or you
> > > > > > could make it
> > > > > > WAIT FOR Ident [the rest]
> > > > > > and then ensure in C that the identifier matches the word LSN, such as
> > > > > > we do for "permissive" and "restrictive" in
> > > > > > RowSecurityDefaultPermissive.
> > > > >
> > > > > Shall make LSN an unreserved keyword as well.
> > > >
> > > > Here's the updated v11.  Many thanks Jian for off-list discussions and review.
> > >
> > > v12 removed unused
> > > +WaitStmt
> > > +WaitStmtParam in pgindent/typedefs.list.
> > >
> >
> > Hi, I’ve split the patch into multiple patch sets for easier review,
> > per Michael’s advice [1].
> >
> > [1] https://www.postgresql.org/message-id/aOMsv9TszlB1n-W7%40paquier.xyz
> >
>
> Patch 2 in v13 is corrupted and patch 3 has an error. Sorry for the
> noise. Here's v14.
>

Made minor changes to #include of xlogwait.h in patch2 to calm CF-bots down.

Best,
Xuneng

Вложения

Re: Implement waiting for wal lsn replay: reloaded

От

Álvaro Herrera

Дата:

15 октября 2025 г., 11:51:42

I didn't review the patch other than look at the grammar, but I disagree
with using opt_with in it.  I think WITH should be a mandatory word, or
just not be there at all.  The current formulation lets you do one of:

1. WAIT FOR LSN '123/456' WITH (opt = val);
2. WAIT FOR LSN '123/456' (opt = val);
3. WAIT FOR LSN '123/456';

and I don't see why you need two ways to specify an option list.

So one option is to remove opt_wait_with_clause and just use
opt_utility_option_list, which would remove the WITH keyword from there
(ie. only keep 2 and 3 from the above list).  But I think that's worse:
just look at the REPACK grammar[1], where we have to have additional
productions for the optional parenthesized option list.

So why not do just

+opt_wait_with_clause:
+           WITH '(' utility_option_list ')'        { $$ = $3; }
+           | /*EMPTY*/                             { $$ = NIL; }
+           ;

which keeps options 1 and 3 of the list above.

Note: you don't need to worry about WITH_LA, because that's only going
to show up when the user writes WITH TIME or WITH ORDINALITY (see
parser.c), and that's a syntax error anyway.


[1] https://postgr.es/m/202510101352.vvp4p3p2dblu@alvherre.pgsql

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"La virtud es el justo medio entre dos defectos" (Aristóteles)

Re: Implement waiting for wal lsn replay: reloaded

От

Xuneng Zhou

Дата:

15 октября 2025 г., 15:48:29

Hi,

Thank you for the grammar review and the clear recommendation.

On Wed, Oct 15, 2025 at 4:51 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:
>
> I didn't review the patch other than look at the grammar, but I disagree
> with using opt_with in it.  I think WITH should be a mandatory word, or
> just not be there at all.  The current formulation lets you do one of:
>
> 1. WAIT FOR LSN '123/456' WITH (opt = val);
> 2. WAIT FOR LSN '123/456' (opt = val);
> 3. WAIT FOR LSN '123/456';
>
> and I don't see why you need two ways to specify an option list.

I agree with this as unnecessary choices are confusing.

>
> So one option is to remove opt_wait_with_clause and just use
> opt_utility_option_list, which would remove the WITH keyword from there
> (ie. only keep 2 and 3 from the above list).  But I think that's worse:
> just look at the REPACK grammar[1], where we have to have additional
> productions for the optional parenthesized option list.
>
>
>
> So why not do just
>
> +opt_wait_with_clause:
> +           WITH '(' utility_option_list ')'        { $$ = $3; }
> +           | /*EMPTY*/                             { $$ = NIL; }
> +           ;
>
> which keeps options 1 and 3 of the list above.

Your suggested approach of making WITH mandatory when options are
present looks better.
I've implemented the change as you recommended. Please see patch 3 in v16.

>
>
>
> Note: you don't need to worry about WITH_LA, because that's only going
> to show up when the user writes WITH TIME or WITH ORDINALITY (see
> parser.c), and that's a syntax error anyway.
>

Yeah, we require '(' immediately after WITH in our grammar, the
lookahead mechanism will keep it as regular WITH, and any attempt to
write "WITH TIME" or "WITH ORDINALITY" would be a syntax error anyway,
which is expected.

Best,
Xuneng

Вложения

Re: Implement waiting for wal lsn replay: reloaded

От

Xuneng Zhou

Дата:

16 октября 2025 г., 10:11:58

Hi,

On Wed, Oct 15, 2025 at 8:48 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
>
> Hi,
>
> Thank you for the grammar review and the clear recommendation.
>
> On Wed, Oct 15, 2025 at 4:51 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:
> >
> > I didn't review the patch other than look at the grammar, but I disagree
> > with using opt_with in it.  I think WITH should be a mandatory word, or
> > just not be there at all.  The current formulation lets you do one of:
> >
> > 1. WAIT FOR LSN '123/456' WITH (opt = val);
> > 2. WAIT FOR LSN '123/456' (opt = val);
> > 3. WAIT FOR LSN '123/456';
> >
> > and I don't see why you need two ways to specify an option list.
>
> I agree with this as unnecessary choices are confusing.
>
> >
> > So one option is to remove opt_wait_with_clause and just use
> > opt_utility_option_list, which would remove the WITH keyword from there
> > (ie. only keep 2 and 3 from the above list).  But I think that's worse:
> > just look at the REPACK grammar[1], where we have to have additional
> > productions for the optional parenthesized option list.
> >
> >
> >
> > So why not do just
> >
> > +opt_wait_with_clause:
> > +           WITH '(' utility_option_list ')'        { $$ = $3; }
> > +           | /*EMPTY*/                             { $$ = NIL; }
> > +           ;
> >
> > which keeps options 1 and 3 of the list above.
>
> Your suggested approach of making WITH mandatory when options are
> present looks better.
> I've implemented the change as you recommended. Please see patch 3 in v16.
>
> >
> >
> >
> > Note: you don't need to worry about WITH_LA, because that's only going
> > to show up when the user writes WITH TIME or WITH ORDINALITY (see
> > parser.c), and that's a syntax error anyway.
> >
>
> Yeah, we require '(' immediately after WITH in our grammar, the
> lookahead mechanism will keep it as regular WITH, and any attempt to
> write "WITH TIME" or "WITH ORDINALITY" would be a syntax error anyway,
> which is expected.
>

The filename of patch 1 is incorrect due to coping. Just correct it.

Best,
Xuneng

Вложения

Re: Implement waiting for wal lsn replay: reloaded

От

Alexander Korotkov

Дата:

23 октября 2025 г., 13:46:27

Hi!

In Thu, Oct 16, 2025 at 10:12 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> On Wed, Oct 15, 2025 at 8:48 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> >
> > Hi,
> >
> > Thank you for the grammar review and the clear recommendation.
> >
> > On Wed, Oct 15, 2025 at 4:51 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:
> > >
> > > I didn't review the patch other than look at the grammar, but I disagree
> > > with using opt_with in it.  I think WITH should be a mandatory word, or
> > > just not be there at all.  The current formulation lets you do one of:
> > >
> > > 1. WAIT FOR LSN '123/456' WITH (opt = val);
> > > 2. WAIT FOR LSN '123/456' (opt = val);
> > > 3. WAIT FOR LSN '123/456';
> > >
> > > and I don't see why you need two ways to specify an option list.
> >
> > I agree with this as unnecessary choices are confusing.
> >
> > >
> > > So one option is to remove opt_wait_with_clause and just use
> > > opt_utility_option_list, which would remove the WITH keyword from there
> > > (ie. only keep 2 and 3 from the above list).  But I think that's worse:
> > > just look at the REPACK grammar[1], where we have to have additional
> > > productions for the optional parenthesized option list.
> > >
> > >
> > >
> > > So why not do just
> > >
> > > +opt_wait_with_clause:
> > > +           WITH '(' utility_option_list ')'        { $$ = $3; }
> > > +           | /*EMPTY*/                             { $$ = NIL; }
> > > +           ;
> > >
> > > which keeps options 1 and 3 of the list above.
> >
> > Your suggested approach of making WITH mandatory when options are
> > present looks better.
> > I've implemented the change as you recommended. Please see patch 3 in v16.
> >
> > >
> > >
> > >
> > > Note: you don't need to worry about WITH_LA, because that's only going
> > > to show up when the user writes WITH TIME or WITH ORDINALITY (see
> > > parser.c), and that's a syntax error anyway.
> > >
> >
> > Yeah, we require '(' immediately after WITH in our grammar, the
> > lookahead mechanism will keep it as regular WITH, and any attempt to
> > write "WITH TIME" or "WITH ORDINALITY" would be a syntax error anyway,
> > which is expected.
> >
>
> The filename of patch 1 is incorrect due to coping. Just correct it.

Thank you for rebasing the patch.

I've revised it.  The significant changes has been made to 0002, where
I reduced the code duplication.  Also, I run pgindent and pgperltidy
and made other small improvements.
Please, check.

------
Regards,
Alexander Korotkov
Supabase

Вложения

Re: Implement waiting for wal lsn replay: reloaded

От

Xuneng Zhou

Дата:

23 октября 2025 г., 15:58:46

Hi,

On Thu, Oct 23, 2025 at 6:46 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
>
> Hi!
>
> In Thu, Oct 16, 2025 at 10:12 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > On Wed, Oct 15, 2025 at 8:48 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > >
> > > Hi,
> > >
> > > Thank you for the grammar review and the clear recommendation.
> > >
> > > On Wed, Oct 15, 2025 at 4:51 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:
> > > >
> > > > I didn't review the patch other than look at the grammar, but I disagree
> > > > with using opt_with in it.  I think WITH should be a mandatory word, or
> > > > just not be there at all.  The current formulation lets you do one of:
> > > >
> > > > 1. WAIT FOR LSN '123/456' WITH (opt = val);
> > > > 2. WAIT FOR LSN '123/456' (opt = val);
> > > > 3. WAIT FOR LSN '123/456';
> > > >
> > > > and I don't see why you need two ways to specify an option list.
> > >
> > > I agree with this as unnecessary choices are confusing.
> > >
> > > >
> > > > So one option is to remove opt_wait_with_clause and just use
> > > > opt_utility_option_list, which would remove the WITH keyword from there
> > > > (ie. only keep 2 and 3 from the above list).  But I think that's worse:
> > > > just look at the REPACK grammar[1], where we have to have additional
> > > > productions for the optional parenthesized option list.
> > > >
> > > >
> > > >
> > > > So why not do just
> > > >
> > > > +opt_wait_with_clause:
> > > > +           WITH '(' utility_option_list ')'        { $$ = $3; }
> > > > +           | /*EMPTY*/                             { $$ = NIL; }
> > > > +           ;
> > > >
> > > > which keeps options 1 and 3 of the list above.
> > >
> > > Your suggested approach of making WITH mandatory when options are
> > > present looks better.
> > > I've implemented the change as you recommended. Please see patch 3 in v16.
> > >
> > > >
> > > >
> > > >
> > > > Note: you don't need to worry about WITH_LA, because that's only going
> > > > to show up when the user writes WITH TIME or WITH ORDINALITY (see
> > > > parser.c), and that's a syntax error anyway.
> > > >
> > >
> > > Yeah, we require '(' immediately after WITH in our grammar, the
> > > lookahead mechanism will keep it as regular WITH, and any attempt to
> > > write "WITH TIME" or "WITH ORDINALITY" would be a syntax error anyway,
> > > which is expected.
> > >
> >
> > The filename of patch 1 is incorrect due to coping. Just correct it.
>
> Thank you for rebasing the patch.
>
> I've revised it.  The significant changes has been made to 0002, where
> I reduced the code duplication.  Also, I run pgindent and pgperltidy
> and made other small improvements.
> Please, check.

Thanks for updating the patch set!
Patch 2 looks more elegant after the revision. I’ll review them soon.

Best,
Xuneng

Re: Implement waiting for wal lsn replay: reloaded

От

Xuneng Zhou

Дата:

02 ноября 2025 г., 09:24:48

Hi, Alexander!

On Thu, Oct 23, 2025 at 8:58 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
>
> Hi,
>
> On Thu, Oct 23, 2025 at 6:46 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> >
> > Hi!
> >
> > In Thu, Oct 16, 2025 at 10:12 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > > On Wed, Oct 15, 2025 at 8:48 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > > >
> > > > Hi,
> > > >
> > > > Thank you for the grammar review and the clear recommendation.
> > > >
> > > > On Wed, Oct 15, 2025 at 4:51 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:
> > > > >
> > > > > I didn't review the patch other than look at the grammar, but I disagree
> > > > > with using opt_with in it.  I think WITH should be a mandatory word, or
> > > > > just not be there at all.  The current formulation lets you do one of:
> > > > >
> > > > > 1. WAIT FOR LSN '123/456' WITH (opt = val);
> > > > > 2. WAIT FOR LSN '123/456' (opt = val);
> > > > > 3. WAIT FOR LSN '123/456';
> > > > >
> > > > > and I don't see why you need two ways to specify an option list.
> > > >
> > > > I agree with this as unnecessary choices are confusing.
> > > >
> > > > >
> > > > > So one option is to remove opt_wait_with_clause and just use
> > > > > opt_utility_option_list, which would remove the WITH keyword from there
> > > > > (ie. only keep 2 and 3 from the above list).  But I think that's worse:
> > > > > just look at the REPACK grammar[1], where we have to have additional
> > > > > productions for the optional parenthesized option list.
> > > > >
> > > > >
> > > > >
> > > > > So why not do just
> > > > >
> > > > > +opt_wait_with_clause:
> > > > > +           WITH '(' utility_option_list ')'        { $$ = $3; }
> > > > > +           | /*EMPTY*/                             { $$ = NIL; }
> > > > > +           ;
> > > > >
> > > > > which keeps options 1 and 3 of the list above.
> > > >
> > > > Your suggested approach of making WITH mandatory when options are
> > > > present looks better.
> > > > I've implemented the change as you recommended. Please see patch 3 in v16.
> > > >
> > > > >
> > > > >
> > > > >
> > > > > Note: you don't need to worry about WITH_LA, because that's only going
> > > > > to show up when the user writes WITH TIME or WITH ORDINALITY (see
> > > > > parser.c), and that's a syntax error anyway.
> > > > >
> > > >
> > > > Yeah, we require '(' immediately after WITH in our grammar, the
> > > > lookahead mechanism will keep it as regular WITH, and any attempt to
> > > > write "WITH TIME" or "WITH ORDINALITY" would be a syntax error anyway,
> > > > which is expected.
> > > >
> > >
> > > The filename of patch 1 is incorrect due to coping. Just correct it.
> >
> > Thank you for rebasing the patch.
> >
> > I've revised it.  The significant changes has been made to 0002, where
> > I reduced the code duplication.  Also, I run pgindent and pgperltidy
> > and made other small improvements.
> > Please, check.
>
> Thanks for updating the patch set!
> Patch 2 looks more elegant after the revision. I’ll review them soon.

I’ve made a few minor updates to the comments and docs in patches 2
and 3. The patch set LGTM now.

Best,
Xuneng

Вложения

Re: Implement waiting for wal lsn replay: reloaded

От

Xuneng Zhou

Дата:

03 ноября 2025 г., 05:20:28

Hi,

On Sun, Nov 2, 2025 at 2:24 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
>
> Hi, Alexander!
>
> On Thu, Oct 23, 2025 at 8:58 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> >
> > Hi,
> >
> > On Thu, Oct 23, 2025 at 6:46 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > >
> > > Hi!
> > >
> > > In Thu, Oct 16, 2025 at 10:12 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > > > On Wed, Oct 15, 2025 at 8:48 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > Thank you for the grammar review and the clear recommendation.
> > > > >
> > > > > On Wed, Oct 15, 2025 at 4:51 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:
> > > > > >
> > > > > > I didn't review the patch other than look at the grammar, but I disagree
> > > > > > with using opt_with in it.  I think WITH should be a mandatory word, or
> > > > > > just not be there at all.  The current formulation lets you do one of:
> > > > > >
> > > > > > 1. WAIT FOR LSN '123/456' WITH (opt = val);
> > > > > > 2. WAIT FOR LSN '123/456' (opt = val);
> > > > > > 3. WAIT FOR LSN '123/456';
> > > > > >
> > > > > > and I don't see why you need two ways to specify an option list.
> > > > >
> > > > > I agree with this as unnecessary choices are confusing.
> > > > >
> > > > > >
> > > > > > So one option is to remove opt_wait_with_clause and just use
> > > > > > opt_utility_option_list, which would remove the WITH keyword from there
> > > > > > (ie. only keep 2 and 3 from the above list).  But I think that's worse:
> > > > > > just look at the REPACK grammar[1], where we have to have additional
> > > > > > productions for the optional parenthesized option list.
> > > > > >
> > > > > >
> > > > > >
> > > > > > So why not do just
> > > > > >
> > > > > > +opt_wait_with_clause:
> > > > > > +           WITH '(' utility_option_list ')'        { $$ = $3; }
> > > > > > +           | /*EMPTY*/                             { $$ = NIL; }
> > > > > > +           ;
> > > > > >
> > > > > > which keeps options 1 and 3 of the list above.
> > > > >
> > > > > Your suggested approach of making WITH mandatory when options are
> > > > > present looks better.
> > > > > I've implemented the change as you recommended. Please see patch 3 in v16.
> > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > Note: you don't need to worry about WITH_LA, because that's only going
> > > > > > to show up when the user writes WITH TIME or WITH ORDINALITY (see
> > > > > > parser.c), and that's a syntax error anyway.
> > > > > >
> > > > >
> > > > > Yeah, we require '(' immediately after WITH in our grammar, the
> > > > > lookahead mechanism will keep it as regular WITH, and any attempt to
> > > > > write "WITH TIME" or "WITH ORDINALITY" would be a syntax error anyway,
> > > > > which is expected.
> > > > >
> > > >
> > > > The filename of patch 1 is incorrect due to coping. Just correct it.
> > >
> > > Thank you for rebasing the patch.
> > >
> > > I've revised it.  The significant changes has been made to 0002, where
> > > I reduced the code duplication.  Also, I run pgindent and pgperltidy
> > > and made other small improvements.
> > > Please, check.
> >
> > Thanks for updating the patch set!
> > Patch 2 looks more elegant after the revision. I’ll review them soon.
>
> I’ve made a few minor updates to the comments and docs in patches 2
> and 3. The patch set LGTM now.

Fix an minor issue in v18: WaitStmt was mistakenly added to
pgindent/typedefs.list in patch 2, but it should belong to patch 3.

Best,
Xuneng

Вложения

Re: Implement waiting for wal lsn replay: reloaded

От

Alexander Korotkov

Дата:

03 ноября 2025 г., 14:46:40

Hello, Xuneng!

On Mon, Nov 3, 2025 at 4:20 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> On Sun, Nov 2, 2025 at 2:24 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > On Thu, Oct 23, 2025 at 8:58 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > I’ve made a few minor updates to the comments and docs in patches 2
> > and 3. The patch set LGTM now.
>
> Fix an minor issue in v18: WaitStmt was mistakenly added to
> pgindent/typedefs.list in patch 2, but it should belong to patch 3.

Thank you.  I also made some minor changes to 0002 renaming
"operation" => "lsnType".

I'd like to give this subject another chance for pg19.  I'm going to
push this if no objections.

------
Regards,
Alexander Korotkov
Supabase

Вложения

Re: Implement waiting for wal lsn replay: reloaded

От

Álvaro Herrera

Дата:

03 ноября 2025 г., 18:06:58

On 2025-Nov-03, Alexander Korotkov wrote:

> I'd like to give this subject another chance for pg19.  I'm going to
> push this if no objections.

Sure.  I don't understand why patches 0002 and 0003 are separate though.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/

Re: Implement waiting for wal lsn replay: reloaded

От

Andres Freund

Дата:

03 ноября 2025 г., 18:13:02

Hi,

On 2025-11-03 16:06:58 +0100, Álvaro Herrera wrote:
> On 2025-Nov-03, Alexander Korotkov wrote:
> 
> > I'd like to give this subject another chance for pg19.  I'm going to
> > push this if no objections.
> 
> Sure.  I don't understand why patches 0002 and 0003 are separate though.

FWIW, I appreciate such splits. Even if the functionality isn't usable
independently, it's still different type of code that's affected. And the
patches are each big enough to make that worthwhile for easier review.

One thing that'd be nice to do once we have WAIT FOR is to make the common
case of wait_for_catchup() use this facility, instead of polling...

Greetings,

Andres Freund

Re: Implement waiting for wal lsn replay: reloaded

От

Alexander Korotkov

Дата:

05 ноября 2025 г., 12:51:18

Hi!

On Mon, Nov 3, 2025 at 5:13 PM Andres Freund <andres@anarazel.de> wrote:
> On 2025-11-03 16:06:58 +0100, Álvaro Herrera wrote:
> > On 2025-Nov-03, Alexander Korotkov wrote:
> >
> > > I'd like to give this subject another chance for pg19.  I'm going to
> > > push this if no objections.
> >
> > Sure.  I don't understand why patches 0002 and 0003 are separate though.
>
> FWIW, I appreciate such splits. Even if the functionality isn't usable
> independently, it's still different type of code that's affected. And the
> patches are each big enough to make that worthwhile for easier review.

Thank you for the feedback, pushed.

> One thing that'd be nice to do once we have WAIT FOR is to make the common
> case of wait_for_catchup() use this facility, instead of polling...

The draft patch for that is attached.  WAIT FOR doesn't handle all the
possible use cases of wait_for_catchup(), but I've added usage when
it's appropriate.

------
Regards,
Alexander Korotkov
Supabase

Вложения

v1-0001-Use-WAIT-FOR-LSN-in-PostgreSQL-Test-Cluster-wait_.patch

Re: Implement waiting for wal lsn replay: reloaded

От

Xuneng Zhou

Дата:

05 ноября 2025 г., 17:03:25

Hi,

On Wed, Nov 5, 2025 at 5:51 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
>
> Hi!
>
> On Mon, Nov 3, 2025 at 5:13 PM Andres Freund <andres@anarazel.de> wrote:
> > On 2025-11-03 16:06:58 +0100, Álvaro Herrera wrote:
> > > On 2025-Nov-03, Alexander Korotkov wrote:
> > >
> > > > I'd like to give this subject another chance for pg19.  I'm going to
> > > > push this if no objections.
> > >
> > > Sure.  I don't understand why patches 0002 and 0003 are separate though.
> >
> > FWIW, I appreciate such splits. Even if the functionality isn't usable
> > independently, it's still different type of code that's affected. And the
> > patches are each big enough to make that worthwhile for easier review.
>
> Thank you for the feedback, pushed.

Thanks for pushing them!

> > One thing that'd be nice to do once we have WAIT FOR is to make the common
> > case of wait_for_catchup() use this facility, instead of polling...
>
> The draft patch for that is attached.  WAIT FOR doesn't handle all the
> possible use cases of wait_for_catchup(), but I've added usage when
> it's appropriate.

Interesting, could this approach be extended to the flush and other
modes as well? I might need to spend some time to understand it before
I can provide a meaningful review.

Best,
Xuneng

Re: Implement waiting for wal lsn replay: reloaded

От

Alexander Korotkov

Дата:

08 ноября 2025 г., 01:02:36

On Wed, Nov 5, 2025 at 4:03 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
>
> On Wed, Nov 5, 2025 at 5:51 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > On Mon, Nov 3, 2025 at 5:13 PM Andres Freund <andres@anarazel.de> wrote:
> > > On 2025-11-03 16:06:58 +0100, Álvaro Herrera wrote:
> > > > On 2025-Nov-03, Alexander Korotkov wrote:
> > > >
> > > > > I'd like to give this subject another chance for pg19.  I'm going to
> > > > > push this if no objections.
> > > >
> > > > Sure.  I don't understand why patches 0002 and 0003 are separate though.
> > >
> > > FWIW, I appreciate such splits. Even if the functionality isn't usable
> > > independently, it's still different type of code that's affected. And the
> > > patches are each big enough to make that worthwhile for easier review.
> >
> > Thank you for the feedback, pushed.
>
> Thanks for pushing them!
>
> > > One thing that'd be nice to do once we have WAIT FOR is to make the common
> > > case of wait_for_catchup() use this facility, instead of polling...
> >
> > The draft patch for that is attached.  WAIT FOR doesn't handle all the
> > possible use cases of wait_for_catchup(), but I've added usage when
> > it's appropriate.
>
> Interesting, could this approach be extended to the flush and other
> modes as well? I might need to spend some time to understand it before
> I can provide a meaningful review.

I think we might end up extending WaitLSNType enum.  However, I hate
inHeap and heapNode arrays growing in WaitLSNProcInfo as they are
allocated per process.  I found that we could optimize WaitLSNProcInfo
struct turning them into simple variables because a single process can
wait only for a single LSN at a time.  Please, check the attached
patch.

------
Regards,
Alexander Korotkov
Supabase

Вложения

v1-0001-Optimize-shared-memory-usage-for-WaitLSNProcInfo.patch

Re: Implement waiting for wal lsn replay: reloaded

От

Xuneng Zhou

Дата:

12 ноября 2025 г., 10:19:53

Hi,

On Wed, Nov 5, 2025 at 5:51 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
>
> Hi!
>
> On Mon, Nov 3, 2025 at 5:13 PM Andres Freund <andres@anarazel.de> wrote:
> > On 2025-11-03 16:06:58 +0100, Álvaro Herrera wrote:
> > > On 2025-Nov-03, Alexander Korotkov wrote:
> > >
> > > > I'd like to give this subject another chance for pg19.  I'm going to
> > > > push this if no objections.
> > >
> > > Sure.  I don't understand why patches 0002 and 0003 are separate though.
> >
> > FWIW, I appreciate such splits. Even if the functionality isn't usable
> > independently, it's still different type of code that's affected. And the
> > patches are each big enough to make that worthwhile for easier review.
>
> Thank you for the feedback, pushed.
>
> > One thing that'd be nice to do once we have WAIT FOR is to make the common
> > case of wait_for_catchup() use this facility, instead of polling...
>
> The draft patch for that is attached.  WAIT FOR doesn't handle all the
> possible use cases of wait_for_catchup(), but I've added usage when
> it's appropriate.
>
> ------
> Regards,
> Alexander Korotkov
> Supabase

I tested the patch using make check-world, and it worked well. I also
made a few adjustments:

- Added an unconditional chomp($isrecovery) after querying
pg_is_in_recovery() to prevent newline mismatches when $target_lsn is
accidently defined.
- Added chomp($output) to normalize the result from WAIT FOR LSN
before comparison.

At the moment, the WAIT FOR LSN command supports only the replay mode.
If we intend to extend its functionality more broadly, one option is
to add a mode option or something similar. Are users expected to wait
for flush(or others) completion in such cases? If not, and the TAP
test is the only intended use, this approach might be a bit of an
overkill.

--
Best,
Xuneng

Вложения

v2-0001-Use-WAIT-FOR-LSN-in.patch

Re: Implement waiting for wal lsn replay: reloaded

От

Tomas Vondra

Дата:

13 ноября 2025 г., 23:32:31

On 11/5/25 10:51, Alexander Korotkov wrote:
> Hi!
> 
> On Mon, Nov 3, 2025 at 5:13 PM Andres Freund <andres@anarazel.de> wrote:
>> On 2025-11-03 16:06:58 +0100, Álvaro Herrera wrote:
>>> On 2025-Nov-03, Alexander Korotkov wrote:
>>>
>>>> I'd like to give this subject another chance for pg19.  I'm going to
>>>> push this if no objections.
>>>
>>> Sure.  I don't understand why patches 0002 and 0003 are separate though.
>>
>> FWIW, I appreciate such splits. Even if the functionality isn't usable
>> independently, it's still different type of code that's affected. And the
>> patches are each big enough to make that worthwhile for easier review.
> 
> Thank you for the feedback, pushed.
> 

Hi,

The new TAP test 049_wait_for_lsn.pl introduced by this commit, because
it takes a long time - about 65 seconds on my laptop. That's about 25%
of the whole src/test/recovery, more than any other test.

And most of the time there's nothing happening - these are the two log
messages showing the 60-second wait:

2025-11-13 21:12:39.949 CET checkpointer[562597] LOG:  checkpoint
complete: wrote 9 buffers (7.0%), wrote 3 SLRU buffers; 0 WAL file(s)
added, 0 removed, 2 recycled; write=0.906 s, sync=0.001 s, total=0.907
s; sync files=0, longest=0.000 s, average=0.000 s; distance=32768 kB,
estimate=32768 kB; lsn=0/040000B8, redo lsn=0/04000060

2025-11-13 21:13:38.994 CET client backend[562727] 049_wait_for_lsn.pl
ERROR:  recovery is not in progress

So there's a checkpoint, 60 seconds of nothing, and then a failure. I
haven't looked into why it waits for 1 minute exactly, but adding 60
seconds to check-world is somewhat annoying.

While at it, I noticed a couple comments refer to WaitForLSNReplay, but
but I think that got renamed simply to WaitForLSN.

regards

-- 
Tomas Vondra

Re: Implement waiting for wal lsn replay: reloaded

От

Xuneng Zhou

Дата:

14 ноября 2025 г., 04:49:51

Hi Tomas,

On Fri, Nov 14, 2025 at 4:32 AM Tomas Vondra <tomas@vondra.me> wrote:
>
> On 11/5/25 10:51, Alexander Korotkov wrote:
> > Hi!
> >
> > On Mon, Nov 3, 2025 at 5:13 PM Andres Freund <andres@anarazel.de> wrote:
> >> On 2025-11-03 16:06:58 +0100, Álvaro Herrera wrote:
> >>> On 2025-Nov-03, Alexander Korotkov wrote:
> >>>
> >>>> I'd like to give this subject another chance for pg19.  I'm going to
> >>>> push this if no objections.
> >>>
> >>> Sure.  I don't understand why patches 0002 and 0003 are separate though.
> >>
> >> FWIW, I appreciate such splits. Even if the functionality isn't usable
> >> independently, it's still different type of code that's affected. And the
> >> patches are each big enough to make that worthwhile for easier review.
> >
> > Thank you for the feedback, pushed.
> >
>
> Hi,
>
> The new TAP test 049_wait_for_lsn.pl introduced by this commit, because
> it takes a long time - about 65 seconds on my laptop. That's about 25%
> of the whole src/test/recovery, more than any other test.
>
> And most of the time there's nothing happening - these are the two log
> messages showing the 60-second wait:
>
> 2025-11-13 21:12:39.949 CET checkpointer[562597] LOG:  checkpoint
> complete: wrote 9 buffers (7.0%), wrote 3 SLRU buffers; 0 WAL file(s)
> added, 0 removed, 2 recycled; write=0.906 s, sync=0.001 s, total=0.907
> s; sync files=0, longest=0.000 s, average=0.000 s; distance=32768 kB,
> estimate=32768 kB; lsn=0/040000B8, redo lsn=0/04000060
>
> 2025-11-13 21:13:38.994 CET client backend[562727] 049_wait_for_lsn.pl
> ERROR:  recovery is not in progress
>
> So there's a checkpoint, 60 seconds of nothing, and then a failure. I
> haven't looked into why it waits for 1 minute exactly, but adding 60
> seconds to check-world is somewhat annoying.

Thanks for looking into this!

I did a quick analysis for this prolonged waiting:

In WaitLSNWakeup() (xlogwait.c:267), the fast-path check incorrectly
handled InvalidXLogRecPtr:
/* Fast path check */
if (pg_atomic_read_u64(&waitLSNState->minWaitedLSN[i]) > currentLSN)
    return;  // Issue: Returns early when currentLSN = 0

When currentLSN = InvalidXLogRecPtr (0), meaning "wake all waiters",
the check compared:
- minWaitedLSN (e.g., 0x570CC048) > 0 → TRUE
- Result: function returned early without waking anyone

When It Happened
During standby promotion, xlog.c:6246 calls:

WaitLSNWakeup(WAIT_LSN_TYPE_REPLAY, InvalidXLogRecPtr);

This should wake all LSN waiters, but the bug prevented it. WAIT FOR
LSN commands could wait indefinitely. Test 049_wait_for_lsn.pl took 68
seconds instead of ~9 seconds.

if the above analysis is sound, the fix could be like:

Proposed fix:
Added a validity check before the comparison:
/*
 * Fast path check.  Skip if currentLSN is InvalidXLogRecPtr, which means
 * "wake all waiters" (e.g., during promotion when recovery ends).
 */
if (XLogRecPtrIsValid(currentLSN) &&
    pg_atomic_read_u64(&waitLSNState->minWaitedLSN[i]) > currentLSN)
    return;

Result:
Test time: 68s → 9s
WAIT FOR LSN exits immediately on promotion (62ms vs 60s)

> While at it, I noticed a couple comments refer to WaitForLSNReplay, but
> but I think that got renamed simply to WaitForLSN.

Please check the attached patch for replacing them.

--
Best,
Xuneng

Вложения

Re: Implement waiting for wal lsn replay: reloaded

От

Alexander Korotkov

Дата:

15 ноября 2025 г., 13:29:25

Hi, Xuneng!

On Fri, Nov 14, 2025 at 3:50 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
>
> On Fri, Nov 14, 2025 at 4:32 AM Tomas Vondra <tomas@vondra.me> wrote:
> >
> > On 11/5/25 10:51, Alexander Korotkov wrote:
> > > Hi!
> > >
> > > On Mon, Nov 3, 2025 at 5:13 PM Andres Freund <andres@anarazel.de> wrote:
> > >> On 2025-11-03 16:06:58 +0100, Álvaro Herrera wrote:
> > >>> On 2025-Nov-03, Alexander Korotkov wrote:
> > >>>
> > >>>> I'd like to give this subject another chance for pg19.  I'm going to
> > >>>> push this if no objections.
> > >>>
> > >>> Sure.  I don't understand why patches 0002 and 0003 are separate though.
> > >>
> > >> FWIW, I appreciate such splits. Even if the functionality isn't usable
> > >> independently, it's still different type of code that's affected. And the
> > >> patches are each big enough to make that worthwhile for easier review.
> > >
> > > Thank you for the feedback, pushed.
> > >
> >
> > Hi,
> >
> > The new TAP test 049_wait_for_lsn.pl introduced by this commit, because
> > it takes a long time - about 65 seconds on my laptop. That's about 25%
> > of the whole src/test/recovery, more than any other test.
> >
> > And most of the time there's nothing happening - these are the two log
> > messages showing the 60-second wait:
> >
> > 2025-11-13 21:12:39.949 CET checkpointer[562597] LOG:  checkpoint
> > complete: wrote 9 buffers (7.0%), wrote 3 SLRU buffers; 0 WAL file(s)
> > added, 0 removed, 2 recycled; write=0.906 s, sync=0.001 s, total=0.907
> > s; sync files=0, longest=0.000 s, average=0.000 s; distance=32768 kB,
> > estimate=32768 kB; lsn=0/040000B8, redo lsn=0/04000060
> >
> > 2025-11-13 21:13:38.994 CET client backend[562727] 049_wait_for_lsn.pl
> > ERROR:  recovery is not in progress
> >
> > So there's a checkpoint, 60 seconds of nothing, and then a failure. I
> > haven't looked into why it waits for 1 minute exactly, but adding 60
> > seconds to check-world is somewhat annoying.
>
> Thanks for looking into this!
>
> I did a quick analysis for this prolonged waiting:
>
> In WaitLSNWakeup() (xlogwait.c:267), the fast-path check incorrectly
> handled InvalidXLogRecPtr:
> /* Fast path check */
> if (pg_atomic_read_u64(&waitLSNState->minWaitedLSN[i]) > currentLSN)
>     return;  // Issue: Returns early when currentLSN = 0
>
> When currentLSN = InvalidXLogRecPtr (0), meaning "wake all waiters",
> the check compared:
> - minWaitedLSN (e.g., 0x570CC048) > 0 → TRUE
> - Result: function returned early without waking anyone
>
> When It Happened
> During standby promotion, xlog.c:6246 calls:
>
> WaitLSNWakeup(WAIT_LSN_TYPE_REPLAY, InvalidXLogRecPtr);
>
> This should wake all LSN waiters, but the bug prevented it. WAIT FOR
> LSN commands could wait indefinitely. Test 049_wait_for_lsn.pl took 68
> seconds instead of ~9 seconds.
>
> if the above analysis is sound, the fix could be like:
>
> Proposed fix:
> Added a validity check before the comparison:
> /*
>  * Fast path check.  Skip if currentLSN is InvalidXLogRecPtr, which means
>  * "wake all waiters" (e.g., during promotion when recovery ends).
>  */
> if (XLogRecPtrIsValid(currentLSN) &&
>     pg_atomic_read_u64(&waitLSNState->minWaitedLSN[i]) > currentLSN)
>     return;
>
> Result:
> Test time: 68s → 9s
> WAIT FOR LSN exits immediately on promotion (62ms vs 60s)
>
> > While at it, I noticed a couple comments refer to WaitForLSNReplay, but
> > but I think that got renamed simply to WaitForLSN.
>
> Please check the attached patch for replacing them.

Thank you so much for your patches!
Pushed with minor corrections.

------
Regards,
Alexander Korotkov
Supabase

Re: Implement waiting for wal lsn replay: reloaded

От

Alexander Korotkov

Дата:

16 ноября 2025 г., 15:08:58

On Sat, Nov 8, 2025 at 12:02 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> On Wed, Nov 5, 2025 at 4:03 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > On Wed, Nov 5, 2025 at 5:51 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > > On Mon, Nov 3, 2025 at 5:13 PM Andres Freund <andres@anarazel.de> wrote:
> > > > On 2025-11-03 16:06:58 +0100, Álvaro Herrera wrote:
> > > > > On 2025-Nov-03, Alexander Korotkov wrote:
> > > > >
> > > > > > I'd like to give this subject another chance for pg19.  I'm going to
> > > > > > push this if no objections.
> > > > >
> > > > > Sure.  I don't understand why patches 0002 and 0003 are separate though.
> > > >
> > > > FWIW, I appreciate such splits. Even if the functionality isn't usable
> > > > independently, it's still different type of code that's affected. And the
> > > > patches are each big enough to make that worthwhile for easier review.
> > >
> > > Thank you for the feedback, pushed.
> >
> > Thanks for pushing them!
> >
> > > > One thing that'd be nice to do once we have WAIT FOR is to make the common
> > > > case of wait_for_catchup() use this facility, instead of polling...
> > >
> > > The draft patch for that is attached.  WAIT FOR doesn't handle all the
> > > possible use cases of wait_for_catchup(), but I've added usage when
> > > it's appropriate.
> >
> > Interesting, could this approach be extended to the flush and other
> > modes as well? I might need to spend some time to understand it before
> > I can provide a meaningful review.
>
> I think we might end up extending WaitLSNType enum.  However, I hate
> inHeap and heapNode arrays growing in WaitLSNProcInfo as they are
> allocated per process.  I found that we could optimize WaitLSNProcInfo
> struct turning them into simple variables because a single process can
> wait only for a single LSN at a time.  Please, check the attached
> patch.

Here is the updated patch integrating minor corrections provided by
Xuneng Zhou off-list.  I'm going to push this if no objections.

------
Regards,
Alexander Korotkov
Supabase

Вложения

v3-0001-Optimize-shared-memory-usage-for-WaitLSNProcInfo.patch

Re: Implement waiting for wal lsn replay: reloaded

От

Alexander Korotkov

Дата:

16 ноября 2025 г., 15:36:52

On Wed, Nov 12, 2025 at 9:20 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> On Wed, Nov 5, 2025 at 5:51 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > On Mon, Nov 3, 2025 at 5:13 PM Andres Freund <andres@anarazel.de> wrote:
> > > On 2025-11-03 16:06:58 +0100, Álvaro Herrera wrote:
> > > > On 2025-Nov-03, Alexander Korotkov wrote:
> > > >
> > > > > I'd like to give this subject another chance for pg19.  I'm going to
> > > > > push this if no objections.
> > > >
> > > > Sure.  I don't understand why patches 0002 and 0003 are separate though.
> > >
> > > FWIW, I appreciate such splits. Even if the functionality isn't usable
> > > independently, it's still different type of code that's affected. And the
> > > patches are each big enough to make that worthwhile for easier review.
> >
> > Thank you for the feedback, pushed.
> >
> > > One thing that'd be nice to do once we have WAIT FOR is to make the common
> > > case of wait_for_catchup() use this facility, instead of polling...
> >
> > The draft patch for that is attached.  WAIT FOR doesn't handle all the
> > possible use cases of wait_for_catchup(), but I've added usage when
> > it's appropriate.
>
> I tested the patch using make check-world, and it worked well. I also
> made a few adjustments:
>
> - Added an unconditional chomp($isrecovery) after querying
> pg_is_in_recovery() to prevent newline mismatches when $target_lsn is
> accidently defined.
> - Added chomp($output) to normalize the result from WAIT FOR LSN
> before comparison.
>
> At the moment, the WAIT FOR LSN command supports only the replay mode.
> If we intend to extend its functionality more broadly, one option is
> to add a mode option or something similar. Are users expected to wait
> for flush(or others) completion in such cases? If not, and the TAP
> test is the only intended use, this approach might be a bit of an
> overkill.

I would say that adding mode parameter seems to be a pretty natural
extension of what we have at the moment.  I can imagine some
clustering solution can use it to wait for certain transaction to be
flushed at the replica (without delaying the commit at the primary).

------
Regards,
Alexander Korotkov
Supabase

Re: Implement waiting for wal lsn replay: reloaded

От

Xuneng Zhou

Дата:

16 ноября 2025 г., 16:25:39

Hi Alexander,

On Sat, Nov 15, 2025 at 6:29 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
>
> Hi, Xuneng!
>
> On Fri, Nov 14, 2025 at 3:50 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> >
> > On Fri, Nov 14, 2025 at 4:32 AM Tomas Vondra <tomas@vondra.me> wrote:
> > >
> > > On 11/5/25 10:51, Alexander Korotkov wrote:
> > > > Hi!
> > > >
> > > > On Mon, Nov 3, 2025 at 5:13 PM Andres Freund <andres@anarazel.de> wrote:
> > > >> On 2025-11-03 16:06:58 +0100, Álvaro Herrera wrote:
> > > >>> On 2025-Nov-03, Alexander Korotkov wrote:
> > > >>>
> > > >>>> I'd like to give this subject another chance for pg19.  I'm going to
> > > >>>> push this if no objections.
> > > >>>
> > > >>> Sure.  I don't understand why patches 0002 and 0003 are separate though.
> > > >>
> > > >> FWIW, I appreciate such splits. Even if the functionality isn't usable
> > > >> independently, it's still different type of code that's affected. And the
> > > >> patches are each big enough to make that worthwhile for easier review.
> > > >
> > > > Thank you for the feedback, pushed.
> > > >
> > >
> > > Hi,
> > >
> > > The new TAP test 049_wait_for_lsn.pl introduced by this commit, because
> > > it takes a long time - about 65 seconds on my laptop. That's about 25%
> > > of the whole src/test/recovery, more than any other test.
> > >
> > > And most of the time there's nothing happening - these are the two log
> > > messages showing the 60-second wait:
> > >
> > > 2025-11-13 21:12:39.949 CET checkpointer[562597] LOG:  checkpoint
> > > complete: wrote 9 buffers (7.0%), wrote 3 SLRU buffers; 0 WAL file(s)
> > > added, 0 removed, 2 recycled; write=0.906 s, sync=0.001 s, total=0.907
> > > s; sync files=0, longest=0.000 s, average=0.000 s; distance=32768 kB,
> > > estimate=32768 kB; lsn=0/040000B8, redo lsn=0/04000060
> > >
> > > 2025-11-13 21:13:38.994 CET client backend[562727] 049_wait_for_lsn.pl
> > > ERROR:  recovery is not in progress
> > >
> > > So there's a checkpoint, 60 seconds of nothing, and then a failure. I
> > > haven't looked into why it waits for 1 minute exactly, but adding 60
> > > seconds to check-world is somewhat annoying.
> >
> > Thanks for looking into this!
> >
> > I did a quick analysis for this prolonged waiting:
> >
> > In WaitLSNWakeup() (xlogwait.c:267), the fast-path check incorrectly
> > handled InvalidXLogRecPtr:
> > /* Fast path check */
> > if (pg_atomic_read_u64(&waitLSNState->minWaitedLSN[i]) > currentLSN)
> >     return;  // Issue: Returns early when currentLSN = 0
> >
> > When currentLSN = InvalidXLogRecPtr (0), meaning "wake all waiters",
> > the check compared:
> > - minWaitedLSN (e.g., 0x570CC048) > 0 → TRUE
> > - Result: function returned early without waking anyone
> >
> > When It Happened
> > During standby promotion, xlog.c:6246 calls:
> >
> > WaitLSNWakeup(WAIT_LSN_TYPE_REPLAY, InvalidXLogRecPtr);
> >
> > This should wake all LSN waiters, but the bug prevented it. WAIT FOR
> > LSN commands could wait indefinitely. Test 049_wait_for_lsn.pl took 68
> > seconds instead of ~9 seconds.
> >
> > if the above analysis is sound, the fix could be like:
> >
> > Proposed fix:
> > Added a validity check before the comparison:
> > /*
> >  * Fast path check.  Skip if currentLSN is InvalidXLogRecPtr, which means
> >  * "wake all waiters" (e.g., during promotion when recovery ends).
> >  */
> > if (XLogRecPtrIsValid(currentLSN) &&
> >     pg_atomic_read_u64(&waitLSNState->minWaitedLSN[i]) > currentLSN)
> >     return;
> >
> > Result:
> > Test time: 68s → 9s
> > WAIT FOR LSN exits immediately on promotion (62ms vs 60s)
> >
> > > While at it, I noticed a couple comments refer to WaitForLSNReplay, but
> > > but I think that got renamed simply to WaitForLSN.
> >
> > Please check the attached patch for replacing them.
>
> Thank you so much for your patches!
> Pushed with minor corrections.

Thanks for pushing! It appears I should be running pgindent more regularly :).


--
Best,
Xuneng

Re: Implement waiting for wal lsn replay: reloaded

От

Xuneng Zhou

Дата:

16 ноября 2025 г., 16:30:08

Hi!

On Sun, Nov 16, 2025 at 8:09 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
>
> On Sat, Nov 8, 2025 at 12:02 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > On Wed, Nov 5, 2025 at 4:03 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > > On Wed, Nov 5, 2025 at 5:51 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > > > On Mon, Nov 3, 2025 at 5:13 PM Andres Freund <andres@anarazel.de> wrote:
> > > > > On 2025-11-03 16:06:58 +0100, Álvaro Herrera wrote:
> > > > > > On 2025-Nov-03, Alexander Korotkov wrote:
> > > > > >
> > > > > > > I'd like to give this subject another chance for pg19.  I'm going to
> > > > > > > push this if no objections.
> > > > > >
> > > > > > Sure.  I don't understand why patches 0002 and 0003 are separate though.
> > > > >
> > > > > FWIW, I appreciate such splits. Even if the functionality isn't usable
> > > > > independently, it's still different type of code that's affected. And the
> > > > > patches are each big enough to make that worthwhile for easier review.
> > > >
> > > > Thank you for the feedback, pushed.
> > >
> > > Thanks for pushing them!
> > >
> > > > > One thing that'd be nice to do once we have WAIT FOR is to make the common
> > > > > case of wait_for_catchup() use this facility, instead of polling...
> > > >
> > > > The draft patch for that is attached.  WAIT FOR doesn't handle all the
> > > > possible use cases of wait_for_catchup(), but I've added usage when
> > > > it's appropriate.
> > >
> > > Interesting, could this approach be extended to the flush and other
> > > modes as well? I might need to spend some time to understand it before
> > > I can provide a meaningful review.
> >
> > I think we might end up extending WaitLSNType enum.  However, I hate
> > inHeap and heapNode arrays growing in WaitLSNProcInfo as they are
> > allocated per process.  I found that we could optimize WaitLSNProcInfo
> > struct turning them into simple variables because a single process can
> > wait only for a single LSN at a time.  Please, check the attached
> > patch.
>
> Here is the updated patch integrating minor corrections provided by
> Xuneng Zhou off-list.  I'm going to push this if no objections.
>
> ------
> Regards,
> Alexander Korotkov
> Supabase

LGTM. Thanks.

--
Best,
Xuneng

Re: Implement waiting for wal lsn replay: reloaded

От

Xuneng Zhou

Дата:

16 ноября 2025 г., 17:01:04

Hi!

On Sun, Nov 16, 2025 at 8:37 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
>
> On Wed, Nov 12, 2025 at 9:20 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > On Wed, Nov 5, 2025 at 5:51 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > > On Mon, Nov 3, 2025 at 5:13 PM Andres Freund <andres@anarazel.de> wrote:
> > > > On 2025-11-03 16:06:58 +0100, Álvaro Herrera wrote:
> > > > > On 2025-Nov-03, Alexander Korotkov wrote:
> > > > >
> > > > > > I'd like to give this subject another chance for pg19.  I'm going to
> > > > > > push this if no objections.
> > > > >
> > > > > Sure.  I don't understand why patches 0002 and 0003 are separate though.
> > > >
> > > > FWIW, I appreciate such splits. Even if the functionality isn't usable
> > > > independently, it's still different type of code that's affected. And the
> > > > patches are each big enough to make that worthwhile for easier review.
> > >
> > > Thank you for the feedback, pushed.
> > >
> > > > One thing that'd be nice to do once we have WAIT FOR is to make the common
> > > > case of wait_for_catchup() use this facility, instead of polling...
> > >
> > > The draft patch for that is attached.  WAIT FOR doesn't handle all the
> > > possible use cases of wait_for_catchup(), but I've added usage when
> > > it's appropriate.
> >
> > I tested the patch using make check-world, and it worked well. I also
> > made a few adjustments:
> >
> > - Added an unconditional chomp($isrecovery) after querying
> > pg_is_in_recovery() to prevent newline mismatches when $target_lsn is
> > accidently defined.
> > - Added chomp($output) to normalize the result from WAIT FOR LSN
> > before comparison.
> >
> > At the moment, the WAIT FOR LSN command supports only the replay mode.
> > If we intend to extend its functionality more broadly, one option is
> > to add a mode option or something similar. Are users expected to wait
> > for flush(or others) completion in such cases? If not, and the TAP
> > test is the only intended use, this approach might be a bit of an
> > overkill.
>
> I would say that adding mode parameter seems to be a pretty natural
> extension of what we have at the moment.  I can imagine some
> clustering solution can use it to wait for certain transaction to be
> flushed at the replica (without delaying the commit at the primary).
>
> ------
> Regards,
> Alexander Korotkov
> Supabase

Makes sense. I'll play with it and try to prepare a follow-up patch.

--
Best,
Xuneng

Re: Implement waiting for wal lsn replay: reloaded

От

Alexander Korotkov

Дата:

16 ноября 2025 г., 18:20:25

On Sun, Nov 16, 2025 at 3:25 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> On Sat, Nov 15, 2025 at 6:29 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > Thank you so much for your patches!
> > Pushed with minor corrections.
>
> Thanks for pushing! It appears I should be running pgindent more regularly :).

Thank you.  pgindent is not a problem for me, cause I anyway run it
every time before pushing a patch.  But yes, if you make it a habit to
run pgindent every time before publishing a patch, it would become
cleaner.

------
Regards,
Alexander Korotkov
Supabase

Re: Implement waiting for wal lsn replay: reloaded

От

Xuneng Zhou

Дата:

20 ноября 2025 г., 15:43:06

Hi Alexander, Hackers,

On Sun, Nov 16, 2025 at 10:01 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
>
> Hi!
>
> On Sun, Nov 16, 2025 at 8:37 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> >
> > On Wed, Nov 12, 2025 at 9:20 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > > On Wed, Nov 5, 2025 at 5:51 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > > > On Mon, Nov 3, 2025 at 5:13 PM Andres Freund <andres@anarazel.de> wrote:
> > > > > On 2025-11-03 16:06:58 +0100, Álvaro Herrera wrote:
> > > > > > On 2025-Nov-03, Alexander Korotkov wrote:
> > > > > >
> > > > > > > I'd like to give this subject another chance for pg19.  I'm going to
> > > > > > > push this if no objections.
> > > > > >
> > > > > > Sure.  I don't understand why patches 0002 and 0003 are separate though.
> > > > >
> > > > > FWIW, I appreciate such splits. Even if the functionality isn't usable
> > > > > independently, it's still different type of code that's affected. And the
> > > > > patches are each big enough to make that worthwhile for easier review.
> > > >
> > > > Thank you for the feedback, pushed.
> > > >
> > > > > One thing that'd be nice to do once we have WAIT FOR is to make the common
> > > > > case of wait_for_catchup() use this facility, instead of polling...
> > > >
> > > > The draft patch for that is attached.  WAIT FOR doesn't handle all the
> > > > possible use cases of wait_for_catchup(), but I've added usage when
> > > > it's appropriate.
> > >
> > > I tested the patch using make check-world, and it worked well. I also
> > > made a few adjustments:
> > >
> > > - Added an unconditional chomp($isrecovery) after querying
> > > pg_is_in_recovery() to prevent newline mismatches when $target_lsn is
> > > accidently defined.
> > > - Added chomp($output) to normalize the result from WAIT FOR LSN
> > > before comparison.
> > >
> > > At the moment, the WAIT FOR LSN command supports only the replay mode.
> > > If we intend to extend its functionality more broadly, one option is
> > > to add a mode option or something similar. Are users expected to wait
> > > for flush(or others) completion in such cases? If not, and the TAP
> > > test is the only intended use, this approach might be a bit of an
> > > overkill.
> >
> > I would say that adding mode parameter seems to be a pretty natural
> > extension of what we have at the moment.  I can imagine some
> > clustering solution can use it to wait for certain transaction to be
> > flushed at the replica (without delaying the commit at the primary).
> >
> > ------
> > Regards,
> > Alexander Korotkov
> > Supabase
>
> Makes sense. I'll play with it and try to prepare a follow-up patch.
>
> --
> Best,
> Xuneng

In terms of extending the functionality of the command, I see two
possible approaches here. One is to keep mode as a mandatory keyword,
and the other is to introduce it as an option in the WITH clause.

Syntax Option A: Mode in the WITH Clause

WAIT FOR LSN '0/12345' WITH (mode = 'replay');
WAIT FOR LSN '0/12345' WITH (mode = 'flush');
WAIT FOR LSN '0/12345' WITH (mode = 'write');

With this option, we can keep "replay" as the default mode. That means
existing TAP tests won’t need to be refactored unless they explicitly
want a different mode.

Syntax Option B: Mode as Part of the Main Command

WAIT FOR LSN '0/12345' MODE 'replay';
WAIT FOR LSN '0/12345' MODE 'flush';
WAIT FOR LSN '0/12345' MODE 'write';

Or a more concise variant using keywords:

WAIT FOR LSN '0/12345' REPLAY;
WAIT FOR LSN '0/12345' FLUSH;
WAIT FOR LSN '0/12345' WRITE;

This option produces a cleaner syntax if the intent is simply to wait
for a particular LSN type, without specifying additional options like
timeout or no_throw.

I don’t have a clear preference among them. I’d be interested to hear
what you or others think is the better direction.


--
Best,
Xuneng

Re: Implement waiting for wal lsn replay: reloaded

От

Xuneng Zhou

Дата:

25 ноября 2025 г., 14:51:19

Hi!

> > > > At the moment, the WAIT FOR LSN command supports only the replay mode.
> > > > If we intend to extend its functionality more broadly, one option is
> > > > to add a mode option or something similar. Are users expected to wait
> > > > for flush(or others) completion in such cases? If not, and the TAP
> > > > test is the only intended use, this approach might be a bit of an
> > > > overkill.
> > >
> > > I would say that adding mode parameter seems to be a pretty natural
> > > extension of what we have at the moment.  I can imagine some
> > > clustering solution can use it to wait for certain transaction to be
> > > flushed at the replica (without delaying the commit at the primary).
> > >
> > > ------
> > > Regards,
> > > Alexander Korotkov
> > > Supabase
> >
> > Makes sense. I'll play with it and try to prepare a follow-up patch.
> >
> > --
> > Best,
> > Xuneng
>
> In terms of extending the functionality of the command, I see two
> possible approaches here. One is to keep mode as a mandatory keyword,
> and the other is to introduce it as an option in the WITH clause.
>
> Syntax Option A: Mode in the WITH Clause
>
> WAIT FOR LSN '0/12345' WITH (mode = 'replay');
> WAIT FOR LSN '0/12345' WITH (mode = 'flush');
> WAIT FOR LSN '0/12345' WITH (mode = 'write');
>
> With this option, we can keep "replay" as the default mode. That means
> existing TAP tests won’t need to be refactored unless they explicitly
> want a different mode.
>
> Syntax Option B: Mode as Part of the Main Command
>
> WAIT FOR LSN '0/12345' MODE 'replay';
> WAIT FOR LSN '0/12345' MODE 'flush';
> WAIT FOR LSN '0/12345' MODE 'write';
>
> Or a more concise variant using keywords:
>
> WAIT FOR LSN '0/12345' REPLAY;
> WAIT FOR LSN '0/12345' FLUSH;
> WAIT FOR LSN '0/12345' WRITE;
>
> This option produces a cleaner syntax if the intent is simply to wait
> for a particular LSN type, without specifying additional options like
> timeout or no_throw.
>
> I don’t have a clear preference among them. I’d be interested to hear
> what you or others think is the better direction.
>

I've implemented a patch that adds MODE support to WAIT FOR LSN

The new grammar looks like:

——
WAIT FOR LSN '<lsn>' [MODE { REPLAY | WRITE | FLUSH }] [WITH (...)]
——

Two modes added: flush and write

Design decisions:

1. MODE as a separate keyword (not in WITH clause) - This follows the
pattern used by LOCK command. It also makes the common case more
concise.

2. REPLAY as the default - When MODE is not specified, it defaults to REPLAY.

3. Keywords rather than strings - Using `MODE WRITE` rather than `MODE 'write'`

The patch set includes:
-------
0001 - Extend xlogwait infrastructure with write and flush wait types

Adds WAIT_LSN_TYPE_WRITE and WAIT_LSN_TYPE_FLUSH to WaitLSNType enum,
along with corresponding wait events and pairing heaps. Introduces
GetCurrentLSNForWaitType() to retrieve the appropriate LSN based on
wait type, and adds wakeup calls in walreceiver for write/flush
events.

-------
0002 - Add pg_last_wal_write_lsn() SQL function

Adds a new SQL function that returns the current WAL write position on
a standby using GetWalRcvWriteRecPtr(). This complements existing
pg_last_wal_receive_lsn() (flush) and pg_last_wal_replay_lsn()
functions, enabling verification of WAIT FOR LSN MODE WRITE in TAP
tests.

-------
0003 - Add MODE parameter to WAIT FOR LSN command

Extends the parser and executor to support the optional MODE
parameter. Updates documentation with new syntax and mode
descriptions. Adds TAP tests covering all three modes including
mixed-mode concurrent waiters.

-------
0004 - Add tab completion for WAIT FOR LSN MODE parameter

Adds psql tab completion support: completes MODE after LSN value,
completes REPLAY/WRITE/FLUSH after MODE keyword, and completes WITH
after mode selection.

-------
0005 - Use WAIT FOR LSN in PostgreSQL::Test::Cluster::wait_for_catchup()

Replaces polling-based wait_for_catchup() with WAIT FOR LSN when the
target is a standby in recovery, improving test efficiency by avoiding
repeated queries.

The WRITE and FLUSH modes enable scenarios where applications need to
ensure WAL has been received or persisted on the standby without
waiting for replay to complete.

Feedback welcome.


--
Best,
Xuneng

Вложения

Re: Implement waiting for wal lsn replay: reloaded

От

Xuneng Zhou

Дата:

01 декабря 2025 г., 07:33:27

Hi hackers,

On Tue, Nov 25, 2025 at 7:51 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
>
> Hi!
>
> > > > > At the moment, the WAIT FOR LSN command supports only the replay mode.
> > > > > If we intend to extend its functionality more broadly, one option is
> > > > > to add a mode option or something similar. Are users expected to wait
> > > > > for flush(or others) completion in such cases? If not, and the TAP
> > > > > test is the only intended use, this approach might be a bit of an
> > > > > overkill.
> > > >
> > > > I would say that adding mode parameter seems to be a pretty natural
> > > > extension of what we have at the moment.  I can imagine some
> > > > clustering solution can use it to wait for certain transaction to be
> > > > flushed at the replica (without delaying the commit at the primary).
> > > >
> > > > ------
> > > > Regards,
> > > > Alexander Korotkov
> > > > Supabase
> > >
> > > Makes sense. I'll play with it and try to prepare a follow-up patch.
> > >
> > > --
> > > Best,
> > > Xuneng
> >
> > In terms of extending the functionality of the command, I see two
> > possible approaches here. One is to keep mode as a mandatory keyword,
> > and the other is to introduce it as an option in the WITH clause.
> >
> > Syntax Option A: Mode in the WITH Clause
> >
> > WAIT FOR LSN '0/12345' WITH (mode = 'replay');
> > WAIT FOR LSN '0/12345' WITH (mode = 'flush');
> > WAIT FOR LSN '0/12345' WITH (mode = 'write');
> >
> > With this option, we can keep "replay" as the default mode. That means
> > existing TAP tests won’t need to be refactored unless they explicitly
> > want a different mode.
> >
> > Syntax Option B: Mode as Part of the Main Command
> >
> > WAIT FOR LSN '0/12345' MODE 'replay';
> > WAIT FOR LSN '0/12345' MODE 'flush';
> > WAIT FOR LSN '0/12345' MODE 'write';
> >
> > Or a more concise variant using keywords:
> >
> > WAIT FOR LSN '0/12345' REPLAY;
> > WAIT FOR LSN '0/12345' FLUSH;
> > WAIT FOR LSN '0/12345' WRITE;
> >
> > This option produces a cleaner syntax if the intent is simply to wait
> > for a particular LSN type, without specifying additional options like
> > timeout or no_throw.
> >
> > I don’t have a clear preference among them. I’d be interested to hear
> > what you or others think is the better direction.
> >
>
> I've implemented a patch that adds MODE support to WAIT FOR LSN
>
> The new grammar looks like:
>
> ——
> WAIT FOR LSN '<lsn>' [MODE { REPLAY | WRITE | FLUSH }] [WITH (...)]
> ——
>
> Two modes added: flush and write
>
> Design decisions:
>
> 1. MODE as a separate keyword (not in WITH clause) - This follows the
> pattern used by LOCK command. It also makes the common case more
> concise.
>
> 2. REPLAY as the default - When MODE is not specified, it defaults to REPLAY.
>
> 3. Keywords rather than strings - Using `MODE WRITE` rather than `MODE 'write'`
>
> The patch set includes:
> -------
> 0001 - Extend xlogwait infrastructure with write and flush wait types
>
> Adds WAIT_LSN_TYPE_WRITE and WAIT_LSN_TYPE_FLUSH to WaitLSNType enum,
> along with corresponding wait events and pairing heaps. Introduces
> GetCurrentLSNForWaitType() to retrieve the appropriate LSN based on
> wait type, and adds wakeup calls in walreceiver for write/flush
> events.
>
> -------
> 0002 - Add pg_last_wal_write_lsn() SQL function
>
> Adds a new SQL function that returns the current WAL write position on
> a standby using GetWalRcvWriteRecPtr(). This complements existing
> pg_last_wal_receive_lsn() (flush) and pg_last_wal_replay_lsn()
> functions, enabling verification of WAIT FOR LSN MODE WRITE in TAP
> tests.
>
> -------
> 0003 - Add MODE parameter to WAIT FOR LSN command
>
> Extends the parser and executor to support the optional MODE
> parameter. Updates documentation with new syntax and mode
> descriptions. Adds TAP tests covering all three modes including
> mixed-mode concurrent waiters.
>
> -------
> 0004 - Add tab completion for WAIT FOR LSN MODE parameter
>
> Adds psql tab completion support: completes MODE after LSN value,
> completes REPLAY/WRITE/FLUSH after MODE keyword, and completes WITH
> after mode selection.
>
> -------
> 0005 - Use WAIT FOR LSN in PostgreSQL::Test::Cluster::wait_for_catchup()
>
> Replaces polling-based wait_for_catchup() with WAIT FOR LSN when the
> target is a standby in recovery, improving test efficiency by avoiding
> repeated queries.
>
> The WRITE and FLUSH modes enable scenarios where applications need to
> ensure WAL has been received or persisted on the standby without
> waiting for replay to complete.
>
> Feedback welcome.
>

Here is the updated v2 patch set. Most of the updates are in patch 3.

Changes from v1:

Patch 1 (Extend wait types in xlogwait infra)
- Renamed enum values for consistency (WAIT_LSN_TYPE_REPLAY →
WAIT_LSN_TYPE_REPLAY_STANDBY, etc.)

Patch 2 (pg_last_wal_write_lsn):
- Clarified documentation and comment
- Improved pg_proc.dat description

Patch 3 (MODE parameter):
- Replaced direct cast with explicit switch statement for WaitLSNMode
→ WaitLSNType conversion
- Improved FLUSH/WRITE mode documentation with verification function references
- TAP tests (7b, 7c, 7d): Added walreceiver control for concurrency,
explicit blocking verification via poll_query_until, and log-based
completion verification via wait_for_log
- Fix the timing issue in wait for all three sessions to get the
errors after promotion of tap test 8.

--
Best,
Xuneng

Вложения

Re: Implement waiting for wal lsn replay: reloaded

От

Xuneng Zhou

Дата:

02 декабря 2025 г., 06:08:01

Hi,

On Mon, Dec 1, 2025 at 12:33 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
>
> Hi hackers,
>
> On Tue, Nov 25, 2025 at 7:51 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> >
> > Hi!
> >
> > > > > > At the moment, the WAIT FOR LSN command supports only the replay mode.
> > > > > > If we intend to extend its functionality more broadly, one option is
> > > > > > to add a mode option or something similar. Are users expected to wait
> > > > > > for flush(or others) completion in such cases? If not, and the TAP
> > > > > > test is the only intended use, this approach might be a bit of an
> > > > > > overkill.
> > > > >
> > > > > I would say that adding mode parameter seems to be a pretty natural
> > > > > extension of what we have at the moment.  I can imagine some
> > > > > clustering solution can use it to wait for certain transaction to be
> > > > > flushed at the replica (without delaying the commit at the primary).
> > > > >
> > > > > ------
> > > > > Regards,
> > > > > Alexander Korotkov
> > > > > Supabase
> > > >
> > > > Makes sense. I'll play with it and try to prepare a follow-up patch.
> > > >
> > > > --
> > > > Best,
> > > > Xuneng
> > >
> > > In terms of extending the functionality of the command, I see two
> > > possible approaches here. One is to keep mode as a mandatory keyword,
> > > and the other is to introduce it as an option in the WITH clause.
> > >
> > > Syntax Option A: Mode in the WITH Clause
> > >
> > > WAIT FOR LSN '0/12345' WITH (mode = 'replay');
> > > WAIT FOR LSN '0/12345' WITH (mode = 'flush');
> > > WAIT FOR LSN '0/12345' WITH (mode = 'write');
> > >
> > > With this option, we can keep "replay" as the default mode. That means
> > > existing TAP tests won’t need to be refactored unless they explicitly
> > > want a different mode.
> > >
> > > Syntax Option B: Mode as Part of the Main Command
> > >
> > > WAIT FOR LSN '0/12345' MODE 'replay';
> > > WAIT FOR LSN '0/12345' MODE 'flush';
> > > WAIT FOR LSN '0/12345' MODE 'write';
> > >
> > > Or a more concise variant using keywords:
> > >
> > > WAIT FOR LSN '0/12345' REPLAY;
> > > WAIT FOR LSN '0/12345' FLUSH;
> > > WAIT FOR LSN '0/12345' WRITE;
> > >
> > > This option produces a cleaner syntax if the intent is simply to wait
> > > for a particular LSN type, without specifying additional options like
> > > timeout or no_throw.
> > >
> > > I don’t have a clear preference among them. I’d be interested to hear
> > > what you or others think is the better direction.
> > >
> >
> > I've implemented a patch that adds MODE support to WAIT FOR LSN
> >
> > The new grammar looks like:
> >
> > ——
> > WAIT FOR LSN '<lsn>' [MODE { REPLAY | WRITE | FLUSH }] [WITH (...)]
> > ——
> >
> > Two modes added: flush and write
> >
> > Design decisions:
> >
> > 1. MODE as a separate keyword (not in WITH clause) - This follows the
> > pattern used by LOCK command. It also makes the common case more
> > concise.
> >
> > 2. REPLAY as the default - When MODE is not specified, it defaults to REPLAY.
> >
> > 3. Keywords rather than strings - Using `MODE WRITE` rather than `MODE 'write'`
> >
> > The patch set includes:
> > -------
> > 0001 - Extend xlogwait infrastructure with write and flush wait types
> >
> > Adds WAIT_LSN_TYPE_WRITE and WAIT_LSN_TYPE_FLUSH to WaitLSNType enum,
> > along with corresponding wait events and pairing heaps. Introduces
> > GetCurrentLSNForWaitType() to retrieve the appropriate LSN based on
> > wait type, and adds wakeup calls in walreceiver for write/flush
> > events.
> >
> > -------
> > 0002 - Add pg_last_wal_write_lsn() SQL function
> >
> > Adds a new SQL function that returns the current WAL write position on
> > a standby using GetWalRcvWriteRecPtr(). This complements existing
> > pg_last_wal_receive_lsn() (flush) and pg_last_wal_replay_lsn()
> > functions, enabling verification of WAIT FOR LSN MODE WRITE in TAP
> > tests.
> >
> > -------
> > 0003 - Add MODE parameter to WAIT FOR LSN command
> >
> > Extends the parser and executor to support the optional MODE
> > parameter. Updates documentation with new syntax and mode
> > descriptions. Adds TAP tests covering all three modes including
> > mixed-mode concurrent waiters.
> >
> > -------
> > 0004 - Add tab completion for WAIT FOR LSN MODE parameter
> >
> > Adds psql tab completion support: completes MODE after LSN value,
> > completes REPLAY/WRITE/FLUSH after MODE keyword, and completes WITH
> > after mode selection.
> >
> > -------
> > 0005 - Use WAIT FOR LSN in PostgreSQL::Test::Cluster::wait_for_catchup()
> >
> > Replaces polling-based wait_for_catchup() with WAIT FOR LSN when the
> > target is a standby in recovery, improving test efficiency by avoiding
> > repeated queries.
> >
> > The WRITE and FLUSH modes enable scenarios where applications need to
> > ensure WAL has been received or persisted on the standby without
> > waiting for replay to complete.
> >
> > Feedback welcome.
> >
>
> Here is the updated v2 patch set. Most of the updates are in patch 3.
>
> Changes from v1:
>
> Patch 1 (Extend wait types in xlogwait infra)
> - Renamed enum values for consistency (WAIT_LSN_TYPE_REPLAY →
> WAIT_LSN_TYPE_REPLAY_STANDBY, etc.)
>
> Patch 2 (pg_last_wal_write_lsn):
> - Clarified documentation and comment
> - Improved pg_proc.dat description
>
> Patch 3 (MODE parameter):
> - Replaced direct cast with explicit switch statement for WaitLSNMode
> → WaitLSNType conversion
> - Improved FLUSH/WRITE mode documentation with verification function references
> - TAP tests (7b, 7c, 7d): Added walreceiver control for concurrency,
> explicit blocking verification via poll_query_until, and log-based
> completion verification via wait_for_log
> - Fix the timing issue in wait for all three sessions to get the
> errors after promotion of tap test 8.
>
> --
> Best,
> Xuneng

Here is the updated v3. The changes are made to patch 3:

- Refactor duplicated TAP test code by extracting helper routines for
starting and stopping walreceiver.
- Increase the number of concurrent WRITE and FLUSH waiters in tests
7b and 7c from three to five, matching the number in test 7a.

--
Best,
Xuneng

Вложения

Re: Implement waiting for wal lsn replay: reloaded

От

Xuneng Zhou

Дата:

02 декабря 2025 г., 13:10:02

Hi,

On Tue, Dec 2, 2025 at 11:08 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
>
> Hi,
>
> On Mon, Dec 1, 2025 at 12:33 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> >
> > Hi hackers,
> >
> > On Tue, Nov 25, 2025 at 7:51 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > >
> > > Hi!
> > >
> > > > > > > At the moment, the WAIT FOR LSN command supports only the replay mode.
> > > > > > > If we intend to extend its functionality more broadly, one option is
> > > > > > > to add a mode option or something similar. Are users expected to wait
> > > > > > > for flush(or others) completion in such cases? If not, and the TAP
> > > > > > > test is the only intended use, this approach might be a bit of an
> > > > > > > overkill.
> > > > > >
> > > > > > I would say that adding mode parameter seems to be a pretty natural
> > > > > > extension of what we have at the moment.  I can imagine some
> > > > > > clustering solution can use it to wait for certain transaction to be
> > > > > > flushed at the replica (without delaying the commit at the primary).
> > > > > >
> > > > > > ------
> > > > > > Regards,
> > > > > > Alexander Korotkov
> > > > > > Supabase
> > > > >
> > > > > Makes sense. I'll play with it and try to prepare a follow-up patch.
> > > > >
> > > > > --
> > > > > Best,
> > > > > Xuneng
> > > >
> > > > In terms of extending the functionality of the command, I see two
> > > > possible approaches here. One is to keep mode as a mandatory keyword,
> > > > and the other is to introduce it as an option in the WITH clause.
> > > >
> > > > Syntax Option A: Mode in the WITH Clause
> > > >
> > > > WAIT FOR LSN '0/12345' WITH (mode = 'replay');
> > > > WAIT FOR LSN '0/12345' WITH (mode = 'flush');
> > > > WAIT FOR LSN '0/12345' WITH (mode = 'write');
> > > >
> > > > With this option, we can keep "replay" as the default mode. That means
> > > > existing TAP tests won’t need to be refactored unless they explicitly
> > > > want a different mode.
> > > >
> > > > Syntax Option B: Mode as Part of the Main Command
> > > >
> > > > WAIT FOR LSN '0/12345' MODE 'replay';
> > > > WAIT FOR LSN '0/12345' MODE 'flush';
> > > > WAIT FOR LSN '0/12345' MODE 'write';
> > > >
> > > > Or a more concise variant using keywords:
> > > >
> > > > WAIT FOR LSN '0/12345' REPLAY;
> > > > WAIT FOR LSN '0/12345' FLUSH;
> > > > WAIT FOR LSN '0/12345' WRITE;
> > > >
> > > > This option produces a cleaner syntax if the intent is simply to wait
> > > > for a particular LSN type, without specifying additional options like
> > > > timeout or no_throw.
> > > >
> > > > I don’t have a clear preference among them. I’d be interested to hear
> > > > what you or others think is the better direction.
> > > >
> > >
> > > I've implemented a patch that adds MODE support to WAIT FOR LSN
> > >
> > > The new grammar looks like:
> > >
> > > ——
> > > WAIT FOR LSN '<lsn>' [MODE { REPLAY | WRITE | FLUSH }] [WITH (...)]
> > > ——
> > >
> > > Two modes added: flush and write
> > >
> > > Design decisions:
> > >
> > > 1. MODE as a separate keyword (not in WITH clause) - This follows the
> > > pattern used by LOCK command. It also makes the common case more
> > > concise.
> > >
> > > 2. REPLAY as the default - When MODE is not specified, it defaults to REPLAY.
> > >
> > > 3. Keywords rather than strings - Using `MODE WRITE` rather than `MODE 'write'`
> > >
> > > The patch set includes:
> > > -------
> > > 0001 - Extend xlogwait infrastructure with write and flush wait types
> > >
> > > Adds WAIT_LSN_TYPE_WRITE and WAIT_LSN_TYPE_FLUSH to WaitLSNType enum,
> > > along with corresponding wait events and pairing heaps. Introduces
> > > GetCurrentLSNForWaitType() to retrieve the appropriate LSN based on
> > > wait type, and adds wakeup calls in walreceiver for write/flush
> > > events.
> > >
> > > -------
> > > 0002 - Add pg_last_wal_write_lsn() SQL function
> > >
> > > Adds a new SQL function that returns the current WAL write position on
> > > a standby using GetWalRcvWriteRecPtr(). This complements existing
> > > pg_last_wal_receive_lsn() (flush) and pg_last_wal_replay_lsn()
> > > functions, enabling verification of WAIT FOR LSN MODE WRITE in TAP
> > > tests.
> > >
> > > -------
> > > 0003 - Add MODE parameter to WAIT FOR LSN command
> > >
> > > Extends the parser and executor to support the optional MODE
> > > parameter. Updates documentation with new syntax and mode
> > > descriptions. Adds TAP tests covering all three modes including
> > > mixed-mode concurrent waiters.
> > >
> > > -------
> > > 0004 - Add tab completion for WAIT FOR LSN MODE parameter
> > >
> > > Adds psql tab completion support: completes MODE after LSN value,
> > > completes REPLAY/WRITE/FLUSH after MODE keyword, and completes WITH
> > > after mode selection.
> > >
> > > -------
> > > 0005 - Use WAIT FOR LSN in PostgreSQL::Test::Cluster::wait_for_catchup()
> > >
> > > Replaces polling-based wait_for_catchup() with WAIT FOR LSN when the
> > > target is a standby in recovery, improving test efficiency by avoiding
> > > repeated queries.
> > >
> > > The WRITE and FLUSH modes enable scenarios where applications need to
> > > ensure WAL has been received or persisted on the standby without
> > > waiting for replay to complete.
> > >
> > > Feedback welcome.
> > >
> >
> > Here is the updated v2 patch set. Most of the updates are in patch 3.
> >
> > Changes from v1:
> >
> > Patch 1 (Extend wait types in xlogwait infra)
> > - Renamed enum values for consistency (WAIT_LSN_TYPE_REPLAY →
> > WAIT_LSN_TYPE_REPLAY_STANDBY, etc.)
> >
> > Patch 2 (pg_last_wal_write_lsn):
> > - Clarified documentation and comment
> > - Improved pg_proc.dat description
> >
> > Patch 3 (MODE parameter):
> > - Replaced direct cast with explicit switch statement for WaitLSNMode
> > → WaitLSNType conversion
> > - Improved FLUSH/WRITE mode documentation with verification function references
> > - TAP tests (7b, 7c, 7d): Added walreceiver control for concurrency,
> > explicit blocking verification via poll_query_until, and log-based
> > completion verification via wait_for_log
> > - Fix the timing issue in wait for all three sessions to get the
> > errors after promotion of tap test 8.
> >
> > --
> > Best,
> > Xuneng
>
> Here is the updated v3. The changes are made to patch 3:
>
> - Refactor duplicated TAP test code by extracting helper routines for
> starting and stopping walreceiver.
> - Increase the number of concurrent WRITE and FLUSH waiters in tests
> 7b and 7c from three to five, matching the number in test 7a.
>
> --
> Best,
> Xuneng

Just realized that patch 2 in prior emails could be dropped for
simplicity. Since the write LSN can be retrieved directly from
pg_stat_wal_receiver, the TAP test in patch 3 does not require a
separate SQL function for this purpose alone.


--
Best,
Xuneng

Вложения

Re: Implement waiting for wal lsn replay: reloaded

От

Xuneng Zhou

Дата:

16 декабря 2025 г., 06:28:51

Hi,

On Tue, Dec 2, 2025 at 6:10 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
>
> Hi,
>
> On Tue, Dec 2, 2025 at 11:08 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> >
> > Hi,
> >
> > On Mon, Dec 1, 2025 at 12:33 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > >
> > > Hi hackers,
> > >
> > > On Tue, Nov 25, 2025 at 7:51 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > > >
> > > > Hi!
> > > >
> > > > > > > > At the moment, the WAIT FOR LSN command supports only the replay mode.
> > > > > > > > If we intend to extend its functionality more broadly, one option is
> > > > > > > > to add a mode option or something similar. Are users expected to wait
> > > > > > > > for flush(or others) completion in such cases? If not, and the TAP
> > > > > > > > test is the only intended use, this approach might be a bit of an
> > > > > > > > overkill.
> > > > > > >
> > > > > > > I would say that adding mode parameter seems to be a pretty natural
> > > > > > > extension of what we have at the moment.  I can imagine some
> > > > > > > clustering solution can use it to wait for certain transaction to be
> > > > > > > flushed at the replica (without delaying the commit at the primary).
> > > > > > >
> > > > > > > ------
> > > > > > > Regards,
> > > > > > > Alexander Korotkov
> > > > > > > Supabase
> > > > > >
> > > > > > Makes sense. I'll play with it and try to prepare a follow-up patch.
> > > > > >
> > > > > > --
> > > > > > Best,
> > > > > > Xuneng
> > > > >
> > > > > In terms of extending the functionality of the command, I see two
> > > > > possible approaches here. One is to keep mode as a mandatory keyword,
> > > > > and the other is to introduce it as an option in the WITH clause.
> > > > >
> > > > > Syntax Option A: Mode in the WITH Clause
> > > > >
> > > > > WAIT FOR LSN '0/12345' WITH (mode = 'replay');
> > > > > WAIT FOR LSN '0/12345' WITH (mode = 'flush');
> > > > > WAIT FOR LSN '0/12345' WITH (mode = 'write');
> > > > >
> > > > > With this option, we can keep "replay" as the default mode. That means
> > > > > existing TAP tests won’t need to be refactored unless they explicitly
> > > > > want a different mode.
> > > > >
> > > > > Syntax Option B: Mode as Part of the Main Command
> > > > >
> > > > > WAIT FOR LSN '0/12345' MODE 'replay';
> > > > > WAIT FOR LSN '0/12345' MODE 'flush';
> > > > > WAIT FOR LSN '0/12345' MODE 'write';
> > > > >
> > > > > Or a more concise variant using keywords:
> > > > >
> > > > > WAIT FOR LSN '0/12345' REPLAY;
> > > > > WAIT FOR LSN '0/12345' FLUSH;
> > > > > WAIT FOR LSN '0/12345' WRITE;
> > > > >
> > > > > This option produces a cleaner syntax if the intent is simply to wait
> > > > > for a particular LSN type, without specifying additional options like
> > > > > timeout or no_throw.
> > > > >
> > > > > I don’t have a clear preference among them. I’d be interested to hear
> > > > > what you or others think is the better direction.
> > > > >
> > > >
> > > > I've implemented a patch that adds MODE support to WAIT FOR LSN
> > > >
> > > > The new grammar looks like:
> > > >
> > > > ——
> > > > WAIT FOR LSN '<lsn>' [MODE { REPLAY | WRITE | FLUSH }] [WITH (...)]
> > > > ——
> > > >
> > > > Two modes added: flush and write
> > > >
> > > > Design decisions:
> > > >
> > > > 1. MODE as a separate keyword (not in WITH clause) - This follows the
> > > > pattern used by LOCK command. It also makes the common case more
> > > > concise.
> > > >
> > > > 2. REPLAY as the default - When MODE is not specified, it defaults to REPLAY.
> > > >
> > > > 3. Keywords rather than strings - Using `MODE WRITE` rather than `MODE 'write'`
> > > >
> > > > The patch set includes:
> > > > -------
> > > > 0001 - Extend xlogwait infrastructure with write and flush wait types
> > > >
> > > > Adds WAIT_LSN_TYPE_WRITE and WAIT_LSN_TYPE_FLUSH to WaitLSNType enum,
> > > > along with corresponding wait events and pairing heaps. Introduces
> > > > GetCurrentLSNForWaitType() to retrieve the appropriate LSN based on
> > > > wait type, and adds wakeup calls in walreceiver for write/flush
> > > > events.
> > > >
> > > > -------
> > > > 0002 - Add pg_last_wal_write_lsn() SQL function
> > > >
> > > > Adds a new SQL function that returns the current WAL write position on
> > > > a standby using GetWalRcvWriteRecPtr(). This complements existing
> > > > pg_last_wal_receive_lsn() (flush) and pg_last_wal_replay_lsn()
> > > > functions, enabling verification of WAIT FOR LSN MODE WRITE in TAP
> > > > tests.
> > > >
> > > > -------
> > > > 0003 - Add MODE parameter to WAIT FOR LSN command
> > > >
> > > > Extends the parser and executor to support the optional MODE
> > > > parameter. Updates documentation with new syntax and mode
> > > > descriptions. Adds TAP tests covering all three modes including
> > > > mixed-mode concurrent waiters.
> > > >
> > > > -------
> > > > 0004 - Add tab completion for WAIT FOR LSN MODE parameter
> > > >
> > > > Adds psql tab completion support: completes MODE after LSN value,
> > > > completes REPLAY/WRITE/FLUSH after MODE keyword, and completes WITH
> > > > after mode selection.
> > > >
> > > > -------
> > > > 0005 - Use WAIT FOR LSN in PostgreSQL::Test::Cluster::wait_for_catchup()
> > > >
> > > > Replaces polling-based wait_for_catchup() with WAIT FOR LSN when the
> > > > target is a standby in recovery, improving test efficiency by avoiding
> > > > repeated queries.
> > > >
> > > > The WRITE and FLUSH modes enable scenarios where applications need to
> > > > ensure WAL has been received or persisted on the standby without
> > > > waiting for replay to complete.
> > > >
> > > > Feedback welcome.
> > > >
> > >
> > > Here is the updated v2 patch set. Most of the updates are in patch 3.
> > >
> > > Changes from v1:
> > >
> > > Patch 1 (Extend wait types in xlogwait infra)
> > > - Renamed enum values for consistency (WAIT_LSN_TYPE_REPLAY →
> > > WAIT_LSN_TYPE_REPLAY_STANDBY, etc.)
> > >
> > > Patch 2 (pg_last_wal_write_lsn):
> > > - Clarified documentation and comment
> > > - Improved pg_proc.dat description
> > >
> > > Patch 3 (MODE parameter):
> > > - Replaced direct cast with explicit switch statement for WaitLSNMode
> > > → WaitLSNType conversion
> > > - Improved FLUSH/WRITE mode documentation with verification function references
> > > - TAP tests (7b, 7c, 7d): Added walreceiver control for concurrency,
> > > explicit blocking verification via poll_query_until, and log-based
> > > completion verification via wait_for_log
> > > - Fix the timing issue in wait for all three sessions to get the
> > > errors after promotion of tap test 8.
> > >
> > > --
> > > Best,
> > > Xuneng
> >
> > Here is the updated v3. The changes are made to patch 3:
> >
> > - Refactor duplicated TAP test code by extracting helper routines for
> > starting and stopping walreceiver.
> > - Increase the number of concurrent WRITE and FLUSH waiters in tests
> > 7b and 7c from three to five, matching the number in test 7a.
> >
> > --
> > Best,
> > Xuneng
>
> Just realized that patch 2 in prior emails could be dropped for
> simplicity. Since the write LSN can be retrieved directly from
> pg_stat_wal_receiver, the TAP test in patch 3 does not require a
> separate SQL function for this purpose alone.
>

Just rebase with minor changes to the wait-lsn types.

--
Best,
Xuneng

Вложения

Re: Implement waiting for wal lsn replay: reloaded

От

Xuneng Zhou

Дата:

16 декабря 2025 г., 07:46:27

Hi,

On Tue, Dec 16, 2025 at 11:28 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
>
> Hi,
>
> On Tue, Dec 2, 2025 at 6:10 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> >
> > Hi,
> >
> > On Tue, Dec 2, 2025 at 11:08 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > >
> > > Hi,
> > >
> > > On Mon, Dec 1, 2025 at 12:33 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > > >
> > > > Hi hackers,
> > > >
> > > > On Tue, Nov 25, 2025 at 7:51 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > > > >
> > > > > Hi!
> > > > >
> > > > > > > > > At the moment, the WAIT FOR LSN command supports only the replay mode.
> > > > > > > > > If we intend to extend its functionality more broadly, one option is
> > > > > > > > > to add a mode option or something similar. Are users expected to wait
> > > > > > > > > for flush(or others) completion in such cases? If not, and the TAP
> > > > > > > > > test is the only intended use, this approach might be a bit of an
> > > > > > > > > overkill.
> > > > > > > >
> > > > > > > > I would say that adding mode parameter seems to be a pretty natural
> > > > > > > > extension of what we have at the moment.  I can imagine some
> > > > > > > > clustering solution can use it to wait for certain transaction to be
> > > > > > > > flushed at the replica (without delaying the commit at the primary).
> > > > > > > >
> > > > > > > > ------
> > > > > > > > Regards,
> > > > > > > > Alexander Korotkov
> > > > > > > > Supabase
> > > > > > >
> > > > > > > Makes sense. I'll play with it and try to prepare a follow-up patch.
> > > > > > >
> > > > > > > --
> > > > > > > Best,
> > > > > > > Xuneng
> > > > > >
> > > > > > In terms of extending the functionality of the command, I see two
> > > > > > possible approaches here. One is to keep mode as a mandatory keyword,
> > > > > > and the other is to introduce it as an option in the WITH clause.
> > > > > >
> > > > > > Syntax Option A: Mode in the WITH Clause
> > > > > >
> > > > > > WAIT FOR LSN '0/12345' WITH (mode = 'replay');
> > > > > > WAIT FOR LSN '0/12345' WITH (mode = 'flush');
> > > > > > WAIT FOR LSN '0/12345' WITH (mode = 'write');
> > > > > >
> > > > > > With this option, we can keep "replay" as the default mode. That means
> > > > > > existing TAP tests won’t need to be refactored unless they explicitly
> > > > > > want a different mode.
> > > > > >
> > > > > > Syntax Option B: Mode as Part of the Main Command
> > > > > >
> > > > > > WAIT FOR LSN '0/12345' MODE 'replay';
> > > > > > WAIT FOR LSN '0/12345' MODE 'flush';
> > > > > > WAIT FOR LSN '0/12345' MODE 'write';
> > > > > >
> > > > > > Or a more concise variant using keywords:
> > > > > >
> > > > > > WAIT FOR LSN '0/12345' REPLAY;
> > > > > > WAIT FOR LSN '0/12345' FLUSH;
> > > > > > WAIT FOR LSN '0/12345' WRITE;
> > > > > >
> > > > > > This option produces a cleaner syntax if the intent is simply to wait
> > > > > > for a particular LSN type, without specifying additional options like
> > > > > > timeout or no_throw.
> > > > > >
> > > > > > I don’t have a clear preference among them. I’d be interested to hear
> > > > > > what you or others think is the better direction.
> > > > > >
> > > > >
> > > > > I've implemented a patch that adds MODE support to WAIT FOR LSN
> > > > >
> > > > > The new grammar looks like:
> > > > >
> > > > > ——
> > > > > WAIT FOR LSN '<lsn>' [MODE { REPLAY | WRITE | FLUSH }] [WITH (...)]
> > > > > ——
> > > > >
> > > > > Two modes added: flush and write
> > > > >
> > > > > Design decisions:
> > > > >
> > > > > 1. MODE as a separate keyword (not in WITH clause) - This follows the
> > > > > pattern used by LOCK command. It also makes the common case more
> > > > > concise.
> > > > >
> > > > > 2. REPLAY as the default - When MODE is not specified, it defaults to REPLAY.
> > > > >
> > > > > 3. Keywords rather than strings - Using `MODE WRITE` rather than `MODE 'write'`
> > > > >
> > > > > The patch set includes:
> > > > > -------
> > > > > 0001 - Extend xlogwait infrastructure with write and flush wait types
> > > > >
> > > > > Adds WAIT_LSN_TYPE_WRITE and WAIT_LSN_TYPE_FLUSH to WaitLSNType enum,
> > > > > along with corresponding wait events and pairing heaps. Introduces
> > > > > GetCurrentLSNForWaitType() to retrieve the appropriate LSN based on
> > > > > wait type, and adds wakeup calls in walreceiver for write/flush
> > > > > events.
> > > > >
> > > > > -------
> > > > > 0002 - Add pg_last_wal_write_lsn() SQL function
> > > > >
> > > > > Adds a new SQL function that returns the current WAL write position on
> > > > > a standby using GetWalRcvWriteRecPtr(). This complements existing
> > > > > pg_last_wal_receive_lsn() (flush) and pg_last_wal_replay_lsn()
> > > > > functions, enabling verification of WAIT FOR LSN MODE WRITE in TAP
> > > > > tests.
> > > > >
> > > > > -------
> > > > > 0003 - Add MODE parameter to WAIT FOR LSN command
> > > > >
> > > > > Extends the parser and executor to support the optional MODE
> > > > > parameter. Updates documentation with new syntax and mode
> > > > > descriptions. Adds TAP tests covering all three modes including
> > > > > mixed-mode concurrent waiters.
> > > > >
> > > > > -------
> > > > > 0004 - Add tab completion for WAIT FOR LSN MODE parameter
> > > > >
> > > > > Adds psql tab completion support: completes MODE after LSN value,
> > > > > completes REPLAY/WRITE/FLUSH after MODE keyword, and completes WITH
> > > > > after mode selection.
> > > > >
> > > > > -------
> > > > > 0005 - Use WAIT FOR LSN in PostgreSQL::Test::Cluster::wait_for_catchup()
> > > > >
> > > > > Replaces polling-based wait_for_catchup() with WAIT FOR LSN when the
> > > > > target is a standby in recovery, improving test efficiency by avoiding
> > > > > repeated queries.
> > > > >
> > > > > The WRITE and FLUSH modes enable scenarios where applications need to
> > > > > ensure WAL has been received or persisted on the standby without
> > > > > waiting for replay to complete.
> > > > >
> > > > > Feedback welcome.
> > > > >
> > > >
> > > > Here is the updated v2 patch set. Most of the updates are in patch 3.
> > > >
> > > > Changes from v1:
> > > >
> > > > Patch 1 (Extend wait types in xlogwait infra)
> > > > - Renamed enum values for consistency (WAIT_LSN_TYPE_REPLAY →
> > > > WAIT_LSN_TYPE_REPLAY_STANDBY, etc.)
> > > >
> > > > Patch 2 (pg_last_wal_write_lsn):
> > > > - Clarified documentation and comment
> > > > - Improved pg_proc.dat description
> > > >
> > > > Patch 3 (MODE parameter):
> > > > - Replaced direct cast with explicit switch statement for WaitLSNMode
> > > > → WaitLSNType conversion
> > > > - Improved FLUSH/WRITE mode documentation with verification function references
> > > > - TAP tests (7b, 7c, 7d): Added walreceiver control for concurrency,
> > > > explicit blocking verification via poll_query_until, and log-based
> > > > completion verification via wait_for_log
> > > > - Fix the timing issue in wait for all three sessions to get the
> > > > errors after promotion of tap test 8.
> > > >
> > > > --
> > > > Best,
> > > > Xuneng
> > >
> > > Here is the updated v3. The changes are made to patch 3:
> > >
> > > - Refactor duplicated TAP test code by extracting helper routines for
> > > starting and stopping walreceiver.
> > > - Increase the number of concurrent WRITE and FLUSH waiters in tests
> > > 7b and 7c from three to five, matching the number in test 7a.
> > >
> > > --
> > > Best,
> > > Xuneng
> >
> > Just realized that patch 2 in prior emails could be dropped for
> > simplicity. Since the write LSN can be retrieved directly from
> > pg_stat_wal_receiver, the TAP test in patch 3 does not require a
> > separate SQL function for this purpose alone.
> >
>
> Just rebase with minor changes to the wait-lsn types.
>

Remove the erroneous WAIT_LSN_TYPE_COUNT case from the switch
statement in v5 patch 1.

--
Best,
Xuneng

Вложения

Re: Implement waiting for wal lsn replay: reloaded

От

Chao Li

Дата:

16 декабря 2025 г., 08:48:37


> On Oct 4, 2025, at 09:35, Xuneng Zhou <xunengzhou@gmail.com> wrote:
>
> Hi,
>
> On Sun, Sep 28, 2025 at 5:02 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
>>
>> Hi,
>>
>> On Fri, Sep 26, 2025 at 7:22 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
>>>
>>> Hi Álvaro,
>>>
>>> Thanks for your review.
>>>
>>> On Tue, Sep 16, 2025 at 4:24 AM Álvaro Herrera <alvherre@kurilemu.de> wrote:
>>>>
>>>> On 2025-Sep-15, Alexander Korotkov wrote:
>>>>
>>>>>> It's LGTM. The same pattern is observed in VACUUM, EXPLAIN, and CREATE
>>>>>> PUBLICATION - all use minimal grammar rules that produce generic
>>>>>> option lists, with the actual interpretation done in their respective
>>>>>> implementation files. The moderate complexity in wait.c seems
>>>>>> acceptable.
>>>>
>>>> Actually I find the code in ExecWaitStmt pretty unusual.  We tend to use
>>>> lists of DefElem (a name optionally followed by a value) instead of
>>>> individual scattered elements that must later be matched up.  Why not
>>>> use utility_option_list instead and then loop on the list of DefElems?
>>>> It'd be a lot simpler.
>>>
>>> I took a look at commands like VACUUM and EXPLAIN and they do follow
>>> this pattern. v11 will make use of utility_option_list.
>>>
>>>> Also, we've found that failing to surround the options by parens leads
>>>> to pain down the road, so maybe add that.  Given that the LSN seems to
>>>> be mandatory, maybe make it something like
>>>>
>>>> WAIT FOR LSN 'xy/zzy' [ WITH ( utility_option_list ) ]
>>>>
>>>> This requires that you make LSN a keyword, albeit unreserved.  Or you
>>>> could make it
>>>> WAIT FOR Ident [the rest]
>>>> and then ensure in C that the identifier matches the word LSN, such as
>>>> we do for "permissive" and "restrictive" in
>>>> RowSecurityDefaultPermissive.
>>>
>>> Shall make LSN an unreserved keyword as well.
>>
>> Here's the updated v11.  Many thanks Jian for off-list discussions and review.
>
> v12 removed unused
> +WaitStmt
> +WaitStmtParam in pgindent/typedefs.list.
>
> Best,
> Xuneng
> <v12-0001-Implement-WAIT-FOR-command.patch>

I just tried to review v12 but failed to “git am”. Can you please rebase the change?

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Re: Implement waiting for wal lsn replay: reloaded

От

Xuneng Zhou

Дата:

16 декабря 2025 г., 09:42:00

Hi,

On Tue, Dec 16, 2025 at 1:49 PM Chao Li <li.evan.chao@gmail.com> wrote:
>
>
>
> > On Oct 4, 2025, at 09:35, Xuneng Zhou <xunengzhou@gmail.com> wrote:
> >
> > Hi,
> >
> > On Sun, Sep 28, 2025 at 5:02 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> >>
> >> Hi,
> >>
> >> On Fri, Sep 26, 2025 at 7:22 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> >>>
> >>> Hi Álvaro,
> >>>
> >>> Thanks for your review.
> >>>
> >>> On Tue, Sep 16, 2025 at 4:24 AM Álvaro Herrera <alvherre@kurilemu.de> wrote:
> >>>>
> >>>> On 2025-Sep-15, Alexander Korotkov wrote:
> >>>>
> >>>>>> It's LGTM. The same pattern is observed in VACUUM, EXPLAIN, and CREATE
> >>>>>> PUBLICATION - all use minimal grammar rules that produce generic
> >>>>>> option lists, with the actual interpretation done in their respective
> >>>>>> implementation files. The moderate complexity in wait.c seems
> >>>>>> acceptable.
> >>>>
> >>>> Actually I find the code in ExecWaitStmt pretty unusual.  We tend to use
> >>>> lists of DefElem (a name optionally followed by a value) instead of
> >>>> individual scattered elements that must later be matched up.  Why not
> >>>> use utility_option_list instead and then loop on the list of DefElems?
> >>>> It'd be a lot simpler.
> >>>
> >>> I took a look at commands like VACUUM and EXPLAIN and they do follow
> >>> this pattern. v11 will make use of utility_option_list.
> >>>
> >>>> Also, we've found that failing to surround the options by parens leads
> >>>> to pain down the road, so maybe add that.  Given that the LSN seems to
> >>>> be mandatory, maybe make it something like
> >>>>
> >>>> WAIT FOR LSN 'xy/zzy' [ WITH ( utility_option_list ) ]
> >>>>
> >>>> This requires that you make LSN a keyword, albeit unreserved.  Or you
> >>>> could make it
> >>>> WAIT FOR Ident [the rest]
> >>>> and then ensure in C that the identifier matches the word LSN, such as
> >>>> we do for "permissive" and "restrictive" in
> >>>> RowSecurityDefaultPermissive.
> >>>
> >>> Shall make LSN an unreserved keyword as well.
> >>
> >> Here's the updated v11.  Many thanks Jian for off-list discussions and review.
> >
> > v12 removed unused
> > +WaitStmt
> > +WaitStmtParam in pgindent/typedefs.list.
> >
> > Best,
> > Xuneng
> > <v12-0001-Implement-WAIT-FOR-command.patch>
>
> I just tried to review v12 but failed to “git am”. Can you please rebase the change?
>

Thanks for looking into this.

That series of patches implementing the WAIT FOR REPLAY command was
applied last month (8af3ae0d , 447aae13, 3b4e53a0, a1f7f91b) in its
version 20. The current v6 patch set [1] [2] primarily extends the
WAIT FOR functionality to support waiting for flush and write LSNs on
a replica by adding a MODE parameter [3]. This made me wonder whether
it would be more appropriate to start a new thread for the extension,
though it is still part of the same WAIT FOR command.

[1] https://commitfest.postgresql.org/patch/6265/
[2] https://www.postgresql.org/message-id/CABPTF7XKti620ZAOXPGuhSZxCKyaV_9stq7ruhnuxvshUxCeRQ@mail.gmail.com
[3] https://www.postgresql.org/message-id/CAPpHfdt4b0wBC4+Oopp_eFQnNjDvxwQLrQ1r4GMJfCY0XWP0dA@mail.gmail.com

--
Best,
Xuneng

Re: Implement waiting for wal lsn replay: reloaded

От

Alexander Korotkov

Дата:

18 декабря 2025 г., 13:38:43

Hi, Xuneng!

On Tue, Dec 16, 2025 at 6:46 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> Remove the erroneous WAIT_LSN_TYPE_COUNT case from the switch
> statement in v5 patch 1.

Thank you for your work on this patchset.  Generally, it looks like
good and quite straightforward extension of the current functionality.
But this patch adds 4 new unreserved keywords to our grammar.  Do you
think we can put mode into with options clause?

------
Regards,
Alexander Korotkov

Re: Implement waiting for wal lsn replay: reloaded

От

Xuneng Zhou

Дата:

18 декабря 2025 г., 15:24:38

Hi Alexander,

On Thu, Dec 18, 2025 at 6:38 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
>
> Hi, Xuneng!
>
> On Tue, Dec 16, 2025 at 6:46 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > Remove the erroneous WAIT_LSN_TYPE_COUNT case from the switch
> > statement in v5 patch 1.
>
> Thank you for your work on this patchset.  Generally, it looks like
> good and quite straightforward extension of the current functionality.
> But this patch adds 4 new unreserved keywords to our grammar.  Do you
> think we can put mode into with options clause?
>

Thanks for pointing this out. Yeah, 4 unreserved keywords add
complexity to the parser and it may not be worthwhile since replay is
expected to be the common use scenario. Maybe we can do something like
this:

-- Default (REPLAY mode)
WAIT FOR LSN '0/306EE20' WITH (TIMEOUT '1s');

-- Explicit REPLAY mode
WAIT FOR LSN '0/306EE20' WITH (MODE 'replay', TIMEOUT '1s');

-- WRITE mode
WAIT FOR LSN '0/306EE20' WITH (MODE 'write', TIMEOUT '1s');

If no mode is set explicitly in the options clause, it defaults to
replay. I'll update the patch per your suggestion.

--
Best,
Xuneng

Re: Implement waiting for wal lsn replay: reloaded

От

Alexander Korotkov

Дата:

18 декабря 2025 г., 15:25:43

On Thu, Dec 18, 2025 at 2:24 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> On Thu, Dec 18, 2025 at 6:38 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> >
> > Hi, Xuneng!
> >
> > On Tue, Dec 16, 2025 at 6:46 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > > Remove the erroneous WAIT_LSN_TYPE_COUNT case from the switch
> > > statement in v5 patch 1.
> >
> > Thank you for your work on this patchset.  Generally, it looks like
> > good and quite straightforward extension of the current functionality.
> > But this patch adds 4 new unreserved keywords to our grammar.  Do you
> > think we can put mode into with options clause?
> >
>
> Thanks for pointing this out. Yeah, 4 unreserved keywords add
> complexity to the parser and it may not be worthwhile since replay is
> expected to be the common use scenario. Maybe we can do something like
> this:
>
> -- Default (REPLAY mode)
> WAIT FOR LSN '0/306EE20' WITH (TIMEOUT '1s');
>
> -- Explicit REPLAY mode
> WAIT FOR LSN '0/306EE20' WITH (MODE 'replay', TIMEOUT '1s');
>
> -- WRITE mode
> WAIT FOR LSN '0/306EE20' WITH (MODE 'write', TIMEOUT '1s');
>
> If no mode is set explicitly in the options clause, it defaults to
> replay. I'll update the patch per your suggestion.

This is exactly what I meant.  Please, go ahead.

------
Regards,
Alexander Korotkov
Supabase

Re: Implement waiting for wal lsn replay: reloaded

От

Xuneng Zhou

Дата:

19 декабря 2025 г., 05:49:46

Hi,

On Thu, Dec 18, 2025 at 8:25 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
>
> On Thu, Dec 18, 2025 at 2:24 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > On Thu, Dec 18, 2025 at 6:38 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > >
> > > Hi, Xuneng!
> > >
> > > On Tue, Dec 16, 2025 at 6:46 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > > > Remove the erroneous WAIT_LSN_TYPE_COUNT case from the switch
> > > > statement in v5 patch 1.
> > >
> > > Thank you for your work on this patchset.  Generally, it looks like
> > > good and quite straightforward extension of the current functionality.
> > > But this patch adds 4 new unreserved keywords to our grammar.  Do you
> > > think we can put mode into with options clause?
> > >
> >
> > Thanks for pointing this out. Yeah, 4 unreserved keywords add
> > complexity to the parser and it may not be worthwhile since replay is
> > expected to be the common use scenario. Maybe we can do something like
> > this:
> >
> > -- Default (REPLAY mode)
> > WAIT FOR LSN '0/306EE20' WITH (TIMEOUT '1s');
> >
> > -- Explicit REPLAY mode
> > WAIT FOR LSN '0/306EE20' WITH (MODE 'replay', TIMEOUT '1s');
> >
> > -- WRITE mode
> > WAIT FOR LSN '0/306EE20' WITH (MODE 'write', TIMEOUT '1s');
> >
> > If no mode is set explicitly in the options clause, it defaults to
> > replay. I'll update the patch per your suggestion.
>
> This is exactly what I meant.  Please, go ahead.
>

Here is the updated patch set (v7). Please check.

--
Best,
Xuneng

Вложения

Re: Implement waiting for wal lsn replay: reloaded

От

Alexander Korotkov

Дата:

21 декабря 2025 г., 00:09:43

On Fri, Dec 19, 2025 at 4:50 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> On Thu, Dec 18, 2025 at 8:25 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> >
> > On Thu, Dec 18, 2025 at 2:24 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > > On Thu, Dec 18, 2025 at 6:38 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > > >
> > > > Hi, Xuneng!
> > > >
> > > > On Tue, Dec 16, 2025 at 6:46 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > > > > Remove the erroneous WAIT_LSN_TYPE_COUNT case from the switch
> > > > > statement in v5 patch 1.
> > > >
> > > > Thank you for your work on this patchset.  Generally, it looks like
> > > > good and quite straightforward extension of the current functionality.
> > > > But this patch adds 4 new unreserved keywords to our grammar.  Do you
> > > > think we can put mode into with options clause?
> > > >
> > >
> > > Thanks for pointing this out. Yeah, 4 unreserved keywords add
> > > complexity to the parser and it may not be worthwhile since replay is
> > > expected to be the common use scenario. Maybe we can do something like
> > > this:
> > >
> > > -- Default (REPLAY mode)
> > > WAIT FOR LSN '0/306EE20' WITH (TIMEOUT '1s');
> > >
> > > -- Explicit REPLAY mode
> > > WAIT FOR LSN '0/306EE20' WITH (MODE 'replay', TIMEOUT '1s');
> > >
> > > -- WRITE mode
> > > WAIT FOR LSN '0/306EE20' WITH (MODE 'write', TIMEOUT '1s');
> > >
> > > If no mode is set explicitly in the options clause, it defaults to
> > > replay. I'll update the patch per your suggestion.
> >
> > This is exactly what I meant.  Please, go ahead.
> >
>
> Here is the updated patch set (v7). Please check.

I see that we can't specify WAIT_LSN_TYPE_PRIMARY_FLUSH by setting
mode parameter.  Should we allow this?

If we allow specifying WAIT_LSN_TYPE_PRIMARY_FLUSH, should it be
separate mode value or the same with WAIT_LSN_TYPE_STANDBY_FLUSH?  In
principle, we could encode both as just 'flush' mode, and detect which
WaitLSNType to pick by checking if recovery is in progress.  However,
how should we then react to unreached flush location after standby
promotion (technically it could be still reached but on the different
timeline)?

------
Regards,
Alexander Korotkov
Supabase

Re: Implement waiting for wal lsn replay: reloaded

От

Xuneng Zhou

Дата:

21 декабря 2025 г., 07:37:18

Hi Alexander,

Thanks for your feedback!

> I see that we can't specify WAIT_LSN_TYPE_PRIMARY_FLUSH by setting
> mode parameter.  Should we allow this?

I think this constraint could be relaxed if needed. I was previously
unsure about the use cases.

> If we allow specifying WAIT_LSN_TYPE_PRIMARY_FLUSH, should it be
> separate mode value or the same with WAIT_LSN_TYPE_STANDBY_FLUSH?  In
> principle, we could encode both as just 'flush' mode, and detect which
> WaitLSNType to pick by checking if recovery is in progress.  However,
> how should we then react to unreached flush location after standby
> promotion (technically it could be still reached but on the different
> timeline)?
>

Technically, we can use 'flush' mode to specify WAIT FOR behavior in
both primary and replica. Currently, wait for commands error out if
promotion occurs since: either the requested LSN type does not exist
on the primary, or we do not yet have the infrastructure to support
continuing the wait. If we allow waiting for flush on the primary as a
user-visible command and the wake-up calls for flush in primary are
introduced, the question becomes whether we should still abort the
wait on promotion, or continue waiting—as you noted—given that the
target LSN might still be reached, albeit on a different timeline. The
question behind this might be: do users care and should be aware of
the state change of the server while waiting? If they do, then we
better stop the waiting and report the error. In this case, I am
inclined to to break the unified flush mode to something like
primary_flush/standby_flush mode and
WAIT_LSN_TYPE_PRIMARY_FLUSH/WAIT_LSN_TYPE_STANDBY_FLUSH respectively.

--
Best,
Xuneng

Re: Implement waiting for wal lsn replay: reloaded

От

Xuneng Zhou

Дата:

22 декабря 2025 г., 10:56:59

Hi,

On Sun, Dec 21, 2025 at 12:37 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
>
> Hi Alexander,
>
> Thanks for your feedback!
>
> > I see that we can't specify WAIT_LSN_TYPE_PRIMARY_FLUSH by setting
> > mode parameter.  Should we allow this?
>
> I think this constraint could be relaxed if needed. I was previously
> unsure about the use cases.

Flush mode on the primary seems useful when synchronous_commit is set
to off [1]. In that mode, a transaction in primary may return success
before its WAL is durably flushed to disk, trading durability for
lower latency. A “wait for primary flush” operation provides an
explicit durability barrier for cases where applications or tools
occasionally need stronger guarantees.

[1] https://postgresqlco.nf/doc/en/param/synchronous_commit/

> > If we allow specifying WAIT_LSN_TYPE_PRIMARY_FLUSH, should it be
> > separate mode value or the same with WAIT_LSN_TYPE_STANDBY_FLUSH?  In
> > principle, we could encode both as just 'flush' mode, and detect which
> > WaitLSNType to pick by checking if recovery is in progress.  However,
> > how should we then react to unreached flush location after standby
> > promotion (technically it could be still reached but on the different
> > timeline)?
> >
>
> Technically, we can use 'flush' mode to specify WAIT FOR behavior in
> both primary and replica. Currently, wait for commands error out if
> promotion occurs since: either the requested LSN type does not exist
> on the primary, or we do not yet have the infrastructure to support
> continuing the wait. If we allow waiting for flush on the primary as a
> user-visible command and the wake-up calls for flush in primary are
> introduced, the question becomes whether we should still abort the
> wait on promotion, or continue waiting—as you noted—given that the
> target LSN might still be reached, albeit on a different timeline. The
> question behind this might be: do users care and should be aware of
> the state change of the server while waiting? If they do, then we
> better stop the waiting and report the error. In this case, I am
> inclined to to break the unified flush mode to something like
> primary_flush/standby_flush mode and
> WAIT_LSN_TYPE_PRIMARY_FLUSH/WAIT_LSN_TYPE_STANDBY_FLUSH respectively.
>

After further consideration, it also seems reasonable to use a single,
unified flush mode that works on both primary and standby servers,
provided its semantics are clearly documented to avoid the potential
confusion on failure. I don’t have a strong preference between these
two and would be interested in your thoughts.

If a standby is promoted while a session is waiting, the command
better abort and return an error (or report “not in recovery” when
using NO_THROW). At that point, the meaning of the LSN being waited
for may have changed due to the timeline switch and the transition
from standby to primary. An LSN such as 0/5000000 on TLI 2 can
represent entirely different WAL content from 0/5000000 on TLI 1.
Allowing the wait to silently continue across promotion risks giving
users a false sense of safety—for example, interpreting “wait
completed” as “the original data is now durable,” which would no
longer be true.

--
Best,
Xuneng

Вложения

synchronous_commit.png

Re: Implement waiting for wal lsn replay: reloaded

От

Alexander Korotkov

Дата:

25 декабря 2025 г., 14:13:36

Hi, Xuneng!

On Mon, Dec 22, 2025 at 9:57 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
>
> On Sun, Dec 21, 2025 at 12:37 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> >
> > Hi Alexander,
> >
> > Thanks for your feedback!
> >
> > > I see that we can't specify WAIT_LSN_TYPE_PRIMARY_FLUSH by setting
> > > mode parameter.  Should we allow this?
> >
> > I think this constraint could be relaxed if needed. I was previously
> > unsure about the use cases.
>
> Flush mode on the primary seems useful when synchronous_commit is set
> to off [1]. In that mode, a transaction in primary may return success
> before its WAL is durably flushed to disk, trading durability for
> lower latency. A “wait for primary flush” operation provides an
> explicit durability barrier for cases where applications or tools
> occasionally need stronger guarantees.
>
> [1] https://postgresqlco.nf/doc/en/param/synchronous_commit/
>
> > > If we allow specifying WAIT_LSN_TYPE_PRIMARY_FLUSH, should it be
> > > separate mode value or the same with WAIT_LSN_TYPE_STANDBY_FLUSH?  In
> > > principle, we could encode both as just 'flush' mode, and detect which
> > > WaitLSNType to pick by checking if recovery is in progress.  However,
> > > how should we then react to unreached flush location after standby
> > > promotion (technically it could be still reached but on the different
> > > timeline)?
> > >
> >
> > Technically, we can use 'flush' mode to specify WAIT FOR behavior in
> > both primary and replica. Currently, wait for commands error out if
> > promotion occurs since: either the requested LSN type does not exist
> > on the primary, or we do not yet have the infrastructure to support
> > continuing the wait. If we allow waiting for flush on the primary as a
> > user-visible command and the wake-up calls for flush in primary are
> > introduced, the question becomes whether we should still abort the
> > wait on promotion, or continue waiting—as you noted—given that the
> > target LSN might still be reached, albeit on a different timeline. The
> > question behind this might be: do users care and should be aware of
> > the state change of the server while waiting? If they do, then we
> > better stop the waiting and report the error. In this case, I am
> > inclined to to break the unified flush mode to something like
> > primary_flush/standby_flush mode and
> > WAIT_LSN_TYPE_PRIMARY_FLUSH/WAIT_LSN_TYPE_STANDBY_FLUSH respectively.
> >
>
> After further consideration, it also seems reasonable to use a single,
> unified flush mode that works on both primary and standby servers,
> provided its semantics are clearly documented to avoid the potential
> confusion on failure. I don’t have a strong preference between these
> two and would be interested in your thoughts.
>
> If a standby is promoted while a session is waiting, the command
> better abort and return an error (or report “not in recovery” when
> using NO_THROW). At that point, the meaning of the LSN being waited
> for may have changed due to the timeline switch and the transition
> from standby to primary. An LSN such as 0/5000000 on TLI 2 can
> represent entirely different WAL content from 0/5000000 on TLI 1.
> Allowing the wait to silently continue across promotion risks giving
> users a false sense of safety—for example, interpreting “wait
> completed” as “the original data is now durable,” which would no
> longer be true.

Agree, but there is still risk that promotion happens after user send
the query but before we started to wait.  In this case we will still
silently start to wait on primary, while user probably meant to wait
on replica.  Probably it would be safer to have separate user-visible
modes for waiting on primary and on replica?

------
Regards,
Alexander Korotkov
Supabase

Re: Implement waiting for wal lsn replay: reloaded

От

Xuneng Zhou

Дата:

25 декабря 2025 г., 15:52:34

Hi Alexander,

On Thu, Dec 25, 2025 at 7:13 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
>
> Hi, Xuneng!
>
> On Mon, Dec 22, 2025 at 9:57 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> >
> > On Sun, Dec 21, 2025 at 12:37 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > >
> > > Hi Alexander,
> > >
> > > Thanks for your feedback!
> > >
> > > > I see that we can't specify WAIT_LSN_TYPE_PRIMARY_FLUSH by setting
> > > > mode parameter.  Should we allow this?
> > >
> > > I think this constraint could be relaxed if needed. I was previously
> > > unsure about the use cases.
> >
> > Flush mode on the primary seems useful when synchronous_commit is set
> > to off [1]. In that mode, a transaction in primary may return success
> > before its WAL is durably flushed to disk, trading durability for
> > lower latency. A “wait for primary flush” operation provides an
> > explicit durability barrier for cases where applications or tools
> > occasionally need stronger guarantees.
> >
> > [1] https://postgresqlco.nf/doc/en/param/synchronous_commit/
> >
> > > > If we allow specifying WAIT_LSN_TYPE_PRIMARY_FLUSH, should it be
> > > > separate mode value or the same with WAIT_LSN_TYPE_STANDBY_FLUSH?  In
> > > > principle, we could encode both as just 'flush' mode, and detect which
> > > > WaitLSNType to pick by checking if recovery is in progress.  However,
> > > > how should we then react to unreached flush location after standby
> > > > promotion (technically it could be still reached but on the different
> > > > timeline)?
> > > >
> > >
> > > Technically, we can use 'flush' mode to specify WAIT FOR behavior in
> > > both primary and replica. Currently, wait for commands error out if
> > > promotion occurs since: either the requested LSN type does not exist
> > > on the primary, or we do not yet have the infrastructure to support
> > > continuing the wait. If we allow waiting for flush on the primary as a
> > > user-visible command and the wake-up calls for flush in primary are
> > > introduced, the question becomes whether we should still abort the
> > > wait on promotion, or continue waiting—as you noted—given that the
> > > target LSN might still be reached, albeit on a different timeline. The
> > > question behind this might be: do users care and should be aware of
> > > the state change of the server while waiting? If they do, then we
> > > better stop the waiting and report the error. In this case, I am
> > > inclined to to break the unified flush mode to something like
> > > primary_flush/standby_flush mode and
> > > WAIT_LSN_TYPE_PRIMARY_FLUSH/WAIT_LSN_TYPE_STANDBY_FLUSH respectively.
> > >
> >
> > After further consideration, it also seems reasonable to use a single,
> > unified flush mode that works on both primary and standby servers,
> > provided its semantics are clearly documented to avoid the potential
> > confusion on failure. I don’t have a strong preference between these
> > two and would be interested in your thoughts.
> >
> > If a standby is promoted while a session is waiting, the command
> > better abort and return an error (or report “not in recovery” when
> > using NO_THROW). At that point, the meaning of the LSN being waited
> > for may have changed due to the timeline switch and the transition
> > from standby to primary. An LSN such as 0/5000000 on TLI 2 can
> > represent entirely different WAL content from 0/5000000 on TLI 1.
> > Allowing the wait to silently continue across promotion risks giving
> > users a false sense of safety—for example, interpreting “wait
> > completed” as “the original data is now durable,” which would no
> > longer be true.
>
> Agree, but there is still risk that promotion happens after user send
> the query but before we started to wait.  In this case we will still
> silently start to wait on primary, while user probably meant to wait
> on replica.  Probably it would be safer to have separate user-visible
> modes for waiting on primary and on replica?
>

Thanks for your thoughts. You're right about the race condition. If
promotion happens between query submission and execution, a unified
'flush' mode could silently switch semantics without the user knowing.
Separate modes like 'standby_flush' and 'primary_flush' would make
user intent explicit and catch this case with an error, which is
safer. Do these two terms look reasonable to you, or would you suggest
better names? If they look ok, I plan to update the implementation to
use these two modes.


--
Best,
Xuneng

Re: Implement waiting for wal lsn replay: reloaded

От

Alexander Korotkov

Дата:

25 декабря 2025 г., 19:34:02

On Thu, Dec 25, 2025 at 2:52 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> On Thu, Dec 25, 2025 at 7:13 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> >
> > Hi, Xuneng!
> >
> > On Mon, Dec 22, 2025 at 9:57 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > >
> > > On Sun, Dec 21, 2025 at 12:37 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > > >
> > > > Hi Alexander,
> > > >
> > > > Thanks for your feedback!
> > > >
> > > > > I see that we can't specify WAIT_LSN_TYPE_PRIMARY_FLUSH by setting
> > > > > mode parameter.  Should we allow this?
> > > >
> > > > I think this constraint could be relaxed if needed. I was previously
> > > > unsure about the use cases.
> > >
> > > Flush mode on the primary seems useful when synchronous_commit is set
> > > to off [1]. In that mode, a transaction in primary may return success
> > > before its WAL is durably flushed to disk, trading durability for
> > > lower latency. A “wait for primary flush” operation provides an
> > > explicit durability barrier for cases where applications or tools
> > > occasionally need stronger guarantees.
> > >
> > > [1] https://postgresqlco.nf/doc/en/param/synchronous_commit/
> > >
> > > > > If we allow specifying WAIT_LSN_TYPE_PRIMARY_FLUSH, should it be
> > > > > separate mode value or the same with WAIT_LSN_TYPE_STANDBY_FLUSH?  In
> > > > > principle, we could encode both as just 'flush' mode, and detect which
> > > > > WaitLSNType to pick by checking if recovery is in progress.  However,
> > > > > how should we then react to unreached flush location after standby
> > > > > promotion (technically it could be still reached but on the different
> > > > > timeline)?
> > > > >
> > > >
> > > > Technically, we can use 'flush' mode to specify WAIT FOR behavior in
> > > > both primary and replica. Currently, wait for commands error out if
> > > > promotion occurs since: either the requested LSN type does not exist
> > > > on the primary, or we do not yet have the infrastructure to support
> > > > continuing the wait. If we allow waiting for flush on the primary as a
> > > > user-visible command and the wake-up calls for flush in primary are
> > > > introduced, the question becomes whether we should still abort the
> > > > wait on promotion, or continue waiting—as you noted—given that the
> > > > target LSN might still be reached, albeit on a different timeline. The
> > > > question behind this might be: do users care and should be aware of
> > > > the state change of the server while waiting? If they do, then we
> > > > better stop the waiting and report the error. In this case, I am
> > > > inclined to to break the unified flush mode to something like
> > > > primary_flush/standby_flush mode and
> > > > WAIT_LSN_TYPE_PRIMARY_FLUSH/WAIT_LSN_TYPE_STANDBY_FLUSH respectively.
> > > >
> > >
> > > After further consideration, it also seems reasonable to use a single,
> > > unified flush mode that works on both primary and standby servers,
> > > provided its semantics are clearly documented to avoid the potential
> > > confusion on failure. I don’t have a strong preference between these
> > > two and would be interested in your thoughts.
> > >
> > > If a standby is promoted while a session is waiting, the command
> > > better abort and return an error (or report “not in recovery” when
> > > using NO_THROW). At that point, the meaning of the LSN being waited
> > > for may have changed due to the timeline switch and the transition
> > > from standby to primary. An LSN such as 0/5000000 on TLI 2 can
> > > represent entirely different WAL content from 0/5000000 on TLI 1.
> > > Allowing the wait to silently continue across promotion risks giving
> > > users a false sense of safety—for example, interpreting “wait
> > > completed” as “the original data is now durable,” which would no
> > > longer be true.
> >
> > Agree, but there is still risk that promotion happens after user send
> > the query but before we started to wait.  In this case we will still
> > silently start to wait on primary, while user probably meant to wait
> > on replica.  Probably it would be safer to have separate user-visible
> > modes for waiting on primary and on replica?
> >
>
> Thanks for your thoughts. You're right about the race condition. If
> promotion happens between query submission and execution, a unified
> 'flush' mode could silently switch semantics without the user knowing.
> Separate modes like 'standby_flush' and 'primary_flush' would make
> user intent explicit and catch this case with an error, which is
> safer. Do these two terms look reasonable to you, or would you suggest
> better names? If they look ok, I plan to update the implementation to
> use these two modes.

Thank you, Xuneng.  'standby_flush' and 'primary_flush' look good for
me.  Please, go ahead.  I think we should name other modes
'standby_write' and 'standby_replay' for the sake of unity.

------
Regards,
Alexander Korotkov
Supabase

Re: Implement waiting for wal lsn replay: reloaded

От

Xuneng Zhou

Дата:

26 декабря 2025 г., 03:31:26

Hi,

On Fri, Dec 26, 2025 at 12:34 AM Alexander Korotkov
<aekorotkov@gmail.com> wrote:
>
> On Thu, Dec 25, 2025 at 2:52 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > On Thu, Dec 25, 2025 at 7:13 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > >
> > > Hi, Xuneng!
> > >
> > > On Mon, Dec 22, 2025 at 9:57 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > > >
> > > > On Sun, Dec 21, 2025 at 12:37 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > > > >
> > > > > Hi Alexander,
> > > > >
> > > > > Thanks for your feedback!
> > > > >
> > > > > > I see that we can't specify WAIT_LSN_TYPE_PRIMARY_FLUSH by setting
> > > > > > mode parameter.  Should we allow this?
> > > > >
> > > > > I think this constraint could be relaxed if needed. I was previously
> > > > > unsure about the use cases.
> > > >
> > > > Flush mode on the primary seems useful when synchronous_commit is set
> > > > to off [1]. In that mode, a transaction in primary may return success
> > > > before its WAL is durably flushed to disk, trading durability for
> > > > lower latency. A “wait for primary flush” operation provides an
> > > > explicit durability barrier for cases where applications or tools
> > > > occasionally need stronger guarantees.
> > > >
> > > > [1] https://postgresqlco.nf/doc/en/param/synchronous_commit/
> > > >
> > > > > > If we allow specifying WAIT_LSN_TYPE_PRIMARY_FLUSH, should it be
> > > > > > separate mode value or the same with WAIT_LSN_TYPE_STANDBY_FLUSH?  In
> > > > > > principle, we could encode both as just 'flush' mode, and detect which
> > > > > > WaitLSNType to pick by checking if recovery is in progress.  However,
> > > > > > how should we then react to unreached flush location after standby
> > > > > > promotion (technically it could be still reached but on the different
> > > > > > timeline)?
> > > > > >
> > > > >
> > > > > Technically, we can use 'flush' mode to specify WAIT FOR behavior in
> > > > > both primary and replica. Currently, wait for commands error out if
> > > > > promotion occurs since: either the requested LSN type does not exist
> > > > > on the primary, or we do not yet have the infrastructure to support
> > > > > continuing the wait. If we allow waiting for flush on the primary as a
> > > > > user-visible command and the wake-up calls for flush in primary are
> > > > > introduced, the question becomes whether we should still abort the
> > > > > wait on promotion, or continue waiting—as you noted—given that the
> > > > > target LSN might still be reached, albeit on a different timeline. The
> > > > > question behind this might be: do users care and should be aware of
> > > > > the state change of the server while waiting? If they do, then we
> > > > > better stop the waiting and report the error. In this case, I am
> > > > > inclined to to break the unified flush mode to something like
> > > > > primary_flush/standby_flush mode and
> > > > > WAIT_LSN_TYPE_PRIMARY_FLUSH/WAIT_LSN_TYPE_STANDBY_FLUSH respectively.
> > > > >
> > > >
> > > > After further consideration, it also seems reasonable to use a single,
> > > > unified flush mode that works on both primary and standby servers,
> > > > provided its semantics are clearly documented to avoid the potential
> > > > confusion on failure. I don’t have a strong preference between these
> > > > two and would be interested in your thoughts.
> > > >
> > > > If a standby is promoted while a session is waiting, the command
> > > > better abort and return an error (or report “not in recovery” when
> > > > using NO_THROW). At that point, the meaning of the LSN being waited
> > > > for may have changed due to the timeline switch and the transition
> > > > from standby to primary. An LSN such as 0/5000000 on TLI 2 can
> > > > represent entirely different WAL content from 0/5000000 on TLI 1.
> > > > Allowing the wait to silently continue across promotion risks giving
> > > > users a false sense of safety—for example, interpreting “wait
> > > > completed” as “the original data is now durable,” which would no
> > > > longer be true.
> > >
> > > Agree, but there is still risk that promotion happens after user send
> > > the query but before we started to wait.  In this case we will still
> > > silently start to wait on primary, while user probably meant to wait
> > > on replica.  Probably it would be safer to have separate user-visible
> > > modes for waiting on primary and on replica?
> > >
> >
> > Thanks for your thoughts. You're right about the race condition. If
> > promotion happens between query submission and execution, a unified
> > 'flush' mode could silently switch semantics without the user knowing.
> > Separate modes like 'standby_flush' and 'primary_flush' would make
> > user intent explicit and catch this case with an error, which is
> > safer. Do these two terms look reasonable to you, or would you suggest
> > better names? If they look ok, I plan to update the implementation to
> > use these two modes.
>
> Thank you, Xuneng.  'standby_flush' and 'primary_flush' look good for
> me.  Please, go ahead.  I think we should name other modes
> 'standby_write' and 'standby_replay' for the sake of unity.
>

Thanks. Yeah, renaming existing modes to  'standby_write' and
'standby_replay' also makes sense to me.

--
Best,
Xuneng

Re: Implement waiting for wal lsn replay: reloaded

От

Chao Li

Дата:

26 декабря 2025 г., 11:24:39


> On Dec 19, 2025, at 10:49, Xuneng Zhou <xunengzhou@gmail.com> wrote:
>
> Hi,
>
> On Thu, Dec 18, 2025 at 8:25 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
>>
>> On Thu, Dec 18, 2025 at 2:24 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
>>> On Thu, Dec 18, 2025 at 6:38 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
>>>>
>>>> Hi, Xuneng!
>>>>
>>>> On Tue, Dec 16, 2025 at 6:46 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
>>>>> Remove the erroneous WAIT_LSN_TYPE_COUNT case from the switch
>>>>> statement in v5 patch 1.
>>>>
>>>> Thank you for your work on this patchset.  Generally, it looks like
>>>> good and quite straightforward extension of the current functionality.
>>>> But this patch adds 4 new unreserved keywords to our grammar.  Do you
>>>> think we can put mode into with options clause?
>>>>
>>>
>>> Thanks for pointing this out. Yeah, 4 unreserved keywords add
>>> complexity to the parser and it may not be worthwhile since replay is
>>> expected to be the common use scenario. Maybe we can do something like
>>> this:
>>>
>>> -- Default (REPLAY mode)
>>> WAIT FOR LSN '0/306EE20' WITH (TIMEOUT '1s');
>>>
>>> -- Explicit REPLAY mode
>>> WAIT FOR LSN '0/306EE20' WITH (MODE 'replay', TIMEOUT '1s');
>>>
>>> -- WRITE mode
>>> WAIT FOR LSN '0/306EE20' WITH (MODE 'write', TIMEOUT '1s');
>>>
>>> If no mode is set explicitly in the options clause, it defaults to
>>> replay. I'll update the patch per your suggestion.
>>
>> This is exactly what I meant.  Please, go ahead.
>>
>
> Here is the updated patch set (v7). Please check.
>
> --
> Best,
> Xuneng
>
<v7-0001-Extend-xlogwait-infrastructure-with-write-and-flu.patch><v7-0004-Use-WAIT-FOR-LSN-in.patch><v7-0003-Add-tab-completion-for-WAIT-FOR-LSN-MODE-option.patch><v7-0002-Add-MODE-option-to-WAIT-FOR-LSN-command.patch>

Hi Xuneng,

A solid patch! Just a few small comments:

1 - 0001
```
+XLogRecPtr
+GetCurrentLSNForWaitType(WaitLSNType lsnType)
+{
+    switch (lsnType)
+    {
+        case WAIT_LSN_TYPE_STANDBY_REPLAY:
+            return GetXLogReplayRecPtr(NULL);
+
+        case WAIT_LSN_TYPE_STANDBY_WRITE:
+            return GetWalRcvWriteRecPtr();
+
+        case WAIT_LSN_TYPE_STANDBY_FLUSH:
+            return GetWalRcvFlushRecPtr(NULL, NULL);
+
+        case WAIT_LSN_TYPE_PRIMARY_FLUSH:
+            return GetFlushRecPtr(NULL);
+    }
+
+    elog(ERROR, "invalid LSN wait type: %d", lsnType);
+    pg_unreachable();
+}
```

As you add pg_unreachable() in the new function GetCurrentLSNForWaitType(), I’m thinking if we should just do an
Assert(),I saw every existing related function has done such an assert, for example addLSNWaiter(), it does “Assert(i
>=0 && i < WAIT_LSN_TYPE_COUNT);”. I guess we can just following the current mechanism to verify lsnType. So, for
GetCurrentLSNForWaitType(),we can just add a default clause and Assert(false). 

2 - 0002
```
+            else
+                ereport(ERROR,
+                        (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                         errmsg("unrecognized value for WAIT option \"%s\": \"%s\"",
+                                "MODE", mode_str),
```

I wonder why don’t we directly put MODE into the error message?

3 - 0002
```
         case WAIT_LSN_RESULT_NOT_IN_RECOVERY:
             if (throw)
             {
+                const        WaitLSNTypeDesc *desc = &WaitLSNTypeDescs[lsnType];
+                XLogRecPtr    currentLSN = GetCurrentLSNForWaitType(lsnType);
+
                 if (PromoteIsTriggered())
                 {
                     ereport(ERROR,
                             errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                             errmsg("recovery is not in progress"),
-                            errdetail("Recovery ended before replaying target LSN %X/%08X; last replay LSN %X/%08X.",
+                            errdetail("Recovery ended before target LSN %X/%08X was %s; last %s LSN %X/%08X.",
                                       LSN_FORMAT_ARGS(lsn),
-                                      LSN_FORMAT_ARGS(GetXLogReplayRecPtr(NULL))));
+                                      desc->verb,
+                                      desc->noun,
+                                      LSN_FORMAT_ARGS(currentLSN)));
                 }
                 else
                     ereport(ERROR,
                             errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                             errmsg("recovery is not in progress"),
-                            errhint("Waiting for the replay LSN can only be executed during recovery."));
+                            errhint("Waiting for the %s LSN can only be executed during recovery.",
+                                    desc->noun));
             }
```

currentLSN is only used in the if clause, thus it can be defined inside the if clause.

3 - 0002
```
+    /*
+     * If we wrote an LSN that someone was waiting for then walk over the
+     * shared memory array and set latches to notify the waiters.
+     */
+    if (waitLSNState &&
+        (LogstreamResult.Write >=
+         pg_atomic_read_u64(&waitLSNState->minWaitedLSN[WAIT_LSN_TYPE_STANDBY_WRITE])))
+        WaitLSNWakeup(WAIT_LSN_TYPE_STANDBY_WRITE, LogstreamResult.Write);
```

Do we need to mention "walk over the shared memory array and set latches” in the comment? The logic belongs to
WaitLSNWakeup().What about if the wake up logic changes in future, then this comment would become stale. So I think we
onlyneed to mention “notify the waiters”. 


4 - 0003
```
+    /*
+     * Handle parenthesized option list.  This fires when we're in an
+     * unfinished parenthesized option list.  get_previous_words treats a
+     * completed parenthesized option list as one word, so the above test is
+     * correct.  mode takes a string value ('replay', 'write', 'flush'),
+     * timeout takes a string value, no_throw takes no value.
+     */
     else if (HeadMatches("WAIT", "FOR", "LSN", MatchAny, "WITH", "(*") &&
              !HeadMatches("WAIT", "FOR", "LSN", MatchAny, "WITH", "(*)"))
     {
-        /*
-         * This fires if we're in an unfinished parenthesized option list.
-         * get_previous_words treats a completed parenthesized option list as
-         * one word, so the above test is correct.
-         */
         if (ends_with(prev_wd, '(') || ends_with(prev_wd, ','))
-            COMPLETE_WITH("timeout", "no_throw");
-
-        /*
-         * timeout takes a string value, no_throw takes no value. We don't
-         * offer completions for these values.
-         */
```

The new comment has lost the meaning of “We don’t offer completions for these values (timeout and no_throw)”, to be
explicit,I feel we can retain the sentence. 

5 - 0004
```
+    my $isrecovery =
+      $self->safe_psql('postgres', "SELECT pg_is_in_recovery()");
+    chomp($isrecovery);
     croak "unknown mode $mode for 'wait_for_catchup', valid modes are "
       . join(', ', keys(%valid_modes))
       unless exists($valid_modes{$mode});
@@ -3347,9 +3350,6 @@ sub wait_for_catchup
     }
     if (!defined($target_lsn))
     {
-        my $isrecovery =
-          $self->safe_psql('postgres', "SELECT pg_is_in_recovery()");
-        chomp($isrecovery);
```

I wonder why pull up pg_is_in_recovery to an early place and unconditionally call it?

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Re: Implement waiting for wal lsn replay: reloaded

От

Xuneng Zhou

Дата:

26 декабря 2025 г., 19:15:05

Hi Chao,

Thanks a lot for your review!

On Fri, Dec 26, 2025 at 4:25 PM Chao Li <li.evan.chao@gmail.com> wrote:
>
>
>
> > On Dec 19, 2025, at 10:49, Xuneng Zhou <xunengzhou@gmail.com> wrote:
> >
> > Hi,
> >
> > On Thu, Dec 18, 2025 at 8:25 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> >>
> >> On Thu, Dec 18, 2025 at 2:24 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> >>> On Thu, Dec 18, 2025 at 6:38 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> >>>>
> >>>> Hi, Xuneng!
> >>>>
> >>>> On Tue, Dec 16, 2025 at 6:46 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> >>>>> Remove the erroneous WAIT_LSN_TYPE_COUNT case from the switch
> >>>>> statement in v5 patch 1.
> >>>>
> >>>> Thank you for your work on this patchset.  Generally, it looks like
> >>>> good and quite straightforward extension of the current functionality.
> >>>> But this patch adds 4 new unreserved keywords to our grammar.  Do you
> >>>> think we can put mode into with options clause?
> >>>>
> >>>
> >>> Thanks for pointing this out. Yeah, 4 unreserved keywords add
> >>> complexity to the parser and it may not be worthwhile since replay is
> >>> expected to be the common use scenario. Maybe we can do something like
> >>> this:
> >>>
> >>> -- Default (REPLAY mode)
> >>> WAIT FOR LSN '0/306EE20' WITH (TIMEOUT '1s');
> >>>
> >>> -- Explicit REPLAY mode
> >>> WAIT FOR LSN '0/306EE20' WITH (MODE 'replay', TIMEOUT '1s');
> >>>
> >>> -- WRITE mode
> >>> WAIT FOR LSN '0/306EE20' WITH (MODE 'write', TIMEOUT '1s');
> >>>
> >>> If no mode is set explicitly in the options clause, it defaults to
> >>> replay. I'll update the patch per your suggestion.
> >>
> >> This is exactly what I meant.  Please, go ahead.
> >>
> >
> > Here is the updated patch set (v7). Please check.
> >
> > --
> > Best,
> > Xuneng
> >
<v7-0001-Extend-xlogwait-infrastructure-with-write-and-flu.patch><v7-0004-Use-WAIT-FOR-LSN-in.patch><v7-0003-Add-tab-completion-for-WAIT-FOR-LSN-MODE-option.patch><v7-0002-Add-MODE-option-to-WAIT-FOR-LSN-command.patch>
>
> Hi Xuneng,
>
> A solid patch! Just a few small comments:
>
> 1 - 0001
> ```
> +XLogRecPtr
> +GetCurrentLSNForWaitType(WaitLSNType lsnType)
> +{
> +       switch (lsnType)
> +       {
> +               case WAIT_LSN_TYPE_STANDBY_REPLAY:
> +                       return GetXLogReplayRecPtr(NULL);
> +
> +               case WAIT_LSN_TYPE_STANDBY_WRITE:
> +                       return GetWalRcvWriteRecPtr();
> +
> +               case WAIT_LSN_TYPE_STANDBY_FLUSH:
> +                       return GetWalRcvFlushRecPtr(NULL, NULL);
> +
> +               case WAIT_LSN_TYPE_PRIMARY_FLUSH:
> +                       return GetFlushRecPtr(NULL);
> +       }
> +
> +       elog(ERROR, "invalid LSN wait type: %d", lsnType);
> +       pg_unreachable();
> +}
> ```
>
> As you add pg_unreachable() in the new function GetCurrentLSNForWaitType(), I’m thinking if we should just do an
Assert(),I saw every existing related function has done such an assert, for example addLSNWaiter(), it does “Assert(i
>=0 && i < WAIT_LSN_TYPE_COUNT);”. I guess we can just following the current mechanism to verify lsnType. So, for
GetCurrentLSNForWaitType(),we can just add a default clause and Assert(false). 

My take is that Assert(false) alone might not be enough here, since
assertions vanish in non-assert builds. An unexpected lsnType is a
real bug even in production, so keeping a hard error plus
pg_unreachable() seems to be a safer pattern. It also acts as a
guardrail for future extensions — if new wait types are added without
updating this code, we’ll fail loudly rather than silently returning
an incorrect LSN. Assert(i >= 0 && i < WAIT_LSN_TYPE_COUNT) was added
to the top of the function.

> 2 - 0002
> ```
> +                       else
> +                               ereport(ERROR,
> +                                               (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +                                                errmsg("unrecognized value for WAIT option \"%s\": \"%s\"",
> +                                                               "MODE", mode_str),
> ```
>
> I wonder why don’t we directly put MODE into the error message?

Yeah, putting MODE into the error message is cleaner. It's done in v8.

> 3 - 0002
> ```
>                 case WAIT_LSN_RESULT_NOT_IN_RECOVERY:
>                         if (throw)
>                         {
> +                               const           WaitLSNTypeDesc *desc = &WaitLSNTypeDescs[lsnType];
> +                               XLogRecPtr      currentLSN = GetCurrentLSNForWaitType(lsnType);
> +
>                                 if (PromoteIsTriggered())
>                                 {
>                                         ereport(ERROR,
>                                                         errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>                                                         errmsg("recovery is not in progress"),
> -                                                       errdetail("Recovery ended before replaying target LSN
%X/%08X;last replay LSN %X/%08X.", 
> +                                                       errdetail("Recovery ended before target LSN %X/%08X was %s;
last%s LSN %X/%08X.", 
>                                                                           LSN_FORMAT_ARGS(lsn),
> -
LSN_FORMAT_ARGS(GetXLogReplayRecPtr(NULL))));
> +                                                                         desc->verb,
> +                                                                         desc->noun,
> +                                                                         LSN_FORMAT_ARGS(currentLSN)));
>                                 }
>                                 else
>                                         ereport(ERROR,
>                                                         errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>                                                         errmsg("recovery is not in progress"),
> -                                                       errhint("Waiting for the replay LSN can only be executed
duringrecovery.")); 
> +                                                       errhint("Waiting for the %s LSN can only be executed during
recovery.",
> +                                                                       desc->noun));
>                         }
> ```
>
> currentLSN is only used in the if clause, thus it can be defined inside the if clause.

+ 1.

> 3 - 0002
> ```
> +       /*
> +        * If we wrote an LSN that someone was waiting for then walk over the
> +        * shared memory array and set latches to notify the waiters.
> +        */
> +       if (waitLSNState &&
> +               (LogstreamResult.Write >=
> +                pg_atomic_read_u64(&waitLSNState->minWaitedLSN[WAIT_LSN_TYPE_STANDBY_WRITE])))
> +               WaitLSNWakeup(WAIT_LSN_TYPE_STANDBY_WRITE, LogstreamResult.Write);
> ```
>
> Do we need to mention "walk over the shared memory array and set latches” in the comment? The logic belongs to
WaitLSNWakeup().What about if the wake up logic changes in future, then this comment would become stale. So I think we
onlyneed to mention “notify the waiters”. 
>

It makes sense to me. They are incorporated into v8.

>
> 4 - 0003
> ```
> +       /*
> +        * Handle parenthesized option list.  This fires when we're in an
> +        * unfinished parenthesized option list.  get_previous_words treats a
> +        * completed parenthesized option list as one word, so the above test is
> +        * correct.  mode takes a string value ('replay', 'write', 'flush'),
> +        * timeout takes a string value, no_throw takes no value.
> +        */
>         else if (HeadMatches("WAIT", "FOR", "LSN", MatchAny, "WITH", "(*") &&
>                          !HeadMatches("WAIT", "FOR", "LSN", MatchAny, "WITH", "(*)"))
>         {
> -               /*
> -                * This fires if we're in an unfinished parenthesized option list.
> -                * get_previous_words treats a completed parenthesized option list as
> -                * one word, so the above test is correct.
> -                */
>                 if (ends_with(prev_wd, '(') || ends_with(prev_wd, ','))
> -                       COMPLETE_WITH("timeout", "no_throw");
> -
> -               /*
> -                * timeout takes a string value, no_throw takes no value. We don't
> -                * offer completions for these values.
> -                */
> ```
>
> The new comment has lost the meaning of “We don’t offer completions for these values (timeout and no_throw)”, to be
explicit,I feel we can retain the sentence. 

 The sentence is retained.

> 5 - 0004
> ```
> +       my $isrecovery =
> +         $self->safe_psql('postgres', "SELECT pg_is_in_recovery()");
> +       chomp($isrecovery);
>         croak "unknown mode $mode for 'wait_for_catchup', valid modes are "
>           . join(', ', keys(%valid_modes))
>           unless exists($valid_modes{$mode});
> @@ -3347,9 +3350,6 @@ sub wait_for_catchup
>         }
>         if (!defined($target_lsn))
>         {
> -               my $isrecovery =
> -                 $self->safe_psql('postgres', "SELECT pg_is_in_recovery()");
> -               chomp($isrecovery);
> ```
>
> I wonder why pull up pg_is_in_recovery to an early place and unconditionally call it?
>

This seems unnecessary. I also realized that my earlier approach in
patch 4 may have been semantically incorrect — it could end up waiting
for the LSN to replay/write/flush on the node itself, rather than on
the downstream standby, which defeats the purpose of
wait_for_catchup(). Patch 4 attempts to address this by running WAIT
FOR LSN on the standby itself.

Support for primary-flush waiting and the refactoring of existing
modes have been also incorporated in v8 following Alexander’s
feedback. The major updates are in patches 2 and 4. Please check.

--
Best,
Xuneng

Вложения

Re: Implement waiting for wal lsn replay: reloaded

От

Xuneng Zhou

Дата:

30 декабря 2025 г., 05:12:28

Hi,

On Sat, Dec 27, 2025 at 12:15 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
>
> Hi Chao,
>
> Thanks a lot for your review!
>
> On Fri, Dec 26, 2025 at 4:25 PM Chao Li <li.evan.chao@gmail.com> wrote:
> >
> >
> >
> > > On Dec 19, 2025, at 10:49, Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > >
> > > Hi,
> > >
> > > On Thu, Dec 18, 2025 at 8:25 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > >>
> > >> On Thu, Dec 18, 2025 at 2:24 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > >>> On Thu, Dec 18, 2025 at 6:38 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > >>>>
> > >>>> Hi, Xuneng!
> > >>>>
> > >>>> On Tue, Dec 16, 2025 at 6:46 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > >>>>> Remove the erroneous WAIT_LSN_TYPE_COUNT case from the switch
> > >>>>> statement in v5 patch 1.
> > >>>>
> > >>>> Thank you for your work on this patchset.  Generally, it looks like
> > >>>> good and quite straightforward extension of the current functionality.
> > >>>> But this patch adds 4 new unreserved keywords to our grammar.  Do you
> > >>>> think we can put mode into with options clause?
> > >>>>
> > >>>
> > >>> Thanks for pointing this out. Yeah, 4 unreserved keywords add
> > >>> complexity to the parser and it may not be worthwhile since replay is
> > >>> expected to be the common use scenario. Maybe we can do something like
> > >>> this:
> > >>>
> > >>> -- Default (REPLAY mode)
> > >>> WAIT FOR LSN '0/306EE20' WITH (TIMEOUT '1s');
> > >>>
> > >>> -- Explicit REPLAY mode
> > >>> WAIT FOR LSN '0/306EE20' WITH (MODE 'replay', TIMEOUT '1s');
> > >>>
> > >>> -- WRITE mode
> > >>> WAIT FOR LSN '0/306EE20' WITH (MODE 'write', TIMEOUT '1s');
> > >>>
> > >>> If no mode is set explicitly in the options clause, it defaults to
> > >>> replay. I'll update the patch per your suggestion.
> > >>
> > >> This is exactly what I meant.  Please, go ahead.
> > >>
> > >
> > > Here is the updated patch set (v7). Please check.
> > >
> > > --
> > > Best,
> > > Xuneng
> > >
<v7-0001-Extend-xlogwait-infrastructure-with-write-and-flu.patch><v7-0004-Use-WAIT-FOR-LSN-in.patch><v7-0003-Add-tab-completion-for-WAIT-FOR-LSN-MODE-option.patch><v7-0002-Add-MODE-option-to-WAIT-FOR-LSN-command.patch>
> >
> > Hi Xuneng,
> >
> > A solid patch! Just a few small comments:
> >
> > 1 - 0001
> > ```
> > +XLogRecPtr
> > +GetCurrentLSNForWaitType(WaitLSNType lsnType)
> > +{
> > +       switch (lsnType)
> > +       {
> > +               case WAIT_LSN_TYPE_STANDBY_REPLAY:
> > +                       return GetXLogReplayRecPtr(NULL);
> > +
> > +               case WAIT_LSN_TYPE_STANDBY_WRITE:
> > +                       return GetWalRcvWriteRecPtr();
> > +
> > +               case WAIT_LSN_TYPE_STANDBY_FLUSH:
> > +                       return GetWalRcvFlushRecPtr(NULL, NULL);
> > +
> > +               case WAIT_LSN_TYPE_PRIMARY_FLUSH:
> > +                       return GetFlushRecPtr(NULL);
> > +       }
> > +
> > +       elog(ERROR, "invalid LSN wait type: %d", lsnType);
> > +       pg_unreachable();
> > +}
> > ```
> >
> > As you add pg_unreachable() in the new function GetCurrentLSNForWaitType(), I’m thinking if we should just do an
Assert(),I saw every existing related function has done such an assert, for example addLSNWaiter(), it does “Assert(i
>=0 && i < WAIT_LSN_TYPE_COUNT);”. I guess we can just following the current mechanism to verify lsnType. So, for
GetCurrentLSNForWaitType(),we can just add a default clause and Assert(false). 
>
> My take is that Assert(false) alone might not be enough here, since
> assertions vanish in non-assert builds. An unexpected lsnType is a
> real bug even in production, so keeping a hard error plus
> pg_unreachable() seems to be a safer pattern. It also acts as a
> guardrail for future extensions — if new wait types are added without
> updating this code, we’ll fail loudly rather than silently returning
> an incorrect LSN. Assert(i >= 0 && i < WAIT_LSN_TYPE_COUNT) was added
> to the top of the function.
>
> > 2 - 0002
> > ```
> > +                       else
> > +                               ereport(ERROR,
> > +                                               (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> > +                                                errmsg("unrecognized value for WAIT option \"%s\": \"%s\"",
> > +                                                               "MODE", mode_str),
> > ```
> >
> > I wonder why don’t we directly put MODE into the error message?
>
> Yeah, putting MODE into the error message is cleaner. It's done in v8.
>
> > 3 - 0002
> > ```
> >                 case WAIT_LSN_RESULT_NOT_IN_RECOVERY:
> >                         if (throw)
> >                         {
> > +                               const           WaitLSNTypeDesc *desc = &WaitLSNTypeDescs[lsnType];
> > +                               XLogRecPtr      currentLSN = GetCurrentLSNForWaitType(lsnType);
> > +
> >                                 if (PromoteIsTriggered())
> >                                 {
> >                                         ereport(ERROR,
> >                                                         errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> >                                                         errmsg("recovery is not in progress"),
> > -                                                       errdetail("Recovery ended before replaying target LSN
%X/%08X;last replay LSN %X/%08X.", 
> > +                                                       errdetail("Recovery ended before target LSN %X/%08X was %s;
last%s LSN %X/%08X.", 
> >                                                                           LSN_FORMAT_ARGS(lsn),
> > -
LSN_FORMAT_ARGS(GetXLogReplayRecPtr(NULL))));
> > +                                                                         desc->verb,
> > +                                                                         desc->noun,
> > +                                                                         LSN_FORMAT_ARGS(currentLSN)));
> >                                 }
> >                                 else
> >                                         ereport(ERROR,
> >                                                         errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> >                                                         errmsg("recovery is not in progress"),
> > -                                                       errhint("Waiting for the replay LSN can only be executed
duringrecovery.")); 
> > +                                                       errhint("Waiting for the %s LSN can only be executed during
recovery.",
> > +                                                                       desc->noun));
> >                         }
> > ```
> >
> > currentLSN is only used in the if clause, thus it can be defined inside the if clause.
>
> + 1.
>
> > 3 - 0002
> > ```
> > +       /*
> > +        * If we wrote an LSN that someone was waiting for then walk over the
> > +        * shared memory array and set latches to notify the waiters.
> > +        */
> > +       if (waitLSNState &&
> > +               (LogstreamResult.Write >=
> > +                pg_atomic_read_u64(&waitLSNState->minWaitedLSN[WAIT_LSN_TYPE_STANDBY_WRITE])))
> > +               WaitLSNWakeup(WAIT_LSN_TYPE_STANDBY_WRITE, LogstreamResult.Write);
> > ```
> >
> > Do we need to mention "walk over the shared memory array and set latches” in the comment? The logic belongs to
WaitLSNWakeup().What about if the wake up logic changes in future, then this comment would become stale. So I think we
onlyneed to mention “notify the waiters”. 
> >
>
> It makes sense to me. They are incorporated into v8.
>
> >
> > 4 - 0003
> > ```
> > +       /*
> > +        * Handle parenthesized option list.  This fires when we're in an
> > +        * unfinished parenthesized option list.  get_previous_words treats a
> > +        * completed parenthesized option list as one word, so the above test is
> > +        * correct.  mode takes a string value ('replay', 'write', 'flush'),
> > +        * timeout takes a string value, no_throw takes no value.
> > +        */
> >         else if (HeadMatches("WAIT", "FOR", "LSN", MatchAny, "WITH", "(*") &&
> >                          !HeadMatches("WAIT", "FOR", "LSN", MatchAny, "WITH", "(*)"))
> >         {
> > -               /*
> > -                * This fires if we're in an unfinished parenthesized option list.
> > -                * get_previous_words treats a completed parenthesized option list as
> > -                * one word, so the above test is correct.
> > -                */
> >                 if (ends_with(prev_wd, '(') || ends_with(prev_wd, ','))
> > -                       COMPLETE_WITH("timeout", "no_throw");
> > -
> > -               /*
> > -                * timeout takes a string value, no_throw takes no value. We don't
> > -                * offer completions for these values.
> > -                */
> > ```
> >
> > The new comment has lost the meaning of “We don’t offer completions for these values (timeout and no_throw)”, to be
explicit,I feel we can retain the sentence. 
>
>  The sentence is retained.
>
> > 5 - 0004
> > ```
> > +       my $isrecovery =
> > +         $self->safe_psql('postgres', "SELECT pg_is_in_recovery()");
> > +       chomp($isrecovery);
> >         croak "unknown mode $mode for 'wait_for_catchup', valid modes are "
> >           . join(', ', keys(%valid_modes))
> >           unless exists($valid_modes{$mode});
> > @@ -3347,9 +3350,6 @@ sub wait_for_catchup
> >         }
> >         if (!defined($target_lsn))
> >         {
> > -               my $isrecovery =
> > -                 $self->safe_psql('postgres', "SELECT pg_is_in_recovery()");
> > -               chomp($isrecovery);
> > ```
> >
> > I wonder why pull up pg_is_in_recovery to an early place and unconditionally call it?
> >
>
> This seems unnecessary. I also realized that my earlier approach in
> patch 4 may have been semantically incorrect — it could end up waiting
> for the LSN to replay/write/flush on the node itself, rather than on
> the downstream standby, which defeats the purpose of
> wait_for_catchup(). Patch 4 attempts to address this by running WAIT
> FOR LSN on the standby itself.
>
> Support for primary-flush waiting and the refactoring of existing
> modes have been also incorporated in v8 following Alexander’s
> feedback. The major updates are in patches 2 and 4. Please check.
>

Added WaitLSNTypeDesc to typedefs.list in v9 patch 2.

--
Best,
Xuneng

Вложения

Re: Implement waiting for wal lsn replay: reloaded

От

Xuneng Zhou

Дата:

30 декабря 2025 г., 05:42:09

Hi,

On Tue, Dec 30, 2025 at 10:12 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
>
> Hi,
>
> On Sat, Dec 27, 2025 at 12:15 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> >
> > Hi Chao,
> >
> > Thanks a lot for your review!
> >
> > On Fri, Dec 26, 2025 at 4:25 PM Chao Li <li.evan.chao@gmail.com> wrote:
> > >
> > >
> > >
> > > > On Dec 19, 2025, at 10:49, Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > > >
> > > > Hi,
> > > >
> > > > On Thu, Dec 18, 2025 at 8:25 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > > >>
> > > >> On Thu, Dec 18, 2025 at 2:24 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > > >>> On Thu, Dec 18, 2025 at 6:38 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > > >>>>
> > > >>>> Hi, Xuneng!
> > > >>>>
> > > >>>> On Tue, Dec 16, 2025 at 6:46 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > > >>>>> Remove the erroneous WAIT_LSN_TYPE_COUNT case from the switch
> > > >>>>> statement in v5 patch 1.
> > > >>>>
> > > >>>> Thank you for your work on this patchset.  Generally, it looks like
> > > >>>> good and quite straightforward extension of the current functionality.
> > > >>>> But this patch adds 4 new unreserved keywords to our grammar.  Do you
> > > >>>> think we can put mode into with options clause?
> > > >>>>
> > > >>>
> > > >>> Thanks for pointing this out. Yeah, 4 unreserved keywords add
> > > >>> complexity to the parser and it may not be worthwhile since replay is
> > > >>> expected to be the common use scenario. Maybe we can do something like
> > > >>> this:
> > > >>>
> > > >>> -- Default (REPLAY mode)
> > > >>> WAIT FOR LSN '0/306EE20' WITH (TIMEOUT '1s');
> > > >>>
> > > >>> -- Explicit REPLAY mode
> > > >>> WAIT FOR LSN '0/306EE20' WITH (MODE 'replay', TIMEOUT '1s');
> > > >>>
> > > >>> -- WRITE mode
> > > >>> WAIT FOR LSN '0/306EE20' WITH (MODE 'write', TIMEOUT '1s');
> > > >>>
> > > >>> If no mode is set explicitly in the options clause, it defaults to
> > > >>> replay. I'll update the patch per your suggestion.
> > > >>
> > > >> This is exactly what I meant.  Please, go ahead.
> > > >>
> > > >
> > > > Here is the updated patch set (v7). Please check.
> > > >
> > > > --
> > > > Best,
> > > > Xuneng
> > > >
<v7-0001-Extend-xlogwait-infrastructure-with-write-and-flu.patch><v7-0004-Use-WAIT-FOR-LSN-in.patch><v7-0003-Add-tab-completion-for-WAIT-FOR-LSN-MODE-option.patch><v7-0002-Add-MODE-option-to-WAIT-FOR-LSN-command.patch>
> > >
> > > Hi Xuneng,
> > >
> > > A solid patch! Just a few small comments:
> > >
> > > 1 - 0001
> > > ```
> > > +XLogRecPtr
> > > +GetCurrentLSNForWaitType(WaitLSNType lsnType)
> > > +{
> > > +       switch (lsnType)
> > > +       {
> > > +               case WAIT_LSN_TYPE_STANDBY_REPLAY:
> > > +                       return GetXLogReplayRecPtr(NULL);
> > > +
> > > +               case WAIT_LSN_TYPE_STANDBY_WRITE:
> > > +                       return GetWalRcvWriteRecPtr();
> > > +
> > > +               case WAIT_LSN_TYPE_STANDBY_FLUSH:
> > > +                       return GetWalRcvFlushRecPtr(NULL, NULL);
> > > +
> > > +               case WAIT_LSN_TYPE_PRIMARY_FLUSH:
> > > +                       return GetFlushRecPtr(NULL);
> > > +       }
> > > +
> > > +       elog(ERROR, "invalid LSN wait type: %d", lsnType);
> > > +       pg_unreachable();
> > > +}
> > > ```
> > >
> > > As you add pg_unreachable() in the new function GetCurrentLSNForWaitType(), I’m thinking if we should just do an
Assert(),I saw every existing related function has done such an assert, for example addLSNWaiter(), it does “Assert(i
>=0 && i < WAIT_LSN_TYPE_COUNT);”. I guess we can just following the current mechanism to verify lsnType. So, for
GetCurrentLSNForWaitType(),we can just add a default clause and Assert(false). 
> >
> > My take is that Assert(false) alone might not be enough here, since
> > assertions vanish in non-assert builds. An unexpected lsnType is a
> > real bug even in production, so keeping a hard error plus
> > pg_unreachable() seems to be a safer pattern. It also acts as a
> > guardrail for future extensions — if new wait types are added without
> > updating this code, we’ll fail loudly rather than silently returning
> > an incorrect LSN. Assert(i >= 0 && i < WAIT_LSN_TYPE_COUNT) was added
> > to the top of the function.
> >
> > > 2 - 0002
> > > ```
> > > +                       else
> > > +                               ereport(ERROR,
> > > +                                               (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> > > +                                                errmsg("unrecognized value for WAIT option \"%s\": \"%s\"",
> > > +                                                               "MODE", mode_str),
> > > ```
> > >
> > > I wonder why don’t we directly put MODE into the error message?
> >
> > Yeah, putting MODE into the error message is cleaner. It's done in v8.
> >
> > > 3 - 0002
> > > ```
> > >                 case WAIT_LSN_RESULT_NOT_IN_RECOVERY:
> > >                         if (throw)
> > >                         {
> > > +                               const           WaitLSNTypeDesc *desc = &WaitLSNTypeDescs[lsnType];
> > > +                               XLogRecPtr      currentLSN = GetCurrentLSNForWaitType(lsnType);
> > > +
> > >                                 if (PromoteIsTriggered())
> > >                                 {
> > >                                         ereport(ERROR,
> > >                                                         errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> > >                                                         errmsg("recovery is not in progress"),
> > > -                                                       errdetail("Recovery ended before replaying target LSN
%X/%08X;last replay LSN %X/%08X.", 
> > > +                                                       errdetail("Recovery ended before target LSN %X/%08X was
%s;last %s LSN %X/%08X.", 
> > >                                                                           LSN_FORMAT_ARGS(lsn),
> > > -
LSN_FORMAT_ARGS(GetXLogReplayRecPtr(NULL))));
> > > +                                                                         desc->verb,
> > > +                                                                         desc->noun,
> > > +                                                                         LSN_FORMAT_ARGS(currentLSN)));
> > >                                 }
> > >                                 else
> > >                                         ereport(ERROR,
> > >                                                         errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> > >                                                         errmsg("recovery is not in progress"),
> > > -                                                       errhint("Waiting for the replay LSN can only be executed
duringrecovery.")); 
> > > +                                                       errhint("Waiting for the %s LSN can only be executed
duringrecovery.", 
> > > +                                                                       desc->noun));
> > >                         }
> > > ```
> > >
> > > currentLSN is only used in the if clause, thus it can be defined inside the if clause.
> >
> > + 1.
> >
> > > 3 - 0002
> > > ```
> > > +       /*
> > > +        * If we wrote an LSN that someone was waiting for then walk over the
> > > +        * shared memory array and set latches to notify the waiters.
> > > +        */
> > > +       if (waitLSNState &&
> > > +               (LogstreamResult.Write >=
> > > +                pg_atomic_read_u64(&waitLSNState->minWaitedLSN[WAIT_LSN_TYPE_STANDBY_WRITE])))
> > > +               WaitLSNWakeup(WAIT_LSN_TYPE_STANDBY_WRITE, LogstreamResult.Write);
> > > ```
> > >
> > > Do we need to mention "walk over the shared memory array and set latches” in the comment? The logic belongs to
WaitLSNWakeup().What about if the wake up logic changes in future, then this comment would become stale. So I think we
onlyneed to mention “notify the waiters”. 
> > >
> >
> > It makes sense to me. They are incorporated into v8.
> >
> > >
> > > 4 - 0003
> > > ```
> > > +       /*
> > > +        * Handle parenthesized option list.  This fires when we're in an
> > > +        * unfinished parenthesized option list.  get_previous_words treats a
> > > +        * completed parenthesized option list as one word, so the above test is
> > > +        * correct.  mode takes a string value ('replay', 'write', 'flush'),
> > > +        * timeout takes a string value, no_throw takes no value.
> > > +        */
> > >         else if (HeadMatches("WAIT", "FOR", "LSN", MatchAny, "WITH", "(*") &&
> > >                          !HeadMatches("WAIT", "FOR", "LSN", MatchAny, "WITH", "(*)"))
> > >         {
> > > -               /*
> > > -                * This fires if we're in an unfinished parenthesized option list.
> > > -                * get_previous_words treats a completed parenthesized option list as
> > > -                * one word, so the above test is correct.
> > > -                */
> > >                 if (ends_with(prev_wd, '(') || ends_with(prev_wd, ','))
> > > -                       COMPLETE_WITH("timeout", "no_throw");
> > > -
> > > -               /*
> > > -                * timeout takes a string value, no_throw takes no value. We don't
> > > -                * offer completions for these values.
> > > -                */
> > > ```
> > >
> > > The new comment has lost the meaning of “We don’t offer completions for these values (timeout and no_throw)”, to
beexplicit, I feel we can retain the sentence. 
> >
> >  The sentence is retained.
> >
> > > 5 - 0004
> > > ```
> > > +       my $isrecovery =
> > > +         $self->safe_psql('postgres', "SELECT pg_is_in_recovery()");
> > > +       chomp($isrecovery);
> > >         croak "unknown mode $mode for 'wait_for_catchup', valid modes are "
> > >           . join(', ', keys(%valid_modes))
> > >           unless exists($valid_modes{$mode});
> > > @@ -3347,9 +3350,6 @@ sub wait_for_catchup
> > >         }
> > >         if (!defined($target_lsn))
> > >         {
> > > -               my $isrecovery =
> > > -                 $self->safe_psql('postgres', "SELECT pg_is_in_recovery()");
> > > -               chomp($isrecovery);
> > > ```
> > >
> > > I wonder why pull up pg_is_in_recovery to an early place and unconditionally call it?
> > >
> >
> > This seems unnecessary. I also realized that my earlier approach in
> > patch 4 may have been semantically incorrect — it could end up waiting
> > for the LSN to replay/write/flush on the node itself, rather than on
> > the downstream standby, which defeats the purpose of
> > wait_for_catchup(). Patch 4 attempts to address this by running WAIT
> > FOR LSN on the standby itself.
> >
> > Support for primary-flush waiting and the refactoring of existing
> > modes have been also incorporated in v8 following Alexander’s
> > feedback. The major updates are in patches 2 and 4. Please check.
> >
>
> Added WaitLSNTypeDesc to typedefs.list in v9 patch 2.
>

Run pgindent using the updated typedefs.list.

--
Best,
Xuneng

Вложения

Re: Implement waiting for wal lsn replay: reloaded

От

Álvaro Herrera

Дата:

30 декабря 2025 г., 06:14:42

On 2025-Dec-27, Xuneng Zhou wrote:

> On Fri, Dec 26, 2025 at 4:25 PM Chao Li <li.evan.chao@gmail.com> wrote:

> > 2 - 0002
> > ```
> > +                       else
> > +                               ereport(ERROR,
> > +                                               (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> > +                                                errmsg("unrecognized value for WAIT option \"%s\": \"%s\"",
> > +                                                               "MODE", mode_str),
> > ```
> >
> > I wonder why don’t we directly put MODE into the error message?
> 
> Yeah, putting MODE into the error message is cleaner. It's done in v8.

The reason not to do that (and also put WAIT in a separate string) is so
that the message is identicla to other messages and thus requires no
separate translation, specifically
  errmsg("unrecognized value for %s option \"%s\": \"%s\"", ...)

See commit 502e256f2262.  Please use that form.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"El sabio habla porque tiene algo que decir;
el tonto, porque tiene que decir algo" (Platon).

Re: Implement waiting for wal lsn replay: reloaded

От

Chao Li

Дата:

30 декабря 2025 г., 06:24:37


> On Dec 30, 2025, at 11:14, Álvaro Herrera <alvherre@kurilemu.de> wrote:
>
> On 2025-Dec-27, Xuneng Zhou wrote:
>
>> On Fri, Dec 26, 2025 at 4:25 PM Chao Li <li.evan.chao@gmail.com> wrote:
>
>>> 2 - 0002
>>> ```
>>> +                       else
>>> +                               ereport(ERROR,
>>> +                                               (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>>> +                                                errmsg("unrecognized value for WAIT option \"%s\": \"%s\"",
>>> +                                                               "MODE", mode_str),
>>> ```
>>>
>>> I wonder why don’t we directly put MODE into the error message?
>>
>> Yeah, putting MODE into the error message is cleaner. It's done in v8.
>
> The reason not to do that (and also put WAIT in a separate string) is so
> that the message is identicla to other messages and thus requires no
> separate translation, specifically
>  errmsg("unrecognized value for %s option \"%s\": \"%s\"", ...)
>
> See commit 502e256f2262.  Please use that form.
>

To follow 502e256f2262, it should use “%s” for “WAIT” as well. I raised the comment because I saw “WAIT” is the format
strings,thus “MODE” can be there as well. 

So, we should do a similar change like:
```
-                                                errmsg("unrecognized value for EXPLAIN option \"%s\": \"%s\"",
-                                                               opt->defname, p),
+                                                errmsg("unrecognized value for %s option \"%s\": \"%s\"",
+                                                               "EXPLAIN", opt->defname, p),
```

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Re: Implement waiting for wal lsn replay: reloaded

От

Xuneng Zhou

Дата:

30 декабря 2025 г., 09:19:27

Hi,

On Tue, Dec 30, 2025 at 11:25 AM Chao Li <li.evan.chao@gmail.com> wrote:
>
>
>
> > On Dec 30, 2025, at 11:14, Álvaro Herrera <alvherre@kurilemu.de> wrote:
> >
> > On 2025-Dec-27, Xuneng Zhou wrote:
> >
> >> On Fri, Dec 26, 2025 at 4:25 PM Chao Li <li.evan.chao@gmail.com> wrote:
> >
> >>> 2 - 0002
> >>> ```
> >>> +                       else
> >>> +                               ereport(ERROR,
> >>> +                                               (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> >>> +                                                errmsg("unrecognized value for WAIT option \"%s\": \"%s\"",
> >>> +                                                               "MODE", mode_str),
> >>> ```
> >>>
> >>> I wonder why don’t we directly put MODE into the error message?
> >>
> >> Yeah, putting MODE into the error message is cleaner. It's done in v8.
> >
> > The reason not to do that (and also put WAIT in a separate string) is so
> > that the message is identicla to other messages and thus requires no
> > separate translation, specifically
> >  errmsg("unrecognized value for %s option \"%s\": \"%s\"", ...)
> >
> > See commit 502e256f2262.  Please use that form.
> >
>
> To follow 502e256f2262, it should use “%s” for “WAIT” as well. I raised the comment because I saw “WAIT” is the
formatstrings, thus “MODE” can be there as well. 
>
> So, we should do a similar change like:
> ```
> -                                                errmsg("unrecognized value for EXPLAIN option \"%s\": \"%s\"",
> -                                                               opt->defname, p),
> +                                                errmsg("unrecognized value for %s option \"%s\": \"%s\"",
> +                                                               "EXPLAIN", opt->defname, p),
> ```
>

Thanks for raising this and clarifying the rationale. I've made the
modification per your input.

--
Best,
Xuneng

Вложения

Re: Implement waiting for wal lsn replay: reloaded

От

Álvaro Herrera

Дата:

01 января, 20:16:05

In 0002 you have this kind of thing:

>                  ereport(ERROR,
>                          errcode(ERRCODE_QUERY_CANCELED),
> -                        errmsg("timed out while waiting for target LSN %X/%08X to be replayed; current replay LSN
%X/%08X",
> +                        errmsg("timed out while waiting for target LSN %X/%08X to be %s; current %s LSN %X/%08X",
>                                 LSN_FORMAT_ARGS(lsn),
> -                               LSN_FORMAT_ARGS(GetXLogReplayRecPtr(NULL))));
> +                               desc->verb,
> +                               desc->noun,
> +                               LSN_FORMAT_ARGS(currentLSN)));
> +            }


I'm afraid this technique doesn't work, for translatability reasons.
Your whole design of having a struct with ->verb and ->noun is not
workable (which is a pity, but you can't really fight this.) You need to
spell out the whole messages for each case, something like

if (lsntype == replay)
   ereport(ERROR,
           errcode(ERRCODE_QUERY_CANCELED),
           errmsg("timed out while waiting for target LSN %X/%08X to be replayed; current standby_replay LSN %X/%08X",
else if (lsntype == flush)
    ereport( ... )

and so on.  This means four separate messages for translation for each
message your patch is adding, which is IMO the correct approach.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"...  In accounting terms this makes perfect sense.  To rational humans, it
is insane.  Welcome to IBM."                           (Robert X. Cringely)
https://www.cringely.com/2015/06/03/autodesks-john-walker-explained-hp-and-ibm-in-1991/

Re: Implement waiting for wal lsn replay: reloaded

От

Alexander Korotkov

Дата:

02 января, 02:42:34

On Thu, Jan 1, 2026 at 7:16 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:
> In 0002 you have this kind of thing:
>
> >                               ereport(ERROR,
> >                                               errcode(ERRCODE_QUERY_CANCELED),
> > -                                             errmsg("timed out while waiting for target LSN %X/%08X to be
replayed;current replay LSN %X/%08X", 
> > +                                             errmsg("timed out while waiting for target LSN %X/%08X to be %s;
current%s LSN %X/%08X", 
> >                                                          LSN_FORMAT_ARGS(lsn),
> > -                                                        LSN_FORMAT_ARGS(GetXLogReplayRecPtr(NULL))));
> > +                                                        desc->verb,
> > +                                                        desc->noun,
> > +                                                        LSN_FORMAT_ARGS(currentLSN)));
> > +                     }
>
>
> I'm afraid this technique doesn't work, for translatability reasons.
> Your whole design of having a struct with ->verb and ->noun is not
> workable (which is a pity, but you can't really fight this.) You need to
> spell out the whole messages for each case, something like
>
> if (lsntype == replay)
>    ereport(ERROR,
>            errcode(ERRCODE_QUERY_CANCELED),
>            errmsg("timed out while waiting for target LSN %X/%08X to be replayed; current standby_replay LSN
%X/%08X",
> else if (lsntype == flush)
>     ereport( ... )
>
> and so on.  This means four separate messages for translation for each
> message your patch is adding, which is IMO the correct approach.

+1
Thank you for catching this, Alvaro.  Yes, I think we need to get rid
of WaitLSNTypeDesc.  It's nice idea, but we support too many languages
to have something like this.

------
Regards,
Alexander Korotkov
Supabase

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Re: Implement waiting for wal lsn replay: reloaded

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения