Обсуждение: Proposal: Out-of-Order NOTIFY via GUC to Improve LISTEN/NOTIFY Throughput

Поиск

Список

Период

Сортировка

Proposal: Out-of-Order NOTIFY via GUC to Improve LISTEN/NOTIFY Throughput

От

Rishu Bagga

Дата:

18 июля, 03:49:59

Hi all,

There’s been some renewed attention on improving the performance of the
LISTEN/NOTIFY system, which historically hasn’t scaled well under high
notify frequency. Joel Jacobson recently shared some work on optimizing
the LISTEN path [1], and I wanted to follow up with a proposal focused on
the NOTIFY side.

One of the main bottlenecks in the current implementation is the global
lock taken in `PreCommit_Notify`, which serializes all notifications.
In many use cases (especially where NOTIFY is used for non–mission-critical
caching or coordination), users may not care about strict notification
ordering or delivery semantics in the event of a transaction rollback.

To explore this further, I’ve drafted a patch that introduces a new GUC:
`publish_out_of_order_notifications`. When enabled, this skips the global
lock in `PreCommit_Notify`, allowing notifications to be queued in parallel.
This comes at the cost of possible out-of-order delivery and the potential
for notifications to be delivered from rolled-back transactions.

For benchmarking, I used pgbench with a custom SQL script that sends a
single NOTIFY message per transaction. The test was run using 8 connections
and 2000 transactions per client.

Here are the results on a MacBook Air (Apple M2 chip, 8 cores, 16 GB memory):

publish_out_of_order_notifications = off:

• Run 1: 158,190 TPS (latency: 0.051 ms)
• Run 2: 189,771 TPS (latency: 0.042 ms)
• Run 3: 189,401 TPS (latency: 0.042 ms)
• Run 4: 190,288 TPS (latency: 0.042 ms)
• Run 5: 185,001 TPS (latency: 0.043 ms)

publish_out_of_order_notifications = on:

• Run 1: 298,982 TPS (latency: 0.027 ms)
• Run 2: 345,162 TPS (latency: 0.023 ms)
• Run 3: 351,309 TPS (latency: 0.023 ms)
• Run 4: 333,035 TPS (latency: 0.024 ms)
• Run 5: 353,834 TPS (latency: 0.023 ms)

This shows roughly a 2x improvement in TPS in this basic benchmark.

I believe this could serve as a useful knob for users who want performance
over guarantees, and it may help guide future efforts to reduce contention
in NOTIFY more generally. I also have some ideas for stricter-but-faster
implementations that preserve ordering, but I wanted to start with a
minimal and illustrative patch.

I'd appreciate thoughts on the direction and whether this seems worth
pursuing further.

Relevant prior discussions:
[1] https://www.postgresql.org/message-id/flat/6899c044-4a82-49be-8117-e6f669765f7e%40app.fastmail.com
[2] https://www.postgresql.org/message-id/flat/CAM527d_s8coiXDA4xbJRyVOcNnnjnf%2BezPYpn214y3-5ixn75w%40mail.gmail.com

Thanks,
Rishu

Вложения

0001-allow-out-of-order-notifications.patch

Re: Proposal: Out-of-Order NOTIFY via GUC to Improve LISTEN/NOTIFY Throughput

От

Tom Lane

Дата:

18 июля, 04:28:38

Rishu Bagga <rishu.postgres@gmail.com> writes:
> To explore this further, I’ve drafted a patch that introduces a new GUC:
> `publish_out_of_order_notifications`.

We have generally found that GUCs that change query semantics turn
out to be very regrettable choices.  They break applications that
aren't expecting it, and the granularity of the effect is frequently
not what you want either.

In the case at hand, I fear that a USERSET GUC is particularly
inappropriate, because what this proposal does is to break the
notification order guarantees system-wide, even if only one
issuer of notifications has it set.

> When enabled, this skips the global
> lock in `PreCommit_Notify`, allowing notifications to be queued in parallel.

How much does that really help, considering that we'll still serialize
on the NotifyQueueLock?  I think that you'd need some rather
fundamental redesign to allow truly parallel queueing.

Stepping back a bit, my recollection is that "queue entries appear in
commit order" is a rather fundamental assumption in async.c, which we
rely on while dequeuing notifications.  If that stops being true,
I think you'd have cases where listening backends fail to collect
available (committed) notifications because they appear in the queue
beyond not-yet-committed notifications.  Maybe the window in which a
notification would remain uncommitted is short enough that we could
avert our eyes from that problem, but I'm not convinced.

So I sympathize with concerns about how well the notification code
scales, but I think you're going to have to do a ton more work than
this to get to anything that would pass muster to get committed.
In particular, I'd really want to see something that involves
explicitly opting-into out-of-order delivery on a per-NOTIFY basis,
because anything else will break too many applications.  The
underlying queue mechanism is going to need a serious rethink, too.
My guess is that we'd need to move to something involving multiple
queues rather than just one, and I'm not very sure how that ought
to work.  (But perhaps queuing in-order notifications separately
from not-guaranteed-in-order notifications would help?  Or maybe
the latter should be transmitted in some other way entirely.)

            regards, tom lane

Re: Proposal: Out-of-Order NOTIFY via GUC to Improve LISTEN/NOTIFY Throughput

От

"Joel Jacobson"

Дата:

18 июля, 09:46:06

On Fri, Jul 18, 2025, at 03:28, Tom Lane wrote:
> So I sympathize with concerns about how well the notification code
> scales, but I think you're going to have to do a ton more work than
> this to get to anything that would pass muster to get committed.
> In particular, I'd really want to see something that involves
> explicitly opting-into out-of-order delivery on a per-NOTIFY basis,
> because anything else will break too many applications.  The
> underlying queue mechanism is going to need a serious rethink, too.
> My guess is that we'd need to move to something involving multiple
> queues rather than just one, and I'm not very sure how that ought
> to work.  (But perhaps queuing in-order notifications separately
> from not-guaranteed-in-order notifications would help?  Or maybe
> the latter should be transmitted in some other way entirely.)

I agree opting-into out-of-order delivery on a per-NOTIFY basis
sounds like a great idea.

For all the exiting users that rely on in-order delivery,
and those who are not really sure they would dare to make the switch,
do we want to try to do something to improve their use-cases?

It doesn't seem possible to do better than what we already do
when all backends listen on the same channel.

However, for cases when up to K backends listen on the same channel (multicast),
my patch and benchmark in the other thread suggests we can achieve massive
improvements of parallelization of both LISTEN/UNLISTEN as well as NOTIFY
without introducing that much extra complexity.

Do we find this goal worth pursuing on its own?

Or should we only focus on exposing a new third parameter to pg_notify()
to indicate out-of-order delivery?

/Joel

Re: Proposal: Out-of-Order NOTIFY via GUC to Improve LISTEN/NOTIFY Throughput

От

"Joel Jacobson"

Дата:

18 июля, 11:15:17

On Fri, Jul 18, 2025, at 02:49, Rishu Bagga wrote:
> Hi all,
>
> There’s been some renewed attention on improving the performance of the
> LISTEN/NOTIFY system, which historically hasn’t scaled well under high
> notify frequency. Joel Jacobson recently shared some work on optimizing
> the LISTEN path [1], and I wanted to follow up with a proposal focused on
> the NOTIFY side.

To clarify, my patch optimizes parallelizability of NOTIFY, without degrading parallelizability of LISTEN/UNLISTEN.

/Joel

Re: Proposal: Out-of-Order NOTIFY via GUC to Improve LISTEN/NOTIFY Throughput

От

"Joel Jacobson"

Дата:

18 июля, 12:59:24

On Fri, Jul 18, 2025, at 08:46, Joel Jacobson wrote:
> However, for cases when up to K backends listen on the same channel (multicast),
> my patch and benchmark in the other thread suggests we can achieve massive
> improvements of parallelization of both LISTEN/UNLISTEN as well as NOTIFY
> without introducing that much extra complexity.
>
> Do we find this goal worth pursuing on its own?
>
> Or should we only focus on exposing a new third parameter to pg_notify()
> to indicate out-of-order delivery?

I guess the best would be if we could do both, i.e. improve existing use-cases as well as out-of-order delivery per
notification,within acceptable levels of increased complexity?

One thing I wonder though, that I haven't yet benchmarked, is if even more parallelization than what my in-order
optimizationsof NOTIFY already achives, would actually significantly improve parallelization of real workloads, where
youdo some actual DML in the same txn that you send a NOTIFY. My in-order optimizations now scale to 3000 tps at 1000
connections.I wonder if PostgreSQL really can push that much DML tps, or if the serialization effect of LISTEN/NOTIFY
wouldbe marginal to other serialization caused by the DML.

/Joel

Re: Proposal: Out-of-Order NOTIFY via GUC to Improve LISTEN/NOTIFY Throughput

От

"Joel Jacobson"

Дата:

18 июля, 18:51:47

On Fri, Jul 18, 2025, at 10:15, Joel Jacobson wrote:
> On Fri, Jul 18, 2025, at 02:49, Rishu Bagga wrote:
>> Hi all,
>>
>> There’s been some renewed attention on improving the performance of the
>> LISTEN/NOTIFY system, which historically hasn’t scaled well under high
>> notify frequency. Joel Jacobson recently shared some work on optimizing
>> the LISTEN path [1], and I wanted to follow up with a proposal focused on
>> the NOTIFY side.
>
> To clarify, my patch optimizes parallelizability of NOTIFY, without
> degrading parallelizability of LISTEN/UNLISTEN.

I realize my above clarification is not technically true,
what I meant to say is:

My patch improves NOTIFY TPS when many backends are listening on multiple
channels by eliminating unnecessary syscall wake‑ups, but it doesn't increase
the internal parallelism of the NOTIFY queue itself.

/Joel

Re: Proposal: Out-of-Order NOTIFY via GUC to Improve LISTEN/NOTIFY Throughput

От

Tom Lane

Дата:

18 июля, 20:06:30

"Joel Jacobson" <joel@compiler.org> writes:
> My patch improves NOTIFY TPS when many backends are listening on multiple
> channels by eliminating unnecessary syscall wake‑ups, but it doesn't increase
> the internal parallelism of the NOTIFY queue itself.

After thinking about this for awhile, I have a rough idea of
something we could do to improve parallelism of NOTIFY.
As a bonus, this'd allow processes on hot standby servers to
receive NOTIFYs from processes on the primary, which is a
feature many have asked for.

The core thought here was to steal some implementation ideas
from two-phase commit.  I initially thought maybe we could
remove the SLRU queue entirely, and maybe we can still find
a way to do that, but in this sketch it's still there with
substantially reduced traffic.

The idea basically is to use the WAL log rather than SLRU
as transport for notify messages.

1. In PreCommit_Notify(), gather up all the notifications this
transaction has emitted, and write them into a WAL log message.
Remember the LSN of this message.  (I think this part should be
parallelizable, because of work that's previously been done to
allow parallel writes to WAL.)

2. When writing the transaction's commit WAL log entry, include
the LSN of the previous notify-data entry.

3. Concurrently with writing the commit entry, send a message
to the notify SLRU queue.  This would be a small fixed-size
message with the transaction's XID, database ID, and the LSN
of the notify-data WAL entry.  (The DBID is there to let
listeners quickly ignore traffic from senders in other DBs.)

4. Signal listening backends to check the queue, as we do now.

5. Listeners read the SLRU queue and then, if in same database,
pull the notify data out of the WAL log.  (I believe we already
have enough infrastructure to make that cheap, because 2-phase
commit does it too.)

In the simplest implementation of this idea, step 3 would still
require a global lock, to ensure that SLRU entries are made in
commit order.  However, that lock only needs to be held for the
duration of step 3, which is much shorter than what happens now.

A more advanced idea could be to send the SLRU message in step 1, as
soon as we've pushed out the notify-data message.  In this approach,
listening backends would become responsible for figuring out whether
senders have committed yet and processing the messages in correct
commit order.  This is quite handwavy yet because I don't have a
clear idea of how they'd do that reliably, but maybe it's possible.

In a hot standby server, the WAL replay process would simply have to
send the proper SLRU message and issue signals when it sees a commit
message containing a notify-data LSN.  (One small detail to be worked
out is who's responsible for truncating the notify SLRU in a hot
standby server.  In current usage the sending backends do it, but
there won't be any in hot standby, and there aren't necessarily any
listeners either.)

An area that needs a bit of thought is how to ensure that we don't
truncate away WAL that contains still-unread notify messages.
We have mechanisms already to prevent too-soon truncation of WAL,
so I doubt there's anything too difficult here.  (Also note that
we have an existing unsolved problem of preventing CLOG truncation
while the notify SLRU still contains references to some old XID.
Perhaps that could be dealt with at the same time.)

This isn't something I'm likely to work on anytime soon, but
perhaps someone else would like to push on these ideas.

            regards, tom lane

Re: Proposal: Out-of-Order NOTIFY via GUC to Improve LISTEN/NOTIFY Throughput

От

"Joel Jacobson"

Дата:

19 июля, 18:35:25

On Fri, Jul 18, 2025, at 19:06, Tom Lane wrote:
> "Joel Jacobson" <joel@compiler.org> writes:
>> My patch improves NOTIFY TPS when many backends are listening on multiple
>> channels by eliminating unnecessary syscall wake‑ups, but it doesn't increase
>> the internal parallelism of the NOTIFY queue itself.
>
> After thinking about this for awhile, I have a rough idea of
> something we could do to improve parallelism of NOTIFY.
> As a bonus, this'd allow processes on hot standby servers to
> receive NOTIFYs from processes on the primary, which is a
> feature many have asked for.

LISTEN/NOTIFY on standbys sounds interesting on its own.

However, I don't think reducing the time we hold the exclusive lock,
would do anything at all, to help the users who have been
reporting problems with LISTEN/NOTIFY. I'll explain why I think so.

I assume Rishu in his original post, with "renewed attention"
was referring to the post "Postgres LISTEN/NOTIFY does not scale" [1]
that was on the front page of Hacker News with 319 comments [2].

I think the reported "still waiting for AccessExclusiveLock"
they saw in the logs, is probably just a *symptom* but not the *cause*
of their problems.

Unfortunately, the author of [1] jumped to conclusion and assumed
the global lock was the problem. I'm quite sure it is probably not,
because:

We know for sure, that current users do LISTEN and NOTIFY
in the same database. And there is no point in doing NOTIFY
unless you also do LISTEN.

Their plots show an y-axis with a few hundred "active sessions".
If we assume at least ~100 of them would be listening backends,
that would explain their problems, due to the syscall thundering
herd wake-up bomb, that each NOTIFY currently causes.

So instead of saying
    "Postgres LISTEN/NOTIFY does not scale",
like in the article [1], I think it would be much more fair and meaningful to say
    "Postgres LISTEN/NOTIFY does not scale, with the number of listening backends".

All my benchmarks support this hypothesis. I've already posted a lot of them,
but can of course provide more specific additional benchmarks if desired.

/Joel

[1] https://www.recall.ai/blog/postgres-listen-notify-does-not-scale
[2] https://news.ycombinator.com/item?id=44490510

Re: Proposal: Out-of-Order NOTIFY via GUC to Improve LISTEN/NOTIFY Throughput

От

Rishu Bagga

Дата:

21 июля, 01:06:46

> "Joel Jacobson" <joel@compiler.org> writes:
>
> Unfortunately, the author of [1] jumped to conclusion and assumed
> the global lock was the problem. I'm quite sure it is probably not,
> because:
>
> We know for sure, that current users do LISTEN and NOTIFY
> in the same database. And there is no point in doing NOTIFY
> unless you also do LISTEN.

I agree that, in practice, there is likely no use of NOTIFY without
LISTEN. That being said, I disagree that the global lock is not at
least one of the bottlenecks from a performance perspective.

The global lock is not released until after the call to
RecordTransactionCommit and the reset of CommitTransaction. So if
there are multiple transactions sending notifications, each must
become durable before the next can proceed, which introduces a
significant bottleneck.

This becomes especially expensive in environments where compute and
storage are separated, such as certain cloud-based variants of Postgres.

> Tom Lane <tgl@sss.pgh.pa.us> writes:
>
> 1. In PreCommit_Notify(), gather up all the notifications this
> transaction has emitted, and write them into a WAL log message.
> Remember the LSN of this message. (I think this part should be
> parallelizable, because of work that's previously been done to
> allow parallel writes to WAL.)
>
> 2. When writing the transaction's commit WAL log entry, include
> the LSN of the previous notify-data entry.

Thanks for the input — this is a compelling idea. I'll work on
implementing a proof of concept.

Thanks,
Rishu

Re: Proposal: Out-of-Order NOTIFY via GUC to Improve LISTEN/NOTIFY Throughput

От

"Joel Jacobson"

Дата:

21 июля, 09:16:24

On Mon, Jul 21, 2025, at 00:06, Rishu Bagga wrote:
>> "Joel Jacobson" <joel@compiler.org> writes:
>>
>> Unfortunately, the author of [1] jumped to conclusion and assumed
>> the global lock was the problem. I'm quite sure it is probably not,
>> because:
>>
>> We know for sure, that current users do LISTEN and NOTIFY
>> in the same database. And there is no point in doing NOTIFY
>> unless you also do LISTEN.
>
> I agree that, in practice, there is likely no use of NOTIFY without
> LISTEN. That being said, I disagree that the global lock is not at
> least one of the bottlenecks from a performance perspective.
>
> The global lock is not released until after the call to
> RecordTransactionCommit and the reset of CommitTransaction. So if
> there are multiple transactions sending notifications, each must
> become durable before the next can proceed, which introduces a
> significant bottleneck.
>
> This becomes especially expensive in environments where compute and
> storage are separated, such as certain cloud-based variants of Postgres.

Can you please share the benchmark script and pgbench commands from your initial
post? When you skipped the global lock, TPS went from max 190k to max 350k.
Based how high these TPS are, I would guess you're not doing LISTEN in your
benchmark, but only NOTIFY?

Since there is no point of just doing NOTIFY if nobody is LISTENing,
a realistic benchmark would also need to do LISTEN.
What you will then see is that TPS will be severely impacted,
and the gains from removing the global exclusive lock will
drown in the huge cost for all the syscalls.

On my end, I will work on reducing the expensive syscalls,
which I believe is the current bottleneck.
Once that's fixed though, the next bottleneck might be the global
exclusive lock, so I think it's great your working on improving that as well.

/Joel

Re: Proposal: Out-of-Order NOTIFY via GUC to Improve LISTEN/NOTIFY Throughput

От

"Joel Jacobson"

Дата:

22 июля, 15:48:00

On Mon, Jul 21, 2025, at 08:16, Joel Jacobson wrote:
> Since there is no point of just doing NOTIFY if nobody is LISTENing,
> a realistic benchmark would also need to do LISTEN.
> What you will then see is that TPS will be severely impacted,
> and the gains from removing the global exclusive lock will
> drown in the huge cost for all the syscalls.

I thought I better put my money where my mouth is,
and decided to try to replicate Rishu's benchmark results,
and in addition, benchmark my own hypothesis above,
that if not just doing NOTIFY but both LISTEN+NOTIFY,
the 2x improvement would be heavily reduced.

The benchmarks I've run below confirm this hypothesis.
The observed 2x improvement is reduced to a 0.14x
improvement when doing both LISTEN+NOTIFY.

Benchmark from original post:

> Here are the results on a MacBook Air (Apple M2 chip, 8 cores, 16 GB memory):
>
> publish_out_of_order_notifications = off:
>
> • Run 1: 158,190 TPS (latency: 0.051 ms)
> • Run 2: 189,771 TPS (latency: 0.042 ms)
> • Run 3: 189,401 TPS (latency: 0.042 ms)
> • Run 4: 190,288 TPS (latency: 0.042 ms)
> • Run 5: 185,001 TPS (latency: 0.043 ms)
>
> publish_out_of_order_notifications = on:
>
> • Run 1: 298,982 TPS (latency: 0.027 ms)
> • Run 2: 345,162 TPS (latency: 0.023 ms)
> • Run 3: 351,309 TPS (latency: 0.023 ms)
> • Run 4: 333,035 TPS (latency: 0.024 ms)
> • Run 5: 353,834 TPS (latency: 0.023 ms)
>
> This shows roughly a 2x improvement in TPS in this basic benchmark.

# Benchmarks on my MacBook Pro (Apple M3 Max, 16 cores, 128 GB memory)

## NOTIFY only

~160k TPS master (HEAD)
~340k TPS (0001-allow-out-of-order-notifications.patch)
=> ~125% improvement

### master (HEAD)

% cat notify_common.sql
NOTIFY channel_common;

% for n in `seq 1 5` ; do pgbench -f notify_common.sql -c 8 -t 2000 -n | grep -E '^(latency|tps)' ; done
latency average = 0.052 ms
tps = 154326.941626 (without initial connection time)
latency average = 0.049 ms
tps = 162334.368215 (without initial connection time)
latency average = 0.050 ms
tps = 160703.883008 (without initial connection time)
latency average = 0.048 ms
tps = 165296.086615 (without initial connection time)
latency average = 0.049 ms
tps = 163706.310878 (without initial connection time)

### 0001-allow-out-of-order-notifications.patch

% cat notify_common.sql
NOTIFY channel_common;

% for n in `seq 1 5` ; do PGOPTIONS='-c publish_notifications_out_of_order=true' pgbench -f notify_common.sql -c 8 -t
2000-n | grep -E '^(latency|tps)' ; done 

latency average = 0.026 ms
tps = 310149.647205 (without initial connection time)
latency average = 0.021 ms
tps = 380427.029340 (without initial connection time)
latency average = 0.025 ms
tps = 320108.837005 (without initial connection time)
latency average = 0.024 ms
tps = 333500.083375 (without initial connection time)
latency average = 0.022 ms
tps = 357965.859006 (without initial connection time)

## LISTEN+NOTIFY ####

~73k TPS master (HEAD)
~83k TPS (0001-allow-out-of-order-notifications.patch)
=> ~14% improvement

### master (HEAD)

% cat listen_notify_common.sql
LISTEN channel_common;
NOTIFY channel_common;

% for n in `seq 1 5` ; do pgbench -f listen_notify_common.sql -c 8 -t 2000 -n | grep -E '^(latency|tps)' ; done
latency average = 0.112 ms
tps = 71677.201722 (without initial connection time)
latency average = 0.109 ms
tps = 73228.220325 (without initial connection time)
latency average = 0.109 ms
tps = 73310.423826 (without initial connection time)
latency average = 0.108 ms
tps = 73995.625009 (without initial connection time)
latency average = 0.113 ms
tps = 70970.431944 (without initial connection time)

### 0001-allow-out-of-order-notifications.patch

% for n in `seq 1 5` ; do PGOPTIONS='-c publish_notifications_out_of_order=true' pgbench -f listen_notify_common.sql -c
8-t 2000 -n | grep -E '^(latency|tps)' ; done 
latency average = 0.098 ms
tps = 81620.992919 (without initial connection time)
latency average = 0.095 ms
tps = 84173.755675 (without initial connection time)
latency average = 0.096 ms
tps = 83634.329802 (without initial connection time)
latency average = 0.095 ms
tps = 84311.700356 (without initial connection time)
latency average = 0.096 ms
tps = 83340.278357 (without initial connection time)

For a normal PostgreSQL with the CPU and storage on the same physical machine,
I think the results above clearly demonstrate that the global exclusive lock
is at least not the bottleneck, which I strongly believe instead is the flood of
unnecessary kill(pid, SIGUSR1) syscalls.

If anyone with access to a cloud environment, with compute and storage
separated, like suggested by Rishu, it would be interesting to see
what benchmark results you get.

/Joel

Re: Proposal: Out-of-Order NOTIFY via GUC to Improve LISTEN/NOTIFY Throughput

От

"Joel Jacobson"

Дата:

30 июля, 12:03:59

On Tue, Jul 22, 2025, at 14:48, Joel Jacobson wrote:
> Benchmark from original post:
...
> For a normal PostgreSQL with the CPU and storage on the same physical machine,
> I think the results above clearly demonstrate that the global exclusive lock
> is at least not the bottleneck, which I strongly believe instead is the flood of
> unnecessary kill(pid, SIGUSR1) syscalls.

I was wrong here. This is much more complex than I initially thought.

After some additional benchmarking and analyzing perf results,
I realize the bottleneck depends on the workload,
which is either the kill() syscalls *or* the heavyweight lock.

Here is one scenario where the heavyweight lock actually *is* the bottleneck:

1 session does LISTEN
pgbench -f notify.sql -c 1000 -j 8 -T 60 -n

Simply commenting out the heavyweight lock gives a dramatic difference:
tps = 7679 (with heavyweight lock; in commit order)
tps = 95430 (without heavyweight lock; not in commit order)

My conclusion so far is that we would greatly benefit both from
reducing/eliminating kill() syscalls, as well as finding ways to avoid
the heavyweight lock while preserving commit order.

/Joel

Re: Proposal: Out-of-Order NOTIFY via GUC to Improve LISTEN/NOTIFY Throughput

От

Rishu Bagga

Дата:

05 сентября, 01:53:27

On Fri, Jul 18, 2025 at 10:06 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

> After thinking about this for awhile, I have a rough idea of
> something we could do to improve parallelism of NOTIFY.
> As a bonus, this'd allow processes on hot standby servers to
> receive NOTIFYs from processes on the primary, which is a
> feature many have asked for.
>
> The core thought here was to steal some implementation ideas
> from two-phase commit.  I initially thought maybe we could
> remove the SLRU queue entirely, and maybe we can still find
> a way to do that, but in this sketch it's still there with
> substantially reduced traffic.
>
> The idea basically is to use the WAL log rather than SLRU
> as transport for notify messages.
>
> 1. In PreCommit_Notify(), gather up all the notifications this
> transaction has emitted, and write them into a WAL log message.
> Remember the LSN of this message.  (I think this part should be
> parallelizable, because of work that's previously been done to
> allow parallel writes to WAL.)
>
> 2. When writing the transaction's commit WAL log entry, include
> the LSN of the previous notify-data entry.
>
> 3. Concurrently with writing the commit entry, send a message
> to the notify SLRU queue.  This would be a small fixed-size
> message with the transaction's XID, database ID, and the LSN
> of the notify-data WAL entry.  (The DBID is there to let
> listeners quickly ignore traffic from senders in other DBs.)
>
> 4. Signal listening backends to check the queue, as we do now.
>
> 5. Listeners read the SLRU queue and then, if in same database,
> pull the notify data out of the WAL log.  (I believe we already
> have enough infrastructure to make that cheap, because 2-phase
> commit does it too.)
>
> In the simplest implementation of this idea, step 3 would still
> require a global lock, to ensure that SLRU entries are made in
> commit order.  However, that lock only needs to be held for the
> duration of step 3, which is much shorter than what happens now.

Attached is an initial patch that implements this idea.

There is still some
work to be done around how to handle truncation / vacuum for the new
approach, and testing replication of notifications onto a reader instance.

That being said, I ran some basic benchmarking to stress concurrent
notifications.

With the following sql script, I ran
pgbench -T 100 -c 100 -j 8 -f pgbench_transaction_notify.sql  -d postgres

BEGIN;
INSERT INTO test VALUES(1);
NOTIFY benchmark_channel, 'transaction_completed';
COMMIT;

With the patch 3 runs showed the following TPS:

tps = 66372.705917
tps = 63445.909465
tps = 64412.544339

Without the patch, we got the following TPS:

tps = 30212.390982
tps = 30908.865812
tps = 29191.388601

So, there is about a 2x increase in TPS at 100 connections, which establishes
some promise in the approach.

Additionally, this would help solve the issue being discussed in a
separate thread [1],
where listeners currently rely on the transaction log to verify if a
transaction that it reads
has indeed committed, but it is possible that the portion of the
transaction log has
been truncated by vacuum.

Would appreciate any thoughts on the direction of this patch.


Thanks, Rishu

[1] https://www.postgresql.org/message-id/CAK98qZ3wZLE-RZJN_Y%
2BTFjiTRPPFPBwNBpBi5K5CU8hUHkzDpw%40mail.gmail.com

Вложения

notify-through-wal.patch

Re: Proposal: Out-of-Order NOTIFY via GUC to Improve LISTEN/NOTIFY Throughput

От

"Joel Jacobson"

Дата:

05 сентября, 12:38:33

On Fri, Sep 5, 2025, at 00:53, Rishu Bagga wrote:
> With the following sql script, I ran
> pgbench -T 100 -c 100 -j 8 -f pgbench_transaction_notify.sql  -d postgres
>
> BEGIN;
> INSERT INTO test VALUES(1);
> NOTIFY benchmark_channel, 'transaction_completed';
> COMMIT;

Thanks for working on this.
I haven't looked at the code yet, but have some questions on the benchmark.

What's the definition of the test table?

> With the patch 3 runs showed the following TPS:
>
> tps = 66372.705917
> tps = 63445.909465
> tps = 64412.544339
>
> Without the patch, we got the following TPS:
>
> tps = 30212.390982
> tps = 30908.865812
> tps = 29191.388601
>
> So, there is about a 2x increase in TPS at 100 connections, which establishes
> some promise in the approach.

In your benchmark, how many backends were doing `LISTEN benchmark_channel;`?

It would be interesting if you could run the benchmark and vary the number of listeners,
e.g. with 1, 10, 100 listeners, to understand the effect of this patch for different type of
workloads.

> Additionally, this would help solve the issue being discussed in a
> separate thread [1],
> where listeners currently rely on the transaction log to verify if a
> transaction that it reads
> has indeed committed, but it is possible that the portion of the
> transaction log has
> been truncated by vacuum.

Nice!

/Joel

Re: Proposal: Out-of-Order NOTIFY via GUC to Improve LISTEN/NOTIFY Throughput

От

Arseniy Mukhin

Дата:

05 сентября, 20:31:27

Hi,

On Fri, Sep 5, 2025 at 1:53 AM Rishu Bagga <rishu.postgres@gmail.com> wrote:
>
> Attached is an initial patch that implements this idea.

Thank you for sharing the patch and for working on this!

I briefly read the patch and there are some points:

1) There are some compilation warnings about unused functions.

2) I thought the idea was to add notification_data_entry_lsn to the
transaction's commit wal entry, but the patch introduced the new
XLOG_ASYNC_NOTIFY_COMMIT wal entry and put lsn in it. Also
XLOG_ASYNC_NOTIFY_COMMIT is written in AtCommit_Notify, when a
transaction is already committed, so we write it after the commit
entry. I don't think we can write anything about the transaction to
the wal after we wrote the transaction commit record. In my
understanding all transaction stuff in wal should be before commit /
abort entries. At least because during committing we sync wal with
disk up to the commit entry, and we can't guarantee that anything
after a commit entry will survive a crash. Anyway I think we just
don't need this new wal entry.

3) I see you removed global lock, but I don't see where you use lock
for step 3 [0] to provide the commit order of notifications. I thought
step 3 should look something like that:

        if (have_notifications)
           acquire_lock();

        ...

        XactLogCommitRecord();

        ....

        if (have_notifications)
        {
             asyncQueueAddCompactEntry();
             release_lock();
        }

So this way we will have the same order for commit records in wal and
entries in the listen/notify queue. I'm not sure what level we should
add this stuff, but it seems that to minimise the time we hold the
lock we need to release it before calling XLogFlush(). I'm not sure
about it.

Best regards,
Arseniy Mukhin

Re: Proposal: Out-of-Order NOTIFY via GUC to Improve LISTEN/NOTIFY Throughput

От

"Matheus Alcantara"

Дата:

06 сентября, 17:52:34

On Thu Sep 4, 2025 at 7:53 PM -03, Rishu Bagga wrote:
> Attached is an initial patch that implements this idea.
>
Thanks for working on this! I haven't looked at the code yet, but have
some questions related with the issue discussed at [1].

> Additionally, this would help solve the issue being discussed in a
> separate thread [1],
> where listeners currently rely on the transaction log to verify if a
> transaction that it reads
> has indeed committed, but it is possible that the portion of the
> transaction log has
> been truncated by vacuum.
>
Your patch already aims to fix the issue? On [2] I implemented a TAP
test that reproduce the issue and I tried to execute using your patch
and I still see the error. I'm attaching the TAP test isolated and maybe
we could incorporate into your patch series to ensure that the issue is
fixed? What do you think?

[1] https://www.postgresql.org/message-id/CAK98qZ3wZLE-RZJN_Y%2BTFjiTRPPFPBwNBpBi5K5CU8hUHkzDpw%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CAFY6G8cJm73_MM9SuynZUqtqcaTuepUDgDuvS661oLW7U0dgsg%40mail.gmail.com

--
Matheus Alcantara

Вложения

0001-Add-TAP-test-for-LISTEN-NOTIFY-xid-vacuum-freeze-bug.patch.nocfbot

Re: Proposal: Out-of-Order NOTIFY via GUC to Improve LISTEN/NOTIFY Throughput

От

Rishu Bagga

Дата:

09 сентября, 03:08:08

Hi Joel, Arseniy, Matheus, thanks for taking a look. I’ve attached an
updated patch and rebased on the latest commits that fixes the
correctness issues.

On Fri, Sep 5, 2025 at 2:38 AM Joel Jacobson <joel@compiler.org> wrote:
> What's the definition of the test table?

It’s just a one column integer table defined as such:

CREATE TABLE test(x int);

> In your benchmark, how many backends were doing
> `LISTEN benchmark_channel;`?

I just tested the notify codepath throughput, with no listeners.

> It would be interesting if you could run the benchmark and vary the
> number of listeners, e.g. with 1, 10, 100 listeners, to understand the
> effect of this patch for different type of workloads.

IIUC, the current benchmark wouldn’t be affected by listeners - the
commit of the notifying transaction is independent from the listener
reading the transaction. I agree that it would be useful to test listener
performance as well. I will work on this shortly.

On Fri, Sep 5, 2025 at 10:31 AM Arseniy Mukhin
<arseniy.mukhin.dev@gmail.com> wrote:

> 1) There are some compilation warnings about unused functions.

Taken care of in the attached patch.

> 2) I thought the idea was to add notification_data_entry_lsn to the
> transaction's commit wal entry, but the patch introduced the new
> XLOG_ASYNC_NOTIFY_COMMIT wal entry and put lsn in it. Also
> XLOG_ASYNC_NOTIFY_COMMIT is written in AtCommit_Notify, when a
> transaction is already committed, so we write it after the commit
> entry. I don't think we can write anything about the transaction to
> the wal after we wrote the transaction commit record. In my
> understanding all transaction stuff in wal should be before commit /
> abort entries. At least because during committing we sync wal with
> disk up to the commit entry, and we can't guarantee that anything
> after a commit entry will survive a crash. Anyway I think we just
> don't need this new wal entry.

I removed the XLOG_ASYNC_NOTIFY_COMMIT record in the new patch,
in favor of including the LSN of the notification as a new field in the
commit record itself. The reasoning here is that Notify data is WAL’d
before transaction commit, so during WAL replay, we need some way
to know whether the notification should be added to the notify queue.
In the previous patch we were using XLOG_ASYNC_NOTIFY_COMMIT
record, but actually this doesn’t work, because we could write this
record before commit, or we could commit and crash before writing
this record. So the best solution is to have a field within the commit
record, that tells us which log record contains the notifications for that
transaction, if there are any.

> I see you removed global lock, but I don't see where you use lock
> for step 3 [0] to provide the commit order of notifications.

Fixed this in the updated patch.

On Sat, Sep 6, 2025 at 7:52 AM Matheus Alcantara
<matheusssilv97@gmail.com> wrote:

> Your patch already aims to fix the issue? On [2] I implemented a TAP
> test that reproduce the issue and I tried to execute using your patch
> and I still see the error. I'm attaching the TAP test isolated and
> maybe we could incorporate into your patch series to ensure that the
> issue is fixed? What do you think?

I wasn’t able to run the TAP tests; however, in the updated patch, we
can be confident that entries in the queue are from committed
transactions. If there is a crash before committing and after writing to
the queue, this would be within the critical section, so a notification
from an uncommitted transaction would never be read in the queue.
That allows us to remove the XidInMVCCSnapshot and
TransactionIdDidCommit check.

Thanks,
Rishu

Вложения

notify-through-wal-v2.patch

Re: Proposal: Out-of-Order NOTIFY via GUC to Improve LISTEN/NOTIFY Throughput

От

"Matheus Alcantara"

Дата:

10 сентября, 01:34:39

On Mon Sep 8, 2025 at 9:08 PM -03, Rishu Bagga wrote:
> Hi Joel, Arseniy, Matheus, thanks for taking a look. I’ve attached an
> updated patch and rebased on the latest commits that fixes the
> correctness issues.
>
I think that your latest patch miss to include of asyncdesc.c file. I'm
getting a compile error:

src/backend/access/rmgrdesc/meson.build:4:20: ERROR: File asyncdesc.c does not exist.

--
Matheus Alcantara

Re: Proposal: Out-of-Order NOTIFY via GUC to Improve LISTEN/NOTIFY Throughput

От

Rishu Bagga

Дата:

10 сентября, 01:49:34

On Tue, Sep 9, 2025 at 3:34 PM Matheus Alcantara
<matheusssilv97@gmail.com> wrote:
>
> On Mon Sep 8, 2025 at 9:08 PM -03, Rishu Bagga wrote:
> > Hi Joel, Arseniy, Matheus, thanks for taking a look. I’ve attached an
> > updated patch and rebased on the latest commits that fixes the
> > correctness issues.
> >
> I think that your latest patch miss to include of asyncdesc.c file. I'm
> getting a compile error:
>
> src/backend/access/rmgrdesc/meson.build:4:20: ERROR: File asyncdesc.c does not exist.
>
> --
> Matheus Alcantara

Sorry about that, added the asyncdesc.c file in the attached patch.

Вложения

notify-through-wal-v3.patch

Re: Proposal: Out-of-Order NOTIFY via GUC to Improve LISTEN/NOTIFY Throughput

От

"Matheus Alcantara"

Дата:

10 сентября, 02:02:04

On Tue Sep 9, 2025 at 7:49 PM -03, Rishu Bagga wrote:
> On Tue, Sep 9, 2025 at 3:34 PM Matheus Alcantara
> <matheusssilv97@gmail.com> wrote:
>>
>> On Mon Sep 8, 2025 at 9:08 PM -03, Rishu Bagga wrote:
>> > Hi Joel, Arseniy, Matheus, thanks for taking a look. I’ve attached an
>> > updated patch and rebased on the latest commits that fixes the
>> > correctness issues.
>> >
>> I think that your latest patch miss to include of asyncdesc.c file. I'm
>> getting a compile error:
>>
>> src/backend/access/rmgrdesc/meson.build:4:20: ERROR: File asyncdesc.c does not exist.
>>
>> --
>> Matheus Alcantara
>
> Sorry about that, added the asyncdesc.c file in the attached patch.
>
I'm still seeing some compiler errors.

FAILED: src/backend/postgres_lib.a.p/access_rmgrdesc_asyncdesc.c.o
ccache cc -Isrc/backend/postgres_lib.a.p -Isrc/include -I../src/include -I/opt/homebrew/Cellar/icu4c@77/77.1/include
-I/opt/homebrew/opt/lz4/include-I/opt/homebrew/Cellar/openssl@3/3.5.2/include -I/opt/homebrew/opt/zstd/include
-I/opt/homebrew/opt/openssl@3/include-fdiagnostics-color=always -Wall -Winvalid-pch -O0 -g -isysroot
/Library/Developer/CommandLineTools/SDKs/MacOSX15.2.sdk-fno-strict-aliasing -fwrapv -fexcess-precision=standard
-Wmissing-prototypes-Wpointer-arith -Werror=vla -Werror=unguarded-availability-new -Wendif-labels
-Wmissing-format-attribute-Wcast-function-type -Wformat-security -Wdeclaration-after-statement
-Wmissing-variable-declarations-Wno-unused-command-line-argument -Wno-compound-token-split-by-macro
-Wno-cast-function-type-strict-DBUILDING_DLL -MD -MQ src/backend/postgres_lib.a.p/access_rmgrdesc_asyncdesc.c.o -MF
src/backend/postgres_lib.a.p/access_rmgrdesc_asyncdesc.c.o.d-o
src/backend/postgres_lib.a.p/access_rmgrdesc_asyncdesc.c.o-c ../src/backend/access/rmgrdesc/asyncdesc.c 
../src/backend/access/rmgrdesc/asyncdesc.c:17:10: fatal error: 'access/async_xlog.h' file not found
   17 | #include "access/async_xlog.h"
      |          ^~~~~~~~~~~~~~~~~~~~~
1 error generated.


I'm missing something here? I'm using meson+ninja and I remove the build
directory, executed meson setup and then ninja -C build install and I
got this error.

--
Matheus Alcantara

Re: Proposal: Out-of-Order NOTIFY via GUC to Improve LISTEN/NOTIFY Throughput

От

Rishu Bagga

Дата:

10 сентября, 02:14:22

On Tue, Sep 9, 2025 at 4:02 PM Matheus Alcantara
<matheusssilv97@gmail.com> wrote:

>
> FAILED: src/backend/postgres_lib.a.p/access_rmgrdesc_asyncdesc.c.o
> ccache cc -Isrc/backend/postgres_lib.a.p -Isrc/include -I../src/include -I/opt/homebrew/Cellar/icu4c@77/77.1/include
-I/opt/homebrew/opt/lz4/include-I/opt/homebrew/Cellar/openssl@3/3.5.2/include -I/opt/homebrew/opt/zstd/include
-I/opt/homebrew/opt/openssl@3/include-fdiagnostics-color=always -Wall -Winvalid-pch -O0 -g -isysroot
/Library/Developer/CommandLineTools/SDKs/MacOSX15.2.sdk-fno-strict-aliasing -fwrapv -fexcess-precision=standard
-Wmissing-prototypes-Wpointer-arith -Werror=vla -Werror=unguarded-availability-new -Wendif-labels
-Wmissing-format-attribute-Wcast-function-type -Wformat-security -Wdeclaration-after-statement
-Wmissing-variable-declarations-Wno-unused-command-line-argument -Wno-compound-token-split-by-macro
-Wno-cast-function-type-strict-DBUILDING_DLL -MD -MQ src/backend/postgres_lib.a.p/access_rmgrdesc_asyncdesc.c.o -MF
src/backend/postgres_lib.a.p/access_rmgrdesc_asyncdesc.c.o.d-o
src/backend/postgres_lib.a.p/access_rmgrdesc_asyncdesc.c.o-c ../src/backend/access/rmgrdesc/asyncdesc.c 
> ../src/backend/access/rmgrdesc/asyncdesc.c:17:10: fatal error: 'access/async_xlog.h' file not found
>    17 | #include "access/async_xlog.h"
>       |          ^~~~~~~~~~~~~~~~~~~~~
> 1 error generated.
>
>
> I'm missing something here? I'm using meson+ninja and I remove the build
> directory, executed meson setup and then ninja -C build install and I
> got this error.

Oops again - I didn't "git add" the new files, so they weren't showing
up in the patch. I added async_xlog.h as well now,
and tested to make sure the patch applies and compiles. Sorry about
that, it should work now.

Вложения

notify-through-wal-v4.patch

Re: Proposal: Out-of-Order NOTIFY via GUC to Improve LISTEN/NOTIFY Throughput

От

"Matheus Alcantara"

Дата:

10 сентября, 16:05:21

On Tue Sep 9, 2025 at 8:14 PM -03, Rishu Bagga wrote:
> Oops again - I didn't "git add" the new files, so they weren't showing
> up in the patch. I added async_xlog.h as well now,
> and tested to make sure the patch applies and compiles. Sorry about
> that, it should work now.
>
Thanks, it's compiling now.

You wrote:
> I wasn’t able to run the TAP tests; however, in the updated patch, we
> can be confident that entries in the queue are from committed
> transactions. If there is a crash before committing and after writing to
> the queue, this would be within the critical section, so a notification
> from an uncommitted transaction would never be read in the queue.
> That allows us to remove the XidInMVCCSnapshot and
> TransactionIdDidCommit check.
>
I've executed the TAP test that I've attached on [1] and the test is
failing due to an assert failure:
TRAP: failed Assert("CritSectionCount == 0 || (context)->allowInCritSection"), File:
"../src/backend/utils/mmgr/mcxt.c",Line: 1372, PID: 79968 
0   postgres                            0x0000000105661860 ExceptionalCondition + 236
1   postgres                            0x00000001056b7a0c palloc + 248
2   postgres                            0x0000000104f96850 SignalBackends + 36
3   postgres                            0x0000000104edd50c XactLogCommitRecord + 1632
4   postgres                            0x0000000104ee0188 RecordTransactionCommit + 860
5   postgres                            0x0000000104edcb8c CommitTransaction + 896
6   postgres                            0x0000000104ed7b38 CommitTransactionCommandInternal + 256
7   postgres                            0x0000000104ed7a24 CommitTransactionCommand + 16
8   postgres                            0x0000000105402220 finish_xact_command + 32
9   postgres                            0x00000001053ffb60 exec_simple_query + 1556
10  postgres                            0x00000001053feb78 PostgresMain + 3424
11  postgres                            0x00000001053f5504 BackendInitialize + 0
12  postgres                            0x00000001052c11f4 postmaster_child_launch + 492
13  postgres                            0x00000001052c9660 BackendStartup + 336
14  postgres                            0x00000001052c70e0 ServerLoop + 432
15  postgres                            0x00000001052c5a38 PostmasterMain + 7096
16  postgres                            0x0000000105166488 main + 952
17  dyld                                0x000000019145ab98 start + 6076

There is also some other tests failing, like isolation, regress and
others.

[1] https://www.postgresql.org/message-id/DCLSWKOKDAX4.3HS2NBE53P0M2%40gmail.com

--
Matheus Alcantara

Re: Proposal: Out-of-Order NOTIFY via GUC to Improve LISTEN/NOTIFY Throughput

От

Arseniy Mukhin

Дата:

10 сентября, 22:00:11

Hi,

Thank you for the new version! There are several points I noticed:

1)

On Tue, Sep 9, 2025 at 3:08 AM Rishu Bagga <rishu.postgres@gmail.com> wrote:
>
> ...
> On Sat, Sep 6, 2025 at 7:52 AM Matheus Alcantara
> <matheusssilv97@gmail.com> wrote:
>
> > Your patch already aims to fix the issue? On [2] I implemented a TAP
> > test that reproduce the issue and I tried to execute using your patch
> > and I still see the error. I'm attaching the TAP test isolated and
> > maybe we could incorporate into your patch series to ensure that the
> > issue is fixed? What do you think?
>
> I wasn’t able to run the TAP tests; however, in the updated patch, we
> can be confident that entries in the queue are from committed
> transactions. If there is a crash before committing and after writing to
> the queue, this would be within the critical section, so a notification
> from an uncommitted transaction would never be read in the queue.
> That allows us to remove the XidInMVCCSnapshot and
> TransactionIdDidCommit check.
>

I agree that listeners don't need to check if the notify transaction
was committed or not anymore, but it seems that listeners still need
to wait until the notify transaction is completed before sending its
notifications. I believe we should continue using XidInMVCCSnapshot as
the current version does.

2) Now we have two calls to SignalBackends (is it intentional?). The
first in AtCommit_Notify() which seems correct to me. The second in
XactLogCommitRecord() seems too early, because other backends can't
process new notifications at this point (if the point about
XidInMVCCSnapshot is correct). And there is Assert failure (Matheus'
report is about it).

3) I think we still need to check if the queue is full or not while
adding new entries and ereport if it is (entries are smaller now, but
we still can run out of space). And it seems to me that we should do
this check before entering the critical section, because in the
critical section it will result in PANIC. Master results in ERROR in
case of the full queue, so should we do the same here?

4) I don't know, but It seems to me that XactLogCommitRecord is not
the right place for adding listen/notify logic, it seems to be only
about wal stuff.

5) Do we want to use NotifyQueueLock as a lock that provides the
commit order? Maybe we can introduce another lock to avoid unnecessary
contantion? For example, if we use NotifyQueueLock, we block listeners
that want to read new notifications while we are inserting a commit
record, which seems unnecessary.

6) We have fixed size queue entries, so I think we don't need this
"padding" logic at the end of the page anymore, because we know how
many entries we can have on each page.

BTW, what do you think about creating a separate thread for the patch?
The current thread's subject seems a bit irrelevant.

Best regards,
Arseniy Mukhin

Re: Proposal: Out-of-Order NOTIFY via GUC to Improve LISTEN/NOTIFY Throughput

От

Masahiko Sawada

Дата:

11 сентября, 02:30:56

On Thu, Sep 4, 2025 at 3:53 PM Rishu Bagga <rishu.postgres@gmail.com> wrote:
>
> On Fri, Jul 18, 2025 at 10:06 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> > After thinking about this for awhile, I have a rough idea of
> > something we could do to improve parallelism of NOTIFY.
> > As a bonus, this'd allow processes on hot standby servers to
> > receive NOTIFYs from processes on the primary, which is a
> > feature many have asked for.
> >
> > The core thought here was to steal some implementation ideas
> > from two-phase commit.  I initially thought maybe we could
> > remove the SLRU queue entirely, and maybe we can still find
> > a way to do that, but in this sketch it's still there with
> > substantially reduced traffic.
> >
> > The idea basically is to use the WAL log rather than SLRU
> > as transport for notify messages.
> >
> > 1. In PreCommit_Notify(), gather up all the notifications this
> > transaction has emitted, and write them into a WAL log message.
> > Remember the LSN of this message.  (I think this part should be
> > parallelizable, because of work that's previously been done to
> > allow parallel writes to WAL.)
> >
> > 2. When writing the transaction's commit WAL log entry, include
> > the LSN of the previous notify-data entry.
> >
> > 3. Concurrently with writing the commit entry, send a message
> > to the notify SLRU queue.  This would be a small fixed-size
> > message with the transaction's XID, database ID, and the LSN
> > of the notify-data WAL entry.  (The DBID is there to let
> > listeners quickly ignore traffic from senders in other DBs.)
> >
> > 4. Signal listening backends to check the queue, as we do now.
> >
> > 5. Listeners read the SLRU queue and then, if in same database,
> > pull the notify data out of the WAL log.  (I believe we already
> > have enough infrastructure to make that cheap, because 2-phase
> > commit does it too.)
> >
> > In the simplest implementation of this idea, step 3 would still
> > require a global lock, to ensure that SLRU entries are made in
> > commit order.  However, that lock only needs to be held for the
> > duration of step 3, which is much shorter than what happens now.
>
> Attached is an initial patch that implements this idea.
>
> There is still some
> work to be done around how to handle truncation / vacuum for the new
> approach, and testing replication of notifications onto a reader instance.
>
> That being said, I ran some basic benchmarking to stress concurrent
> notifications.
>
> With the following sql script, I ran
> pgbench -T 100 -c 100 -j 8 -f pgbench_transaction_notify.sql  -d postgres
>
> BEGIN;
> INSERT INTO test VALUES(1);
> NOTIFY benchmark_channel, 'transaction_completed';
> COMMIT;
>
> With the patch 3 runs showed the following TPS:
>
> tps = 66372.705917
> tps = 63445.909465
> tps = 64412.544339
>
> Without the patch, we got the following TPS:
>
> tps = 30212.390982
> tps = 30908.865812
> tps = 29191.388601
>
> So, there is about a 2x increase in TPS at 100 connections, which establishes
> some promise in the approach.

Looks promising improvement.

>
> Additionally, this would help solve the issue being discussed in a
> separate thread [1],
> where listeners currently rely on the transaction log to verify if a
> transaction that it reads
> has indeed committed, but it is possible that the portion of the
> transaction log has
> been truncated by vacuum.

With your patch, since the backends get the notification by reading
WAL records do we need to prevent WAL records that potentially have
unconsumed notification from being removed by the checkpointer? Or we
can save unconsumed notifications in WAL records to somewhere during
the checkpoint as we do for 2PC transactions.

Also, could you add this patch to the next commit fest[1] if not yet?

Regards,

[1] https://commitfest.postgresql.org/56/


--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Proposal: Out-of-Order NOTIFY via GUC to Improve LISTEN/NOTIFY Throughput

От

Masahiko Sawada

Дата:

12 сентября, 21:03:37

On Wed, Sep 10, 2025 at 12:00 PM Arseniy Mukhin
<arseniy.mukhin.dev@gmail.com> wrote:
>
>
> 6) We have fixed size queue entries, so I think we don't need this
> "padding" logic at the end of the page anymore, because we know how
> many entries we can have on each page.

+1

Probably we also no longer need to have a pair of page number and
offset number as an entry queue's position since AsyncQueueEntry is
now a fixed size.

> BTW, what do you think about creating a separate thread for the patch?
> The current thread's subject seems a bit irrelevant.

+1

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Proposal: Out-of-Order NOTIFY via GUC to Improve LISTEN/NOTIFY Throughput

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения