Обсуждение: Report bytes and transactions actually sent downtream

Поиск

Список

Период

Сортировка

Report bytes and transactions actually sent downtream

От

Ashutosh Bapat

Дата:

30 июня 2025 г., 12:53:59

Hi All,
In a recent logical replication issue, there were multiple replication
slots involved, each using a different publication. Thus the amount of
data that was replicated through each slot was expected to be
different. However, total_bytes and total_txns were reported the same
for all the replication slots as expected. One of the slots started
lagging and we were trying to figure out whether its the WAL sender
slowing down or the consumer  (in this case Debezium). The lagging
slot then showed total_txns and total_bytes lesser than other slots
giving an impression that the WAL sender is processing the data
slowly. Had pg_stat_replication_slot reported the amount of data
actually sent downstream, it would have been easier to compare it with
the amount of data received by the consumer and thus pinpoint the
bottleneck.

Here's a patch to do the same. It adds two columns
    - sent_txns: The total number of transactions sent downstream.
    - sent_bytes: The total number of bytes sent downstream in data messages
to pg_stat_replication_slots. sent_bytes includes only the bytes sent
as part of 'd' messages and does not include keep alive messages or
CopyDone messages for example. But those are very few and can be
ignored. If others feel that those are important to be included, we
can make that change.

Plugins may choose not to send an empty transaction downstream. It's
better to increment sent_txns counter in the plugin code when it
actually sends a BEGIN message, for example in pgoutput_send_begin()
and pg_output_begin(). This means that every plugin will need to be
modified to increment the counter for it to reported correctly.

I first thought of incrementing sent_bytes in OutputPluginWrite()
which is a central function for all logical replication message
writes. But that calls LogicalDecodingContext::write() which may
further add bytes to the message e.g. WalSndWriteData() and
LogicalOutputWrite(). So it's better to increment the counter in
implementations of LogicalDecodingContext::write(), so that we count
the exact number of bytes. These implementations are within core code
so they won't miss updating sent_bytes.

I think we should rename total_txns and total_bytes to reordered_txns
and reordered_bytes respectively, and also update the documentation
accordingly to make better sense of those numbers. But these patches
do not contain that change. If others feel the same way, I will
provide a patch with that change.

-- 
Best Wishes,
Ashutosh Bapat

Вложения

0001-Report-data-sent-statistics-in-pg_stat_repl-20250630.patch

Re: Report bytes and transactions actually sent downtream

От

Amit Kapila

Дата:

01 июля 2025 г., 13:52:55

On Mon, Jun 30, 2025 at 3:24 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> Hi All,
> In a recent logical replication issue, there were multiple replication
> slots involved, each using a different publication. Thus the amount of
> data that was replicated through each slot was expected to be
> different. However, total_bytes and total_txns were reported the same
> for all the replication slots as expected. One of the slots started
> lagging and we were trying to figure out whether its the WAL sender
> slowing down or the consumer  (in this case Debezium). The lagging
> slot then showed total_txns and total_bytes lesser than other slots
> giving an impression that the WAL sender is processing the data
> slowly. Had pg_stat_replication_slot reported the amount of data
> actually sent downstream, it would have been easier to compare it with
> the amount of data received by the consumer and thus pinpoint the
> bottleneck.
>
> Here's a patch to do the same. It adds two columns
>     - sent_txns: The total number of transactions sent downstream.
>     - sent_bytes: The total number of bytes sent downstream in data messages
> to pg_stat_replication_slots. sent_bytes includes only the bytes sent
> as part of 'd' messages and does not include keep alive messages or
> CopyDone messages for example. But those are very few and can be
> ignored. If others feel that those are important to be included, we
> can make that change.
>
> Plugins may choose not to send an empty transaction downstream. It's
> better to increment sent_txns counter in the plugin code when it
> actually sends a BEGIN message, for example in pgoutput_send_begin()
> and pg_output_begin(). This means that every plugin will need to be
> modified to increment the counter for it to reported correctly.
>

What if some plugin didn't implemented it or does it incorrectly?
Users will then complain that PG view is showing incorrect value.
Shouldn't the plugin specific stats be shown differently, for example,
one may be interested in how much plugin has filtered the data because
it was not published or because something like row_filter caused it
skip sending such data?

--
With Regards,
Amit Kapila.

Re: Report bytes and transactions actually sent downtream

От

Ashutosh Bapat

Дата:

01 июля 2025 г., 17:05:18

On Tue, Jul 1, 2025 at 4:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Jun 30, 2025 at 3:24 PM Ashutosh Bapat
> <ashutosh.bapat.oss@gmail.com> wrote:
> >
> > Hi All,
> > In a recent logical replication issue, there were multiple replication
> > slots involved, each using a different publication. Thus the amount of
> > data that was replicated through each slot was expected to be
> > different. However, total_bytes and total_txns were reported the same
> > for all the replication slots as expected. One of the slots started
> > lagging and we were trying to figure out whether its the WAL sender
> > slowing down or the consumer  (in this case Debezium). The lagging
> > slot then showed total_txns and total_bytes lesser than other slots
> > giving an impression that the WAL sender is processing the data
> > slowly. Had pg_stat_replication_slot reported the amount of data
> > actually sent downstream, it would have been easier to compare it with
> > the amount of data received by the consumer and thus pinpoint the
> > bottleneck.
> >
> > Here's a patch to do the same. It adds two columns
> >     - sent_txns: The total number of transactions sent downstream.
> >     - sent_bytes: The total number of bytes sent downstream in data messages
> > to pg_stat_replication_slots. sent_bytes includes only the bytes sent
> > as part of 'd' messages and does not include keep alive messages or
> > CopyDone messages for example. But those are very few and can be
> > ignored. If others feel that those are important to be included, we
> > can make that change.
> >
> > Plugins may choose not to send an empty transaction downstream. It's
> > better to increment sent_txns counter in the plugin code when it
> > actually sends a BEGIN message, for example in pgoutput_send_begin()
> > and pg_output_begin(). This means that every plugin will need to be
> > modified to increment the counter for it to reported correctly.
> >
>
> What if some plugin didn't implemented it or does it incorrectly?
> Users will then complain that PG view is showing incorrect value.

That is right.

To fix the problem of plugins not implementing the counter increment
logic we could use logic similar to how we track whether
OutputPluginPrepareWrite() has been called or not. In
ReorderBufferTxn, we add a new member sent_status which would be an
enum with 3 values UNKNOWN, SENT, NOT_SENT. Initially the sent_status
= UNKNOWN. We provide a function called
plugin_sent_txn(ReorderBufferTxn txn, sent bool) which will set
sent_status = SENT when sent = true and sent_status = NOT_SENT when
sent = false. In all the end transaction callback wrappers like
commit_cb_wrapper(), prepare_cb_wrapper(), stream_abort_cb_wrapper(),
stream_commit_cb_wrapper() and stream_prepare_cb_wrapper(), if
tsent_status = UNKNOWN, we throw an error. If sent_status = SENT, we
increment sent_txns. That will catch any plugin which does not call
plugin_set_txn(). The plugin may still call plugin_sent_txn() with
sent = true when it should have called it with sent = false or vice
versa, but that's hard to monitor and control.

Additionally, we should highlight in the document that sent_txns is as
per report from the output plugin so  that users know where to look
for in case they see a wrong/dubious value. I see this similar to what
we do with pg_stat_replication::reply_time which may be wrong if a
non-PG standby reports the wrong value. Documentation says "Send time
of last reply message received from standby server", so the users know
where to look for incase they spot the error.

Does that look good?

I am open to other suggestions.

> Shouldn't the plugin specific stats be shown differently, for example,
> one may be interested in how much plugin has filtered the data because
> it was not published or because something like row_filter caused it
> skip sending such data?
>

That looks useful, we could track the ReorderBufferChange's that were
not sent downstream and add their sizes to another counter
ReorderBuffer::filtered_bytes and report it in
pg_stat_replication_slots. I think we will need to devise a mechanism
similar to above by which the plugin tells core whether a change has
been filtered or not. However, that will not be a replacement for
sent_bytes, since filtered_bytes or total_bytes - filtered_bytes won't
tell us how much data was sent downstream, which is crucial to the
purpose stated in my earlier email.

--
Best Wishes,
Ashutosh Bapat

Re: Report bytes and transactions actually sent downtream

От

Amit Kapila

Дата:

13 июля 2025 г., 14:04:14

On Tue, Jul 1, 2025 at 7:35 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Tue, Jul 1, 2025 at 4:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, Jun 30, 2025 at 3:24 PM Ashutosh Bapat
> > <ashutosh.bapat.oss@gmail.com> wrote:
> > >
> > > Hi All,
> > > In a recent logical replication issue, there were multiple replication
> > > slots involved, each using a different publication. Thus the amount of
> > > data that was replicated through each slot was expected to be
> > > different. However, total_bytes and total_txns were reported the same
> > > for all the replication slots as expected. One of the slots started
> > > lagging and we were trying to figure out whether its the WAL sender
> > > slowing down or the consumer  (in this case Debezium). The lagging
> > > slot then showed total_txns and total_bytes lesser than other slots
> > > giving an impression that the WAL sender is processing the data
> > > slowly. Had pg_stat_replication_slot reported the amount of data
> > > actually sent downstream, it would have been easier to compare it with
> > > the amount of data received by the consumer and thus pinpoint the
> > > bottleneck.
> > >
> > > Here's a patch to do the same. It adds two columns
> > >     - sent_txns: The total number of transactions sent downstream.
> > >     - sent_bytes: The total number of bytes sent downstream in data messages
> > > to pg_stat_replication_slots. sent_bytes includes only the bytes sent
> > > as part of 'd' messages and does not include keep alive messages or
> > > CopyDone messages for example. But those are very few and can be
> > > ignored. If others feel that those are important to be included, we
> > > can make that change.
> > >
> > > Plugins may choose not to send an empty transaction downstream. It's
> > > better to increment sent_txns counter in the plugin code when it
> > > actually sends a BEGIN message, for example in pgoutput_send_begin()
> > > and pg_output_begin(). This means that every plugin will need to be
> > > modified to increment the counter for it to reported correctly.
> > >
> >
> > What if some plugin didn't implemented it or does it incorrectly?
> > Users will then complain that PG view is showing incorrect value.
>
> That is right.
>
> To fix the problem of plugins not implementing the counter increment
> logic we could use logic similar to how we track whether
> OutputPluginPrepareWrite() has been called or not. In
> ReorderBufferTxn, we add a new member sent_status which would be an
> enum with 3 values UNKNOWN, SENT, NOT_SENT. Initially the sent_status
> = UNKNOWN. We provide a function called
> plugin_sent_txn(ReorderBufferTxn txn, sent bool) which will set
> sent_status = SENT when sent = true and sent_status = NOT_SENT when
> sent = false. In all the end transaction callback wrappers like
> commit_cb_wrapper(), prepare_cb_wrapper(), stream_abort_cb_wrapper(),
> stream_commit_cb_wrapper() and stream_prepare_cb_wrapper(), if
> tsent_status = UNKNOWN, we throw an error.
>

I think we don't want to make it mandatory for plugins to implement
these stats, so instead of throwing ERROR, the view should show that
the plugin doesn't provide stats. How about having OutputPluginStats
similar to OutputPluginCallbacks and OutputPluginOptions members in
LogicalDecodingContext? It will have members like stats_available,
txns_sent or txns_skipped, txns_filtered, etc. I am thinking it will
be better to provide this information in a separate view like
pg_stat_plugin_stats or something like that, here we can report
slot_name, plugin_name, then the other stats we want to implement part
of OutputPluginStats.

--
With Regards,
Amit Kapila.

Re: Report bytes and transactions actually sent downtream

От

Ashutosh Bapat

Дата:

14 июля 2025 г., 08:24:58

On Sun, Jul 13, 2025 at 4:34 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Jul 1, 2025 at 7:35 PM Ashutosh Bapat
> <ashutosh.bapat.oss@gmail.com> wrote:
> >
> > On Tue, Jul 1, 2025 at 4:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Mon, Jun 30, 2025 at 3:24 PM Ashutosh Bapat
> > > <ashutosh.bapat.oss@gmail.com> wrote:
> > > >
> > > > Hi All,
> > > > In a recent logical replication issue, there were multiple replication
> > > > slots involved, each using a different publication. Thus the amount of
> > > > data that was replicated through each slot was expected to be
> > > > different. However, total_bytes and total_txns were reported the same
> > > > for all the replication slots as expected. One of the slots started
> > > > lagging and we were trying to figure out whether its the WAL sender
> > > > slowing down or the consumer  (in this case Debezium). The lagging
> > > > slot then showed total_txns and total_bytes lesser than other slots
> > > > giving an impression that the WAL sender is processing the data
> > > > slowly. Had pg_stat_replication_slot reported the amount of data
> > > > actually sent downstream, it would have been easier to compare it with
> > > > the amount of data received by the consumer and thus pinpoint the
> > > > bottleneck.
> > > >
> > > > Here's a patch to do the same. It adds two columns
> > > >     - sent_txns: The total number of transactions sent downstream.
> > > >     - sent_bytes: The total number of bytes sent downstream in data messages
> > > > to pg_stat_replication_slots. sent_bytes includes only the bytes sent
> > > > as part of 'd' messages and does not include keep alive messages or
> > > > CopyDone messages for example. But those are very few and can be
> > > > ignored. If others feel that those are important to be included, we
> > > > can make that change.
> > > >
> > > > Plugins may choose not to send an empty transaction downstream. It's
> > > > better to increment sent_txns counter in the plugin code when it
> > > > actually sends a BEGIN message, for example in pgoutput_send_begin()
> > > > and pg_output_begin(). This means that every plugin will need to be
> > > > modified to increment the counter for it to reported correctly.
> > > >
> > >
> > > What if some plugin didn't implemented it or does it incorrectly?
> > > Users will then complain that PG view is showing incorrect value.
> >
> > That is right.
> >
> > To fix the problem of plugins not implementing the counter increment
> > logic we could use logic similar to how we track whether
> > OutputPluginPrepareWrite() has been called or not. In
> > ReorderBufferTxn, we add a new member sent_status which would be an
> > enum with 3 values UNKNOWN, SENT, NOT_SENT. Initially the sent_status
> > = UNKNOWN. We provide a function called
> > plugin_sent_txn(ReorderBufferTxn txn, sent bool) which will set
> > sent_status = SENT when sent = true and sent_status = NOT_SENT when
> > sent = false. In all the end transaction callback wrappers like
> > commit_cb_wrapper(), prepare_cb_wrapper(), stream_abort_cb_wrapper(),
> > stream_commit_cb_wrapper() and stream_prepare_cb_wrapper(), if
> > tsent_status = UNKNOWN, we throw an error.
> >
>
> I think we don't want to make it mandatory for plugins to implement
> these stats, so instead of throwing ERROR, the view should show that
> the plugin doesn't provide stats. How about having OutputPluginStats
> similar to OutputPluginCallbacks and OutputPluginOptions members in
> LogicalDecodingContext? It will have members like stats_available,
> txns_sent or txns_skipped, txns_filtered, etc.

Not making mandatory looks useful. I can try your suggestion. Rather
than having stats_available as a member of OutputPluginStats, it's
better to have a NULL value for the corresponding member in
LogicalDecodingContext. We don't want an output plugin to reset
stats_available once set. Will that work?

> I am thinking it will
> be better to provide this information in a separate view like
> pg_stat_plugin_stats or something like that, here we can report
> slot_name, plugin_name, then the other stats we want to implement part
> of OutputPluginStats.

As you have previously pointed out, the view should make it explicit
that the new stats are maintained by the plugin and not core. I agree
with that intention. However, already have three views
pg_replication_slots (which has slot name and plugin name), then
pg_replication_stats which is about stats maintained by a WAL sender
or running replication and then pg_stat_replication_slots, which is
about accumulated statistics for a replication through a given
replication slot. It's already a bit hard to keep track of who's who
when debugging an issue. Adding one more view will add to confusion.

Instead of adding a new view how about
a. name the columns as plugin_sent_txns, plugin_sent_bytes,
plugin_filtered_change_bytes to make it clear that these columns are
maintained by plugin
b. report these NULL if stats_available = false OR OutputPluginStats
is not set in LogicalDecodingContext
c. Document that NULL value for these columns indicates that the
plugin is not maintaining/reporting these stats
d. adding plugin name to pg_stat_replication_slots, that will make it
easy for users to know which plugin they should look at in case of
dubious or unavailable stats

--
Best Wishes,
Ashutosh Bapat

Re: Report bytes and transactions actually sent downtream

От

Amit Kapila

Дата:

14 июля 2025 г., 13:00:57

On Mon, Jul 14, 2025 at 10:55 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Sun, Jul 13, 2025 at 4:34 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> >
> > I think we don't want to make it mandatory for plugins to implement
> > these stats, so instead of throwing ERROR, the view should show that
> > the plugin doesn't provide stats. How about having OutputPluginStats
> > similar to OutputPluginCallbacks and OutputPluginOptions members in
> > LogicalDecodingContext? It will have members like stats_available,
> > txns_sent or txns_skipped, txns_filtered, etc.
>
> Not making mandatory looks useful. I can try your suggestion. Rather
> than having stats_available as a member of OutputPluginStats, it's
> better to have a NULL value for the corresponding member in
> LogicalDecodingContext. We don't want an output plugin to reset
> stats_available once set. Will that work?
>

We can try that.

> > I am thinking it will
> > be better to provide this information in a separate view like
> > pg_stat_plugin_stats or something like that, here we can report
> > slot_name, plugin_name, then the other stats we want to implement part
> > of OutputPluginStats.
>
> As you have previously pointed out, the view should make it explicit
> that the new stats are maintained by the plugin and not core. I agree
> with that intention. However, already have three views
> pg_replication_slots (which has slot name and plugin name), then
> pg_replication_stats which is about stats maintained by a WAL sender
> or running replication and then pg_stat_replication_slots, which is
> about accumulated statistics for a replication through a given
> replication slot. It's already a bit hard to keep track of who's who
> when debugging an issue. Adding one more view will add to confusion.
>
> Instead of adding a new view how about
> a. name the columns as plugin_sent_txns, plugin_sent_bytes,
> plugin_filtered_change_bytes to make it clear that these columns are
> maintained by plugin
> b. report these NULL if stats_available = false OR OutputPluginStats
> is not set in LogicalDecodingContext
> c. Document that NULL value for these columns indicates that the
> plugin is not maintaining/reporting these stats
> d. adding plugin name to pg_stat_replication_slots, that will make it
> easy for users to know which plugin they should look at in case of
> dubious or unavailable stats
>

Sounds reasonable.

--
With Regards,
Amit Kapila.

Re: Report bytes and transactions actually sent downtream

От

Ashutosh Bapat

Дата:

24 июля 2025 г., 09:54:26

Hi Amit,

On Mon, Jul 14, 2025 at 3:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Jul 14, 2025 at 10:55 AM Ashutosh Bapat
> <ashutosh.bapat.oss@gmail.com> wrote:
> >
> > On Sun, Jul 13, 2025 at 4:34 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > >
> > > I think we don't want to make it mandatory for plugins to implement
> > > these stats, so instead of throwing ERROR, the view should show that
> > > the plugin doesn't provide stats. How about having OutputPluginStats
> > > similar to OutputPluginCallbacks and OutputPluginOptions members in
> > > LogicalDecodingContext? It will have members like stats_available,
> > > txns_sent or txns_skipped, txns_filtered, etc.
> >
> > Not making mandatory looks useful. I can try your suggestion. Rather
> > than having stats_available as a member of OutputPluginStats, it's
> > better to have a NULL value for the corresponding member in
> > LogicalDecodingContext. We don't want an output plugin to reset
> > stats_available once set. Will that work?
> >
>
> We can try that.
>
> > > I am thinking it will
> > > be better to provide this information in a separate view like
> > > pg_stat_plugin_stats or something like that, here we can report
> > > slot_name, plugin_name, then the other stats we want to implement part
> > > of OutputPluginStats.
> >
> > As you have previously pointed out, the view should make it explicit
> > that the new stats are maintained by the plugin and not core. I agree
> > with that intention. However, already have three views
> > pg_replication_slots (which has slot name and plugin name), then
> > pg_replication_stats which is about stats maintained by a WAL sender
> > or running replication and then pg_stat_replication_slots, which is
> > about accumulated statistics for a replication through a given
> > replication slot. It's already a bit hard to keep track of who's who
> > when debugging an issue. Adding one more view will add to confusion.
> >
> > Instead of adding a new view how about
> > a. name the columns as plugin_sent_txns, plugin_sent_bytes,
> > plugin_filtered_change_bytes to make it clear that these columns are
> > maintained by plugin
> > b. report these NULL if stats_available = false OR OutputPluginStats
> > is not set in LogicalDecodingContext
> > c. Document that NULL value for these columns indicates that the
> > plugin is not maintaining/reporting these stats
> > d. adding plugin name to pg_stat_replication_slots, that will make it
> > easy for users to know which plugin they should look at in case of
> > dubious or unavailable stats
> >
>
> Sounds reasonable.

Here's the next patch which considers all the discussion so far. It
adds four fields to pg_stat_replication_slots.
    - plugin - name of the output plugin
    - plugin_filtered_bytes - reports the amount of changes filtered
out by the output plugin
    - plugin_sent_txns - the amount of transactions sent downstream by
the output plugin
    - plugin_sent_bytes - the amount of data sent downstream by the
outputplugin.

There are some points up for a discussion:
1. pg_stat_reset_replication_slot() zeroes out the statistics entry by
calling pgstat_reset() or pgstat_reset_of_kind() which don't know
about the contents of the entry. So
PgStat_StatReplSlotEntry::plugin_has_stats is set to false and plugin
stats are reported as NULL, instead of zero, immediately after reset.
This is the same case when the stats is queried immediately after the
statistics is initialized and before any stats are reported. We could
instead make it report
zero, if we save the plugin_has_stats and restore it after reset. But
doing that in pgstat_reset_of_kind() seems like an extra overhead + we
will need to write a function to find all replication slot entries.
Resetting the stats would be a rare event in practice. Trying to
report 0 instead of NULL in that rare case doesn't seem to be worth
the efforts and code. Given that the core code doesn't know whether a
given plugin reports stats or not, I think this behaviour is
appropriate as long as we document it. Please let me know if the
documentation in the patch is clear enough.

2. There's also a bit of asymmetry in the way sent_bytes is handled.
The code which actually sends the logical changes to the downstream is
part of the core code
but the format of the change and hence the number of bytes sent is
decided by the plugin. It's a stat related to plugin but maintained by
the core code. The patch implements it as a plugin stat (so the
corresponding column has "plugin" prefix and is also reported as NULL
upon reset etc.), but we may want to reconsider how to report and
maintain it.

3. The names of new columns have the prefix "plugin_" but the internal
variables tracking those don't for the sake of brevity. If you prefer
to have the same prefix for the internal variables, I can change that.

I think I have covered all the cases where filteredbytes should be
incremented, but please let me know if I have missed any.

--
Best Wishes,
Ashutosh Bapat

Вложения

0001-Report-output-plugin-statistics-in-pg_stat_-20250724.patch

Re: Report bytes and transactions actually sent downtream

От

Bertrand Drouvot

Дата:

27 августа 2025 г., 16:43:49

Hi,

On Thu, Jul 24, 2025 at 12:24:26PM +0530, Ashutosh Bapat wrote:
> Here's the next patch which considers all the discussion so far. It
> adds four fields to pg_stat_replication_slots.
>     - plugin - name of the output plugin

Is this one needed? (we could get it with a join on pg_replication_slots)

>     - plugin_filtered_bytes - reports the amount of changes filtered
> out by the output plugin
>     - plugin_sent_txns - the amount of transactions sent downstream by
> the output plugin
>     - plugin_sent_bytes - the amount of data sent downstream by the
> outputplugin.
> 
> There are some points up for a discussion:
> 1. pg_stat_reset_replication_slot() zeroes out the statistics entry by
> calling pgstat_reset() or pgstat_reset_of_kind() which don't know
> about the contents of the entry. So
> PgStat_StatReplSlotEntry::plugin_has_stats is set to false and plugin
> stats are reported as NULL, instead of zero, immediately after reset.
> This is the same case when the stats is queried immediately after the
> statistics is initialized and before any stats are reported. We could
> instead make it report
> zero, if we save the plugin_has_stats and restore it after reset. But
> doing that in pgstat_reset_of_kind() seems like an extra overhead + we
> will need to write a function to find all replication slot entries.

Could we store plugin_has_stats in ReplicationSlotPersistentData instead? That
way it would not be reset. We would need to access ReplicationSlotPersistentData
in pg_stat_get_replication_slot though.

Also would that make sense to expose plugin_has_stats in pg_replication_slots?

> 2. There's also a bit of asymmetry in the way sent_bytes is handled.
> The code which actually sends the logical changes to the downstream is
> part of the core code
> but the format of the change and hence the number of bytes sent is
> decided by the plugin. It's a stat related to plugin but maintained by
> the core code. The patch implements it as a plugin stat (so the
> corresponding column has "plugin" prefix

The way it is done makes sense to me.
 
> 3. The names of new columns have the prefix "plugin_" but the internal
> variables tracking those don't for the sake of brevity. If you prefer
> to have the same prefix for the internal variables, I can change that.

Just my taste: I do prefer when they match.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Report bytes and transactions actually sent downtream

От

shveta malik

Дата:

18 сентября 2025 г., 08:21:50

On Wed, Aug 27, 2025 at 7:14 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On Thu, Jul 24, 2025 at 12:24:26PM +0530, Ashutosh Bapat wrote:
> > Here's the next patch which considers all the discussion so far. It
> > adds four fields to pg_stat_replication_slots.
> >     - plugin - name of the output plugin
>
> Is this one needed? (we could get it with a join on pg_replication_slots)
>

In my opinion, when there are other plugin_* fields present, including
the plugin name directly here seems like a better approach. So, +1 for
the plugin field.

> >     - plugin_filtered_bytes - reports the amount of changes filtered
> > out by the output plugin
> >     - plugin_sent_txns - the amount of transactions sent downstream by
> > the output plugin
> >     - plugin_sent_bytes - the amount of data sent downstream by the
> > outputplugin.
> >
> > There are some points up for a discussion:
> > 1. pg_stat_reset_replication_slot() zeroes out the statistics entry by
> > calling pgstat_reset() or pgstat_reset_of_kind() which don't know
> > about the contents of the entry. So
> > PgStat_StatReplSlotEntry::plugin_has_stats is set to false and plugin
> > stats are reported as NULL, instead of zero, immediately after reset.
> > This is the same case when the stats is queried immediately after the
> > statistics is initialized and before any stats are reported. We could
> > instead make it report
> > zero, if we save the plugin_has_stats and restore it after reset. But
> > doing that in pgstat_reset_of_kind() seems like an extra overhead + we
> > will need to write a function to find all replication slot entries.

I tried to think of an approach where we can differentiate between the
cases 'not initialized' and 'reset' ones with the values. Say instead
of plugin_has_stats, if we have plugin_stats_status, then we can
maintain status like -1(not initialized), 0(reset). But this too will
complicate the code further. Personally, I’m okay with NULL values
appearing even after a reset, especially since the documentation
explains this clearly.

>
> > 2. There's also a bit of asymmetry in the way sent_bytes is handled.
> > The code which actually sends the logical changes to the downstream is
> > part of the core code
> > but the format of the change and hence the number of bytes sent is
> > decided by the plugin. It's a stat related to plugin but maintained by
> > the core code. The patch implements it as a plugin stat (so the
> > corresponding column has "plugin" prefix
>
> The way it is done makes sense to me.
>
> > 3. The names of new columns have the prefix "plugin_" but the internal
> > variables tracking those don't for the sake of brevity. If you prefer
> > to have the same prefix for the internal variables, I can change that.
>

I am okay either way.

Few comments:

1)
postgres=# select slot_name,
total_bytes,plugin_filtered_bytes,plugin_sent_bytes  from
pg_stat_replication_slots order by slot_name;
 slot_name | total_bytes | plugin_filtered_bytes | plugin_sent_bytes
-----------+-------------+-----------------------+-------------------
 slot1     |      800636 |                793188 |               211
 sub1      |      401496 |                132712 |             84041
 sub2      |      401496 |                396184 |               674
 sub3      |      401496 |                145912 |             79959
(4 rows)

Currently it looks quite confusing. 'total_bytes' gives a sense that
it has to be a sum of filtered and sent. But they are no way like
that. In the thread earlier there was a proposal to change the name to
reordered_txns, reordered_bytes. That looks better to me. It will give
clarity without even someone digging into docs.

2)
Tried to verify all filtered data tests, seems to work well. Also  I
tried tracking the usage of OutputPluginWrite() to see if there is any
other place where data needs to be considered as filtered-data.
Encountered this:

send_relation_and_attrs has:
                if (!logicalrep_should_publish_column(att, columns,

                   include_gencols_type))
                        continue;
                if (att->atttypid < FirstGenbkiObjectId)
                        continue;

But I don't think it needs to be considered as filtered data. This is
mostly schema related info. But I wanted to confirm once. Thoughts?

3)
+-- total_txns may vary based on the background activity but sent_txns should
+-- always be 1 since the background transactions are always skipped. Filtered
+-- bytes would be set only when there's a change that was passed to the plugin
+-- but was filtered out. Depending upon the background transactions, filtered
+-- bytes may or may not be zero.
+SELECT slot_name, spill_txns = 0 AS spill_txns, spill_count = 0 AS
spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS
total_bytes, plugin_sent_txns, plugin_sent_bytes > 0 AS sent_bytes,
plugin_filtered_bytes >= 0 AS filtered_bytes FROM
pg_stat_replication_slots ORDER BY slot_name;

In comment either we can say plugin_sent_txns instead of sent_txns or
in the query we can fetch plugin_sent_txns AS  sent_txns, so that we
can relate comment and query.


4)
+      <literal>sentTxns</literal> is the number of transactions sent downstream
+      by the output plugin. <literal>sentBytes</literal> is the amount of data
+      sent downstream by the output plugin.
+      <function>OutputPluginWrite</function> is expected to update this counter
+      if <literal>ctx->stats</literal> is initialized by the output plugin.
+      <literal>filteredBytes</literal> is the size of changes in bytes that are
+      filtered out by the output plugin. Function
+      <literal>ReorderBufferChangeSize</literal> may be used to find
the size of
+      filtered <literal>ReorderBufferChange</literal>.
+     </para>

Either we can mention units as 'bytes' for both filteredBytes and
sentBytes or for none. Currently filteredBytes says 'in bytes' while
sentBytes does not.

thanks
Shveta

Re: Report bytes and transactions actually sent downtream

От

Ashutosh Bapat

Дата:

18 сентября 2025 г., 13:24:32

Hi Shveta, Bertrand,

Replying to both of your review comments together.

On Thu, Sep 18, 2025 at 10:52 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Wed, Aug 27, 2025 at 7:14 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > Hi,
> >
> > On Thu, Jul 24, 2025 at 12:24:26PM +0530, Ashutosh Bapat wrote:
> > > Here's the next patch which considers all the discussion so far. It
> > > adds four fields to pg_stat_replication_slots.
> > >     - plugin - name of the output plugin
> >
> > Is this one needed? (we could get it with a join on pg_replication_slots)
> >
>
> In my opinion, when there are other plugin_* fields present, including
> the plugin name directly here seems like a better approach. So, +1 for
> the plugin field.

Yeah. I think so too.

>
> > >     - plugin_filtered_bytes - reports the amount of changes filtered
> > > out by the output plugin
> > >     - plugin_sent_txns - the amount of transactions sent downstream by
> > > the output plugin
> > >     - plugin_sent_bytes - the amount of data sent downstream by the
> > > outputplugin.
> > >
> > > There are some points up for a discussion:
> > > 1. pg_stat_reset_replication_slot() zeroes out the statistics entry by
> > > calling pgstat_reset() or pgstat_reset_of_kind() which don't know
> > > about the contents of the entry. So
> > > PgStat_StatReplSlotEntry::plugin_has_stats is set to false and plugin
> > > stats are reported as NULL, instead of zero, immediately after reset.
> > > This is the same case when the stats is queried immediately after the
> > > statistics is initialized and before any stats are reported. We could
> > > instead make it report
> > > zero, if we save the plugin_has_stats and restore it after reset. But
> > > doing that in pgstat_reset_of_kind() seems like an extra overhead + we
> > > will need to write a function to find all replication slot entries.
>
> I tried to think of an approach where we can differentiate between the
> cases 'not initialized' and 'reset' ones with the values. Say instead
> of plugin_has_stats, if we have plugin_stats_status, then we can
> maintain status like -1(not initialized), 0(reset). But this too will
> complicate the code further. Personally, I’m okay with NULL values
> appearing even after a reset, especially since the documentation
> explains this clearly.

Ok. Great.

>
> > Could we store plugin_has_stats in ReplicationSlotPersistentData instead? That
> > way it would not be reset. We would need to access ReplicationSlotPersistentData
> > in pg_stat_get_replication_slot though.
>
> > Also would that make sense to expose plugin_has_stats in pg_replication_slots?
>

A plugin may change its decision to support the stats across versions,
we won't be able to tell when it changes that decision and thus
reflect it accurately in ReplicationSlotPersistentData. Doing it in
startup gives the opportunity to the plugin to change it as often as
it wants OR even based on some plugin specific configurations. Further
ReplicationSlotPersistentData is maintained by the core. It will not
be a good place to store something plugin specific.

> >
> > > 2. There's also a bit of asymmetry in the way sent_bytes is handled.
> > > The code which actually sends the logical changes to the downstream is
> > > part of the core code
> > > but the format of the change and hence the number of bytes sent is
> > > decided by the plugin. It's a stat related to plugin but maintained by
> > > the core code. The patch implements it as a plugin stat (so the
> > > corresponding column has "plugin" prefix
> >
> > The way it is done makes sense to me.

Great.

> >
> > > 3. The names of new columns have the prefix "plugin_" but the internal
> > > variables tracking those don't for the sake of brevity. If you prefer
> > > to have the same prefix for the internal variables, I can change that.
> >
>
> I am okay either way.
>
> > Just my taste: I do prefer when they match.

I don't see a strong preference to change what's there in the patch.
Let's wait for more reviews.

>
> Few comments:
>
> 1)
> postgres=# select slot_name,
> total_bytes,plugin_filtered_bytes,plugin_sent_bytes  from
> pg_stat_replication_slots order by slot_name;
>  slot_name | total_bytes | plugin_filtered_bytes | plugin_sent_bytes
> -----------+-------------+-----------------------+-------------------
>  slot1     |      800636 |                793188 |               211
>  sub1      |      401496 |                132712 |             84041
>  sub2      |      401496 |                396184 |               674
>  sub3      |      401496 |                145912 |             79959
> (4 rows)
>
> Currently it looks quite confusing. 'total_bytes' gives a sense that
> it has to be a sum of filtered and sent. But they are no way like
> that. In the thread earlier there was a proposal to change the name to
> reordered_txns, reordered_bytes. That looks better to me. It will give
> clarity without even someone digging into docs.

I also agree with that. But that will break backward compatibility. Do
you think other columns like spill_* and stream_* should also be
renamed with the prefix "reordered"?

>
> 2)
> Tried to verify all filtered data tests, seems to work well. Also  I
> tried tracking the usage of OutputPluginWrite() to see if there is any
> other place where data needs to be considered as filtered-data.
> Encountered this:
>
> send_relation_and_attrs has:
>                 if (!logicalrep_should_publish_column(att, columns,
>
>                    include_gencols_type))
>                         continue;
>                 if (att->atttypid < FirstGenbkiObjectId)
>                         continue;
>
> But I don't think it needs to be considered as filtered data. This is
> mostly schema related info. But I wanted to confirm once. Thoughts?

Yeah. It's part of metadata which in turn is sent only when needed.
It's not part of, say, transaction changes. So it can't be considered
as filtering.

>
> 3)
> +-- total_txns may vary based on the background activity but sent_txns should
> +-- always be 1 since the background transactions are always skipped. Filtered
> +-- bytes would be set only when there's a change that was passed to the plugin
> +-- but was filtered out. Depending upon the background transactions, filtered
> +-- bytes may or may not be zero.
> +SELECT slot_name, spill_txns = 0 AS spill_txns, spill_count = 0 AS
> spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS
> total_bytes, plugin_sent_txns, plugin_sent_bytes > 0 AS sent_bytes,
> plugin_filtered_bytes >= 0 AS filtered_bytes FROM
> pg_stat_replication_slots ORDER BY slot_name;
>
> In comment either we can say plugin_sent_txns instead of sent_txns or
> in the query we can fetch plugin_sent_txns AS  sent_txns, so that we
> can relate comment and query.
>

Used plugin_sent_txns in the comment as well as query.

>
> 4)
> +      <literal>sentTxns</literal> is the number of transactions sent downstream
> +      by the output plugin. <literal>sentBytes</literal> is the amount of data
> +      sent downstream by the output plugin.
> +      <function>OutputPluginWrite</function> is expected to update this counter
> +      if <literal>ctx->stats</literal> is initialized by the output plugin.
> +      <literal>filteredBytes</literal> is the size of changes in bytes that are
> +      filtered out by the output plugin. Function
> +      <literal>ReorderBufferChangeSize</literal> may be used to find
> the size of
> +      filtered <literal>ReorderBufferChange</literal>.
> +     </para>
>
> Either we can mention units as 'bytes' for both filteredBytes and
> sentBytes or for none. Currently filteredBytes says 'in bytes' while
> sentBytes does not.

Used 'in bytes' in both the places.

Thanks for your review. I will include these changes in the next set of patches.

--
Best Wishes,
Ashutosh Bapat

Re: Report bytes and transactions actually sent downtream

От

shveta malik

Дата:

19 сентября 2025 г., 09:18:18

On Thu, Sep 18, 2025 at 3:54 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
>
> >
> > Few comments:
> >
> > 1)
> > postgres=# select slot_name,
> > total_bytes,plugin_filtered_bytes,plugin_sent_bytes  from
> > pg_stat_replication_slots order by slot_name;
> >  slot_name | total_bytes | plugin_filtered_bytes | plugin_sent_bytes
> > -----------+-------------+-----------------------+-------------------
> >  slot1     |      800636 |                793188 |               211
> >  sub1      |      401496 |                132712 |             84041
> >  sub2      |      401496 |                396184 |               674
> >  sub3      |      401496 |                145912 |             79959
> > (4 rows)
> >
> > Currently it looks quite confusing. 'total_bytes' gives a sense that
> > it has to be a sum of filtered and sent. But they are no way like
> > that. In the thread earlier there was a proposal to change the name to
> > reordered_txns, reordered_bytes. That looks better to me. It will give
> > clarity without even someone digging into docs.
>
> I also agree with that. But that will break backward compatibility.

Yes, that it will do.

> Do
> you think other columns like spill_* and stream_* should also be
> renamed with the prefix "reordered"?
>

Okay, I see that all fields in pg_stat_replication_slots are related
to the ReorderBuffer. On reconsideration, I’m unsure whether it's
appropriate to prefix all of them with reorderd_. For example,
renaming spill_bytes and stream_bytes to reordered_spill_bytes and
reordered_stream_bytes. These names start to feel overly long, and I
also noticed that ReorderBuffer isn’t clearly defined anywhere in the
documentation (or at least I couldn’t find it), even though the term
'reorder buffer' does appear in a few places.

As an example, see ReorderBufferRead, ReorderBufferWrite  wait-types
at [1]. Also in plugin-doc [2], we use 'ReorderBufferTXN'. And now, we
are adding: ReorderBufferChangeSize, ReorderBufferChange

This gives me a feeling, will it be better to let
pg_stat_replication_slots as is and add a brief ReorderBuffer section
under Logical Decoding concepts [3] just before Output Plugins. And
then, pg_stat_replication_slots can refer to that section, clarifying
that the bytes, counts, and txn fields pertain to ReorderBuffer
(without changing any of the fields).

And then to define plugin related data, we can have a new view, say
pg_stat_plugin_stats (as Amit suggested earlier) or
pg_stat_replication_plugins. I understand that adding a new view might
not be desirable, but it provides better clarity without requiring
changes to the existing fields in pg_stat_replication_slots. I also
strongly feel that to properly tie all this information together, a
brief definition of the ReorderBuffer is needed. Other pages that
reference this term can then point to that section. Thoughts?

[1]: https://www.postgresql.org/docs/17/monitoring-stats.html#WAIT-EVENT-IO-TABLE
[2]: https://www.postgresql.org/docs/17/logicaldecoding-output-plugin.html
[3]: https://www.postgresql.org/docs/17/logicaldecoding-explanation.html#LOGICALDECODING-EXPLANATION

thanks
Shveta

Re: Report bytes and transactions actually sent downtream

От

Ashutosh Bapat

Дата:

19 сентября 2025 г., 17:41:23

On Fri, Sep 19, 2025 at 11:48 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Thu, Sep 18, 2025 at 3:54 PM Ashutosh Bapat
> <ashutosh.bapat.oss@gmail.com> wrote:
> >
> >
> > >
> > > Few comments:
> > >
> > > 1)
> > > postgres=# select slot_name,
> > > total_bytes,plugin_filtered_bytes,plugin_sent_bytes  from
> > > pg_stat_replication_slots order by slot_name;
> > >  slot_name | total_bytes | plugin_filtered_bytes | plugin_sent_bytes
> > > -----------+-------------+-----------------------+-------------------
> > >  slot1     |      800636 |                793188 |               211
> > >  sub1      |      401496 |                132712 |             84041
> > >  sub2      |      401496 |                396184 |               674
> > >  sub3      |      401496 |                145912 |             79959
> > > (4 rows)
> > >
> > > Currently it looks quite confusing. 'total_bytes' gives a sense that
> > > it has to be a sum of filtered and sent. But they are no way like
> > > that. In the thread earlier there was a proposal to change the name to
> > > reordered_txns, reordered_bytes. That looks better to me. It will give
> > > clarity without even someone digging into docs.
> >
> > I also agree with that. But that will break backward compatibility.
>
> Yes, that it will do.
>
> > Do
> > you think other columns like spill_* and stream_* should also be
> > renamed with the prefix "reordered"?
> >
>
> Okay, I see that all fields in pg_stat_replication_slots are related
> to the ReorderBuffer. On reconsideration, I’m unsure whether it's
> appropriate to prefix all of them with reorderd_. For example,
> renaming spill_bytes and stream_bytes to reordered_spill_bytes and
> reordered_stream_bytes. These names start to feel overly long, and I
> also noticed that ReorderBuffer isn’t clearly defined anywhere in the
> documentation (or at least I couldn’t find it), even though the term
> 'reorder buffer' does appear in a few places.
>
> As an example, see ReorderBufferRead, ReorderBufferWrite  wait-types
> at [1]. Also in plugin-doc [2], we use 'ReorderBufferTXN'. And now, we
> are adding: ReorderBufferChangeSize, ReorderBufferChange
>
> This gives me a feeling, will it be better to let
> pg_stat_replication_slots as is and add a brief ReorderBuffer section
> under Logical Decoding concepts [3] just before Output Plugins. And
> then, pg_stat_replication_slots can refer to that section, clarifying
> that the bytes, counts, and txn fields pertain to ReorderBuffer
> (without changing any of the fields).
>
> And then to define plugin related data, we can have a new view, say
> pg_stat_plugin_stats (as Amit suggested earlier) or
> pg_stat_replication_plugins. I understand that adding a new view might
> not be desirable, but it provides better clarity without requiring
> changes to the existing fields in pg_stat_replication_slots. I also
> strongly feel that to properly tie all this information together, a
> brief definition of the ReorderBuffer is needed. Other pages that
> reference this term can then point to that section. Thoughts?

Even if we keep two views, when they are joined, users will still get
confused by total_* names. So it's not solving the underlying problem.
Andres had raised the point about renaming total_* fields with me
off-list earlier. He suggested names total_wal_bytes, and
total_wal_txns in an off list discussion today. I think those convey
the true meaning - that these are txns and bytes that come from WAL.
Used those in the attached patches. Prefix reordered would give away
lower level details, so I didn't use it.

I agree that it would be good to mention ReorderBuffer in the logical
decoding concepts section since it mentions structures ReorderBuffer*.
But that would be a separate patch since we aren't using "reordered"
in the names of the fields.

0001 is the previous patch
0002 changes addressing your and Bertrand's comments.

--
Best Wishes,
Ashutosh Bapat

Вложения

Re: Report bytes and transactions actually sent downtream

От

shveta malik

Дата:

22 сентября 2025 г., 08:14:31

On Fri, Sep 19, 2025 at 8:11 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Fri, Sep 19, 2025 at 11:48 AM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Thu, Sep 18, 2025 at 3:54 PM Ashutosh Bapat
> > <ashutosh.bapat.oss@gmail.com> wrote:
> > >
> > >
> > > >
> > > > Few comments:
> > > >
> > > > 1)
> > > > postgres=# select slot_name,
> > > > total_bytes,plugin_filtered_bytes,plugin_sent_bytes  from
> > > > pg_stat_replication_slots order by slot_name;
> > > >  slot_name | total_bytes | plugin_filtered_bytes | plugin_sent_bytes
> > > > -----------+-------------+-----------------------+-------------------
> > > >  slot1     |      800636 |                793188 |               211
> > > >  sub1      |      401496 |                132712 |             84041
> > > >  sub2      |      401496 |                396184 |               674
> > > >  sub3      |      401496 |                145912 |             79959
> > > > (4 rows)
> > > >
> > > > Currently it looks quite confusing. 'total_bytes' gives a sense that
> > > > it has to be a sum of filtered and sent. But they are no way like
> > > > that. In the thread earlier there was a proposal to change the name to
> > > > reordered_txns, reordered_bytes. That looks better to me. It will give
> > > > clarity without even someone digging into docs.
> > >
> > > I also agree with that. But that will break backward compatibility.
> >
> > Yes, that it will do.
> >
> > > Do
> > > you think other columns like spill_* and stream_* should also be
> > > renamed with the prefix "reordered"?
> > >
> >
> > Okay, I see that all fields in pg_stat_replication_slots are related
> > to the ReorderBuffer. On reconsideration, I’m unsure whether it's
> > appropriate to prefix all of them with reorderd_. For example,
> > renaming spill_bytes and stream_bytes to reordered_spill_bytes and
> > reordered_stream_bytes. These names start to feel overly long, and I
> > also noticed that ReorderBuffer isn’t clearly defined anywhere in the
> > documentation (or at least I couldn’t find it), even though the term
> > 'reorder buffer' does appear in a few places.
> >
> > As an example, see ReorderBufferRead, ReorderBufferWrite  wait-types
> > at [1]. Also in plugin-doc [2], we use 'ReorderBufferTXN'. And now, we
> > are adding: ReorderBufferChangeSize, ReorderBufferChange
> >
> > This gives me a feeling, will it be better to let
> > pg_stat_replication_slots as is and add a brief ReorderBuffer section
> > under Logical Decoding concepts [3] just before Output Plugins. And
> > then, pg_stat_replication_slots can refer to that section, clarifying
> > that the bytes, counts, and txn fields pertain to ReorderBuffer
> > (without changing any of the fields).
> >
> > And then to define plugin related data, we can have a new view, say
> > pg_stat_plugin_stats (as Amit suggested earlier) or
> > pg_stat_replication_plugins. I understand that adding a new view might
> > not be desirable, but it provides better clarity without requiring
> > changes to the existing fields in pg_stat_replication_slots. I also
> > strongly feel that to properly tie all this information together, a
> > brief definition of the ReorderBuffer is needed. Other pages that
> > reference this term can then point to that section. Thoughts?
>
> Even if we keep two views, when they are joined, users will still get
> confused by total_* names. So it's not solving the underlying problem.

Okay, I see your point.

> Andres had raised the point about renaming total_* fields with me
> off-list earlier. He suggested names total_wal_bytes, and
> total_wal_txns in an off list discussion today. I think those convey
> the true meaning - that these are txns and bytes that come from WAL.

I agree.

> Used those in the attached patches. Prefix reordered would give away
> lower level details, so I didn't use it.
>
> I agree that it would be good to mention ReorderBuffer in the logical
> decoding concepts section since it mentions structures ReorderBuffer*.
> But that would be a separate patch since we aren't using "reordered"
> in the names of the fields.

Okay.

> 0001 is the previous patch
> 0002 changes addressing your and Bertrand's comments.
>

Few trivial comments:

1)
Currently the doc says:

sentTxns is the number of transactions sent downstream by the output
plugin. sentBytes is the amount of data, in bytes, sent downstream by
the output plugin. OutputPluginWrite will update this counter if
ctx->stats is initialized by the output plugin. filteredBytes is the
size of changes, in bytes, that are filtered out by the output plugin.
Function ReorderBufferChangeSize may be used to find the size of
filtered ReorderBufferChange.

Shall we rearrange it to:

sentTxns is the number of transactions sent downstream by the output
plugin. sentBytes is the amount of data, in bytes, sent downstream by
the output plugin. filteredBytes is the size of changes, in bytes,
that are filtered out by the output plugin. OutputPluginWrite will
update these counters if ctx->stats is initialized by the output
plugin.
The function ReorderBufferChangeSize can be used to compute the size
of a filtered ReorderBufferChange, i.e., the filteredBytes.

2)
My preference will be to rename the fields 'total_txns' and
'total_bytes' in PgStat_StatReplSlotEntry to 'total_wal_txns' and
'total_wal_bytes' for better clarity. Additionally, upon rethinking,
it seems better to me that plugin-related fields are also named as
plugin_* to clearly indicate their association. OTOH, in
OutputPluginStats, the field names are fine as is, since the structure
name itself clearly indicates these are plugin-related fields.
PgStat_StatReplSlotEntry lacks such context and thus using full
descriptive names there would improve clarity.

3)
LogicalOutputWrite:
+ if (ctx->stats)
+ ctx->stats->sentBytes += ctx->out->len + sizeof(XLogRecPtr) +
sizeof(TransactionId);
  p->returned_rows++;

A blank line after the new change will increase readability.

~~

In my testing, the patch works as expected. Thanks!

thanks
Shveta

Re: Report bytes and transactions actually sent downtream

От

Bertrand Drouvot

Дата:

23 сентября 2025 г., 09:44:31

Hi,

On Fri, Sep 19, 2025 at 08:11:23PM +0530, Ashutosh Bapat wrote:
> On Fri, Sep 19, 2025 at 11:48 AM shveta malik <shveta.malik@gmail.com> wrote:
> >
> 0001 is the previous patch
> 0002 changes addressing your and Bertrand's comments.

Thanks for the new patch version!

I did not look closely to the code yet but did some testing and I've one remark
regarding plugin_filtered_bytes: It looks ok when a publication is doing rows
filtering but when I:

- create a table and use pg_logical_slot_get_changes with ('skip-empty-xacts', '0')
then I see plugin_sent_bytes increasing (which makes sense).

- create a table and use pg_logical_slot_get_changes with ('skip-empty-xacts', '1')
then I don't see plugin_sent_bytes increasing (which makes sense) but I also don't
see plugin_filtered_bytes increasing. I think that would make sense to also increase
plugin_filtered_bytes in this case (and for the other options that would skip
sending data). Thoughts?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Report bytes and transactions actually sent downtream

От

Ashutosh Bapat

Дата:

23 сентября 2025 г., 13:36:33

On Mon, Sep 22, 2025 at 10:44 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> Few trivial comments:
>
> 1)
> Currently the doc says:
>
> sentTxns is the number of transactions sent downstream by the output
> plugin. sentBytes is the amount of data, in bytes, sent downstream by
> the output plugin. OutputPluginWrite will update this counter if
> ctx->stats is initialized by the output plugin. filteredBytes is the
> size of changes, in bytes, that are filtered out by the output plugin.
> Function ReorderBufferChangeSize may be used to find the size of
> filtered ReorderBufferChange.
>
> Shall we rearrange it to:
>
> sentTxns is the number of transactions sent downstream by the output
> plugin. sentBytes is the amount of data, in bytes, sent downstream by
> the output plugin. filteredBytes is the size of changes, in bytes,
> that are filtered out by the output plugin. OutputPluginWrite will
> update these counters if ctx->stats is initialized by the output
> plugin.
> The function ReorderBufferChangeSize can be used to compute the size
> of a filtered ReorderBufferChange, i.e., the filteredBytes.
>

Only sentBytes is incremented by OutputPluginWrite(), so saying that
it will update counters is not correct. But I think you intend to keep
description of all the fields together followed by any additional
information. How about the following
      <literal>sentTxns</literal> is the number of transactions sent downstream
      by the output plugin. <literal>sentBytes</literal> is the amount of data,
      in bytes, sent downstream by the output plugin.
      <literal>filteredBytes</literal> is the size of changes, in bytes, that
      are filtered out by the output plugin.
      <function>OutputPluginWrite</function> will update
      <literal>sentBytes</literal> if <literal>ctx->stats</literal> is
      initialized by the output plugin. Function
      <literal>ReorderBufferChangeSize</literal> may be used to find the size of
      filtered <literal>ReorderBufferChange</literal>.

> 2)
> My preference will be to rename the fields 'total_txns' and
> 'total_bytes' in PgStat_StatReplSlotEntry to 'total_wal_txns' and
> 'total_wal_bytes' for better clarity. Additionally, upon rethinking,
> it seems better to me that plugin-related fields are also named as
> plugin_* to clearly indicate their association. OTOH, in
> OutputPluginStats, the field names are fine as is, since the structure
> name itself clearly indicates these are plugin-related fields.
> PgStat_StatReplSlotEntry lacks such context and thus using full
> descriptive names there would improve clarity.

Ok. Done.

>
> 3)
> LogicalOutputWrite:
> + if (ctx->stats)
> + ctx->stats->sentBytes += ctx->out->len + sizeof(XLogRecPtr) +
> sizeof(TransactionId);
>   p->returned_rows++;
>
> A blank line after the new change will increase readability.
>

Ok.

> ~~
>
> In my testing, the patch works as expected. Thanks!

Thanks for testing. Can we include any of your tests in the patch? Are
the tests in patch enough?

Applied those suggestions in my repository. Do you have any further
review comments?

--
Best Wishes,
Ashutosh Bapat

Re: Report bytes and transactions actually sent downtream

От

Ashutosh Bapat

Дата:

23 сентября 2025 г., 13:45:14

On Tue, Sep 23, 2025 at 12:14 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On Fri, Sep 19, 2025 at 08:11:23PM +0530, Ashutosh Bapat wrote:
> > On Fri, Sep 19, 2025 at 11:48 AM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > 0001 is the previous patch
> > 0002 changes addressing your and Bertrand's comments.
>
> Thanks for the new patch version!
>
> I did not look closely to the code yet but did some testing and I've one remark
> regarding plugin_filtered_bytes: It looks ok when a publication is doing rows
> filtering but when I:
>
> - create a table and use pg_logical_slot_get_changes with ('skip-empty-xacts', '0')
> then I see plugin_sent_bytes increasing (which makes sense).
>
> - create a table and use pg_logical_slot_get_changes with ('skip-empty-xacts', '1')
> then I don't see plugin_sent_bytes increasing (which makes sense) but I also don't
> see plugin_filtered_bytes increasing. I think that would make sense to also increase
> plugin_filtered_bytes in this case (and for the other options that would skip
> sending data). Thoughts?

Thanks for bringing this up. I don't think we discussed this
explicitly in the thread. The changes which are filtered out by the
core itself e.g. changes to the catalogs or changes to other databases
or changes from undesired origins are not added to the reorder buffer.
They are not counted in total_bytes. The transactions containing only
such changes are not added to reorder buffer, so even total_txns does
not count such empty transactions. If we count these changes and
transactions in plugin_filtered_bytes, and plugin_filtered_txns, that
would create an anomaly - filtered counts being higher than total
counts. Further since core does not add these changes and transactions
to the reorder buffer, there is no way for a plugin to know about
their existence and hence count them. Does that make sense?

--
Best Wishes,
Ashutosh Bapat

Re: Report bytes and transactions actually sent downtream

От

Ashutosh Sharma

Дата:

23 сентября 2025 г., 15:58:18

> 0001 is the previous patch
> 0002 changes addressing your and Bertrand's comments.
>

@@ -1573,6 +1573,13 @@ WalSndWriteData(LogicalDecodingContext *ctx,
XLogRecPtr lsn, TransactionId xid,
  /* output previously gathered data in a CopyData packet */
  pq_putmessage_noblock(PqMsg_CopyData, ctx->out->data, ctx->out->len);

+ /*
+ * If output plugin maintains statistics, update the amount of data sent
+ * downstream.
+ */
+ if (ctx->stats)
+ ctx->stats->sentBytes += ctx->out->len + 1; /* +1 for the 'd' */
+

Just a small observation: I think it’s actually pq_flush_if_writable()
that writes the buffered data to the socket, not pq_putmessage_noblock
(which is actually gathering data in the buffer and not sending). So
it might make more sense to increment the sent pointer after the call
to pq_flush_if_writable().

Should we also consider - pg_hton32((uint32) (len + 4)); -- the
additional 4 bytes of data added to the send buffer.

--
With Regards,
Ashutosh Sharma.

Re: Report bytes and transactions actually sent downtream

От

shveta malik

Дата:

24 сентября 2025 г., 07:42:47

On Tue, Sep 23, 2025 at 4:06 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Mon, Sep 22, 2025 at 10:44 AM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > Few trivial comments:
> >
> > 1)
> > Currently the doc says:
> >
> > sentTxns is the number of transactions sent downstream by the output
> > plugin. sentBytes is the amount of data, in bytes, sent downstream by
> > the output plugin. OutputPluginWrite will update this counter if
> > ctx->stats is initialized by the output plugin. filteredBytes is the
> > size of changes, in bytes, that are filtered out by the output plugin.
> > Function ReorderBufferChangeSize may be used to find the size of
> > filtered ReorderBufferChange.
> >
> > Shall we rearrange it to:
> >
> > sentTxns is the number of transactions sent downstream by the output
> > plugin. sentBytes is the amount of data, in bytes, sent downstream by
> > the output plugin. filteredBytes is the size of changes, in bytes,
> > that are filtered out by the output plugin. OutputPluginWrite will
> > update these counters if ctx->stats is initialized by the output
> > plugin.
> > The function ReorderBufferChangeSize can be used to compute the size
> > of a filtered ReorderBufferChange, i.e., the filteredBytes.
> >
>
> Only sentBytes is incremented by OutputPluginWrite(), so saying that
> it will update counters is not correct. But I think you intend to keep
> description of all the fields together followed by any additional
> information. How about the following
>       <literal>sentTxns</literal> is the number of transactions sent downstream
>       by the output plugin. <literal>sentBytes</literal> is the amount of data,
>       in bytes, sent downstream by the output plugin.
>       <literal>filteredBytes</literal> is the size of changes, in bytes, that
>       are filtered out by the output plugin.
>       <function>OutputPluginWrite</function> will update
>       <literal>sentBytes</literal> if <literal>ctx->stats</literal> is
>       initialized by the output plugin. Function
>       <literal>ReorderBufferChangeSize</literal> may be used to find the size of
>       filtered <literal>ReorderBufferChange</literal>.

Yes, this looks good.

>
> > 2)
> > My preference will be to rename the fields 'total_txns' and
> > 'total_bytes' in PgStat_StatReplSlotEntry to 'total_wal_txns' and
> > 'total_wal_bytes' for better clarity. Additionally, upon rethinking,
> > it seems better to me that plugin-related fields are also named as
> > plugin_* to clearly indicate their association. OTOH, in
> > OutputPluginStats, the field names are fine as is, since the structure
> > name itself clearly indicates these are plugin-related fields.
> > PgStat_StatReplSlotEntry lacks such context and thus using full
> > descriptive names there would improve clarity.
>
> Ok. Done.
>
> >
> > 3)
> > LogicalOutputWrite:
> > + if (ctx->stats)
> > + ctx->stats->sentBytes += ctx->out->len + sizeof(XLogRecPtr) +
> > sizeof(TransactionId);
> >   p->returned_rows++;
> >
> > A blank line after the new change will increase readability.
> >
>
> Ok.
>
> > ~~
> >
> > In my testing, the patch works as expected. Thanks!
>
> Thanks for testing. Can we include any of your tests in the patch? Are
> the tests in patch enough?

I tested the flows with
a) logical replication slot and get-changes.
b) filtered data flows: pub-sub creation with row_filters, 'publish'
options. I tried to verify plugin fields as compared to total_wal*
fields.
c) reset flow.

While tests for a and c are present already. I don't see tests for b
anywhere when it comes to stats. Do you think we shall add a test for
filtered data using row-filter somewhere?

>
> Applied those suggestions in my repository. Do you have any further
> review comments?
>

No, I think that is all.

thanks
Shveta

Re: Report bytes and transactions actually sent downtream

От

Ashutosh Bapat

Дата:

24 сентября 2025 г., 08:38:32

On Tue, Sep 23, 2025 at 6:28 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>
> > 0001 is the previous patch
> > 0002 changes addressing your and Bertrand's comments.
> >
>
> @@ -1573,6 +1573,13 @@ WalSndWriteData(LogicalDecodingContext *ctx,
> XLogRecPtr lsn, TransactionId xid,
>   /* output previously gathered data in a CopyData packet */
>   pq_putmessage_noblock(PqMsg_CopyData, ctx->out->data, ctx->out->len);
>
> + /*
> + * If output plugin maintains statistics, update the amount of data sent
> + * downstream.
> + */
> + if (ctx->stats)
> + ctx->stats->sentBytes += ctx->out->len + 1; /* +1 for the 'd' */
> +
>
> Just a small observation: I think it’s actually pq_flush_if_writable()
> that writes the buffered data to the socket, not pq_putmessage_noblock
> (which is actually gathering data in the buffer and not sending). So
> it might make more sense to increment the sent pointer after the call
> to pq_flush_if_writable().

That's a good point. I placed it after pq_putmessage_noblock() so that
it's easy to link the increment to sentBytes and the actual bytes
being sent. You are right that the bytes won't be sent unless
pq_flush_if_writable() is called but it will be called for sure before
the next UpdateDecodingStats(). So the reported bytes are never wrong.
I would prefer readability over seeming accuracy.

>
> Should we also consider - pg_hton32((uint32) (len + 4)); -- the
> additional 4 bytes of data added to the send buffer.
>

In WalSndWriteData() we can't rely on what happens in a low level API
like socket_putmessage(). And we are counting the number of bytes in
the logically decoded message. So, I actually wonder whether we should
count 1 byte of 'd' in sentBytes. Shveta, Bertand, what do you think?

--
Best Wishes,
Ashutosh Bapat

Re: Report bytes and transactions actually sent downtream

От

shveta malik

Дата:

24 сентября 2025 г., 09:08:30

On Wed, Sep 24, 2025 at 11:08 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Tue, Sep 23, 2025 at 6:28 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
> >
> > > 0001 is the previous patch
> > > 0002 changes addressing your and Bertrand's comments.
> > >
> >
> > @@ -1573,6 +1573,13 @@ WalSndWriteData(LogicalDecodingContext *ctx,
> > XLogRecPtr lsn, TransactionId xid,
> >   /* output previously gathered data in a CopyData packet */
> >   pq_putmessage_noblock(PqMsg_CopyData, ctx->out->data, ctx->out->len);
> >
> > + /*
> > + * If output plugin maintains statistics, update the amount of data sent
> > + * downstream.
> > + */
> > + if (ctx->stats)
> > + ctx->stats->sentBytes += ctx->out->len + 1; /* +1 for the 'd' */
> > +
> >
> > Just a small observation: I think it’s actually pq_flush_if_writable()
> > that writes the buffered data to the socket, not pq_putmessage_noblock
> > (which is actually gathering data in the buffer and not sending). So
> > it might make more sense to increment the sent pointer after the call
> > to pq_flush_if_writable().
>
> That's a good point. I placed it after pq_putmessage_noblock() so that
> it's easy to link the increment to sentBytes and the actual bytes
> being sent. You are right that the bytes won't be sent unless
> pq_flush_if_writable() is called but it will be called for sure before
> the next UpdateDecodingStats(). So the reported bytes are never wrong.
> I would prefer readability over seeming accuracy.
>
> >
> > Should we also consider - pg_hton32((uint32) (len + 4)); -- the
> > additional 4 bytes of data added to the send buffer.
> >
>
> In WalSndWriteData() we can't rely on what happens in a low level API
> like socket_putmessage(). And we are counting the number of bytes in
> the logically decoded message. So, I actually wonder whether we should
> count 1 byte of 'd' in sentBytes. Shveta, Bertand, what do you think?
>

If we are not counting all such metadata bytes ((or can't reliably do
so), then IMO, we shall skip counting msgtype as well.

thanks
Shveta

Re: Report bytes and transactions actually sent downtream

От

Bertrand Drouvot

Дата:

24 сентября 2025 г., 09:42:04

Hi,

On Wed, Sep 24, 2025 at 11:38:30AM +0530, shveta malik wrote:
> On Wed, Sep 24, 2025 at 11:08 AM Ashutosh Bapat
> <ashutosh.bapat.oss@gmail.com> wrote:
> >
> > In WalSndWriteData() we can't rely on what happens in a low level API
> > like socket_putmessage(). And we are counting the number of bytes in
> > the logically decoded message. So, I actually wonder whether we should
> > count 1 byte of 'd' in sentBytes. Shveta, Bertand, what do you think?
> >
> 
> If we are not counting all such metadata bytes ((or can't reliably do
> so), then IMO, we shall skip counting msgtype as well.

Agree. Maybe mention in the doc that metadata (including msgtype) bytes are not
taken into account?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Report bytes and transactions actually sent downtream

От

Bertrand Drouvot

Дата:

24 сентября 2025 г., 10:02:40

Hi,

On Tue, Sep 23, 2025 at 04:15:14PM +0530, Ashutosh Bapat wrote:
> On Tue, Sep 23, 2025 at 12:14 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > - create a table and use pg_logical_slot_get_changes with ('skip-empty-xacts', '1')
> > then I don't see plugin_sent_bytes increasing (which makes sense) but I also don't
> > see plugin_filtered_bytes increasing. I think that would make sense to also increase
> > plugin_filtered_bytes in this case (and for the other options that would skip
> > sending data). Thoughts?
> 
> Thanks for bringing this up. I don't think we discussed this
> explicitly in the thread. The changes which are filtered out by the
> core itself e.g. changes to the catalogs or changes to other databases
> or changes from undesired origins are not added to the reorder buffer.
> They are not counted in total_bytes. The transactions containing only
> such changes are not added to reorder buffer, so even total_txns does
> not count such empty transactions. If we count these changes and
> transactions in plugin_filtered_bytes, and plugin_filtered_txns, that
> would create an anomaly - filtered counts being higher than total
> counts. Further since core does not add these changes and transactions
> to the reorder buffer, there is no way for a plugin to know about
> their existence and hence count them. Does that make sense?

Yes. Do you think that the doc in the patch is clear enough regarding this point?
I mean the doc looks correct (mentioning the output plugin) but would that make
sense to insist that core filtering is not taken into account?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Report bytes and transactions actually sent downtream

От

Ashutosh Bapat

Дата:

24 сентября 2025 г., 10:17:09

On Wed, Sep 24, 2025 at 10:12 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> I tested the flows with
> a) logical replication slot and get-changes.
> b) filtered data flows: pub-sub creation with row_filters, 'publish'
> options. I tried to verify plugin fields as compared to total_wal*
> fields.
> c) reset flow.
>
> While tests for a and c are present already. I don't see tests for b
> anywhere when it comes to stats. Do you think we shall add a test for
> filtered data using row-filter somewhere?

Added a test in 028_row_filter. Please find it in the attached
patchset. I didn't find tests which test table level filtering or
operation level filtering. Can you please point me to such tests. I
will add similar test to other places. Once you review the test in
028_row_filter, I will replicate it to other places you point out.

On Wed, Sep 24, 2025 at 12:12 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On Wed, Sep 24, 2025 at 11:38:30AM +0530, shveta malik wrote:
> > On Wed, Sep 24, 2025 at 11:08 AM Ashutosh Bapat
> > <ashutosh.bapat.oss@gmail.com> wrote:
> > >
> > > In WalSndWriteData() we can't rely on what happens in a low level API
> > > like socket_putmessage(). And we are counting the number of bytes in
> > > the logically decoded message. So, I actually wonder whether we should
> > > count 1 byte of 'd' in sentBytes. Shveta, Bertand, what do you think?
> > >
> >
> > If we are not counting all such metadata bytes ((or can't reliably do
> > so), then IMO, we shall skip counting msgtype as well.
>
> Agree. Maybe mention in the doc that metadata (including msgtype) bytes are not
> taken into account?

We are counting the sentBytes in central places through which all the
logically decoded messages flow. So we are not missing on any metadata
bytes. Given that these bytes are part of the logically decoded
message itself, I think we should count them in the sentBytes. Now the
question remains is whether to count 4 bytes for length in the message
itself? The logical decoding code can not control that and thus should
not account for it. So I am leaving bytes counted for
pg_hton32((uint32) (len + 4)) out of sentBytes calculation.
--
Best Wishes,
Ashutosh Bapat

Вложения

Re: Report bytes and transactions actually sent downtream

От

Ashutosh Bapat

Дата:

24 сентября 2025 г., 10:21:29

On Wed, Sep 24, 2025 at 12:32 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On Tue, Sep 23, 2025 at 04:15:14PM +0530, Ashutosh Bapat wrote:
> > On Tue, Sep 23, 2025 at 12:14 PM Bertrand Drouvot
> > <bertranddrouvot.pg@gmail.com> wrote:
> > >
> > > - create a table and use pg_logical_slot_get_changes with ('skip-empty-xacts', '1')
> > > then I don't see plugin_sent_bytes increasing (which makes sense) but I also don't
> > > see plugin_filtered_bytes increasing. I think that would make sense to also increase
> > > plugin_filtered_bytes in this case (and for the other options that would skip
> > > sending data). Thoughts?
> >
> > Thanks for bringing this up. I don't think we discussed this
> > explicitly in the thread. The changes which are filtered out by the
> > core itself e.g. changes to the catalogs or changes to other databases
> > or changes from undesired origins are not added to the reorder buffer.
> > They are not counted in total_bytes. The transactions containing only
> > such changes are not added to reorder buffer, so even total_txns does
> > not count such empty transactions. If we count these changes and
> > transactions in plugin_filtered_bytes, and plugin_filtered_txns, that
> > would create an anomaly - filtered counts being higher than total
> > counts. Further since core does not add these changes and transactions
> > to the reorder buffer, there is no way for a plugin to know about
> > their existence and hence count them. Does that make sense?
>
> Yes. Do you think that the doc in the patch is clear enough regarding this point?
> I mean the doc looks correct (mentioning the output plugin) but would that make
> sense to insist that core filtering is not taken into account?

Do you mean, should we mention in the docs that core filtering is not
taken into account? I would question whether that's called filtering
at all, in the context of logical decoding. The view should be read in
the context of logical decoding. For example, we aren't mentioning
that total_bytes does not include changes from other database.

--
Best Wishes,
Ashutosh Bapat

Re: Report bytes and transactions actually sent downtream

От

Bertrand Drouvot

Дата:

24 сентября 2025 г., 11:25:50

Hi,

On Wed, Sep 24, 2025 at 12:51:29PM +0530, Ashutosh Bapat wrote:
> On Wed, Sep 24, 2025 at 12:32 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> > > > - create a table and use pg_logical_slot_get_changes with ('skip-empty-xacts', '1')
> > > > then I don't see plugin_sent_bytes increasing (which makes sense) but I also don't
> > > > see plugin_filtered_bytes increasing. I think that would make sense to also increase
> > > > plugin_filtered_bytes in this case (and for the other options that would skip
> > > > sending data). Thoughts?
> > >
> > > Thanks for bringing this up. I don't think we discussed this
> > > explicitly in the thread. The changes which are filtered out by the
> > > core itself e.g. changes to the catalogs or changes to other databases
> > > or changes from undesired origins are not added to the reorder buffer.
> > > They are not counted in total_bytes. The transactions containing only
> > > such changes are not added to reorder buffer, so even total_txns does
> > > not count such empty transactions. If we count these changes and
> > > transactions in plugin_filtered_bytes, and plugin_filtered_txns, that
> > > would create an anomaly - filtered counts being higher than total
> > > counts. Further since core does not add these changes and transactions
> > > to the reorder buffer, there is no way for a plugin to know about
> > > their existence and hence count them. Does that make sense?
> >
> > Yes. Do you think that the doc in the patch is clear enough regarding this point?
> > I mean the doc looks correct (mentioning the output plugin) but would that make
> > sense to insist that core filtering is not taken into account?
> 
> Do you mean, should we mention in the docs that core filtering is not
> taken into account?
> I would question whether that's called filtering
> at all, in the context of logical decoding. The view should be read in
> the context of logical decoding. For example, we aren't mentioning
> that total_bytes does not include changes from other database.

Right. But, in the example above, do you consider "skip-empty-xacts" as "core"
or "plugin" filtering?

It's an option part of the "test_decoding" plugin, so it's the plugin choice to
not display empty xacts (should the option be set accordingly). Then should it
be reported in plugin_filtered_bytes? (one could write a plugin, decide to
skip/filter empty xacts or whatever in the plugin callbacks: should that be
reported as plugin_filtered_bytes?)

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Report bytes and transactions actually sent downtream

От

shveta malik

Дата:

24 сентября 2025 г., 12:07:56

On Wed, Sep 24, 2025 at 12:47 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Wed, Sep 24, 2025 at 10:12 AM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > I tested the flows with
> > a) logical replication slot and get-changes.
> > b) filtered data flows: pub-sub creation with row_filters, 'publish'
> > options. I tried to verify plugin fields as compared to total_wal*
> > fields.
> > c) reset flow.
> >
> > While tests for a and c are present already. I don't see tests for b
> > anywhere when it comes to stats. Do you think we shall add a test for
> > filtered data using row-filter somewhere?
>
> Added a test in 028_row_filter. Please find it in the attached
> patchset.

Test looks good.

> I didn't find tests which test table level filtering or
> operation level filtering. Can you please point me to such tests. I
> will add similar test to other places. Once you review the test in
> 028_row_filter, I will replicate it to other places you point out.
>

I can see a few tests of operation level filtering present in
'subscription/t/001_rep_changes.pl'  and
'subscription/t/010_truncate.pl'

> On Wed, Sep 24, 2025 at 12:12 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > Hi,
> >
> > On Wed, Sep 24, 2025 at 11:38:30AM +0530, shveta malik wrote:
> > > On Wed, Sep 24, 2025 at 11:08 AM Ashutosh Bapat
> > > <ashutosh.bapat.oss@gmail.com> wrote:
> > > >
> > > > In WalSndWriteData() we can't rely on what happens in a low level API
> > > > like socket_putmessage(). And we are counting the number of bytes in
> > > > the logically decoded message. So, I actually wonder whether we should
> > > > count 1 byte of 'd' in sentBytes. Shveta, Bertand, what do you think?
> > > >
> > >
> > > If we are not counting all such metadata bytes ((or can't reliably do
> > > so), then IMO, we shall skip counting msgtype as well.
> >
> > Agree. Maybe mention in the doc that metadata (including msgtype) bytes are not
> > taken into account?
>
> We are counting the sentBytes in central places through which all the
> logically decoded messages flow. So we are not missing on any metadata
> bytes. Given that these bytes are part of the logically decoded
> message itself, I think we should count them in the sentBytes. Now the
> question remains is whether to count 4 bytes for length in the message
> itself? The logical decoding code can not control that and thus should
> not account for it. So I am leaving bytes counted for
> pg_hton32((uint32) (len + 4)) out of sentBytes calculation.
> --

Okay.

thanks
Shveta

Re: Report bytes and transactions actually sent downtream

От

Ashutosh Bapat

Дата:

24 сентября 2025 г., 13:07:07

On Wed, Sep 24, 2025 at 1:55 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On Wed, Sep 24, 2025 at 12:51:29PM +0530, Ashutosh Bapat wrote:
> > On Wed, Sep 24, 2025 at 12:32 PM Bertrand Drouvot
> > <bertranddrouvot.pg@gmail.com> wrote:
> > > > > - create a table and use pg_logical_slot_get_changes with ('skip-empty-xacts', '1')
> > > > > then I don't see plugin_sent_bytes increasing (which makes sense) but I also don't
> > > > > see plugin_filtered_bytes increasing. I think that would make sense to also increase
> > > > > plugin_filtered_bytes in this case (and for the other options that would skip
> > > > > sending data). Thoughts?
> > > >
> > > > Thanks for bringing this up. I don't think we discussed this
> > > > explicitly in the thread. The changes which are filtered out by the
> > > > core itself e.g. changes to the catalogs or changes to other databases
> > > > or changes from undesired origins are not added to the reorder buffer.
> > > > They are not counted in total_bytes. The transactions containing only
> > > > such changes are not added to reorder buffer, so even total_txns does
> > > > not count such empty transactions. If we count these changes and
> > > > transactions in plugin_filtered_bytes, and plugin_filtered_txns, that
> > > > would create an anomaly - filtered counts being higher than total
> > > > counts. Further since core does not add these changes and transactions
> > > > to the reorder buffer, there is no way for a plugin to know about
> > > > their existence and hence count them. Does that make sense?
> > >
> > > Yes. Do you think that the doc in the patch is clear enough regarding this point?
> > > I mean the doc looks correct (mentioning the output plugin) but would that make
> > > sense to insist that core filtering is not taken into account?
> >
> > Do you mean, should we mention in the docs that core filtering is not
> > taken into account?
> > I would question whether that's called filtering
> > at all, in the context of logical decoding. The view should be read in
> > the context of logical decoding. For example, we aren't mentioning
> > that total_bytes does not include changes from other database.
>
> Right. But, in the example above, do you consider "skip-empty-xacts" as "core"
> or "plugin" filtering?
>
> It's an option part of the "test_decoding" plugin, so it's the plugin choice to
> not display empty xacts (should the option be set accordingly). Then should it
> be reported in plugin_filtered_bytes? (one could write a plugin, decide to
> skip/filter empty xacts or whatever in the plugin callbacks: should that be
> reported as plugin_filtered_bytes?)

If a transaction becomes empty because the plugin filtered all the
changes then plugin_filtered_bytes will be incremented by the amount
of filtered changes. If the transaction was empty because core didn't
send any of the changes to the output plugin, there was nothing
filtered by the output plugin so plugin_filtered_bytes will not be
affected.

skip_empty_xacts controls whether BEGIN and COMMIT are sent for an
empty transaction or not. It does not filter "changes". It affects
"sent_bytes".

--
Best Wishes,
Ashutosh Bapat

Re: Report bytes and transactions actually sent downtream

От

Ashutosh Bapat

Дата:

24 сентября 2025 г., 14:58:44

On Wed, Sep 24, 2025 at 2:38 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Wed, Sep 24, 2025 at 12:47 PM Ashutosh Bapat
> <ashutosh.bapat.oss@gmail.com> wrote:
> >
> > On Wed, Sep 24, 2025 at 10:12 AM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > > I tested the flows with
> > > a) logical replication slot and get-changes.
> > > b) filtered data flows: pub-sub creation with row_filters, 'publish'
> > > options. I tried to verify plugin fields as compared to total_wal*
> > > fields.
> > > c) reset flow.
> > >
> > > While tests for a and c are present already. I don't see tests for b
> > > anywhere when it comes to stats. Do you think we shall add a test for
> > > filtered data using row-filter somewhere?
> >
> > Added a test in 028_row_filter. Please find it in the attached
> > patchset.
>
> Test looks good.

Thanks. Added to three more files. I think we have covered all the
cases where filtering can occur.

PFA patches.

--
Best Wishes,
Ashutosh Bapat

Вложения

Re: Report bytes and transactions actually sent downtream

От

Bertrand Drouvot

Дата:

24 сентября 2025 г., 16:13:10

Hi,

On Wed, Sep 24, 2025 at 03:37:07PM +0530, Ashutosh Bapat wrote:
> On Wed, Sep 24, 2025 at 1:55 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> > Right. But, in the example above, do you consider "skip-empty-xacts" as "core"
> > or "plugin" filtering?
> >
> > It's an option part of the "test_decoding" plugin, so it's the plugin choice to
> > not display empty xacts (should the option be set accordingly). Then should it
> > be reported in plugin_filtered_bytes? (one could write a plugin, decide to
> > skip/filter empty xacts or whatever in the plugin callbacks: should that be
> > reported as plugin_filtered_bytes?)
> 
> If a transaction becomes empty because the plugin filtered all the
> changes then plugin_filtered_bytes will be incremented by the amount
> of filtered changes. If the transaction was empty because core didn't
> send any of the changes to the output plugin, there was nothing
> filtered by the output plugin so plugin_filtered_bytes will not be
> affected.
> 
> skip_empty_xacts controls whether BEGIN and COMMIT are sent for an
> empty transaction or not. It does not filter "changes". It affects
> "sent_bytes".

skip_empty_xacts was just an example. I mean a plugin could decide to filter all
the inserts for example (not saying it makes sense). But I think we'are saying the
same: say a plugin wants to filter the inserts then it's its responsability to
increment ctx->stats->filteredBytes in its "change_cb" callback for the 
REORDER_BUFFER_CHANGE_INSERT action, right? If so, I wonder if it would make
sense to provide an example in the test_decoding plugin (I can see it's done
for pgoutput but that might sound more natural to look in contrib if one is
searching for an example).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Report bytes and transactions actually sent downtream

От

Bertrand Drouvot

Дата:

24 сентября 2025 г., 17:41:23

Hi,

On Wed, Sep 24, 2025 at 05:28:44PM +0530, Ashutosh Bapat wrote:
> On Wed, Sep 24, 2025 at 2:38 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Wed, Sep 24, 2025 at 12:47 PM Ashutosh Bapat
> > <ashutosh.bapat.oss@gmail.com> wrote:
> > >
> > > On Wed, Sep 24, 2025 at 10:12 AM shveta malik <shveta.malik@gmail.com> wrote:
> > > >
> > > > I tested the flows with
> > > > a) logical replication slot and get-changes.
> > > > b) filtered data flows: pub-sub creation with row_filters, 'publish'
> > > > options. I tried to verify plugin fields as compared to total_wal*
> > > > fields.
> > > > c) reset flow.
> > > >
> > > > While tests for a and c are present already. I don't see tests for b
> > > > anywhere when it comes to stats. Do you think we shall add a test for
> > > > filtered data using row-filter somewhere?
> > >
> > > Added a test in 028_row_filter. Please find it in the attached
> > > patchset.
> >
> > Test looks good.
> 
> Thanks. Added to three more files. I think we have covered all the
> cases where filtering can occur.
> 
> PFA patches.

Thanks for the new version!

A few random comments:

=== 1

+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+        <structfield>plugin_filtered_bytes</structfield> <type>bigint</type>
+       </para>
+       <para>
+        Amount of changes, from <structfield>total_wal_bytes</structfield>, filtered
+        out by the output plugin and not sent downstream. Please note that it
+        does not include the changes filtered before a change is sent to
+        the output plugin, e.g. the changes filtered by origin. The count is
+        maintained by the output plugin mentioned in
+        <structfield>plugin</structfield>.

I found "The count" somehow ambiguous. What about "This statistic" instead?

=== 2

+        subtransactions. These transactions are subset of transctions sent to

s/transctions/transactions

=== 3

+        the decoding plugin. Hence this count is expected to be lesser than or

s/be lesser/be less/? (not 100% sure)

=== 4

+extern Size ReorderBufferChangeSize(ReorderBufferChange *change);

Another approach could be to pass the change's size as an argument to the
callbacks? That would avoid to expose ReorderBufferChangeSize publicly.

=== 5

        ctx->output_plugin_private = data;
+       ctx->stats = palloc0(sizeof(OutputPluginStats));

I was wondering if we need to free this in pg_decode_shutdown, but it looks
like it's done through FreeDecodingContext() anyway.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Report bytes and transactions actually sent downtream

От

shveta malik

Дата:

25 сентября 2025 г., 06:59:31

On Wed, Sep 24, 2025 at 5:28 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Wed, Sep 24, 2025 at 2:38 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Wed, Sep 24, 2025 at 12:47 PM Ashutosh Bapat
> > <ashutosh.bapat.oss@gmail.com> wrote:
> > >
> > > On Wed, Sep 24, 2025 at 10:12 AM shveta malik <shveta.malik@gmail.com> wrote:
> > > >
> > > > I tested the flows with
> > > > a) logical replication slot and get-changes.
> > > > b) filtered data flows: pub-sub creation with row_filters, 'publish'
> > > > options. I tried to verify plugin fields as compared to total_wal*
> > > > fields.
> > > > c) reset flow.
> > > >
> > > > While tests for a and c are present already. I don't see tests for b
> > > > anywhere when it comes to stats. Do you think we shall add a test for
> > > > filtered data using row-filter somewhere?
> > >
> > > Added a test in 028_row_filter. Please find it in the attached
> > > patchset.
> >
> > Test looks good.
>
> Thanks. Added to three more files. I think we have covered all the
> cases where filtering can occur.
>

Yes. The test looks good now.

thanks
Shveta

Re: Report bytes and transactions actually sent downtream

От

Ashutosh Bapat

Дата:

25 сентября 2025 г., 07:23:05

On Wed, Sep 24, 2025 at 6:43 PM Bertrand Drouvot

<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On Wed, Sep 24, 2025 at 03:37:07PM +0530, Ashutosh Bapat wrote:
> > On Wed, Sep 24, 2025 at 1:55 PM Bertrand Drouvot
> > <bertranddrouvot.pg@gmail.com> wrote:
> > > Right. But, in the example above, do you consider "skip-empty-xacts" as "core"
> > > or "plugin" filtering?
> > >
> > > It's an option part of the "test_decoding" plugin, so it's the plugin choice to
> > > not display empty xacts (should the option be set accordingly). Then should it
> > > be reported in plugin_filtered_bytes? (one could write a plugin, decide to
> > > skip/filter empty xacts or whatever in the plugin callbacks: should that be
> > > reported as plugin_filtered_bytes?)
> >
> > If a transaction becomes empty because the plugin filtered all the
> > changes then plugin_filtered_bytes will be incremented by the amount
> > of filtered changes. If the transaction was empty because core didn't
> > send any of the changes to the output plugin, there was nothing
> > filtered by the output plugin so plugin_filtered_bytes will not be
> > affected.
> >
> > skip_empty_xacts controls whether BEGIN and COMMIT are sent for an
> > empty transaction or not. It does not filter "changes". It affects
> > "sent_bytes".
>
> skip_empty_xacts was just an example. I mean a plugin could decide to filter all
> the inserts for example (not saying it makes sense). But I think we'are saying the
> same: say a plugin wants to filter the inserts then it's its responsability to
> increment ctx->stats->filteredBytes in its "change_cb" callback for the
> REORDER_BUFFER_CHANGE_INSERT action, right?

Right.

> If so, I wonder if it would make
> sense to provide an example in the test_decoding plugin (I can see it's done
> for pgoutput but that might sound more natural to look in contrib if one is
> searching for an example).

test_decoding does not make use of publication at all. Publication
controls filtering and so test_decoding does not have any examples of
filtering code. Doesn't make sense to add code to manipulate
filteredBytes there.

--
Best Wishes,
Ashutosh Bapat

Re: Report bytes and transactions actually sent downtream

От

Ashutosh Bapat

Дата:

25 сентября 2025 г., 07:46:35

On Wed, Sep 24, 2025 at 8:11 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On Wed, Sep 24, 2025 at 05:28:44PM +0530, Ashutosh Bapat wrote:
> > On Wed, Sep 24, 2025 at 2:38 PM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > > On Wed, Sep 24, 2025 at 12:47 PM Ashutosh Bapat
> > > <ashutosh.bapat.oss@gmail.com> wrote:
> > > >
> > > > On Wed, Sep 24, 2025 at 10:12 AM shveta malik <shveta.malik@gmail.com> wrote:
> > > > >
> > > > > I tested the flows with
> > > > > a) logical replication slot and get-changes.
> > > > > b) filtered data flows: pub-sub creation with row_filters, 'publish'
> > > > > options. I tried to verify plugin fields as compared to total_wal*
> > > > > fields.
> > > > > c) reset flow.
> > > > >
> > > > > While tests for a and c are present already. I don't see tests for b
> > > > > anywhere when it comes to stats. Do you think we shall add a test for
> > > > > filtered data using row-filter somewhere?
> > > >
> > > > Added a test in 028_row_filter. Please find it in the attached
> > > > patchset.
> > >
> > > Test looks good.
> >
> > Thanks. Added to three more files. I think we have covered all the
> > cases where filtering can occur.
> >
> > PFA patches.
>
> Thanks for the new version!
>
> A few random comments:
>
> === 1
>
> +     <row>
> +      <entry role="catalog_table_entry"><para role="column_definition">
> +        <structfield>plugin_filtered_bytes</structfield> <type>bigint</type>
> +       </para>
> +       <para>
> +        Amount of changes, from <structfield>total_wal_bytes</structfield>, filtered
> +        out by the output plugin and not sent downstream. Please note that it
> +        does not include the changes filtered before a change is sent to
> +        the output plugin, e.g. the changes filtered by origin. The count is
> +        maintained by the output plugin mentioned in
> +        <structfield>plugin</structfield>.
>
> I found "The count" somehow ambiguous. What about "This statistic" instead?

Existing fields use term "The counter". Changed "The count" to "The counter".

>
> === 2
>
> +        subtransactions. These transactions are subset of transctions sent to
>
> s/transctions/transactions

Done.

>
> === 3
>
> +        the decoding plugin. Hence this count is expected to be lesser than or
>
> s/be lesser/be less/? (not 100% sure)

Less than is correct. Fixed.

>
> === 4
>
> +extern Size ReorderBufferChangeSize(ReorderBufferChange *change);
>
> Another approach could be to pass the change's size as an argument to the
> callbacks? That would avoid to expose ReorderBufferChangeSize publicly.

Do you see any problem in exposing ReorderBufferChangeSize(). It's a
pretty small function and may be quite handy to output plugins
otherwise as well. And we expose many ReorderBuffer related functions;
so this isn't the first.

If we were to do as you say, it will change other external facing APIs
like change_cb(). Output plugins will need to change their code
accordingly even when they don't want to support plugin statistics.
Given that we have made maintaining plugin statistics optional,
forcing API change does not make sense. For example, test_decoding
which does not filter anything would unnecessarily have to change its
code.

I considered adding a field size to ReorderBufferChange itself. But
that means we increase the amount of memory used in the reorder
buffer, which seems to have become prime estate these days. So
rejected that idea as well.

Advantage of this change is that the minimal cost of calculating the
size and maintaining the code change is incurred only when filtering
happens, by the plugins which want to filter and maintain statistics.

>
> === 5
>
>         ctx->output_plugin_private = data;
> +       ctx->stats = palloc0(sizeof(OutputPluginStats));
>
> I was wondering if we need to free this in pg_decode_shutdown, but it looks
> like it's done through FreeDecodingContext() anyway.

That's correct. Even output_plugin_private is freed when the decoding
memory context is freed.

Thanks for the review comments. I have addressed the comments in my
repository and the changes will be included in the next set of
patches.

Do you have any further review comments?

--
Best Wishes,
Ashutosh Bapat

Re: Report bytes and transactions actually sent downtream

От

Bertrand Drouvot

Дата:

25 сентября 2025 г., 16:01:55

Hi,

On Thu, Sep 25, 2025 at 10:16:35AM +0530, Ashutosh Bapat wrote:
> On Wed, Sep 24, 2025 at 8:11 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > === 4
> >
> > +extern Size ReorderBufferChangeSize(ReorderBufferChange *change);
> >
> > Another approach could be to pass the change's size as an argument to the
> > callbacks? That would avoid to expose ReorderBufferChangeSize publicly.
> 
> Do you see any problem in exposing ReorderBufferChangeSize(). It's a
> pretty small function and may be quite handy to output plugins
> otherwise as well. And we expose many ReorderBuffer related functions;
> so this isn't the first.

Right. I don't see a problem per say, just thinking that the less we expose
publicly to be used by extensions/plugins, the better.

> If we were to do as you say, it will change other external facing APIs
> like change_cb(). Output plugins will need to change their code
> accordingly even when they don't want to support plugin statistics.

Correct.

> Given that we have made maintaining plugin statistics optional,
> forcing API change does not make sense. For example, test_decoding
> which does not filter anything would unnecessarily have to change its
> code.

That's right.

> I considered adding a field size to ReorderBufferChange itself. But
> that means we increase the amount of memory used in the reorder
> buffer, which seems to have become prime estate these days. So
> rejected that idea as well.
> 
> Advantage of this change is that the minimal cost of calculating the
> size and maintaining the code change is incurred only when filtering
> happens, by the plugins which want to filter and maintain statistics.

Yes, anyway as it's unlikely that we have to fix a bug in a minor release that
would need a signature change to ReorderBufferChangeSize(), I think that's fine
as proposed.

> >
> > === 5
> >
> >         ctx->output_plugin_private = data;
> > +       ctx->stats = palloc0(sizeof(OutputPluginStats));
> >
> > I was wondering if we need to free this in pg_decode_shutdown, but it looks
> > like it's done through FreeDecodingContext() anyway.
> 
> That's correct. Even output_plugin_private is freed when the decoding
> memory context is freed.
> 
> Thanks for the review comments. I have addressed the comments in my
> repository and the changes will be included in the next set of
> patches.

Thanks!

> Do you have any further review comments?

Not right now. I'll give it another look by early next week the latest.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Report bytes and transactions actually sent downtream

От

Bertrand Drouvot

Дата:

26 сентября 2025 г., 14:13:19

Hi,

On Thu, Sep 25, 2025 at 01:01:55PM +0000, Bertrand Drouvot wrote:
> Hi,
> 
> On Thu, Sep 25, 2025 at 10:16:35AM +0530, Ashutosh Bapat wrote:
> > Do you have any further review comments?
> 
> Not right now. I'll give it another look by early next week the latest.
> 

=== 1

@@ -173,6 +173,7 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
        data->only_local = false;

        ctx->output_plugin_private = data;
+       ctx->stats = palloc0(sizeof(OutputPluginStats));

I was not sure where it's allocated, but looking at:

Breakpoint 1, pg_decode_startup (ctx=0x1ba853a0, opt=0x1ba85478, is_init=false) at test_decoding.c:164
164             bool            enable_streaming = false;
(gdb) n
166             data = palloc0(sizeof(TestDecodingData));
(gdb)
167             data->context = AllocSetContextCreate(ctx->context,
(gdb)
170             data->include_xids = true;
(gdb)
171             data->include_timestamp = false;
(gdb)
172             data->skip_empty_xacts = false;
(gdb)
173             data->only_local = false;
(gdb)
175             ctx->output_plugin_private = data;
(gdb)
176             ctx->stats = palloc0(sizeof(OutputPluginStats));
(gdb)
178             opt->output_type = OUTPUT_PLUGIN_TEXTUAL_OUTPUT;
(gdb) p CurrentMemoryContext
$7 = (MemoryContext) 0x1ba852a0
(gdb) p (*CurrentMemoryContext).name
$8 = 0xe4057d "Logical decoding context"
(gdb) p ctx->context
$9 = (MemoryContext) 0x1ba852a0

I can see that CurrentMemoryContext is "ctx->context" so the palloc0 done here
are done in the right context.

=== 2

Playing with "has stats" a bit.

-- Issue 1:

Say, plugin has stats enabled and I get:

postgres=# select plugin,plugin_sent_txns from pg_stat_replication_slots ;
     plugin     | plugin_sent_txns
----------------+------------------
 pg_commit_info |                9
(1 row)

If the engine is shutdown and the plugin is now replaced by a version that
does not provide stats, then, right after startup, I still get:

postgres=# select plugin,plugin_sent_txns from pg_stat_replication_slots ;
     plugin     | plugin_sent_txns
----------------+------------------
 pg_commit_info |                9
(1 row)

And that will be the case until the plugin decodes something (so that 
statent->plugin_has_stats gets replaced in pgstat_report_replslot()).

That's because plugin_has_stats is stored in PgStat_StatReplSlotEntry
and so it's restored from the stat file when the engine starts.

Now, let's do some inserts and decode:

postgres=# insert into t1 values ('a');
INSERT 0 1
postgres=# insert into t1 values ('a');
INSERT 0 1
postgres=# select * from pg_logical_slot_get_changes('logical_slot',NULL,NULL);
    lsn     | xid |                                          data
------------+-----+-----------------------------------------------------------------------------------------
 0/407121C0 | 766 | xid 766: lsn:0/40712190 inserts:1 deletes:0 updates:0 truncates:0 relations truncated:0
 0/40712268 | 767 | xid 767: lsn:0/40712238 inserts:1 deletes:0 updates:0 truncates:0 relations truncated:0
(2 rows)

postgres=# select plugin,plugin_sent_txns from pg_stat_replication_slots ;
     plugin     | plugin_sent_txns
----------------+------------------
 pg_commit_info |
(1 row)

All good. 

Issue 1 is that before any decoding happens, pg_stat_replication_slots is still
showing stale plugin statistics from a plugin that may no longer support stats.

I'm not sure how we could easily fix this issue, as we don't know the plugin's
stats capability until we actually use it.

-- Issue 2:

Let's shutdown, replace the plugin with a version that has stats enabled and
restart.

Same behavior as before:

postgres=# select plugin,plugin_sent_txns from pg_stat_replication_slots ;
     plugin     | plugin_sent_txns
----------------+------------------
 pg_commit_info |
(1 row)

Until pgstat_report_replslot() is not called, the statent->plugin_has_stats is
not updated. So it displays the stats as they were before the shutdown. But that's
not an issue in this case (when switching from non stats to stats).

Now, let's do some inserts and decode:

postgres=# insert into t1 values ('a');
INSERT 0 1
postgres=# select * from pg_logical_slot_get_changes('logical_slot',NULL,NULL);
    lsn     | xid |                                          data
------------+-----+-----------------------------------------------------------------------------------------
 0/407125B0 | 768 | xid 768: lsn:0/40712580 inserts:1 deletes:0 updates:0 truncates:0 relations truncated:0
(1 row)

and check the stats:

postgres=# select plugin,plugin_sent_txns from pg_stat_replication_slots ;
     plugin     | plugin_sent_txns
----------------+------------------
 pg_commit_info |               10
(1 row)

Now it reports 10, that's the 9 before we changed the plugin to not have stats
enabled plus this new one.

Issue 2: when switching from a non-stats plugin back to a stats-capable plugin, it
shows accumulated values from before the non-stats switch.

PFA attached a proposal to fix Issue 2.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Вложения

fix_issue2.txt

Re: Report bytes and transactions actually sent downtream

От

Ashutosh Bapat

Дата:

26 сентября 2025 г., 15:44:28

On Fri, Sep 26, 2025 at 4:43 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
>
> === 2
>
> Playing with "has stats" a bit.
>
> -- Issue 1:
>

Thanks for experiments! Thanks for bringing it up.

> Say, plugin has stats enabled and I get:
>
> postgres=# select plugin,plugin_sent_txns from pg_stat_replication_slots ;
>      plugin     | plugin_sent_txns
> ----------------+------------------
>  pg_commit_info |                9
> (1 row)
>
> If the engine is shutdown and the plugin is now replaced by a version that
> does not provide stats, then, right after startup, I still get:
>
> postgres=# select plugin,plugin_sent_txns from pg_stat_replication_slots ;
>      plugin     | plugin_sent_txns
> ----------------+------------------
>  pg_commit_info |                9
> (1 row)
>
> And that will be the case until the plugin decodes something (so that
> statent->plugin_has_stats gets replaced in pgstat_report_replslot()).
>
> That's because plugin_has_stats is stored in PgStat_StatReplSlotEntry
> and so it's restored from the stat file when the engine starts.
>
> Now, let's do some inserts and decode:
>
> postgres=# insert into t1 values ('a');
> INSERT 0 1
> postgres=# insert into t1 values ('a');
> INSERT 0 1
> postgres=# select * from pg_logical_slot_get_changes('logical_slot',NULL,NULL);
>     lsn     | xid |                                          data
> ------------+-----+-----------------------------------------------------------------------------------------
>  0/407121C0 | 766 | xid 766: lsn:0/40712190 inserts:1 deletes:0 updates:0 truncates:0 relations truncated:0
>  0/40712268 | 767 | xid 767: lsn:0/40712238 inserts:1 deletes:0 updates:0 truncates:0 relations truncated:0
> (2 rows)
>
> postgres=# select plugin,plugin_sent_txns from pg_stat_replication_slots ;
>      plugin     | plugin_sent_txns
> ----------------+------------------
>  pg_commit_info |
> (1 row)
>
> All good.
>
> Issue 1 is that before any decoding happens, pg_stat_replication_slots is still
> showing stale plugin statistics from a plugin that may no longer support stats.
>
> I'm not sure how we could easily fix this issue, as we don't know the plugin's
> stats capability until we actually use it.
>

I don't think this is an issue. There is no way for the core to tell
whether the plugin will provide stats or not, unless it sets that
ctx->stats which happens in the startup callback. Till then it is
rightly providing the values accumulated so far. Once the decoding
starts, we know that the plugin is not providing any stats and we
don't display anything.

> -- Issue 2:
>
> Let's shutdown, replace the plugin with a version that has stats enabled and
> restart.
>
> Same behavior as before:
>
> postgres=# select plugin,plugin_sent_txns from pg_stat_replication_slots ;
>      plugin     | plugin_sent_txns
> ----------------+------------------
>  pg_commit_info |
> (1 row)
>
> Until pgstat_report_replslot() is not called, the statent->plugin_has_stats is
> not updated. So it displays the stats as they were before the shutdown. But that's
> not an issue in this case (when switching from non stats to stats).
>
> Now, let's do some inserts and decode:
>
> postgres=# insert into t1 values ('a');
> INSERT 0 1
> postgres=# select * from pg_logical_slot_get_changes('logical_slot',NULL,NULL);
>     lsn     | xid |                                          data
> ------------+-----+-----------------------------------------------------------------------------------------
>  0/407125B0 | 768 | xid 768: lsn:0/40712580 inserts:1 deletes:0 updates:0 truncates:0 relations truncated:0
> (1 row)
>
> and check the stats:
>
> postgres=# select plugin,plugin_sent_txns from pg_stat_replication_slots ;
>      plugin     | plugin_sent_txns
> ----------------+------------------
>  pg_commit_info |               10
> (1 row)
>
> Now it reports 10, that's the 9 before we changed the plugin to not have stats
> enabled plus this new one.
>
> Issue 2: when switching from a non-stats plugin back to a stats-capable plugin, it
> shows accumulated values from before the non-stats switch.

This too seems to be a non-issue to me. The stats in the view get
reset only when a user resets them. So we shouldn't wipe out the
already accumulated values just because the plugin stopped providing
it. If the plugin keeps flip-flopping and only partial statistics
provided by the plugin will be accumulated. That's the plugin's
responsibility. Realistically a plugin will either decide to provide
statistics in some version and then continue forever OR it will decide
against it. Flip-flopping won't happen in practice.

If at all we decide to reset the stats when the plugin does not
provide them, I think a better fix is to set them to 0 in
pgstat_report_replslot() independent of previous state of has_stats.
It will be more or less same CPU instructions. like below
if (repSlotStat->plugin_has_stats)
{
REPLSLOT_ACC(plugin_sent_txns);
REPLSLOT_ACC(plugin_sent_bytes);
REPLSLOT_ACC(plugin_filtered_bytes);
}
else
{
statent->plugin_sent_txns = 0;
statent->plugin_sent_bytes = 0;
statent->plugin_filtered_bytes = 0
}

--
Best Wishes,
Ashutosh Bapat

Re: Report bytes and transactions actually sent downtream

От

Bertrand Drouvot

Дата:

26 сентября 2025 г., 19:58:09

Hi,

On Fri, Sep 26, 2025 at 06:14:28PM +0530, Ashutosh Bapat wrote:
> On Fri, Sep 26, 2025 at 4:43 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> >
> > === 2
> >
> > Issue 1 is that before any decoding happens, pg_stat_replication_slots is still
> > showing stale plugin statistics from a plugin that may no longer support stats.
> >
> > I'm not sure how we could easily fix this issue, as we don't know the plugin's
> > stats capability until we actually use it.
> >
> 
> I don't think this is an issue. There is no way for the core to tell
> whether the plugin will provide stats or not, unless it sets that
> ctx->stats which happens in the startup callback. Till then it is
> rightly providing the values accumulated so far. Once the decoding
> starts, we know that the plugin is not providing any stats and we
> don't display anything.

Yeah, I got the technical reasons, but I think there's a valid user experience
concern here: seeing statistics for a plugin that doesn't actually support
statistics is misleading.

What we need is a call to pgstat_report_replslot() to display stats that reflect
the current plugin behavior. We can't just call pgstat_report_replslot()
in say RestoreSlotFromDisk() because we really need the decoding to start.

So one idea could be to set a flag (per slot) when pgstat_report_replslot()
has been called (for good reasons) and check for this flag in
pg_stat_get_replication_slot().

If the flag is not set, then set the plugin fields to NULL.
If the flag is set, then display their values (like now).

And we should document that the plugin stats are not available (i.e are NULL)
until the decoding has valid stats to report after startup.

What do you think?

> 
> > -- Issue 2:
> >
> > Now it reports 10, that's the 9 before we changed the plugin to not have stats
> > enabled plus this new one.
> >
> > Issue 2: when switching from a non-stats plugin back to a stats-capable plugin, it
> > shows accumulated values from before the non-stats switch.
> 
> This too seems to be a non-issue to me. The stats in the view get
> reset only when a user resets them. So we shouldn't wipe out the
> already accumulated values just because the plugin stopped providing
> it. If the plugin keeps flip-flopping and only partial statistics
> provided by the plugin will be accumulated. That's the plugin's
> responsibility.

Okay but then I think that the plugin is missing some flexibility.

For example, how could the plugin set ctx->stats->sentTxns
to zero if it decides not to enable stats (while it was previously enable)?

Indeed, not enabling stats, means not doing
"ctx->stats = palloc0(sizeof(OutputPluginStats))" which means not having control
over the stats anymore.

So, with the current design, it has not other choice but having its previous
stats not reset to zero.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Report bytes and transactions actually sent downtream

От

Ashutosh Bapat

Дата:

29 сентября 2025 г., 10:24:24

On Fri, Sep 26, 2025 at 10:28 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> > >
> >
> > I don't think this is an issue. There is no way for the core to tell
> > whether the plugin will provide stats or not, unless it sets that
> > ctx->stats which happens in the startup callback. Till then it is
> > rightly providing the values accumulated so far. Once the decoding
> > starts, we know that the plugin is not providing any stats and we
> > don't display anything.
>
> Yeah, I got the technical reasons, but I think there's a valid user experience
> concern here: seeing statistics for a plugin that doesn't actually support
> statistics is misleading.
>

1. If the plugin never supported statistics, we will never report
stats. So nothing misleading there.
2. If the plugin starts supporting statistics and continues to do so,
we will report the stats since the time they are made available and
continue to do so. Nothing misleading there.
3. If the plugin starts supporting statistics and midway discontinues
its support, it already has a problem with backward compatibility.

Practically it would 1 or 2, which are working fine.

I don't think we will encounter case 3 practically. Do you have a
practical use case where a plugin would discontinue supporting stats?

Even in case 3, I think we need to consider the fact that these stats
are "cumulative". So if a plugin discontinues reporting stats, they
should go NULL only when the next accumulation action happens, not
before that.

> What we need is a call to pgstat_report_replslot() to display stats that reflect
> the current plugin behavior. We can't just call pgstat_report_replslot()
> in say RestoreSlotFromDisk() because we really need the decoding to start.
>
> So one idea could be to set a flag (per slot) when pgstat_report_replslot()
> has been called (for good reasons) and check for this flag in
> pg_stat_get_replication_slot().
>
> If the flag is not set, then set the plugin fields to NULL.
> If the flag is set, then display their values (like now).

This approach will have the same problem. Till
pgstat_report_replslot() is called, the old statistics will continue
to be shown.

>
> And we should document that the plugin stats are not available (i.e are NULL)
> until the decoding has valid stats to report after startup.

After "startup" would mislead users since then they will think that
the statistics will be NULL just before the decoding (re)starts.

The current documentation is " It is NULL when statistics is not
initialized or immediately after a reset or when not maintained by the
output plugin.". I think that covers all the cases.

--
Best Wishes,
Ashutosh Bapat

Re: Report bytes and transactions actually sent downtream

От

Bertrand Drouvot

Дата:

30 сентября 2025 г., 09:52:46

Hi,

On Mon, Sep 29, 2025 at 12:54:24PM +0530, Ashutosh Bapat wrote:
> On Fri, Sep 26, 2025 at 10:28 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > > >
> > >
> > > I don't think this is an issue. There is no way for the core to tell
> > > whether the plugin will provide stats or not, unless it sets that
> > > ctx->stats which happens in the startup callback. Till then it is
> > > rightly providing the values accumulated so far. Once the decoding
> > > starts, we know that the plugin is not providing any stats and we
> > > don't display anything.
> >
> > Yeah, I got the technical reasons, but I think there's a valid user experience
> > concern here: seeing statistics for a plugin that doesn't actually support
> > statistics is misleading.
> >
> 
> 3. If the plugin starts supporting statistics and midway discontinues
> its support, it already has a problem with backward compatibility.
> 
> Practically it would 1 or 2, which are working fine.
> 
> I don't think we will encounter case 3 practically. Do you have a
> practical use case where a plugin would discontinue supporting stats?

Not that I can think of currently. That looks unlikely but wanted to raise
the point though. Maybe others see a use case and/or have a different point
of view.

> > What we need is a call to pgstat_report_replslot() to display stats that reflect
> > the current plugin behavior. We can't just call pgstat_report_replslot()
> > in say RestoreSlotFromDisk() because we really need the decoding to start.
> >
> > So one idea could be to set a flag (per slot) when pgstat_report_replslot()
> > has been called (for good reasons) and check for this flag in
> > pg_stat_get_replication_slot().
> >
> > If the flag is not set, then set the plugin fields to NULL.
> > If the flag is set, then display their values (like now).
> 
> This approach will have the same problem. Till
> pgstat_report_replslot() is called, the old statistics will continue
> to be shown.

I don't think so because the flag would not be set.

> > And we should document that the plugin stats are not available (i.e are NULL)
> > until the decoding has valid stats to report after startup.
> 
> The current documentation is " It is NULL when statistics is not
> initialized or immediately after a reset or when not maintained by the
> output plugin.". I think that covers all the cases.

Do you think the doc covers the case we discussed above? i.e when a plugin
discontinue supporting stats, it would display stats until the decoding actually
starts.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Report bytes and transactions actually sent downtream

От

Ashutosh Bapat

Дата:

03 октября 2025 г., 09:52:05

On Tue, Sep 30, 2025 at 12:22 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On Mon, Sep 29, 2025 at 12:54:24PM +0530, Ashutosh Bapat wrote:
> > On Fri, Sep 26, 2025 at 10:28 PM Bertrand Drouvot
> > <bertranddrouvot.pg@gmail.com> wrote:
> > >
> > > > >
> > > >
> > > > I don't think this is an issue. There is no way for the core to tell
> > > > whether the plugin will provide stats or not, unless it sets that
> > > > ctx->stats which happens in the startup callback. Till then it is
> > > > rightly providing the values accumulated so far. Once the decoding
> > > > starts, we know that the plugin is not providing any stats and we
> > > > don't display anything.
> > >
> > > Yeah, I got the technical reasons, but I think there's a valid user experience
> > > concern here: seeing statistics for a plugin that doesn't actually support
> > > statistics is misleading.
> > >
> >
> > 3. If the plugin starts supporting statistics and midway discontinues
> > its support, it already has a problem with backward compatibility.
> >
> > Practically it would 1 or 2, which are working fine.
> >
> > I don't think we will encounter case 3 practically. Do you have a
> > practical use case where a plugin would discontinue supporting stats?
>
> Not that I can think of currently. That looks unlikely but wanted to raise
> the point though. Maybe others see a use case and/or have a different point
> of view.
>
> > > What we need is a call to pgstat_report_replslot() to display stats that reflect
> > > the current plugin behavior. We can't just call pgstat_report_replslot()
> > > in say RestoreSlotFromDisk() because we really need the decoding to start.
> > >
> > > So one idea could be to set a flag (per slot) when pgstat_report_replslot()
> > > has been called (for good reasons) and check for this flag in
> > > pg_stat_get_replication_slot().
> > >
> > > If the flag is not set, then set the plugin fields to NULL.
> > > If the flag is set, then display their values (like now).
> >
> > This approach will have the same problem. Till
> > pgstat_report_replslot() is called, the old statistics will continue
> > to be shown.
>
> I don't think so because the flag would not be set.
>
> > > And we should document that the plugin stats are not available (i.e are NULL)
> > > until the decoding has valid stats to report after startup.
> >
> > The current documentation is " It is NULL when statistics is not
> > initialized or immediately after a reset or when not maintained by the
> > output plugin.". I think that covers all the cases.
>
> Do you think the doc covers the case we discussed above? i.e when a plugin
> discontinue supporting stats, it would display stats until the decoding actually
> starts.



Here's patchset addressing two issues:

Issue 1: A plugin supports stats in version X. It stopped supporting
the stats in version X + 1. It again started supporting stats in
version X + 2. Plugin stats will be accumulated when it was at version
X. When X + 1 is loaded, the stats will continue to report the stats
accumulated (by version X) till the first startup_call for that
replication slot happens. If the user knows (from documentation say)
that X + 1 does not support stats, seeing statistics will mislead
them. We don't know whether there's a practical need to do so. A
plugin which flip-flops on stats is breaking backward compatibility. I
have added a note in documentation for plugin authors, warning them
that this isn't expected. I don't think it's worth adding complexity
in code to support such a case unless we see a practical need for the
same.

Issue 2: Once X + 2 is loaded, further statistics are accumulated on
the top of statistics accumulated by version X. Attached patch fixes
issue 2 by zero'ing out the stats when the plugin does not report the
statistics.

The patchset also addresses your earlier review comments.
--
Best Wishes,
Ashutosh Bapat

Вложения

Re: Report bytes and transactions actually sent downtream

От

Bertrand Drouvot

Дата:

03 октября 2025 г., 16:47:34

Hi,

On Fri, Oct 03, 2025 at 12:22:05PM +0530, Ashutosh Bapat wrote:
> Here's patchset addressing two issues:

Thanks for the patch update!

> I
> have added a note in documentation for plugin authors, warning them
> that this isn't expected.

What note are you referring to? (I'm failing to see it).

> I don't think it's worth adding complexity
> in code to support such a case unless we see a practical need for the
> same.

Sounds good.


> Issue 2: Once X + 2 is loaded, further statistics are accumulated on
> the top of statistics accumulated by version X. Attached patch fixes
> issue 2 by zero'ing out the stats when the plugin does not report the
> statistics.

+#define REPLSLOT_SET_TO_ZERO(fld) statent->fld = 0

It looks like that the associated "undef" is missing.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Report bytes and transactions actually sent downtream

От

Ashutosh Bapat

Дата:

06 октября 2025 г., 08:02:57

On Fri, Oct 3, 2025 at 7:17 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On Fri, Oct 03, 2025 at 12:22:05PM +0530, Ashutosh Bapat wrote:
> > Here's patchset addressing two issues:
>
> Thanks for the patch update!
>
> > I
> > have added a note in documentation for plugin authors, warning them
> > that this isn't expected.
>
> What note are you referring to? (I'm failing to see it).

Patch 0002, changes in logicaldecoding.sgml. I am a bit hesitant to
add more details as to what "misleading" means since mentioning so
might be seen as a documented behaviour and thus plugin authors
relying on it.

>
> > I don't think it's worth adding complexity
> > in code to support such a case unless we see a practical need for the
> > same.
>
> Sounds good.
>
>
> > Issue 2: Once X + 2 is loaded, further statistics are accumulated on
> > the top of statistics accumulated by version X. Attached patch fixes
> > issue 2 by zero'ing out the stats when the plugin does not report the
> > statistics.
>
> +#define REPLSLOT_SET_TO_ZERO(fld) statent->fld = 0
>
> It looks like that the associated "undef" is missing.

Good catch. Fixed.


--
Best Wishes,
Ashutosh Bapat

Вложения

Re: Report bytes and transactions actually sent downtream

От

Ashutosh Bapat

Дата:

24 октября 2025 г., 12:53:44

On Mon, Oct 6, 2025 at 10:32 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Fri, Oct 3, 2025 at 7:17 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > Hi,
> >
> > On Fri, Oct 03, 2025 at 12:22:05PM +0530, Ashutosh Bapat wrote:
> > > Here's patchset addressing two issues:
> >
> > Thanks for the patch update!
> >
> > > I
> > > have added a note in documentation for plugin authors, warning them
> > > that this isn't expected.
> >
> > What note are you referring to? (I'm failing to see it).
>
> Patch 0002, changes in logicaldecoding.sgml. I am a bit hesitant to
> add more details as to what "misleading" means since mentioning so
> might be seen as a documented behaviour and thus plugin authors
> relying on it.
>
> >
> > > I don't think it's worth adding complexity
> > > in code to support such a case unless we see a practical need for the
> > > same.
> >
> > Sounds good.
> >
> >
> > > Issue 2: Once X + 2 is loaded, further statistics are accumulated on
> > > the top of statistics accumulated by version X. Attached patch fixes
> > > issue 2 by zero'ing out the stats when the plugin does not report the
> > > statistics.
> >
> > +#define REPLSLOT_SET_TO_ZERO(fld) statent->fld = 0
> >
> > It looks like that the associated "undef" is missing.
>
> Good catch. Fixed.
>

Squashed patches into one and rebased.

--
Best Wishes,
Ashutosh Bapat

Вложения

0001-Report-output-plugin-statistics-in-pg_stat_-20251024.patch

Re: Report bytes and transactions actually sent downtream

От

Bertrand Drouvot

Дата:

24 октября 2025 г., 15:40:30

Hi,

On Fri, Oct 24, 2025 at 03:23:44PM +0530, Ashutosh Bapat wrote:
> On Mon, Oct 6, 2025 at 10:32 AM Ashutosh Bapat
> <ashutosh.bapat.oss@gmail.com> wrote:
> >
> > On Fri, Oct 3, 2025 at 7:17 PM Bertrand Drouvot
> > <bertranddrouvot.pg@gmail.com> wrote:
> > >
> > > > Issue 2: Once X + 2 is loaded, further statistics are accumulated on
> > > > the top of statistics accumulated by version X. Attached patch fixes
> > > > issue 2 by zero'ing out the stats when the plugin does not report the
> > > > statistics.
> > >
> > > +#define REPLSLOT_SET_TO_ZERO(fld) statent->fld = 0
> > >
> > > It looks like that the associated "undef" is missing.
> >
> > Good catch. Fixed.
> >
> 
> Squashed patches into one and rebased.

Thanks for the new version!

LGTM except the plugin flip-flop behaviour that we discussed up-thread.
That said I don't think it hurts that much and maybe that's just me and others 
don't have a concern with it (in that case that's fine by me).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Report bytes and transactions actually sent downtream

От

shveta malik

Дата:

27 октября 2025 г., 13:20:39

On Fri, Oct 3, 2025 at 12:22 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
>
> Here's patchset addressing two issues:
>
> Issue 1: A plugin supports stats in version X. It stopped supporting
> the stats in version X + 1. It again started supporting stats in
> version X + 2. Plugin stats will be accumulated when it was at version
> X. When X + 1 is loaded, the stats will continue to report the stats
> accumulated (by version X) till the first startup_call for that
> replication slot happens. If the user knows (from documentation say)
> that X + 1 does not support stats, seeing statistics will mislead
> them. We don't know whether there's a practical need to do so. A
> plugin which flip-flops on stats is breaking backward compatibility. I
> have added a note in documentation for plugin authors, warning them
> that this isn't expected. I don't think it's worth adding complexity
> in code to support such a case unless we see a practical need for the
> same.

I agree. The current Note saying 'result may be misleading' looks good to me.

thanks
Shveta

Re: Report bytes and transactions actually sent downtream

От

shveta malik

Дата:

27 октября 2025 г., 14:17:00

Few comments:

1)
pgoutput_truncate:

if (nrelids > 0)
{
OutputPluginPrepareWrite(ctx, true);
logicalrep_write_truncate(ctx->out,
  xid,
  nrelids,
  relids,
  change->data.truncate.cascade,
  change->data.truncate.restart_seqs);
OutputPluginWrite(ctx, true);
}
+ else
+ ctx->stats->filteredBytes += ReorderBufferChangeSize(change);
+

It seems that filteredBytes are only counted for TRUNCATE when nrelids
is 0. Can nrelids only be 0 or same as nrelations?

The below code makes me think that nrelids can be any number between 0
and nrelations, depending on which relations are publishable and which
supports publishing TRUNCATE. If that’s true, shouldn’t we count
filteredBytes in each such skipped case?

if (!is_publishable_relation(relation))
continue;

relentry = get_rel_sync_entry(data, relation);

if (!relentry->pubactions.pubtruncate)
continue;


2)
+ int64 filteredBytes; /* amount of data from reoder buffer that was

reoder --> reorder

3)
One small nitpick:

+ /*
+ * If output plugin has chosen to maintain its stats, update the amount of
+ * data sent downstream.
+ */
+ if (ctx->stats)
+ ctx->stats->sentBytes += ctx->out->len + sizeof(XLogRecPtr) +
sizeof(TransactionId);

The way sentBytes is updated here feels a bit unnatural; we’re adding
the lengths for values[2], then [0], and then [1]. Would it be cleaner
to introduce a len[3] array similar to the existing values[3] and
nulls[3] arrays? We could initialize len[i] alongside values[i], and
later just sum up all three elements when updating
ctx->stats->sentBytes. It would be easier to understand as well.

thanks
Shveta

Re: Report bytes and transactions actually sent downtream

От

Ashutosh Bapat

Дата:

28 октября 2025 г., 10:15:59

On Mon, Oct 27, 2025 at 4:47 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> Few comments:
>
> 1)
> pgoutput_truncate:
>
> if (nrelids > 0)
> {
> OutputPluginPrepareWrite(ctx, true);
> logicalrep_write_truncate(ctx->out,
>   xid,
>   nrelids,
>   relids,
>   change->data.truncate.cascade,
>   change->data.truncate.restart_seqs);
> OutputPluginWrite(ctx, true);
> }
> + else
> + ctx->stats->filteredBytes += ReorderBufferChangeSize(change);
> +
>
> It seems that filteredBytes are only counted for TRUNCATE when nrelids
> is 0. Can nrelids only be 0 or same as nrelations?
>
> The below code makes me think that nrelids can be any number between 0
> and nrelations, depending on which relations are publishable and which
> supports publishing TRUNCATE. If that’s true, shouldn’t we count
> filteredBytes in each such skipped case?

IIIUC, you are suggesting that we should add
ReorderBufferChangeSize(change) for every relation which is not part
of the publication or whose truncate is not published. I think that
won't be correct since it can lead to a situation where filtered bytes
> total bytes which should never happen. Even if there is a single
publishable relation whose truncate is published, the change should
not be considered as filtered since something would be output
downstream. Otherwise filtered bytes as well as sent bytes both will
be incremented causing an inconsistency (which would be hard to notice
since total bytes - filtered bytes has something to do with the sent
bytes but the exact correlation is hard to grasp in a formula).

We may increment filteredBytes by sizeof(OID) for every relation we
skip here OR by ReoderBufferChangeSize(change) if all the relations
are filtered, but that's too much dependent on how the WAL record is
encoded; and adding that dependency in an output plugin code seems
hard to manage.

If you are suggesting something else, maybe sharing actual code
changes would help.

>
>
> 2)
> + int64 filteredBytes; /* amount of data from reoder buffer that was
>
> reoder --> reorder

Done.

>
> 3)
> One small nitpick:
>
> + /*
> + * If output plugin has chosen to maintain its stats, update the amount of
> + * data sent downstream.
> + */
> + if (ctx->stats)
> + ctx->stats->sentBytes += ctx->out->len + sizeof(XLogRecPtr) +
> sizeof(TransactionId);
>
> The way sentBytes is updated here feels a bit unnatural; we’re adding
> the lengths for values[2], then [0], and then [1]. Would it be cleaner
> to introduce a len[3] array similar to the existing values[3] and
> nulls[3] arrays? We could initialize len[i] alongside values[i], and
> later just sum up all three elements when updating
> ctx->stats->sentBytes. It would be easier to understand as well.

Instead of an array of length 3, we could keep a counter sentBytes to
accumulate all lengths. It will be assigned to ctx->stats->sentBytes
at the end if ctx->stats != NULL. But that might appear as if we are
performing additions even if it won't be used ultimately. That's not
true, since this plugin will always maintain stats. Changed that way.

--
Best Wishes,
Ashutosh Bapat

Вложения

Re: Report bytes and transactions actually sent downtream

От

shveta malik

Дата:

29 октября 2025 г., 06:43:47

On Tue, Oct 28, 2025 at 12:46 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Mon, Oct 27, 2025 at 4:47 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > Few comments:
> >
> > 1)
> > pgoutput_truncate:
> >
> > if (nrelids > 0)
> > {
> > OutputPluginPrepareWrite(ctx, true);
> > logicalrep_write_truncate(ctx->out,
> >   xid,
> >   nrelids,
> >   relids,
> >   change->data.truncate.cascade,
> >   change->data.truncate.restart_seqs);
> > OutputPluginWrite(ctx, true);
> > }
> > + else
> > + ctx->stats->filteredBytes += ReorderBufferChangeSize(change);
> > +
> >
> > It seems that filteredBytes are only counted for TRUNCATE when nrelids
> > is 0. Can nrelids only be 0 or same as nrelations?
> >
> > The below code makes me think that nrelids can be any number between 0
> > and nrelations, depending on which relations are publishable and which
> > supports publishing TRUNCATE. If that’s true, shouldn’t we count
> > filteredBytes in each such skipped case?
>
> IIIUC, you are suggesting that we should add
> ReorderBufferChangeSize(change) for every relation which is not part
> of the publication or whose truncate is not published.

No, that will be wrong.

> I think that
> won't be correct since it can lead to a situation where filtered bytes
> > total bytes which should never happen. Even if there is a single
> publishable relation whose truncate is published, the change should
> not be considered as filtered since something would be output
> downstream.

Yes, the entire change should not be treated as filtered. The idea is
that, for example, if there are 20 relations belonging to different
publications and only one of them supports publishing TRUNCATE, then
when a TRUNCATE is triggered on all, the data for that one relation
should be counted as sent (which is currently happening based on
nrelids), while the data for the remaining 19 should be considered
filtered — which is not happening right now.

> Otherwise filtered bytes as well as sent bytes both will
> be incremented causing an inconsistency (which would be hard to notice
> since total bytes - filtered bytes has something to do with the sent
> bytes but the exact correlation is hard to grasp in a formula).
>
> We may increment filteredBytes by sizeof(OID) for every relation we
> skip here OR by ReoderBufferChangeSize(change) if all the relations
> are filtered, but that's too much dependent on how the WAL record is
> encoded; and adding that dependency in an output plugin code seems
> hard to manage.
>

Yes, that was the idea, to increment filteredBytes in this way. But I
see your point. I can’t think of a better solution at the moment. If
you also don’t have any better ideas, then at least adding a comment
in this function would be helpful. Right now, it looks like we
overlooked the fact that some relationships should contribute to
filteredBytes while others should go to sentBytes.

> If you are suggesting something else, maybe sharing actual code
> changes would help.
>
> >
> >
> > 2)
> > + int64 filteredBytes; /* amount of data from reoder buffer that was
> >
> > reoder --> reorder
>
> Done.
>
> >
> > 3)
> > One small nitpick:
> >
> > + /*
> > + * If output plugin has chosen to maintain its stats, update the amount of
> > + * data sent downstream.
> > + */
> > + if (ctx->stats)
> > + ctx->stats->sentBytes += ctx->out->len + sizeof(XLogRecPtr) +
> > sizeof(TransactionId);
> >
> > The way sentBytes is updated here feels a bit unnatural; we’re adding
> > the lengths for values[2], then [0], and then [1]. Would it be cleaner
> > to introduce a len[3] array similar to the existing values[3] and
> > nulls[3] arrays? We could initialize len[i] alongside values[i], and
> > later just sum up all three elements when updating
> > ctx->stats->sentBytes. It would be easier to understand as well.
>
> Instead of an array of length 3, we could keep a counter sentBytes to
> accumulate all lengths. It will be assigned to ctx->stats->sentBytes
> at the end if ctx->stats != NULL. But that might appear as if we are
> performing additions even if it won't be used ultimately. That's not
> true, since this plugin will always maintain stats. Changed that way.
>

Looks good.

Apart from the above discussion, I have no more comments on this patch.

thanks
Shveta

Re: Report bytes and transactions actually sent downtream

От

Ashutosh Bapat

Дата:

29 октября 2025 г., 17:55:16

On Wed, Oct 29, 2025 at 9:14 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Tue, Oct 28, 2025 at 12:46 PM Ashutosh Bapat
> <ashutosh.bapat.oss@gmail.com> wrote:
> >
> > On Mon, Oct 27, 2025 at 4:47 PM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > > Few comments:
> > >
> > > 1)
> > > pgoutput_truncate:
> > >
> > > if (nrelids > 0)
> > > {
> > > OutputPluginPrepareWrite(ctx, true);
> > > logicalrep_write_truncate(ctx->out,
> > >   xid,
> > >   nrelids,
> > >   relids,
> > >   change->data.truncate.cascade,
> > >   change->data.truncate.restart_seqs);
> > > OutputPluginWrite(ctx, true);
> > > }
> > > + else
> > > + ctx->stats->filteredBytes += ReorderBufferChangeSize(change);
> > > +
> > >
> > > It seems that filteredBytes are only counted for TRUNCATE when nrelids
> > > is 0. Can nrelids only be 0 or same as nrelations?
> > >
> > > The below code makes me think that nrelids can be any number between 0
> > > and nrelations, depending on which relations are publishable and which
> > > supports publishing TRUNCATE. If that’s true, shouldn’t we count
> > > filteredBytes in each such skipped case?
> >
> > IIIUC, you are suggesting that we should add
> > ReorderBufferChangeSize(change) for every relation which is not part
> > of the publication or whose truncate is not published.
>
> No, that will be wrong.
>
> > I think that
> > won't be correct since it can lead to a situation where filtered bytes
> > > total bytes which should never happen. Even if there is a single
> > publishable relation whose truncate is published, the change should
> > not be considered as filtered since something would be output
> > downstream.
>
> Yes, the entire change should not be treated as filtered. The idea is
> that, for example, if there are 20 relations belonging to different
> publications and only one of them supports publishing TRUNCATE, then
> when a TRUNCATE is triggered on all, the data for that one relation
> should be counted as sent (which is currently happening based on
> nrelids), while the data for the remaining 19 should be considered
> filtered — which is not happening right now.
>
> > Otherwise filtered bytes as well as sent bytes both will
> > be incremented causing an inconsistency (which would be hard to notice
> > since total bytes - filtered bytes has something to do with the sent
> > bytes but the exact correlation is hard to grasp in a formula).
> >
> > We may increment filteredBytes by sizeof(OID) for every relation we
> > skip here OR by ReoderBufferChangeSize(change) if all the relations
> > are filtered, but that's too much dependent on how the WAL record is
> > encoded; and adding that dependency in an output plugin code seems
> > hard to manage.
> >
>
> Yes, that was the idea, to increment filteredBytes in this way. But I
> see your point. I can’t think of a better solution at the moment. If
> you also don’t have any better ideas, then at least adding a comment
> in this function would be helpful. Right now, it looks like we
> overlooked the fact that some relationships should contribute to
> filteredBytes while others should go to sentBytes.

I noticed that we do something similar while filtering columns. I
think we need to add a comment in that code as well. How about
something like below?

diff --git a/src/backend/replication/pgoutput/pgoutput.c
b/src/backend/replication/pgoutput/pgoutput.c
index 4b35f2de6aa..f2d6e20a702 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -1621,7 +1621,12 @@ pgoutput_change(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,

        OutputPluginPrepareWrite(ctx, true);

-       /* Send the data */
+       /*
+        * Send the data. Even if we end up filtering some columns
while sending the
+        * message, we won't consider the change, as a whole, to be
filtered out.
+        * Instead the filtered columns will be reflected as a smaller sentBytes
+        * count.
+        */
        switch (action)
        {
                case REORDER_BUFFER_CHANGE_INSERT:
@@ -1728,6 +1733,13 @@ pgoutput_truncate(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,

change->data.truncate.cascade,

change->data.truncate.restart_seqs);
                OutputPluginWrite(ctx, true);
+
+               /*
+                * Even if we filtered out some relations, we still
send a TRUNCATE
+                * message for the remaining relations. Since the
change, as a whole, is
+                * not filtered out, we don't count modify
filteredBytes. The filtered
+                * out relations will be reflected as a smaller sentBytes count.
+                */
        }
        else
                ctx->stats->filteredBytes += ReorderBufferChangeSize(change);

--
Best Wishes,
Ashutosh Bapat

Re: Report bytes and transactions actually sent downtream

От

shveta malik

Дата:

30 октября 2025 г., 06:38:27

On Wed, Oct 29, 2025 at 8:25 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Wed, Oct 29, 2025 at 9:14 AM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Tue, Oct 28, 2025 at 12:46 PM Ashutosh Bapat
> > <ashutosh.bapat.oss@gmail.com> wrote:
> > >
> > > On Mon, Oct 27, 2025 at 4:47 PM shveta malik <shveta.malik@gmail.com> wrote:
> > > >
> > > > Few comments:
> > > >
> > > > 1)
> > > > pgoutput_truncate:
> > > >
> > > > if (nrelids > 0)
> > > > {
> > > > OutputPluginPrepareWrite(ctx, true);
> > > > logicalrep_write_truncate(ctx->out,
> > > >   xid,
> > > >   nrelids,
> > > >   relids,
> > > >   change->data.truncate.cascade,
> > > >   change->data.truncate.restart_seqs);
> > > > OutputPluginWrite(ctx, true);
> > > > }
> > > > + else
> > > > + ctx->stats->filteredBytes += ReorderBufferChangeSize(change);
> > > > +
> > > >
> > > > It seems that filteredBytes are only counted for TRUNCATE when nrelids
> > > > is 0. Can nrelids only be 0 or same as nrelations?
> > > >
> > > > The below code makes me think that nrelids can be any number between 0
> > > > and nrelations, depending on which relations are publishable and which
> > > > supports publishing TRUNCATE. If that’s true, shouldn’t we count
> > > > filteredBytes in each such skipped case?
> > >
> > > IIIUC, you are suggesting that we should add
> > > ReorderBufferChangeSize(change) for every relation which is not part
> > > of the publication or whose truncate is not published.
> >
> > No, that will be wrong.
> >
> > > I think that
> > > won't be correct since it can lead to a situation where filtered bytes
> > > > total bytes which should never happen. Even if there is a single
> > > publishable relation whose truncate is published, the change should
> > > not be considered as filtered since something would be output
> > > downstream.
> >
> > Yes, the entire change should not be treated as filtered. The idea is
> > that, for example, if there are 20 relations belonging to different
> > publications and only one of them supports publishing TRUNCATE, then
> > when a TRUNCATE is triggered on all, the data for that one relation
> > should be counted as sent (which is currently happening based on
> > nrelids), while the data for the remaining 19 should be considered
> > filtered — which is not happening right now.
> >
> > > Otherwise filtered bytes as well as sent bytes both will
> > > be incremented causing an inconsistency (which would be hard to notice
> > > since total bytes - filtered bytes has something to do with the sent
> > > bytes but the exact correlation is hard to grasp in a formula).
> > >
> > > We may increment filteredBytes by sizeof(OID) for every relation we
> > > skip here OR by ReoderBufferChangeSize(change) if all the relations
> > > are filtered, but that's too much dependent on how the WAL record is
> > > encoded; and adding that dependency in an output plugin code seems
> > > hard to manage.
> > >
> >
> > Yes, that was the idea, to increment filteredBytes in this way. But I
> > see your point. I can’t think of a better solution at the moment. If
> > you also don’t have any better ideas, then at least adding a comment
> > in this function would be helpful. Right now, it looks like we
> > overlooked the fact that some relationships should contribute to
> > filteredBytes while others should go to sentBytes.
>
> I noticed that we do something similar while filtering columns. I
> think we need to add a comment in that code as well. How about
> something like below?
>
> diff --git a/src/backend/replication/pgoutput/pgoutput.c
> b/src/backend/replication/pgoutput/pgoutput.c
> index 4b35f2de6aa..f2d6e20a702 100644
> --- a/src/backend/replication/pgoutput/pgoutput.c
> +++ b/src/backend/replication/pgoutput/pgoutput.c
> @@ -1621,7 +1621,12 @@ pgoutput_change(LogicalDecodingContext *ctx,
> ReorderBufferTXN *txn,
>
>         OutputPluginPrepareWrite(ctx, true);
>
> -       /* Send the data */
> +       /*
> +        * Send the data. Even if we end up filtering some columns
> while sending the
> +        * message, we won't consider the change, as a whole, to be
> filtered out.
> +        * Instead the filtered columns will be reflected as a smaller sentBytes
> +        * count.
> +        */
>         switch (action)
>         {
>                 case REORDER_BUFFER_CHANGE_INSERT:
> @@ -1728,6 +1733,13 @@ pgoutput_truncate(LogicalDecodingContext *ctx,
> ReorderBufferTXN *txn,
>
> change->data.truncate.cascade,
>
> change->data.truncate.restart_seqs);
>                 OutputPluginWrite(ctx, true);
> +
> +               /*
> +                * Even if we filtered out some relations, we still
> send a TRUNCATE
> +                * message for the remaining relations. Since the
> change, as a whole, is
> +                * not filtered out, we don't count modify
> filteredBytes. The filtered
> +                * out relations will be reflected as a smaller sentBytes count.
> +                */
>         }
>         else
>                 ctx->stats->filteredBytes += ReorderBufferChangeSize(change);
>

> +                * not filtered out, we don't count modify filteredBytes. The filtered

Something is wrong in this sentence.

Also, regarding "The filtered out relations will be reflected as a
smaller sentBytes count."
Can you please point me to the code where it happens? From what I have
understood, pgoutput_truncate() completely skips the relations which
do not support publishing truncate. Then it sends 'BEGIN',  then
schema info of non-filtered relations and then TRUNCATE for
non-filtered relations (based on nrelids).

thanks
Shveta

Re: Report bytes and transactions actually sent downtream

От

Ashutosh Bapat

Дата:

03 ноября 2025 г., 09:53:14

On Thu, Oct 30, 2025 at 9:08 AM shveta malik <shveta.malik@gmail.com> wrote:

> >
>
> > +                * not filtered out, we don't count modify filteredBytes. The filtered
>
> Something is wrong in this sentence.

:), here's better one

/*
* Even if we filtered out some relations, we still send a TRUNCATE
* message for the remaining relations. Since the change, as a whole, is
* not filtered out we don't increment filteredBytes. The filtered
* out relations will be reflected as a smaller sentBytes count.
*/

>
> Also, regarding "The filtered out relations will be reflected as a
> smaller sentBytes count."
> Can you please point me to the code where it happens? From what I have
> understood, pgoutput_truncate() completely skips the relations which
> do not support publishing truncate. Then it sends 'BEGIN',  then
> schema info of non-filtered relations and then TRUNCATE for
> non-filtered relations (based on nrelids).

Let's take an example. Assume the TRUNCATE WAL record had relids X, Y,
Z and W. Out of those X and Y were filtered out. Then the message sent
to the downstream will have only Z, W, let's say "TRUNCATE Z W" - 12
bytes (hypothetically). So sentBytes will be incremented by 12.
However, if no relation was filtered, the message would be "TRUNCATE X
Y Z W" ~ 16 bytes and thus sentBytes will be incremented by 16 bytes.
Thus when the relations are filtered from the truncate message,
sentBytes is incremented by a smaller value than those when no
relations are filtered. So, even if filteredBytes is same in both
cases (filtered some relations vs no relation was filtered), sentBytes
indicates the difference. Similarly for column level filtering.
However, reading this again, it seems adding more confusion than
reducing it. So I propose to just add comment

in pgoutput_truncate()
/*
* Even if we filtered out some relations, we still send a TRUNCATE
* message for the remaining relations. Since the change, as a whole, is
* not filtered out we don't increment filteredBytes.
*/

and in pgoutput_change
/*
* Send the data. Even if we end up filtering some columns while sending the
* message, we won't consider the change, as a whole, to be filtered out. Hence
* won't increment the filteredBytes.
*/

Does that look good?

--
Best Wishes,
Ashutosh Bapat

Re: Report bytes and transactions actually sent downtream

От

shveta malik

Дата:

03 ноября 2025 г., 12:54:54

On Mon, Nov 3, 2025 at 12:23 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Thu, Oct 30, 2025 at 9:08 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> > >
> >
> > > +                * not filtered out, we don't count modify filteredBytes. The filtered
> >
> > Something is wrong in this sentence.
>
> :), here's better one
>
> /*
> * Even if we filtered out some relations, we still send a TRUNCATE
> * message for the remaining relations. Since the change, as a whole, is
> * not filtered out we don't increment filteredBytes. The filtered
> * out relations will be reflected as a smaller sentBytes count.
> */
>
> >
> > Also, regarding "The filtered out relations will be reflected as a
> > smaller sentBytes count."
> > Can you please point me to the code where it happens? From what I have
> > understood, pgoutput_truncate() completely skips the relations which
> > do not support publishing truncate. Then it sends 'BEGIN',  then
> > schema info of non-filtered relations and then TRUNCATE for
> > non-filtered relations (based on nrelids).
>
> Let's take an example. Assume the TRUNCATE WAL record had relids X, Y,
> Z and W. Out of those X and Y were filtered out. Then the message sent
> to the downstream will have only Z, W, let's say "TRUNCATE Z W" - 12
> bytes (hypothetically). So sentBytes will be incremented by 12.
> However, if no relation was filtered, the message would be "TRUNCATE X
> Y Z W" ~ 16 bytes and thus sentBytes will be incremented by 16 bytes.
> Thus when the relations are filtered from the truncate message,
> sentBytes is incremented by a smaller value than those when no
> relations are filtered. So, even if filteredBytes is same in both
> cases (filtered some relations vs no relation was filtered), sentBytes
> indicates the difference.

I understand the point, but I didn’t find the message clearly reflecting it.

> Similarly for column level filtering.
> However, reading this again, it seems adding more confusion than
> reducing it.

Right.

> So I propose to just add comment
>
> in pgoutput_truncate()
> /*
> * Even if we filtered out some relations, we still send a TRUNCATE
> * message for the remaining relations. Since the change, as a whole, is
> * not filtered out we don't increment filteredBytes.
> */
>
> and in pgoutput_change
> /*
> * Send the data. Even if we end up filtering some columns while sending the
> * message, we won't consider the change, as a whole, to be filtered out. Hence
> * won't increment the filteredBytes.
> */
>
> Does that look good?

Yes. Works for me.

thanks
Shveta

Re: Report bytes and transactions actually sent downtream

От

Ashutosh Bapat

Дата:

03 ноября 2025 г., 17:23:30

On Mon, Nov 3, 2025 at 3:25 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> > So I propose to just add comment
> >
> > in pgoutput_truncate()
> > /*
> > * Even if we filtered out some relations, we still send a TRUNCATE
> > * message for the remaining relations. Since the change, as a whole, is
> > * not filtered out we don't increment filteredBytes.
> > */
> >
> > and in pgoutput_change
> > /*
> > * Send the data. Even if we end up filtering some columns while sending the
> > * message, we won't consider the change, as a whole, to be filtered out. Hence
> > * won't increment the filteredBytes.
> > */
> >
> > Does that look good?
>
> Yes. Works for me.

Here's a patch with all comments addressed.

--
Best Wishes,
Ashutosh Bapat

Вложения

0001-Report-output-plugin-statistics-in-pg_stat_-20251103.patch

Re: Report bytes and transactions actually sent downtream

От

Andres Freund

Дата:

03 ноября 2025 г., 18:20:57

Hi,

On 2025-11-03 19:53:30 +0530, Ashutosh Bapat wrote:
> This commit adds following fields to pg_stat_replication_slots
> - plugin_filtered_bytes is the amount of changes filtered out by the
>   output plugin
> - plugin_sent_txns is the amount of transactions sent downstream by the
>   output plugin
> - plugin_sent_bytes is the amount of data sent downstream by the output
>   plugin.
> 
> The prefix "plugin_" indicates that these counters are related to and
> maintained by the output plugin. An output plugin may choose not to
> initialize LogicalDecodingContext::stats, which holds these counters, in
> which case the above columns will be reported as NULL.

I continue to be uncomfortable with doing all this tracking explicitly in
output plugins. This still seems like something core infrastructure should
take care of, instead of re-implementing it in different output plugins, with
the inevitable behaviour differences that will entail.

Greetings,

Andres Freund

Re: Report bytes and transactions actually sent downtream

От

Ashutosh Bapat

Дата:

04 ноября 2025 г., 13:58:55

On Mon, Nov 3, 2025 at 8:50 PM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2025-11-03 19:53:30 +0530, Ashutosh Bapat wrote:
> > This commit adds following fields to pg_stat_replication_slots
> > - plugin_filtered_bytes is the amount of changes filtered out by the
> >   output plugin
> > - plugin_sent_txns is the amount of transactions sent downstream by the
> >   output plugin
> > - plugin_sent_bytes is the amount of data sent downstream by the output
> >   plugin.
> >
> > The prefix "plugin_" indicates that these counters are related to and
> > maintained by the output plugin. An output plugin may choose not to
> > initialize LogicalDecodingContext::stats, which holds these counters, in
> > which case the above columns will be reported as NULL.
>
> I continue to be uncomfortable with doing all this tracking explicitly in
> output plugins. This still seems like something core infrastructure should
> take care of, instead of re-implementing it in different output plugins, with
> the inevitable behaviour differences that will entail.

I understand your concern, and while I agree that it's ideal to keep
as much of the stats bookkeeping in core there are some nuances here
which makes it hard as explained below.

My first patch [1] had the stats placed in ReorderBuffer directly. It
was evident from the patch that the sentTxns needs to be set somewhere
in the output plugin code since the output plugin may decide to filter
out or send transaction when processing a change in that transaction
(not necessarily when in begin_cb). Filtered bytes is also something
that is in plugin's control and needs to be updated in the output
plugin code. Few emails, starting from [2], discussed possible
approaches to maintain those in the core vs maintain those in the
output plugin. We decided to let output plugin maintain it for
following reasons

a. sentTxns and filteredBytes need to be modified in the output plugin
code. The behaviour there is inherently output plugin specific, and
requires output plugin specific implementation.
b. an output plugin may or may not want to update their code to track
the statistics for various logistic and technical reasons. We need to
be flexible about that if possible.

The current approach requires only the output plugin specific changes
to be made to the output plugin code and also makes it optional for
them to do those changes. The only changes in output plugin code are
for a. indicating whether it updates the stats and b. updating
filteredBytes and sentBytes at appropriate places. I don't see a way
to avoid that. Rest of the logic is actually in the core. Unless
there's anything we've overlooked in the thread the current approach
seems to balance the constraints quite well. Do you have an
alternative design in mind?

This has been a long thread with many patch versions, and the commit
message might need some rewording to describe the proposed
functionality better.  I hope the above explanation is clearer, and if
so I can reword the commit message to include more of it.

sentBytes is a slightly different story. The core code updates it. But
it's a stat about output plugin's behaviour. Hence it's still exposed
as a plugin stats and maintained in LogicalDecodingContext::stats. It
can be maintained in ReorderBuffer directly and can be projected as a
core stats, renaming it as just "sent_bytes". Please let me know if
you would like it that way.

[1] https://www.postgresql.org/message-id/CAExHW5s6KntzUyUoMbKR5dgwRmdV2Ay_2%2BAnTgYGAzo%3DQv61wA%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CAA4eK1KzYaq9dcaa20Pv44ewomUPj_PbbeLfEnvzuXYMZtNw0A%40mail.gmail.com

--
Best Wishes,
Ashutosh Bapat

Re: Report bytes and transactions actually sent downtream

От

Amit Kapila

Дата:

18 ноября 2025 г., 12:54:20

On Tue, Nov 4, 2025 at 4:29 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Mon, Nov 3, 2025 at 8:50 PM Andres Freund <andres@anarazel.de> wrote:
> >
> > Hi,
> >
> > On 2025-11-03 19:53:30 +0530, Ashutosh Bapat wrote:
> > > This commit adds following fields to pg_stat_replication_slots
> > > - plugin_filtered_bytes is the amount of changes filtered out by the
> > >   output plugin
> > > - plugin_sent_txns is the amount of transactions sent downstream by the
> > >   output plugin
> > > - plugin_sent_bytes is the amount of data sent downstream by the output
> > >   plugin.
> > >
> > > The prefix "plugin_" indicates that these counters are related to and
> > > maintained by the output plugin. An output plugin may choose not to
> > > initialize LogicalDecodingContext::stats, which holds these counters, in
> > > which case the above columns will be reported as NULL.
> >
> > I continue to be uncomfortable with doing all this tracking explicitly in
> > output plugins. This still seems like something core infrastructure should
> > take care of, instead of re-implementing it in different output plugins, with
> > the inevitable behaviour differences that will entail.
>
> I understand your concern, and while I agree that it's ideal to keep
> as much of the stats bookkeeping in core there are some nuances here
> which makes it hard as explained below.
>
> My first patch [1] had the stats placed in ReorderBuffer directly. It
> was evident from the patch that the sentTxns needs to be set somewhere
> in the output plugin code since the output plugin may decide to filter
> out or send transaction when processing a change in that transaction
> (not necessarily when in begin_cb). Filtered bytes is also something
> that is in plugin's control and needs to be updated in the output
> plugin code. Few emails, starting from [2], discussed possible
> approaches to maintain those in the core vs maintain those in the
> output plugin. We decided to let output plugin maintain it for
> following reasons
>
> a. sentTxns and filteredBytes need to be modified in the output plugin
> code. The behaviour there is inherently output plugin specific, and
> requires output plugin specific implementation.
>

Is it possible that we allow change callback (LogicalDecodeChangeCB)
to return a boolean such that if the change is decoded and sent, it
returns true, otherwise, false? If so, the caller could deduce from it
the filtered bytes, and if none of the change calls returns true, this
means the entire transaction is not sent.

I think this should address Andres's concern of explicitly tracking
these stats in plugins, what do you think?

--
With Regards,
Amit Kapila.

Re: Report bytes and transactions actually sent downtream

От

Ashutosh Bapat

Дата:

18 ноября 2025 г., 13:35:42

On Tue, Nov 18, 2025 at 3:24 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Nov 4, 2025 at 4:29 PM Ashutosh Bapat
> <ashutosh.bapat.oss@gmail.com> wrote:
> >
> > On Mon, Nov 3, 2025 at 8:50 PM Andres Freund <andres@anarazel.de> wrote:
> > >
> > > Hi,
> > >
> > > On 2025-11-03 19:53:30 +0530, Ashutosh Bapat wrote:
> > > > This commit adds following fields to pg_stat_replication_slots
> > > > - plugin_filtered_bytes is the amount of changes filtered out by the
> > > >   output plugin
> > > > - plugin_sent_txns is the amount of transactions sent downstream by the
> > > >   output plugin
> > > > - plugin_sent_bytes is the amount of data sent downstream by the output
> > > >   plugin.
> > > >
> > > > The prefix "plugin_" indicates that these counters are related to and
> > > > maintained by the output plugin. An output plugin may choose not to
> > > > initialize LogicalDecodingContext::stats, which holds these counters, in
> > > > which case the above columns will be reported as NULL.
> > >
> > > I continue to be uncomfortable with doing all this tracking explicitly in
> > > output plugins. This still seems like something core infrastructure should
> > > take care of, instead of re-implementing it in different output plugins, with
> > > the inevitable behaviour differences that will entail.
> >
> > I understand your concern, and while I agree that it's ideal to keep
> > as much of the stats bookkeeping in core there are some nuances here
> > which makes it hard as explained below.
> >
> > My first patch [1] had the stats placed in ReorderBuffer directly. It
> > was evident from the patch that the sentTxns needs to be set somewhere
> > in the output plugin code since the output plugin may decide to filter
> > out or send transaction when processing a change in that transaction
> > (not necessarily when in begin_cb). Filtered bytes is also something
> > that is in plugin's control and needs to be updated in the output
> > plugin code. Few emails, starting from [2], discussed possible
> > approaches to maintain those in the core vs maintain those in the
> > output plugin. We decided to let output plugin maintain it for
> > following reasons
> >
> > a. sentTxns and filteredBytes need to be modified in the output plugin
> > code. The behaviour there is inherently output plugin specific, and
> > requires output plugin specific implementation.
> >
>
> Is it possible that we allow change callback (LogicalDecodeChangeCB)
> to return a boolean such that if the change is decoded and sent, it
> returns true, otherwise, false? If so, the caller could deduce from it
> the filtered bytes, and if none of the change calls returns true, this
> means the entire transaction is not sent.
>
> I think this should address Andres's concern of explicitly tracking
> these stats in plugins, what do you think?
>

I was thinking about a similar thing. But I am skeptical since the
calling logic is not straight forward - there's an indirection in
between. Second, it means that all the plugins have to adapt to the
new callback definition. It is optional in my current approach. Since
both of us have thought of this approach, I think it's worth a try.

"if none of the change calls returns true, this means the entire
transaction is not sent" isn't true. A plugin may still send an empty
transaction. I was thinking of making commit/abort/prepare callbacks
to return true/false to indicate whether a transaction was sent or not
and increment the counter accordingly. The plugin has to take care of
not returning true for both prepare and commit or prepare and abort.
So may be just commit and abort should be made to return true or
false. What do you think?

--
Best Wishes,
Ashutosh Bapat

Re: Report bytes and transactions actually sent downtream

От

Amit Kapila

Дата:

18 ноября 2025 г., 13:44:12

On Tue, Nov 18, 2025 at 4:05 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Tue, Nov 18, 2025 at 3:24 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Nov 4, 2025 at 4:29 PM Ashutosh Bapat
> > <ashutosh.bapat.oss@gmail.com> wrote:
> > >
> > > a. sentTxns and filteredBytes need to be modified in the output plugin
> > > code. The behaviour there is inherently output plugin specific, and
> > > requires output plugin specific implementation.
> > >
> >
> > Is it possible that we allow change callback (LogicalDecodeChangeCB)
> > to return a boolean such that if the change is decoded and sent, it
> > returns true, otherwise, false? If so, the caller could deduce from it
> > the filtered bytes, and if none of the change calls returns true, this
> > means the entire transaction is not sent.
> >
> > I think this should address Andres's concern of explicitly tracking
> > these stats in plugins, what do you think?
> >
>
> I was thinking about a similar thing. But I am skeptical since the
> calling logic is not straight forward - there's an indirection in
> between. Second, it means that all the plugins have to adapt to the
> new callback definition. It is optional in my current approach. Since
> both of us have thought of this approach, I think it's worth a try.
>
> "if none of the change calls returns true, this means the entire
> transaction is not sent" isn't true. A plugin may still send an empty
> transaction. I was thinking of making commit/abort/prepare callbacks
> to return true/false to indicate whether a transaction was sent or not
> and increment the counter accordingly. The plugin has to take care of
> not returning true for both prepare and commit or prepare and abort.
> So may be just commit and abort should be made to return true or
> false. What do you think?
>

Sounds reasonable to me.

--
With Regards,
Amit Kapila.

Re: Report bytes and transactions actually sent downtream

От

Ashutosh Bapat

Дата:

11 декабря 2025 г., 07:59:42

Hi All,


On Tue, Nov 18, 2025 at 4:14 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Nov 18, 2025 at 4:05 PM Ashutosh Bapat
> <ashutosh.bapat.oss@gmail.com> wrote:
> >
> > On Tue, Nov 18, 2025 at 3:24 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Tue, Nov 4, 2025 at 4:29 PM Ashutosh Bapat
> > > <ashutosh.bapat.oss@gmail.com> wrote:
> > > >
> > > > a. sentTxns and filteredBytes need to be modified in the output plugin
> > > > code. The behaviour there is inherently output plugin specific, and
> > > > requires output plugin specific implementation.
> > > >
> > >
> > > Is it possible that we allow change callback (LogicalDecodeChangeCB)
> > > to return a boolean such that if the change is decoded and sent, it
> > > returns true, otherwise, false? If so, the caller could deduce from it
> > > the filtered bytes, and if none of the change calls returns true, this
> > > means the entire transaction is not sent.
> > >
> > > I think this should address Andres's concern of explicitly tracking
> > > these stats in plugins, what do you think?
> > >
> >
> > I was thinking about a similar thing. But I am skeptical since the
> > calling logic is not straight forward - there's an indirection in
> > between. Second, it means that all the plugins have to adapt to the
> > new callback definition. It is optional in my current approach. Since
> > both of us have thought of this approach, I think it's worth a try.
> >
> > "if none of the change calls returns true, this means the entire
> > transaction is not sent" isn't true. A plugin may still send an empty
> > transaction. I was thinking of making commit/abort/prepare callbacks
> > to return true/false to indicate whether a transaction was sent or not
> > and increment the counter accordingly. The plugin has to take care of
> > not returning true for both prepare and commit or prepare and abort.
> > So may be just commit and abort should be made to return true or
> > false. What do you think?
> >
>
> Sounds reasonable to me.

Sorry for the delayed response. PFA the patch implementing the idea
discussed above. It relies on the output plugin callback to return
correct boolean but maintains the statistics in the core itself.

I have reviewed all the previous comments and applied the ones which
are relevant to the new approach again. Following two are worth noting
here.

In order to address Amit's concern [1] that an inaccuracy in these
counts because of a bug in output plugin code may be blamed on the
core, I have added a note in the documentation of view
pg_stat_replication_slot in order to avoid such a blame and also
directing users to plugin they should investigate.

With the statistics being maintained by the core, Bertrand's concern
about stale statistics [2] are also addressed. Also it does not have
the asymmetry mentioned in point 2 in [3].

Please review.

[1] https://www.postgresql.org/message-id/CAA4eK1KzYaq9dcaa20Pv44ewomUPj_PbbeLfEnvzuXYMZtNw0A%40mail.gmail.com
[2] https://www.postgresql.org/message-id/aNZ1T5vYC1BtKs4M@ip-10-97-1-34.eu-west-3.compute.internal
[3] https://www.postgresql.org/message-id/CAExHW5tfVHABuv1moL_shp7oPrWmg8ha7T8CqwZxiMrKror7iw%40mail.gmail.com

--
Best Wishes,
Ashutosh Bapat

Вложения

v20251211-0001-Report-output-plugin-statistics-in-pg_stat.patch

Re: Report bytes and transactions actually sent downtream

От

Chao Li

Дата:

11 декабря 2025 г., 12:38:40

Hi, Ashutosh,

I just quickly went through the patch. Obviously I need more time to fully understand the patch, I will do a deep
reviewtoday. In the meantime, I just caught a nit issue. 

> On Dec 11, 2025, at 12:59, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
>
>
> Please review.
>
> [1] https://www.postgresql.org/message-id/CAA4eK1KzYaq9dcaa20Pv44ewomUPj_PbbeLfEnvzuXYMZtNw0A%40mail.gmail.com
> [2] https://www.postgresql.org/message-id/aNZ1T5vYC1BtKs4M@ip-10-97-1-34.eu-west-3.compute.internal
> [3] https://www.postgresql.org/message-id/CAExHW5tfVHABuv1moL_shp7oPrWmg8ha7T8CqwZxiMrKror7iw%40mail.gmail.com
>
> --
> Best Wishes,
> Ashutosh Bapat
> <v20251211-0001-Report-output-plugin-statistics-in-pg_stat.patch>

1
```
+    linkend="logicaldecoding-output-plugin-callbacks"/>. A descripancy in those
```

Typo: descripancy => discrepancy


Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Re: Report bytes and transactions actually sent downtream

От

Ashutosh Bapat

Дата:

17 декабря 2025 г., 08:55:41

Hi Chao,

On Thu, Dec 11, 2025 at 3:09 PM Chao Li <li.evan.chao@gmail.com> wrote:
>
> Hi, Ashutosh,
>
> I just quickly went through the patch. Obviously I need more time to fully understand the patch, I will do a deep
reviewtoday. In the meantime, I just caught a nit issue. 
>

Thanks for your review.

> > On Dec 11, 2025, at 12:59, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
> >
> >
> > Please review.
> >
> > [1] https://www.postgresql.org/message-id/CAA4eK1KzYaq9dcaa20Pv44ewomUPj_PbbeLfEnvzuXYMZtNw0A%40mail.gmail.com
> > [2] https://www.postgresql.org/message-id/aNZ1T5vYC1BtKs4M@ip-10-97-1-34.eu-west-3.compute.internal
> > [3] https://www.postgresql.org/message-id/CAExHW5tfVHABuv1moL_shp7oPrWmg8ha7T8CqwZxiMrKror7iw%40mail.gmail.com
> >
> > --
> > Best Wishes,
> > Ashutosh Bapat
> > <v20251211-0001-Report-output-plugin-statistics-in-pg_stat.patch>
>
> 1
> ```
> +    linkend="logicaldecoding-output-plugin-callbacks"/>. A descripancy in those
> ```
>
> Typo: descripancy => discrepancy
>

Thanks for pointing this out. I have fixed it my code. However, at
this point I am looking for a design review, especially to verify that
the new implementation addresses Andres's concern raised in [1] while
not introducing any design issues raised earlier e.g. those raised in
threads [2], [3] and [4]

[1] https://www.postgresql.org/message-id/zzidfgaowvlv4opptrcdlw57vmulnh7gnes4aerl6u35mirelm@tj2vzseptkjk
> > [2] https://www.postgresql.org/message-id/CAA4eK1KzYaq9dcaa20Pv44ewomUPj_PbbeLfEnvzuXYMZtNw0A%40mail.gmail.com
> > [3] https://www.postgresql.org/message-id/aNZ1T5vYC1BtKs4M@ip-10-97-1-34.eu-west-3.compute.internal
> > [4] https://www.postgresql.org/message-id/CAExHW5tfVHABuv1moL_shp7oPrWmg8ha7T8CqwZxiMrKror7iw%40mail.gmail.com

--
Best Wishes,
Ashutosh Bapat

Re: Report bytes and transactions actually sent downtream

От

Bertrand Drouvot

Дата:

17 декабря 2025 г., 11:42:15

Hi,

On Thu, Dec 11, 2025 at 10:29:42AM +0530, Ashutosh Bapat wrote:
> Sorry for the delayed response. PFA the patch implementing the idea
> discussed above. It relies on the output plugin callback to return
> correct boolean but maintains the statistics in the core itself.

Thanks for the new patch version!

What worries me is all those API changes:

-typedef void (*LogicalDecodeChangeCB) (struct LogicalDecodingContext *ctx,
+typedef bool (*LogicalDecodeChangeCB) (struct LogicalDecodingContext *ctx,

Those changes will break existing third party logical decoding plugin, even ones
that don't want the new statistics features.

What about not changing those and just add a single new optional callback, say?

typedef void (*LogicalDecodeReportStatsCB)(
    LogicalDecodingContext *ctx,
    ReorderBufferTXN *txn,
    bool *transaction_sent,
    size_t *bytes_filtered
);

This way:

- Existing plugins can still work without modification
- New or existing plugins can choose to provide statistics

Thoughts?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Report bytes and transactions actually sent downtream

От

Chao Li

Дата:

18 декабря 2025 г., 05:25:45

> On Dec 17, 2025, at 13:55, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
>
> Thanks for pointing this out. I have fixed it my code. However, at
> this point I am looking for a design review, especially to verify that
> the new implementation addresses Andres's concern raised in [1] while
> not introducing any design issues raised earlier e.g. those raised in
> threads [2], [3] and [4]
>
> [1] https://www.postgresql.org/message-id/zzidfgaowvlv4opptrcdlw57vmulnh7gnes4aerl6u35mirelm@tj2vzseptkjk
>>> [2] https://www.postgresql.org/message-id/CAA4eK1KzYaq9dcaa20Pv44ewomUPj_PbbeLfEnvzuXYMZtNw0A%40mail.gmail.com
>>> [3] https://www.postgresql.org/message-id/aNZ1T5vYC1BtKs4M@ip-10-97-1-34.eu-west-3.compute.internal
>>> [4] https://www.postgresql.org/message-id/CAExHW5tfVHABuv1moL_shp7oPrWmg8ha7T8CqwZxiMrKror7iw%40mail.gmail.com
>
> --
> Best Wishes,
> Ashutosh Bapat

Hi Ashutosh,

Yeah, I owe you a review. I committed to review this patch but I forgot, sorry about that.

From design perspective, I agree increasing counters should belong to the core, plugin should return properly values
followingthe contract. And I got some more comments: 

1. I just feel a bool return value might not be clear enough. For example:

```
-    ctx->callbacks.change_cb(ctx, txn, relation, change);
+    if (!ctx->callbacks.change_cb(ctx, txn, relation, change))
+        cache->filteredBytes += ReorderBufferChangeSize(change);
```

You increase filteredBytes when change_cb returns false. But if we look at pgoutput_change(), there are many reasons to
returnfalse. Counting all the cases to filteredBytes seems wrong. 

2.
```
-    ctx->callbacks.truncate_cb(ctx, txn, nrelations, relations, change);
+    if (!ctx->callbacks.truncate_cb(ctx, txn, nrelations, relations, change))
+        cache->filteredBytes += ReorderBufferChangeSize(change);
```

Row filter doesn’t impact TRUNCATE, why increase filteredBytes after truncate_cb()?

3.
```
-    ctx->callbacks.prepare_cb(ctx, txn, prepare_lsn);
+    if (ctx->callbacks.prepare_cb(ctx, txn, prepare_lsn))
+        cache->sentTxns++;
```

For 2-phase commit, it increase sentTxns after prepare_cb, and
```
+    if (ctx->callbacks.stream_abort_cb(ctx, txn, abort_lsn))
+        cache->sentTxns++;
```

If the transaction is aborted, sentTxns is increased again, which is confusing. Though for aborting there is some data
(anotification) is streamed, but I don’t think that should be counted as a transaction. 

After commit, sentTxns is also increased, so that, a 2-phase commit is counted as two transactions, which feels also
confusing.IMO, a 2-phase commit should still be counted as one transaction. 

4. You add sentBytes and filteredBytes. I am thinking if it makes sense to also add sentRows and filteredRows. Because
tablescould be big or small, bytes + rows could show a more clear picture to users. 

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Re: Report bytes and transactions actually sent downtream

От

Ashutosh Bapat

Дата:

18 декабря 2025 г., 15:52:40

Hi Bertrand,

On Wed, Dec 17, 2025 at 2:12 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> What worries me is all those API changes:
>
> -typedef void (*LogicalDecodeChangeCB) (struct LogicalDecodingContext *ctx,
> +typedef bool (*LogicalDecodeChangeCB) (struct LogicalDecodingContext *ctx,
>
> Those changes will break existing third party logical decoding plugin, even ones
> that don't want the new statistics features.
>
> What about not changing those and just add a single new optional callback, say?
>
> typedef void (*LogicalDecodeReportStatsCB)(
>     LogicalDecodingContext *ctx,
>     ReorderBufferTXN *txn,
>     bool *transaction_sent,
>     size_t *bytes_filtered
> );
>
> This way:
>
> - Existing plugins can still work without modification
> - New or existing plugins can choose to provide statistics
>

I think that it will bring back the same problems that the previous
design had or am I missing something? Let me elaborate:
1. If every plugin implements the calculation of filtered_bytes
differently, the same set of WAL passed through different output
plugins would report different filtered bytes, even if they filtered
the same changes. I think Andres wants minimal changes in the output
plugins to avoid these divergences.
2. This also has the problem that you had raised. What if an output
plugin had calls to this callback in one version but removed them in
the next.
3. An output plugin may simply not realise that it can use this
function to maintain statistics. Or The plugin may not call the
function in all the places that it needs to. Or It may not realise it
needs to call this function in a new callback added in the new
PostgreSQL version. There are many ways an output plugin may get it
wrong. I think this is also the reason Andres wants minimal changes
output plugin to maintaining statistics.
4. filteredBytes and sentTxns are not updated at the same place, so
the plugins have to send one of those values as 0 always when calling
the function. We need two functions one for each sentTxns and
filteredBytes. That means more chances of error and divergence.

The new implementation does not have these problems
1. As the API is changed in the new implementation, every output
plugin is forced to change their implementation. Amit and I discussed
this aspect starting [1]. The plugins will detect the change when
compiling their code against PG 19, so they won't miss it. The change
expected from every plugin is minimal and well documented. They have
to simply return true or false and rest will be taken care of by the
core. So there is less chance of error or divergence.
2. The plugin can not go back and forth on maintaining the statistics
- an issue you raised. The API will force it to always return the
required status.
3. I think getting the correct statistics is more important than
making it optional, especially when the changes expected from the
plugin are simple. Thinking more about it, users wouldn't want to
change their output plugin just because other output plugin supports
statistics.

Ideally, it would have been better if this was raised when Myself and
Amit discussed this proposal [1], a month ago; before I spent time and
effort implementing the design. But better now than before a commit.

[1] https://www.postgresql.org/message-id/CAA4eK1K4Pq=acoXx3dEF7us_NFrDVU+M7f_j7KXm+Q2ywY+LSQ@mail.gmail.com

--
Best Wishes,
Ashutosh Bapat

Re: Report bytes and transactions actually sent downtream

От

Ashutosh Bapat

Дата:

18 декабря 2025 г., 15:52:52

On Thu, Dec 18, 2025 at 7:56 AM Chao Li <li.evan.chao@gmail.com> wrote:
>
>
>
> > On Dec 17, 2025, at 13:55, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
> >
> > Thanks for pointing this out. I have fixed it my code. However, at
> > this point I am looking for a design review, especially to verify that
> > the new implementation addresses Andres's concern raised in [1] while
> > not introducing any design issues raised earlier e.g. those raised in
> > threads [2], [3] and [4]
> >
> > [1] https://www.postgresql.org/message-id/zzidfgaowvlv4opptrcdlw57vmulnh7gnes4aerl6u35mirelm@tj2vzseptkjk
> >>> [2] https://www.postgresql.org/message-id/CAA4eK1KzYaq9dcaa20Pv44ewomUPj_PbbeLfEnvzuXYMZtNw0A%40mail.gmail.com
> >>> [3] https://www.postgresql.org/message-id/aNZ1T5vYC1BtKs4M@ip-10-97-1-34.eu-west-3.compute.internal
> >>> [4] https://www.postgresql.org/message-id/CAExHW5tfVHABuv1moL_shp7oPrWmg8ha7T8CqwZxiMrKror7iw%40mail.gmail.com
> >
> > --
> > Best Wishes,
> > Ashutosh Bapat
>
>
> Hi Ashutosh,
>
> Yeah, I owe you a review. I committed to review this patch but I forgot, sorry about that.
>
> From design perspective, I agree increasing counters should belong to the core, plugin should return properly values
followingthe contract. And I got some more comments: 
>
> 1. I just feel a bool return value might not be clear enough. For example:
>
> ```
> -       ctx->callbacks.change_cb(ctx, txn, relation, change);
> +       if (!ctx->callbacks.change_cb(ctx, txn, relation, change))
> +               cache->filteredBytes += ReorderBufferChangeSize(change);
> ```
>
> You increase filteredBytes when change_cb returns false. But if we look at pgoutput_change(), there are many reasons
toreturn false. Counting all the cases to filteredBytes seems wrong. 

I am not able to understand this. Every "return false" from
pgoutput_change() indicates that the change was filtered out and hence
the size of corresponding change is being added to filteredBytes by
the caller. Which "return false" does not indicate a filtered out
change?

>
> 2.
> ```
> -       ctx->callbacks.truncate_cb(ctx, txn, nrelations, relations, change);
> +       if (!ctx->callbacks.truncate_cb(ctx, txn, nrelations, relations, change))
> +               cache->filteredBytes += ReorderBufferChangeSize(change);
> ```
>
> Row filter doesn’t impact TRUNCATE, why increase filteredBytes after truncate_cb()?

A TRUNCATE of a relation which is not part of the publication will be
filtered out.

>
> 3.
> ```
> -       ctx->callbacks.prepare_cb(ctx, txn, prepare_lsn);
> +       if (ctx->callbacks.prepare_cb(ctx, txn, prepare_lsn))
> +               cache->sentTxns++;
> ```
>
> For 2-phase commit, it increase sentTxns after prepare_cb, and
> ```
> +       if (ctx->callbacks.stream_abort_cb(ctx, txn, abort_lsn))
> +               cache->sentTxns++;
> ```
>
> If the transaction is aborted, sentTxns is increased again, which is confusing. Though for aborting there is some
data(a notification) is streamed, but I don’t think that should be counted as a transaction. 
>
> After commit, sentTxns is also increased, so that, a 2-phase commit is counted as two transactions, which feels also
confusing.IMO, a 2-phase commit should still be counted as one transaction. 

stream_commit/abort_cb is called after stream_prepare_cb not after prepare_cb.

>
> 4. You add sentBytes and filteredBytes. I am thinking if it makes sense to also add sentRows and filteredRows.
Becausetables could be big or small, bytes + rows could show a more clear picture to users. 

We don't have corresponding total_rows and streamed_rows counts. I
think that's because we haven't come across a use case for them. Do
you have a use case in mind?

--
Best Wishes,
Ashutosh Bapat

Re: Report bytes and transactions actually sent downtream

От

Bertrand Drouvot

Дата:

18 декабря 2025 г., 21:22:48

Hi,

On Thu, Dec 18, 2025 at 06:22:40PM +0530, Ashutosh Bapat wrote:
> Hi Bertrand,
> 
> On Wed, Dec 17, 2025 at 2:12 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > What worries me is all those API changes:
> >
> > -typedef void (*LogicalDecodeChangeCB) (struct LogicalDecodingContext *ctx,
> > +typedef bool (*LogicalDecodeChangeCB) (struct LogicalDecodingContext *ctx,
> >
> > Those changes will break existing third party logical decoding plugin, even ones
> > that don't want the new statistics features.
> >
> > What about not changing those and just add a single new optional callback, say?
> >
> > typedef void (*LogicalDecodeReportStatsCB)(
> >     LogicalDecodingContext *ctx,
> >     ReorderBufferTXN *txn,
> >     bool *transaction_sent,
> >     size_t *bytes_filtered
> > );
> >
> > This way:
> >
> > - Existing plugins can still work without modification
> > - New or existing plugins can choose to provide statistics
> >
> 
> I think that it will bring back the same problems that the previous
> design had or am I missing something?

I think that my example was confusing due to "size_t *bytes_filtered". I think
that what we could do is something like:

"
typedef void (*LogicalDecodeReportStatsCB)(
    LogicalDecodingContext *ctx,
    LogicalDecodeEventType event_type,
    bool *filtered,
    bool *txn_sent);
"

Note that there is no more size_t.

Then for, for example in change_cb_wrapper(), we could do:

"
ctx->callbacks.change_cb(ctx, txn, relation, change);

if (ctx->callbacks.report_stats_cb)
{
    bool filtered = false;

    ctx->callbacks.report_stats_cb(ctx, LOGICALDECODE_CHANGE,
                                &filtered, NULL);
        
    if (filtered)
        cache->filteredBytes += ReorderBufferChangeSize(change);
}
"

The plugin would need to "remember" that it filtered (so that it can
reply to the callback). It could do that by adding say "last_event_filtered" to
it's output_plugin_private structure.

That's more work on the plugin side and we would probably need to provide some
examples from our side.

I think the pros are that:

- plugins that don't want to report stats would have nothing to do (no breaking
changes)
- the core does the computation

Thoughts?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Report bytes and transactions actually sent downtream

От

Chao Li

Дата:

19 декабря 2025 г., 04:53:53


> On Dec 18, 2025, at 20:52, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
>
> On Thu, Dec 18, 2025 at 7:56 AM Chao Li <li.evan.chao@gmail.com> wrote:
>>
>>
>>
>>> On Dec 17, 2025, at 13:55, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
>>>
>>> Thanks for pointing this out. I have fixed it my code. However, at
>>> this point I am looking for a design review, especially to verify that
>>> the new implementation addresses Andres's concern raised in [1] while
>>> not introducing any design issues raised earlier e.g. those raised in
>>> threads [2], [3] and [4]
>>>
>>> [1] https://www.postgresql.org/message-id/zzidfgaowvlv4opptrcdlw57vmulnh7gnes4aerl6u35mirelm@tj2vzseptkjk
>>>>> [2] https://www.postgresql.org/message-id/CAA4eK1KzYaq9dcaa20Pv44ewomUPj_PbbeLfEnvzuXYMZtNw0A%40mail.gmail.com
>>>>> [3] https://www.postgresql.org/message-id/aNZ1T5vYC1BtKs4M@ip-10-97-1-34.eu-west-3.compute.internal
>>>>> [4] https://www.postgresql.org/message-id/CAExHW5tfVHABuv1moL_shp7oPrWmg8ha7T8CqwZxiMrKror7iw%40mail.gmail.com
>>>
>>> --
>>> Best Wishes,
>>> Ashutosh Bapat
>>
>>
>> Hi Ashutosh,
>>
>> Yeah, I owe you a review. I committed to review this patch but I forgot, sorry about that.
>>
>> From design perspective, I agree increasing counters should belong to the core, plugin should return properly values
followingthe contract. And I got some more comments: 
>>
>> 1. I just feel a bool return value might not be clear enough. For example:
>>
>> ```
>> -       ctx->callbacks.change_cb(ctx, txn, relation, change);
>> +       if (!ctx->callbacks.change_cb(ctx, txn, relation, change))
>> +               cache->filteredBytes += ReorderBufferChangeSize(change);
>> ```
>>
>> You increase filteredBytes when change_cb returns false. But if we look at pgoutput_change(), there are many reasons
toreturn false. Counting all the cases to filteredBytes seems wrong. 
>
> I am not able to understand this. Every "return false" from
> pgoutput_change() indicates that the change was filtered out and hence
> the size of corresponding change is being added to filteredBytes by
> the caller. Which "return false" does not indicate a filtered out
> change?

I think the confusion comes from the counter name “filteredBytes”, what does “filtered” mean? There are 3 types of data
notsteaming out: 

a. WAL data of tables that doesn’t belong to the publication
b. table belong to the publication, but action doesn’t. For example, FOR ALL TABLES (INSERT), then update/delete will
notbe streamed out 
c. Filtered by row filter (WHERE)

I thought only c should be counted to filteredBytes; thinking over again, maybe b should also be counted. But I still
don’tthink a should be counted. 

IMO, sentBytes + filteredBytes == supposedToSendBytes. If a table doesn’t belong to a publication, then it should not
becounted into supposedToSendBytes, so it should not be counted into filteredBytes. 

The other point is that, if we count a into filteredBytes, then ends up totalBytes == sendBytes + filteredBytes, if
that’strue, why don’t compute such a number by (totalBytes-sendBytes) in client side? 

If we insist to count a, then maybe we need to consider a better counter name.

>>
>> 2.
>> ```
>> -       ctx->callbacks.truncate_cb(ctx, txn, nrelations, relations, change);
>> +       if (!ctx->callbacks.truncate_cb(ctx, txn, nrelations, relations, change))
>> +               cache->filteredBytes += ReorderBufferChangeSize(change);
>> ```
>>
>> Row filter doesn’t impact TRUNCATE, why increase filteredBytes after truncate_cb()?
>
> A TRUNCATE of a relation which is not part of the publication will be
> filtered out.

Same as 1.

>
>>
>> 3.
>> ```
>> -       ctx->callbacks.prepare_cb(ctx, txn, prepare_lsn);
>> +       if (ctx->callbacks.prepare_cb(ctx, txn, prepare_lsn))
>> +               cache->sentTxns++;
>> ```
>>
>> For 2-phase commit, it increase sentTxns after prepare_cb, and
>> ```
>> +       if (ctx->callbacks.stream_abort_cb(ctx, txn, abort_lsn))
>> +               cache->sentTxns++;
>> ```
>>
>> If the transaction is aborted, sentTxns is increased again, which is confusing. Though for aborting there is some
data(a notification) is streamed, but I don’t think that should be counted as a transaction. 
>>
>> After commit, sentTxns is also increased, so that, a 2-phase commit is counted as two transactions, which feels also
confusing.IMO, a 2-phase commit should still be counted as one transaction. 
>
> stream_commit/abort_cb is called after stream_prepare_cb not after prepare_cb.

That’s my typo, but the problem is still there. Should we count a 2-phase-commit as 2 transactions?

>
>>
>> 4. You add sentBytes and filteredBytes. I am thinking if it makes sense to also add sentRows and filteredRows.
Becausetables could be big or small, bytes + rows could show a more clear picture to users. 
>
> We don't have corresponding total_rows and streamed_rows counts. I
> think that's because we haven't come across a use case for them. Do
> you have a use case in mind?
>

That’s still related to 1. totalBytes includes tables don’t belong to the publication, thus totalRows doesn’t make much
sense.But sendRows will only include those rows belonging to the publication. For filterRows, if we exclude a, then I
believefilterRows also makes sense. 

If you argue that “rows” request should be treated in a separate thread, I’ll be okay with that.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Re: Report bytes and transactions actually sent downtream

От

Ashutosh Bapat

Дата:

19 декабря 2025 г., 10:02:49

On Thu, Dec 18, 2025 at 11:52 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On Thu, Dec 18, 2025 at 06:22:40PM +0530, Ashutosh Bapat wrote:
> > Hi Bertrand,
> >
> > On Wed, Dec 17, 2025 at 2:12 PM Bertrand Drouvot
> > <bertranddrouvot.pg@gmail.com> wrote:
> > >
> > > What worries me is all those API changes:
> > >
> > > -typedef void (*LogicalDecodeChangeCB) (struct LogicalDecodingContext *ctx,
> > > +typedef bool (*LogicalDecodeChangeCB) (struct LogicalDecodingContext *ctx,
> > >
> > > Those changes will break existing third party logical decoding plugin, even ones
> > > that don't want the new statistics features.
> > >
> > > What about not changing those and just add a single new optional callback, say?
> > >
> > > typedef void (*LogicalDecodeReportStatsCB)(
> > >     LogicalDecodingContext *ctx,
> > >     ReorderBufferTXN *txn,
> > >     bool *transaction_sent,
> > >     size_t *bytes_filtered
> > > );
> > >
> > > This way:
> > >
> > > - Existing plugins can still work without modification
> > > - New or existing plugins can choose to provide statistics
> > >
> >
> > I think that it will bring back the same problems that the previous
> > design had or am I missing something?
>
> I think that my example was confusing due to "size_t *bytes_filtered". I think
> that what we could do is something like:
>
> "
> typedef void (*LogicalDecodeReportStatsCB)(
>     LogicalDecodingContext *ctx,
>     LogicalDecodeEventType event_type,
>     bool *filtered,
>     bool *txn_sent);
> "
>
> Note that there is no more size_t.
>

Thanks for the clarification. It fixes the problem of filteredBytes
divergence. Since the core is calling stats callback, the problem of
plugin not calling the function at appropriate places is also not
there. IIUC, it still has some problems from the previous solution and
some new problems as explained below.

> Then for, for example in change_cb_wrapper(), we could do:
>
> "
> ctx->callbacks.change_cb(ctx, txn, relation, change);
>
> if (ctx->callbacks.report_stats_cb)
> {
>         bool filtered = false;
>
>         ctx->callbacks.report_stats_cb(ctx, LOGICALDECODE_CHANGE,
>                                                                 &filtered, NULL);
>
>         if (filtered)
>                 cache->filteredBytes += ReorderBufferChangeSize(change);
> }
> "
>
> The plugin would need to "remember" that it filtered (so that it can
> reply to the callback). It could do that by adding say "last_event_filtered" to
> it's output_plugin_private structure.

Why does the core send NULL for the second parameter? Does the output
plugin have to take care of NULL references too?

I think the core will end up calling this or similar stanza at every
callback since it won't know when the output plugin will have
statistics to report. That's more complexity and wasted CPU cycles in
core.

>
> That's more work on the plugin side and we would probably need to provide some
> examples from our side.

Andres is objecting to this exact thing. IIUC, the code changes there
were far simpler than this proposal. Am I missing something?

My feeling is that the core will end up
>
> I think the pros are that:
>
> - plugins that don't want to report stats would have nothing to do (no breaking
> changes)

I don't think there will be an output plugin which wouldn't want to
take advantage of the statistics. The easier it is for them to adopt
the statistics, as is with my proposal, the better. With this proposal
output plugins have to do more work if they want to support
statistics. That itself will create a barrier for them to adopt the
statistics. We want the output plugins to support statistics so that
users can benefit. Let's make it easier for the output plugins to
implement them.

I feel this proposal makes both sides, the core and the output plugin
complex in pursuit of a goal which is not worth it.

This stil has the problem that you had raised. What if an output
plugin stops supporting statistics across versions?

--
Best Wishes,
Ashutosh Bapat

Re: Report bytes and transactions actually sent downtream

От

Bertrand Drouvot

Дата:

19 декабря 2025 г., 10:51:06

Hi,

On Fri, Dec 19, 2025 at 12:32:49PM +0530, Ashutosh Bapat wrote:
> On Thu, Dec 18, 2025 at 11:52 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > I think that my example was confusing due to "size_t *bytes_filtered". I think
> > that what we could do is something like:
> >
> > "
> > typedef void (*LogicalDecodeReportStatsCB)(
> >     LogicalDecodingContext *ctx,
> >     LogicalDecodeEventType event_type,
> >     bool *filtered,
> >     bool *txn_sent);
> > "
> >
> > Note that there is no more size_t.
> >
> 
> Thanks for the clarification. It fixes the problem of filteredBytes
> divergence. Since the core is calling stats callback, the problem of
> plugin not calling the function at appropriate places is also not
> there.

Yeah.

> IIUC, it still has some problems from the previous solution and
> some new problems as explained below.
> 
> > Then for, for example in change_cb_wrapper(), we could do:
> >
> > "
> > ctx->callbacks.change_cb(ctx, txn, relation, change);
> >
> > if (ctx->callbacks.report_stats_cb)
> > {
> >         bool filtered = false;
> >
> >         ctx->callbacks.report_stats_cb(ctx, LOGICALDECODE_CHANGE,
> >                                                                 &filtered, NULL);
> >
> >         if (filtered)
> >                 cache->filteredBytes += ReorderBufferChangeSize(change);
> > }
> > "
> >
> > The plugin would need to "remember" that it filtered (so that it can
> > reply to the callback). It could do that by adding say "last_event_filtered" to
> > it's output_plugin_private structure.
> 
> 
> Why does the core send NULL for the second parameter? Does the output
> plugin have to take care of NULL references too?

It was just a quick example. I was more focused on demonstrating the concept than
the exact API details.

> 
> I think the core will end up calling this or similar stanza at every
> callback since it won't know when the output plugin will have
> statistics to report.

Yes.

> That's more complexity and wasted CPU cycles in core.

I think that should be negligible as compared to what the logical decoding is
already doing at those places.

> > That's more work on the plugin side and we would probably need to provide some
> > examples from our side.
> 
> Andres is objecting to this exact thing. IIUC, the code changes there
> were far simpler than this proposal. Am I missing something?

You are right. My main motivation with this idea was to avoid the APIs break.
But maybe that's not worth it.

> I don't think there will be an output plugin which wouldn't want to
> take advantage of the statistics. The easier it is for them to adopt
> the statistics, as is with my proposal, the better. With this proposal
> output plugins have to do more work if they want to support
> statistics. That itself will create a barrier for them to adopt the
> statistics. We want the output plugins to support statistics so that
> users can benefit. Let's make it easier for the output plugins to
> implement them.

That was the main point. With your proposal, the APIs break will occur (and so
the plugin will need some changes) even if they don't want the stats. But, if
we are confident that most (all?) would want to use it, then I agree that your
proposal is better and that's fine by me to move forward with yours.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Report bytes and transactions actually sent downtream

От

Ashutosh Bapat

Дата:

19 декабря 2025 г., 11:53:04

On Fri, Dec 19, 2025 at 7:24 AM Chao Li <li.evan.chao@gmail.com> wrote:
>
>
>
> > On Dec 18, 2025, at 20:52, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
> >
> > On Thu, Dec 18, 2025 at 7:56 AM Chao Li <li.evan.chao@gmail.com> wrote:
> >>
> >>
> >>
> >>> On Dec 17, 2025, at 13:55, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
> >>>
> >>> Thanks for pointing this out. I have fixed it my code. However, at
> >>> this point I am looking for a design review, especially to verify that
> >>> the new implementation addresses Andres's concern raised in [1] while
> >>> not introducing any design issues raised earlier e.g. those raised in
> >>> threads [2], [3] and [4]
> >>>
> >>> [1] https://www.postgresql.org/message-id/zzidfgaowvlv4opptrcdlw57vmulnh7gnes4aerl6u35mirelm@tj2vzseptkjk
> >>>>> [2] https://www.postgresql.org/message-id/CAA4eK1KzYaq9dcaa20Pv44ewomUPj_PbbeLfEnvzuXYMZtNw0A%40mail.gmail.com
> >>>>> [3] https://www.postgresql.org/message-id/aNZ1T5vYC1BtKs4M@ip-10-97-1-34.eu-west-3.compute.internal
> >>>>> [4] https://www.postgresql.org/message-id/CAExHW5tfVHABuv1moL_shp7oPrWmg8ha7T8CqwZxiMrKror7iw%40mail.gmail.com
> >>>
> >>> --
> >>> Best Wishes,
> >>> Ashutosh Bapat
> >>
> >>
> >> Hi Ashutosh,
> >>
> >> Yeah, I owe you a review. I committed to review this patch but I forgot, sorry about that.
> >>
> >> From design perspective, I agree increasing counters should belong to the core, plugin should return properly
valuesfollowing the contract. And I got some more comments: 
> >>
> >> 1. I just feel a bool return value might not be clear enough. For example:
> >>
> >> ```
> >> -       ctx->callbacks.change_cb(ctx, txn, relation, change);
> >> +       if (!ctx->callbacks.change_cb(ctx, txn, relation, change))
> >> +               cache->filteredBytes += ReorderBufferChangeSize(change);
> >> ```
> >>
> >> You increase filteredBytes when change_cb returns false. But if we look at pgoutput_change(), there are many
reasonsto return false. Counting all the cases to filteredBytes seems wrong. 
> >
> > I am not able to understand this. Every "return false" from
> > pgoutput_change() indicates that the change was filtered out and hence
> > the size of corresponding change is being added to filteredBytes by
> > the caller. Which "return false" does not indicate a filtered out
> > change?
>
> I think the confusion comes from the counter name “filteredBytes”, what does “filtered” mean? There are 3 types of
datanot steaming out: 
>
> a. WAL data of tables that doesn’t belong to the publication
> b. table belong to the publication, but action doesn’t. For example, FOR ALL TABLES (INSERT), then update/delete will
notbe streamed out 
> c. Filtered by row filter (WHERE)
>
> I thought only c should be counted to filteredBytes; thinking over again, maybe b should also be counted. But I still
don’tthink a should be counted. 
>
> IMO, sentBytes + filteredBytes == supposedToSendBytes. If a table doesn’t belong to a publication, then it should not
becounted into supposedToSendBytes, so it should not be counted into filteredBytes. 
>
> The other point is that, if we count a into filteredBytes, then ends up totalBytes == sendBytes + filteredBytes, if
that’strue, why don’t compute such a number by (totalBytes-sendBytes) in client side? 
>
> If we insist to count a, then maybe we need to consider a better counter name.


>
> >>
> >> 2.
> >> ```
> >> -       ctx->callbacks.truncate_cb(ctx, txn, nrelations, relations, change);
> >> +       if (!ctx->callbacks.truncate_cb(ctx, txn, nrelations, relations, change))
> >> +               cache->filteredBytes += ReorderBufferChangeSize(change);
> >> ```
> >>
> >> Row filter doesn’t impact TRUNCATE, why increase filteredBytes after truncate_cb()?
> >
> > A TRUNCATE of a relation which is not part of the publication will be
> > filtered out.
>
> Same as 1.
>

filtered_bytes is the amount of data filtered out of total_bytes.
Since total_bytes accounts for the changes from tables which are not
included in the publications, filtered_bytes should include them since
they are "filtered" from total_bytes. Hence include a in
filtered_bytes. Quoting from the document
--
        Amount of changes, from
<structfield>total_wal_bytes</structfield>, filtered
        out by the output plugin and not sent downstream. Please note that it
        does not include the changes filtered before a change is sent to
        the output plugin, e.g. the changes filtered by origin.
--

sent_bytes is related but different metric. From the documentation
--
        Amount of transaction changes, in the output format, sent downstream for
        this slot by the output plugin.
--
Assumption sentBytes + filteredBytes == supposedToSendBytes. is wrong.
Since filtered_bytes were never converted into the output format we
don't know how many bytes would have been sent downstream, had those
bytes not been filtered. We will never know how much supposedToSend
bytes would be. ALso note that sent_bytes + filtered_bytes is not the
same as total_wal_bytes.

> >
> >>
> >> 3.
> >> ```
> >> -       ctx->callbacks.prepare_cb(ctx, txn, prepare_lsn);
> >> +       if (ctx->callbacks.prepare_cb(ctx, txn, prepare_lsn))
> >> +               cache->sentTxns++;
> >> ```
> >>
> >> For 2-phase commit, it increase sentTxns after prepare_cb, and
> >> ```
> >> +       if (ctx->callbacks.stream_abort_cb(ctx, txn, abort_lsn))
> >> +               cache->sentTxns++;
> >> ```
> >>
> >> If the transaction is aborted, sentTxns is increased again, which is confusing. Though for aborting there is some
data(a notification) is streamed, but I don’t think that should be counted as a transaction. 
> >>
> >> After commit, sentTxns is also increased, so that, a 2-phase commit is counted as two transactions, which feels
alsoconfusing. IMO, a 2-phase commit should still be counted as one transaction. 
> >
> > stream_commit/abort_cb is called after stream_prepare_cb not after prepare_cb.
>
> That’s my typo, but the problem is still there. Should we count a 2-phase-commit as 2 transactions?
>

Can you please provide me a repro where a prepared transaction gets
counted twice as sent_txns?

> >
> >>
> >> 4. You add sentBytes and filteredBytes. I am thinking if it makes sense to also add sentRows and filteredRows.
Becausetables could be big or small, bytes + rows could show a more clear picture to users. 
> >
> > We don't have corresponding total_rows and streamed_rows counts. I
> > think that's because we haven't come across a use case for them. Do
> > you have a use case in mind?
> >
>
> That’s still related to 1. totalBytes includes tables don’t belong to the publication, thus totalRows doesn’t make
muchsense. But sendRows will only include those rows belonging to the publication. For filterRows, if we exclude a,
thenI believe filterRows also makes sense. 
>
> If you argue that “rows” request should be treated in a separate thread, I’ll be okay with that.

I think so. It will be good to provide examples of how this statistics
will be used. A separate thread will be better.

--
Best Wishes,
Ashutosh Bapat

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Report bytes and transactions actually sent downtream

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения