Обсуждение: Report bytes and transactions actually sent downtream

Поиск
Список
Период
Сортировка

Report bytes and transactions actually sent downtream

От
Ashutosh Bapat
Дата:
Hi All,
In a recent logical replication issue, there were multiple replication
slots involved, each using a different publication. Thus the amount of
data that was replicated through each slot was expected to be
different. However, total_bytes and total_txns were reported the same
for all the replication slots as expected. One of the slots started
lagging and we were trying to figure out whether its the WAL sender
slowing down or the consumer  (in this case Debezium). The lagging
slot then showed total_txns and total_bytes lesser than other slots
giving an impression that the WAL sender is processing the data
slowly. Had pg_stat_replication_slot reported the amount of data
actually sent downstream, it would have been easier to compare it with
the amount of data received by the consumer and thus pinpoint the
bottleneck.

Here's a patch to do the same. It adds two columns
    - sent_txns: The total number of transactions sent downstream.
    - sent_bytes: The total number of bytes sent downstream in data messages
to pg_stat_replication_slots. sent_bytes includes only the bytes sent
as part of 'd' messages and does not include keep alive messages or
CopyDone messages for example. But those are very few and can be
ignored. If others feel that those are important to be included, we
can make that change.

Plugins may choose not to send an empty transaction downstream. It's
better to increment sent_txns counter in the plugin code when it
actually sends a BEGIN message, for example in pgoutput_send_begin()
and pg_output_begin(). This means that every plugin will need to be
modified to increment the counter for it to reported correctly.

I first thought of incrementing sent_bytes in OutputPluginWrite()
which is a central function for all logical replication message
writes. But that calls LogicalDecodingContext::write() which may
further add bytes to the message e.g. WalSndWriteData() and
LogicalOutputWrite(). So it's better to increment the counter in
implementations of LogicalDecodingContext::write(), so that we count
the exact number of bytes. These implementations are within core code
so they won't miss updating sent_bytes.

I think we should rename total_txns and total_bytes to reordered_txns
and reordered_bytes respectively, and also update the documentation
accordingly to make better sense of those numbers. But these patches
do not contain that change. If others feel the same way, I will
provide a patch with that change.

-- 
Best Wishes,
Ashutosh Bapat

Вложения

Re: Report bytes and transactions actually sent downtream

От
Amit Kapila
Дата:
On Mon, Jun 30, 2025 at 3:24 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> Hi All,
> In a recent logical replication issue, there were multiple replication
> slots involved, each using a different publication. Thus the amount of
> data that was replicated through each slot was expected to be
> different. However, total_bytes and total_txns were reported the same
> for all the replication slots as expected. One of the slots started
> lagging and we were trying to figure out whether its the WAL sender
> slowing down or the consumer  (in this case Debezium). The lagging
> slot then showed total_txns and total_bytes lesser than other slots
> giving an impression that the WAL sender is processing the data
> slowly. Had pg_stat_replication_slot reported the amount of data
> actually sent downstream, it would have been easier to compare it with
> the amount of data received by the consumer and thus pinpoint the
> bottleneck.
>
> Here's a patch to do the same. It adds two columns
>     - sent_txns: The total number of transactions sent downstream.
>     - sent_bytes: The total number of bytes sent downstream in data messages
> to pg_stat_replication_slots. sent_bytes includes only the bytes sent
> as part of 'd' messages and does not include keep alive messages or
> CopyDone messages for example. But those are very few and can be
> ignored. If others feel that those are important to be included, we
> can make that change.
>
> Plugins may choose not to send an empty transaction downstream. It's
> better to increment sent_txns counter in the plugin code when it
> actually sends a BEGIN message, for example in pgoutput_send_begin()
> and pg_output_begin(). This means that every plugin will need to be
> modified to increment the counter for it to reported correctly.
>

What if some plugin didn't implemented it or does it incorrectly?
Users will then complain that PG view is showing incorrect value.
Shouldn't the plugin specific stats be shown differently, for example,
one may be interested in how much plugin has filtered the data because
it was not published or because something like row_filter caused it
skip sending such data?

--
With Regards,
Amit Kapila.



Re: Report bytes and transactions actually sent downtream

От
Ashutosh Bapat
Дата:
On Tue, Jul 1, 2025 at 4:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Jun 30, 2025 at 3:24 PM Ashutosh Bapat
> <ashutosh.bapat.oss@gmail.com> wrote:
> >
> > Hi All,
> > In a recent logical replication issue, there were multiple replication
> > slots involved, each using a different publication. Thus the amount of
> > data that was replicated through each slot was expected to be
> > different. However, total_bytes and total_txns were reported the same
> > for all the replication slots as expected. One of the slots started
> > lagging and we were trying to figure out whether its the WAL sender
> > slowing down or the consumer  (in this case Debezium). The lagging
> > slot then showed total_txns and total_bytes lesser than other slots
> > giving an impression that the WAL sender is processing the data
> > slowly. Had pg_stat_replication_slot reported the amount of data
> > actually sent downstream, it would have been easier to compare it with
> > the amount of data received by the consumer and thus pinpoint the
> > bottleneck.
> >
> > Here's a patch to do the same. It adds two columns
> >     - sent_txns: The total number of transactions sent downstream.
> >     - sent_bytes: The total number of bytes sent downstream in data messages
> > to pg_stat_replication_slots. sent_bytes includes only the bytes sent
> > as part of 'd' messages and does not include keep alive messages or
> > CopyDone messages for example. But those are very few and can be
> > ignored. If others feel that those are important to be included, we
> > can make that change.
> >
> > Plugins may choose not to send an empty transaction downstream. It's
> > better to increment sent_txns counter in the plugin code when it
> > actually sends a BEGIN message, for example in pgoutput_send_begin()
> > and pg_output_begin(). This means that every plugin will need to be
> > modified to increment the counter for it to reported correctly.
> >
>
> What if some plugin didn't implemented it or does it incorrectly?
> Users will then complain that PG view is showing incorrect value.

That is right.

To fix the problem of plugins not implementing the counter increment
logic we could use logic similar to how we track whether
OutputPluginPrepareWrite() has been called or not. In
ReorderBufferTxn, we add a new member sent_status which would be an
enum with 3 values UNKNOWN, SENT, NOT_SENT. Initially the sent_status
= UNKNOWN. We provide a function called
plugin_sent_txn(ReorderBufferTxn txn, sent bool) which will set
sent_status = SENT when sent = true and sent_status = NOT_SENT when
sent = false. In all the end transaction callback wrappers like
commit_cb_wrapper(), prepare_cb_wrapper(), stream_abort_cb_wrapper(),
stream_commit_cb_wrapper() and stream_prepare_cb_wrapper(), if
tsent_status = UNKNOWN, we throw an error. If sent_status = SENT, we
increment sent_txns. That will catch any plugin which does not call
plugin_set_txn(). The plugin may still call plugin_sent_txn() with
sent = true when it should have called it with sent = false or vice
versa, but that's hard to monitor and control.

Additionally, we should highlight in the document that sent_txns is as
per report from the output plugin so  that users know where to look
for in case they see a wrong/dubious value. I see this similar to what
we do with pg_stat_replication::reply_time which may be wrong if a
non-PG standby reports the wrong value. Documentation says "Send time
of last reply message received from standby server", so the users know
where to look for incase they spot the error.

Does that look good?

I am open to other suggestions.

> Shouldn't the plugin specific stats be shown differently, for example,
> one may be interested in how much plugin has filtered the data because
> it was not published or because something like row_filter caused it
> skip sending such data?
>

That looks useful, we could track the ReorderBufferChange's that were
not sent downstream and add their sizes to another counter
ReorderBuffer::filtered_bytes and report it in
pg_stat_replication_slots. I think we will need to devise a mechanism
similar to above by which the plugin tells core whether a change has
been filtered or not. However, that will not be a replacement for
sent_bytes, since filtered_bytes or total_bytes - filtered_bytes won't
tell us how much data was sent downstream, which is crucial to the
purpose stated in my earlier email.

--
Best Wishes,
Ashutosh Bapat



Re: Report bytes and transactions actually sent downtream

От
Amit Kapila
Дата:
On Tue, Jul 1, 2025 at 7:35 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Tue, Jul 1, 2025 at 4:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, Jun 30, 2025 at 3:24 PM Ashutosh Bapat
> > <ashutosh.bapat.oss@gmail.com> wrote:
> > >
> > > Hi All,
> > > In a recent logical replication issue, there were multiple replication
> > > slots involved, each using a different publication. Thus the amount of
> > > data that was replicated through each slot was expected to be
> > > different. However, total_bytes and total_txns were reported the same
> > > for all the replication slots as expected. One of the slots started
> > > lagging and we were trying to figure out whether its the WAL sender
> > > slowing down or the consumer  (in this case Debezium). The lagging
> > > slot then showed total_txns and total_bytes lesser than other slots
> > > giving an impression that the WAL sender is processing the data
> > > slowly. Had pg_stat_replication_slot reported the amount of data
> > > actually sent downstream, it would have been easier to compare it with
> > > the amount of data received by the consumer and thus pinpoint the
> > > bottleneck.
> > >
> > > Here's a patch to do the same. It adds two columns
> > >     - sent_txns: The total number of transactions sent downstream.
> > >     - sent_bytes: The total number of bytes sent downstream in data messages
> > > to pg_stat_replication_slots. sent_bytes includes only the bytes sent
> > > as part of 'd' messages and does not include keep alive messages or
> > > CopyDone messages for example. But those are very few and can be
> > > ignored. If others feel that those are important to be included, we
> > > can make that change.
> > >
> > > Plugins may choose not to send an empty transaction downstream. It's
> > > better to increment sent_txns counter in the plugin code when it
> > > actually sends a BEGIN message, for example in pgoutput_send_begin()
> > > and pg_output_begin(). This means that every plugin will need to be
> > > modified to increment the counter for it to reported correctly.
> > >
> >
> > What if some plugin didn't implemented it or does it incorrectly?
> > Users will then complain that PG view is showing incorrect value.
>
> That is right.
>
> To fix the problem of plugins not implementing the counter increment
> logic we could use logic similar to how we track whether
> OutputPluginPrepareWrite() has been called or not. In
> ReorderBufferTxn, we add a new member sent_status which would be an
> enum with 3 values UNKNOWN, SENT, NOT_SENT. Initially the sent_status
> = UNKNOWN. We provide a function called
> plugin_sent_txn(ReorderBufferTxn txn, sent bool) which will set
> sent_status = SENT when sent = true and sent_status = NOT_SENT when
> sent = false. In all the end transaction callback wrappers like
> commit_cb_wrapper(), prepare_cb_wrapper(), stream_abort_cb_wrapper(),
> stream_commit_cb_wrapper() and stream_prepare_cb_wrapper(), if
> tsent_status = UNKNOWN, we throw an error.
>

I think we don't want to make it mandatory for plugins to implement
these stats, so instead of throwing ERROR, the view should show that
the plugin doesn't provide stats. How about having OutputPluginStats
similar to OutputPluginCallbacks and OutputPluginOptions members in
LogicalDecodingContext? It will have members like stats_available,
txns_sent or txns_skipped, txns_filtered, etc. I am thinking it will
be better to provide this information in a separate view like
pg_stat_plugin_stats or something like that, here we can report
slot_name, plugin_name, then the other stats we want to implement part
of OutputPluginStats.

--
With Regards,
Amit Kapila.



Re: Report bytes and transactions actually sent downtream

От
Ashutosh Bapat
Дата:
On Sun, Jul 13, 2025 at 4:34 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Jul 1, 2025 at 7:35 PM Ashutosh Bapat
> <ashutosh.bapat.oss@gmail.com> wrote:
> >
> > On Tue, Jul 1, 2025 at 4:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Mon, Jun 30, 2025 at 3:24 PM Ashutosh Bapat
> > > <ashutosh.bapat.oss@gmail.com> wrote:
> > > >
> > > > Hi All,
> > > > In a recent logical replication issue, there were multiple replication
> > > > slots involved, each using a different publication. Thus the amount of
> > > > data that was replicated through each slot was expected to be
> > > > different. However, total_bytes and total_txns were reported the same
> > > > for all the replication slots as expected. One of the slots started
> > > > lagging and we were trying to figure out whether its the WAL sender
> > > > slowing down or the consumer  (in this case Debezium). The lagging
> > > > slot then showed total_txns and total_bytes lesser than other slots
> > > > giving an impression that the WAL sender is processing the data
> > > > slowly. Had pg_stat_replication_slot reported the amount of data
> > > > actually sent downstream, it would have been easier to compare it with
> > > > the amount of data received by the consumer and thus pinpoint the
> > > > bottleneck.
> > > >
> > > > Here's a patch to do the same. It adds two columns
> > > >     - sent_txns: The total number of transactions sent downstream.
> > > >     - sent_bytes: The total number of bytes sent downstream in data messages
> > > > to pg_stat_replication_slots. sent_bytes includes only the bytes sent
> > > > as part of 'd' messages and does not include keep alive messages or
> > > > CopyDone messages for example. But those are very few and can be
> > > > ignored. If others feel that those are important to be included, we
> > > > can make that change.
> > > >
> > > > Plugins may choose not to send an empty transaction downstream. It's
> > > > better to increment sent_txns counter in the plugin code when it
> > > > actually sends a BEGIN message, for example in pgoutput_send_begin()
> > > > and pg_output_begin(). This means that every plugin will need to be
> > > > modified to increment the counter for it to reported correctly.
> > > >
> > >
> > > What if some plugin didn't implemented it or does it incorrectly?
> > > Users will then complain that PG view is showing incorrect value.
> >
> > That is right.
> >
> > To fix the problem of plugins not implementing the counter increment
> > logic we could use logic similar to how we track whether
> > OutputPluginPrepareWrite() has been called or not. In
> > ReorderBufferTxn, we add a new member sent_status which would be an
> > enum with 3 values UNKNOWN, SENT, NOT_SENT. Initially the sent_status
> > = UNKNOWN. We provide a function called
> > plugin_sent_txn(ReorderBufferTxn txn, sent bool) which will set
> > sent_status = SENT when sent = true and sent_status = NOT_SENT when
> > sent = false. In all the end transaction callback wrappers like
> > commit_cb_wrapper(), prepare_cb_wrapper(), stream_abort_cb_wrapper(),
> > stream_commit_cb_wrapper() and stream_prepare_cb_wrapper(), if
> > tsent_status = UNKNOWN, we throw an error.
> >
>
> I think we don't want to make it mandatory for plugins to implement
> these stats, so instead of throwing ERROR, the view should show that
> the plugin doesn't provide stats. How about having OutputPluginStats
> similar to OutputPluginCallbacks and OutputPluginOptions members in
> LogicalDecodingContext? It will have members like stats_available,
> txns_sent or txns_skipped, txns_filtered, etc.

Not making mandatory looks useful. I can try your suggestion. Rather
than having stats_available as a member of OutputPluginStats, it's
better to have a NULL value for the corresponding member in
LogicalDecodingContext. We don't want an output plugin to reset
stats_available once set. Will that work?

> I am thinking it will
> be better to provide this information in a separate view like
> pg_stat_plugin_stats or something like that, here we can report
> slot_name, plugin_name, then the other stats we want to implement part
> of OutputPluginStats.

As you have previously pointed out, the view should make it explicit
that the new stats are maintained by the plugin and not core. I agree
with that intention. However, already have three views
pg_replication_slots (which has slot name and plugin name), then
pg_replication_stats which is about stats maintained by a WAL sender
or running replication and then pg_stat_replication_slots, which is
about accumulated statistics for a replication through a given
replication slot. It's already a bit hard to keep track of who's who
when debugging an issue. Adding one more view will add to confusion.

Instead of adding a new view how about
a. name the columns as plugin_sent_txns, plugin_sent_bytes,
plugin_filtered_change_bytes to make it clear that these columns are
maintained by plugin
b. report these NULL if stats_available = false OR OutputPluginStats
is not set in LogicalDecodingContext
c. Document that NULL value for these columns indicates that the
plugin is not maintaining/reporting these stats
d. adding plugin name to pg_stat_replication_slots, that will make it
easy for users to know which plugin they should look at in case of
dubious or unavailable stats

--
Best Wishes,
Ashutosh Bapat



Re: Report bytes and transactions actually sent downtream

От
Amit Kapila
Дата:
On Mon, Jul 14, 2025 at 10:55 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Sun, Jul 13, 2025 at 4:34 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> >
> > I think we don't want to make it mandatory for plugins to implement
> > these stats, so instead of throwing ERROR, the view should show that
> > the plugin doesn't provide stats. How about having OutputPluginStats
> > similar to OutputPluginCallbacks and OutputPluginOptions members in
> > LogicalDecodingContext? It will have members like stats_available,
> > txns_sent or txns_skipped, txns_filtered, etc.
>
> Not making mandatory looks useful. I can try your suggestion. Rather
> than having stats_available as a member of OutputPluginStats, it's
> better to have a NULL value for the corresponding member in
> LogicalDecodingContext. We don't want an output plugin to reset
> stats_available once set. Will that work?
>

We can try that.

> > I am thinking it will
> > be better to provide this information in a separate view like
> > pg_stat_plugin_stats or something like that, here we can report
> > slot_name, plugin_name, then the other stats we want to implement part
> > of OutputPluginStats.
>
> As you have previously pointed out, the view should make it explicit
> that the new stats are maintained by the plugin and not core. I agree
> with that intention. However, already have three views
> pg_replication_slots (which has slot name and plugin name), then
> pg_replication_stats which is about stats maintained by a WAL sender
> or running replication and then pg_stat_replication_slots, which is
> about accumulated statistics for a replication through a given
> replication slot. It's already a bit hard to keep track of who's who
> when debugging an issue. Adding one more view will add to confusion.
>
> Instead of adding a new view how about
> a. name the columns as plugin_sent_txns, plugin_sent_bytes,
> plugin_filtered_change_bytes to make it clear that these columns are
> maintained by plugin
> b. report these NULL if stats_available = false OR OutputPluginStats
> is not set in LogicalDecodingContext
> c. Document that NULL value for these columns indicates that the
> plugin is not maintaining/reporting these stats
> d. adding plugin name to pg_stat_replication_slots, that will make it
> easy for users to know which plugin they should look at in case of
> dubious or unavailable stats
>

Sounds reasonable.

--
With Regards,
Amit Kapila.



Re: Report bytes and transactions actually sent downtream

От
Ashutosh Bapat
Дата:
Hi Amit,

On Mon, Jul 14, 2025 at 3:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Jul 14, 2025 at 10:55 AM Ashutosh Bapat
> <ashutosh.bapat.oss@gmail.com> wrote:
> >
> > On Sun, Jul 13, 2025 at 4:34 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > >
> > > I think we don't want to make it mandatory for plugins to implement
> > > these stats, so instead of throwing ERROR, the view should show that
> > > the plugin doesn't provide stats. How about having OutputPluginStats
> > > similar to OutputPluginCallbacks and OutputPluginOptions members in
> > > LogicalDecodingContext? It will have members like stats_available,
> > > txns_sent or txns_skipped, txns_filtered, etc.
> >
> > Not making mandatory looks useful. I can try your suggestion. Rather
> > than having stats_available as a member of OutputPluginStats, it's
> > better to have a NULL value for the corresponding member in
> > LogicalDecodingContext. We don't want an output plugin to reset
> > stats_available once set. Will that work?
> >
>
> We can try that.
>
> > > I am thinking it will
> > > be better to provide this information in a separate view like
> > > pg_stat_plugin_stats or something like that, here we can report
> > > slot_name, plugin_name, then the other stats we want to implement part
> > > of OutputPluginStats.
> >
> > As you have previously pointed out, the view should make it explicit
> > that the new stats are maintained by the plugin and not core. I agree
> > with that intention. However, already have three views
> > pg_replication_slots (which has slot name and plugin name), then
> > pg_replication_stats which is about stats maintained by a WAL sender
> > or running replication and then pg_stat_replication_slots, which is
> > about accumulated statistics for a replication through a given
> > replication slot. It's already a bit hard to keep track of who's who
> > when debugging an issue. Adding one more view will add to confusion.
> >
> > Instead of adding a new view how about
> > a. name the columns as plugin_sent_txns, plugin_sent_bytes,
> > plugin_filtered_change_bytes to make it clear that these columns are
> > maintained by plugin
> > b. report these NULL if stats_available = false OR OutputPluginStats
> > is not set in LogicalDecodingContext
> > c. Document that NULL value for these columns indicates that the
> > plugin is not maintaining/reporting these stats
> > d. adding plugin name to pg_stat_replication_slots, that will make it
> > easy for users to know which plugin they should look at in case of
> > dubious or unavailable stats
> >
>
> Sounds reasonable.

Here's the next patch which considers all the discussion so far. It
adds four fields to pg_stat_replication_slots.
    - plugin - name of the output plugin
    - plugin_filtered_bytes - reports the amount of changes filtered
out by the output plugin
    - plugin_sent_txns - the amount of transactions sent downstream by
the output plugin
    - plugin_sent_bytes - the amount of data sent downstream by the
outputplugin.

There are some points up for a discussion:
1. pg_stat_reset_replication_slot() zeroes out the statistics entry by
calling pgstat_reset() or pgstat_reset_of_kind() which don't know
about the contents of the entry. So
PgStat_StatReplSlotEntry::plugin_has_stats is set to false and plugin
stats are reported as NULL, instead of zero, immediately after reset.
This is the same case when the stats is queried immediately after the
statistics is initialized and before any stats are reported. We could
instead make it report
zero, if we save the plugin_has_stats and restore it after reset. But
doing that in pgstat_reset_of_kind() seems like an extra overhead + we
will need to write a function to find all replication slot entries.
Resetting the stats would be a rare event in practice. Trying to
report 0 instead of NULL in that rare case doesn't seem to be worth
the efforts and code. Given that the core code doesn't know whether a
given plugin reports stats or not, I think this behaviour is
appropriate as long as we document it. Please let me know if the
documentation in the patch is clear enough.

2. There's also a bit of asymmetry in the way sent_bytes is handled.
The code which actually sends the logical changes to the downstream is
part of the core code
but the format of the change and hence the number of bytes sent is
decided by the plugin. It's a stat related to plugin but maintained by
the core code. The patch implements it as a plugin stat (so the
corresponding column has "plugin" prefix and is also reported as NULL
upon reset etc.), but we may want to reconsider how to report and
maintain it.

3. The names of new columns have the prefix "plugin_" but the internal
variables tracking those don't for the sake of brevity. If you prefer
to have the same prefix for the internal variables, I can change that.

I think I have covered all the cases where filteredbytes should be
incremented, but please let me know if I have missed any.

--
Best Wishes,
Ashutosh Bapat

Вложения

Re: Report bytes and transactions actually sent downtream

От
Bertrand Drouvot
Дата:
Hi,

On Thu, Jul 24, 2025 at 12:24:26PM +0530, Ashutosh Bapat wrote:
> Here's the next patch which considers all the discussion so far. It
> adds four fields to pg_stat_replication_slots.
>     - plugin - name of the output plugin

Is this one needed? (we could get it with a join on pg_replication_slots)

>     - plugin_filtered_bytes - reports the amount of changes filtered
> out by the output plugin
>     - plugin_sent_txns - the amount of transactions sent downstream by
> the output plugin
>     - plugin_sent_bytes - the amount of data sent downstream by the
> outputplugin.
> 
> There are some points up for a discussion:
> 1. pg_stat_reset_replication_slot() zeroes out the statistics entry by
> calling pgstat_reset() or pgstat_reset_of_kind() which don't know
> about the contents of the entry. So
> PgStat_StatReplSlotEntry::plugin_has_stats is set to false and plugin
> stats are reported as NULL, instead of zero, immediately after reset.
> This is the same case when the stats is queried immediately after the
> statistics is initialized and before any stats are reported. We could
> instead make it report
> zero, if we save the plugin_has_stats and restore it after reset. But
> doing that in pgstat_reset_of_kind() seems like an extra overhead + we
> will need to write a function to find all replication slot entries.

Could we store plugin_has_stats in ReplicationSlotPersistentData instead? That
way it would not be reset. We would need to access ReplicationSlotPersistentData
in pg_stat_get_replication_slot though.

Also would that make sense to expose plugin_has_stats in pg_replication_slots?

> 2. There's also a bit of asymmetry in the way sent_bytes is handled.
> The code which actually sends the logical changes to the downstream is
> part of the core code
> but the format of the change and hence the number of bytes sent is
> decided by the plugin. It's a stat related to plugin but maintained by
> the core code. The patch implements it as a plugin stat (so the
> corresponding column has "plugin" prefix

The way it is done makes sense to me.
 
> 3. The names of new columns have the prefix "plugin_" but the internal
> variables tracking those don't for the sake of brevity. If you prefer
> to have the same prefix for the internal variables, I can change that.

Just my taste: I do prefer when they match.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Report bytes and transactions actually sent downtream

От
shveta malik
Дата:
On Wed, Aug 27, 2025 at 7:14 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On Thu, Jul 24, 2025 at 12:24:26PM +0530, Ashutosh Bapat wrote:
> > Here's the next patch which considers all the discussion so far. It
> > adds four fields to pg_stat_replication_slots.
> >     - plugin - name of the output plugin
>
> Is this one needed? (we could get it with a join on pg_replication_slots)
>

In my opinion, when there are other plugin_* fields present, including
the plugin name directly here seems like a better approach. So, +1 for
the plugin field.

> >     - plugin_filtered_bytes - reports the amount of changes filtered
> > out by the output plugin
> >     - plugin_sent_txns - the amount of transactions sent downstream by
> > the output plugin
> >     - plugin_sent_bytes - the amount of data sent downstream by the
> > outputplugin.
> >
> > There are some points up for a discussion:
> > 1. pg_stat_reset_replication_slot() zeroes out the statistics entry by
> > calling pgstat_reset() or pgstat_reset_of_kind() which don't know
> > about the contents of the entry. So
> > PgStat_StatReplSlotEntry::plugin_has_stats is set to false and plugin
> > stats are reported as NULL, instead of zero, immediately after reset.
> > This is the same case when the stats is queried immediately after the
> > statistics is initialized and before any stats are reported. We could
> > instead make it report
> > zero, if we save the plugin_has_stats and restore it after reset. But
> > doing that in pgstat_reset_of_kind() seems like an extra overhead + we
> > will need to write a function to find all replication slot entries.

I tried to think of an approach where we can differentiate between the
cases 'not initialized' and 'reset' ones with the values. Say instead
of plugin_has_stats, if we have plugin_stats_status, then we can
maintain status like -1(not initialized), 0(reset). But this too will
complicate the code further. Personally, I’m okay with NULL values
appearing even after a reset, especially since the documentation
explains this clearly.

>
> > 2. There's also a bit of asymmetry in the way sent_bytes is handled.
> > The code which actually sends the logical changes to the downstream is
> > part of the core code
> > but the format of the change and hence the number of bytes sent is
> > decided by the plugin. It's a stat related to plugin but maintained by
> > the core code. The patch implements it as a plugin stat (so the
> > corresponding column has "plugin" prefix
>
> The way it is done makes sense to me.
>
> > 3. The names of new columns have the prefix "plugin_" but the internal
> > variables tracking those don't for the sake of brevity. If you prefer
> > to have the same prefix for the internal variables, I can change that.
>

I am okay either way.

Few comments:

1)
postgres=# select slot_name,
total_bytes,plugin_filtered_bytes,plugin_sent_bytes  from
pg_stat_replication_slots order by slot_name;
 slot_name | total_bytes | plugin_filtered_bytes | plugin_sent_bytes
-----------+-------------+-----------------------+-------------------
 slot1     |      800636 |                793188 |               211
 sub1      |      401496 |                132712 |             84041
 sub2      |      401496 |                396184 |               674
 sub3      |      401496 |                145912 |             79959
(4 rows)

Currently it looks quite confusing. 'total_bytes' gives a sense that
it has to be a sum of filtered and sent. But they are no way like
that. In the thread earlier there was a proposal to change the name to
reordered_txns, reordered_bytes. That looks better to me. It will give
clarity without even someone digging into docs.

2)
Tried to verify all filtered data tests, seems to work well. Also  I
tried tracking the usage of OutputPluginWrite() to see if there is any
other place where data needs to be considered as filtered-data.
Encountered this:

send_relation_and_attrs has:
                if (!logicalrep_should_publish_column(att, columns,

                   include_gencols_type))
                        continue;
                if (att->atttypid < FirstGenbkiObjectId)
                        continue;

But I don't think it needs to be considered as filtered data. This is
mostly schema related info. But I wanted to confirm once. Thoughts?

3)
+-- total_txns may vary based on the background activity but sent_txns should
+-- always be 1 since the background transactions are always skipped. Filtered
+-- bytes would be set only when there's a change that was passed to the plugin
+-- but was filtered out. Depending upon the background transactions, filtered
+-- bytes may or may not be zero.
+SELECT slot_name, spill_txns = 0 AS spill_txns, spill_count = 0 AS
spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS
total_bytes, plugin_sent_txns, plugin_sent_bytes > 0 AS sent_bytes,
plugin_filtered_bytes >= 0 AS filtered_bytes FROM
pg_stat_replication_slots ORDER BY slot_name;

In comment either we can say plugin_sent_txns instead of sent_txns or
in the query we can fetch plugin_sent_txns AS  sent_txns, so that we
can relate comment and query.


4)
+      <literal>sentTxns</literal> is the number of transactions sent downstream
+      by the output plugin. <literal>sentBytes</literal> is the amount of data
+      sent downstream by the output plugin.
+      <function>OutputPluginWrite</function> is expected to update this counter
+      if <literal>ctx->stats</literal> is initialized by the output plugin.
+      <literal>filteredBytes</literal> is the size of changes in bytes that are
+      filtered out by the output plugin. Function
+      <literal>ReorderBufferChangeSize</literal> may be used to find
the size of
+      filtered <literal>ReorderBufferChange</literal>.
+     </para>

Either we can mention units as 'bytes' for both filteredBytes and
sentBytes or for none. Currently filteredBytes says 'in bytes' while
sentBytes does not.

thanks
Shveta



Re: Report bytes and transactions actually sent downtream

От
Ashutosh Bapat
Дата:
Hi Shveta, Bertrand,

Replying to both of your review comments together.

On Thu, Sep 18, 2025 at 10:52 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Wed, Aug 27, 2025 at 7:14 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > Hi,
> >
> > On Thu, Jul 24, 2025 at 12:24:26PM +0530, Ashutosh Bapat wrote:
> > > Here's the next patch which considers all the discussion so far. It
> > > adds four fields to pg_stat_replication_slots.
> > >     - plugin - name of the output plugin
> >
> > Is this one needed? (we could get it with a join on pg_replication_slots)
> >
>
> In my opinion, when there are other plugin_* fields present, including
> the plugin name directly here seems like a better approach. So, +1 for
> the plugin field.

Yeah. I think so too.

>
> > >     - plugin_filtered_bytes - reports the amount of changes filtered
> > > out by the output plugin
> > >     - plugin_sent_txns - the amount of transactions sent downstream by
> > > the output plugin
> > >     - plugin_sent_bytes - the amount of data sent downstream by the
> > > outputplugin.
> > >
> > > There are some points up for a discussion:
> > > 1. pg_stat_reset_replication_slot() zeroes out the statistics entry by
> > > calling pgstat_reset() or pgstat_reset_of_kind() which don't know
> > > about the contents of the entry. So
> > > PgStat_StatReplSlotEntry::plugin_has_stats is set to false and plugin
> > > stats are reported as NULL, instead of zero, immediately after reset.
> > > This is the same case when the stats is queried immediately after the
> > > statistics is initialized and before any stats are reported. We could
> > > instead make it report
> > > zero, if we save the plugin_has_stats and restore it after reset. But
> > > doing that in pgstat_reset_of_kind() seems like an extra overhead + we
> > > will need to write a function to find all replication slot entries.
>
> I tried to think of an approach where we can differentiate between the
> cases 'not initialized' and 'reset' ones with the values. Say instead
> of plugin_has_stats, if we have plugin_stats_status, then we can
> maintain status like -1(not initialized), 0(reset). But this too will
> complicate the code further. Personally, I’m okay with NULL values
> appearing even after a reset, especially since the documentation
> explains this clearly.

Ok. Great.

>
> > Could we store plugin_has_stats in ReplicationSlotPersistentData instead? That
> > way it would not be reset. We would need to access ReplicationSlotPersistentData
> > in pg_stat_get_replication_slot though.
>
> > Also would that make sense to expose plugin_has_stats in pg_replication_slots?
>

A plugin may change its decision to support the stats across versions,
we won't be able to tell when it changes that decision and thus
reflect it accurately in ReplicationSlotPersistentData. Doing it in
startup gives the opportunity to the plugin to change it as often as
it wants OR even based on some plugin specific configurations. Further
ReplicationSlotPersistentData is maintained by the core. It will not
be a good place to store something plugin specific.

> >
> > > 2. There's also a bit of asymmetry in the way sent_bytes is handled.
> > > The code which actually sends the logical changes to the downstream is
> > > part of the core code
> > > but the format of the change and hence the number of bytes sent is
> > > decided by the plugin. It's a stat related to plugin but maintained by
> > > the core code. The patch implements it as a plugin stat (so the
> > > corresponding column has "plugin" prefix
> >
> > The way it is done makes sense to me.

Great.

> >
> > > 3. The names of new columns have the prefix "plugin_" but the internal
> > > variables tracking those don't for the sake of brevity. If you prefer
> > > to have the same prefix for the internal variables, I can change that.
> >
>
> I am okay either way.
>
> > Just my taste: I do prefer when they match.

I don't see a strong preference to change what's there in the patch.
Let's wait for more reviews.

>
> Few comments:
>
> 1)
> postgres=# select slot_name,
> total_bytes,plugin_filtered_bytes,plugin_sent_bytes  from
> pg_stat_replication_slots order by slot_name;
>  slot_name | total_bytes | plugin_filtered_bytes | plugin_sent_bytes
> -----------+-------------+-----------------------+-------------------
>  slot1     |      800636 |                793188 |               211
>  sub1      |      401496 |                132712 |             84041
>  sub2      |      401496 |                396184 |               674
>  sub3      |      401496 |                145912 |             79959
> (4 rows)
>
> Currently it looks quite confusing. 'total_bytes' gives a sense that
> it has to be a sum of filtered and sent. But they are no way like
> that. In the thread earlier there was a proposal to change the name to
> reordered_txns, reordered_bytes. That looks better to me. It will give
> clarity without even someone digging into docs.

I also agree with that. But that will break backward compatibility. Do
you think other columns like spill_* and stream_* should also be
renamed with the prefix "reordered"?

>
> 2)
> Tried to verify all filtered data tests, seems to work well. Also  I
> tried tracking the usage of OutputPluginWrite() to see if there is any
> other place where data needs to be considered as filtered-data.
> Encountered this:
>
> send_relation_and_attrs has:
>                 if (!logicalrep_should_publish_column(att, columns,
>
>                    include_gencols_type))
>                         continue;
>                 if (att->atttypid < FirstGenbkiObjectId)
>                         continue;
>
> But I don't think it needs to be considered as filtered data. This is
> mostly schema related info. But I wanted to confirm once. Thoughts?

Yeah. It's part of metadata which in turn is sent only when needed.
It's not part of, say, transaction changes. So it can't be considered
as filtering.

>
> 3)
> +-- total_txns may vary based on the background activity but sent_txns should
> +-- always be 1 since the background transactions are always skipped. Filtered
> +-- bytes would be set only when there's a change that was passed to the plugin
> +-- but was filtered out. Depending upon the background transactions, filtered
> +-- bytes may or may not be zero.
> +SELECT slot_name, spill_txns = 0 AS spill_txns, spill_count = 0 AS
> spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS
> total_bytes, plugin_sent_txns, plugin_sent_bytes > 0 AS sent_bytes,
> plugin_filtered_bytes >= 0 AS filtered_bytes FROM
> pg_stat_replication_slots ORDER BY slot_name;
>
> In comment either we can say plugin_sent_txns instead of sent_txns or
> in the query we can fetch plugin_sent_txns AS  sent_txns, so that we
> can relate comment and query.
>

Used plugin_sent_txns in the comment as well as query.

>
> 4)
> +      <literal>sentTxns</literal> is the number of transactions sent downstream
> +      by the output plugin. <literal>sentBytes</literal> is the amount of data
> +      sent downstream by the output plugin.
> +      <function>OutputPluginWrite</function> is expected to update this counter
> +      if <literal>ctx->stats</literal> is initialized by the output plugin.
> +      <literal>filteredBytes</literal> is the size of changes in bytes that are
> +      filtered out by the output plugin. Function
> +      <literal>ReorderBufferChangeSize</literal> may be used to find
> the size of
> +      filtered <literal>ReorderBufferChange</literal>.
> +     </para>
>
> Either we can mention units as 'bytes' for both filteredBytes and
> sentBytes or for none. Currently filteredBytes says 'in bytes' while
> sentBytes does not.

Used 'in bytes' in both the places.

Thanks for your review. I will include these changes in the next set of patches.

--
Best Wishes,
Ashutosh Bapat



Re: Report bytes and transactions actually sent downtream

От
shveta malik
Дата:
On Thu, Sep 18, 2025 at 3:54 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
>
> >
> > Few comments:
> >
> > 1)
> > postgres=# select slot_name,
> > total_bytes,plugin_filtered_bytes,plugin_sent_bytes  from
> > pg_stat_replication_slots order by slot_name;
> >  slot_name | total_bytes | plugin_filtered_bytes | plugin_sent_bytes
> > -----------+-------------+-----------------------+-------------------
> >  slot1     |      800636 |                793188 |               211
> >  sub1      |      401496 |                132712 |             84041
> >  sub2      |      401496 |                396184 |               674
> >  sub3      |      401496 |                145912 |             79959
> > (4 rows)
> >
> > Currently it looks quite confusing. 'total_bytes' gives a sense that
> > it has to be a sum of filtered and sent. But they are no way like
> > that. In the thread earlier there was a proposal to change the name to
> > reordered_txns, reordered_bytes. That looks better to me. It will give
> > clarity without even someone digging into docs.
>
> I also agree with that. But that will break backward compatibility.

Yes, that it will do.

> Do
> you think other columns like spill_* and stream_* should also be
> renamed with the prefix "reordered"?
>

Okay, I see that all fields in pg_stat_replication_slots are related
to the ReorderBuffer. On reconsideration, I’m unsure whether it's
appropriate to prefix all of them with reorderd_. For example,
renaming spill_bytes and stream_bytes to reordered_spill_bytes and
reordered_stream_bytes. These names start to feel overly long, and I
also noticed that ReorderBuffer isn’t clearly defined anywhere in the
documentation (or at least I couldn’t find it), even though the term
'reorder buffer' does appear in a few places.

As an example, see ReorderBufferRead, ReorderBufferWrite  wait-types
at [1]. Also in plugin-doc [2], we use 'ReorderBufferTXN'. And now, we
are adding: ReorderBufferChangeSize, ReorderBufferChange

This gives me a feeling, will it be better to let
pg_stat_replication_slots as is and add a brief ReorderBuffer section
under Logical Decoding concepts [3] just before Output Plugins. And
then, pg_stat_replication_slots can refer to that section, clarifying
that the bytes, counts, and txn fields pertain to ReorderBuffer
(without changing any of the fields).

And then to define plugin related data, we can have a new view, say
pg_stat_plugin_stats (as Amit suggested earlier) or
pg_stat_replication_plugins. I understand that adding a new view might
not be desirable, but it provides better clarity without requiring
changes to the existing fields in pg_stat_replication_slots. I also
strongly feel that to properly tie all this information together, a
brief definition of the ReorderBuffer is needed. Other pages that
reference this term can then point to that section. Thoughts?

[1]: https://www.postgresql.org/docs/17/monitoring-stats.html#WAIT-EVENT-IO-TABLE
[2]: https://www.postgresql.org/docs/17/logicaldecoding-output-plugin.html
[3]: https://www.postgresql.org/docs/17/logicaldecoding-explanation.html#LOGICALDECODING-EXPLANATION

thanks
Shveta



Re: Report bytes and transactions actually sent downtream

От
Ashutosh Bapat
Дата:
On Fri, Sep 19, 2025 at 11:48 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Thu, Sep 18, 2025 at 3:54 PM Ashutosh Bapat
> <ashutosh.bapat.oss@gmail.com> wrote:
> >
> >
> > >
> > > Few comments:
> > >
> > > 1)
> > > postgres=# select slot_name,
> > > total_bytes,plugin_filtered_bytes,plugin_sent_bytes  from
> > > pg_stat_replication_slots order by slot_name;
> > >  slot_name | total_bytes | plugin_filtered_bytes | plugin_sent_bytes
> > > -----------+-------------+-----------------------+-------------------
> > >  slot1     |      800636 |                793188 |               211
> > >  sub1      |      401496 |                132712 |             84041
> > >  sub2      |      401496 |                396184 |               674
> > >  sub3      |      401496 |                145912 |             79959
> > > (4 rows)
> > >
> > > Currently it looks quite confusing. 'total_bytes' gives a sense that
> > > it has to be a sum of filtered and sent. But they are no way like
> > > that. In the thread earlier there was a proposal to change the name to
> > > reordered_txns, reordered_bytes. That looks better to me. It will give
> > > clarity without even someone digging into docs.
> >
> > I also agree with that. But that will break backward compatibility.
>
> Yes, that it will do.
>
> > Do
> > you think other columns like spill_* and stream_* should also be
> > renamed with the prefix "reordered"?
> >
>
> Okay, I see that all fields in pg_stat_replication_slots are related
> to the ReorderBuffer. On reconsideration, I’m unsure whether it's
> appropriate to prefix all of them with reorderd_. For example,
> renaming spill_bytes and stream_bytes to reordered_spill_bytes and
> reordered_stream_bytes. These names start to feel overly long, and I
> also noticed that ReorderBuffer isn’t clearly defined anywhere in the
> documentation (or at least I couldn’t find it), even though the term
> 'reorder buffer' does appear in a few places.
>
> As an example, see ReorderBufferRead, ReorderBufferWrite  wait-types
> at [1]. Also in plugin-doc [2], we use 'ReorderBufferTXN'. And now, we
> are adding: ReorderBufferChangeSize, ReorderBufferChange
>
> This gives me a feeling, will it be better to let
> pg_stat_replication_slots as is and add a brief ReorderBuffer section
> under Logical Decoding concepts [3] just before Output Plugins. And
> then, pg_stat_replication_slots can refer to that section, clarifying
> that the bytes, counts, and txn fields pertain to ReorderBuffer
> (without changing any of the fields).
>
> And then to define plugin related data, we can have a new view, say
> pg_stat_plugin_stats (as Amit suggested earlier) or
> pg_stat_replication_plugins. I understand that adding a new view might
> not be desirable, but it provides better clarity without requiring
> changes to the existing fields in pg_stat_replication_slots. I also
> strongly feel that to properly tie all this information together, a
> brief definition of the ReorderBuffer is needed. Other pages that
> reference this term can then point to that section. Thoughts?

Even if we keep two views, when they are joined, users will still get
confused by total_* names. So it's not solving the underlying problem.
Andres had raised the point about renaming total_* fields with me
off-list earlier. He suggested names total_wal_bytes, and
total_wal_txns in an off list discussion today. I think those convey
the true meaning - that these are txns and bytes that come from WAL.
Used those in the attached patches. Prefix reordered would give away
lower level details, so I didn't use it.

I agree that it would be good to mention ReorderBuffer in the logical
decoding concepts section since it mentions structures ReorderBuffer*.
But that would be a separate patch since we aren't using "reordered"
in the names of the fields.

0001 is the previous patch
0002 changes addressing your and Bertrand's comments.

--
Best Wishes,
Ashutosh Bapat

Вложения

Re: Report bytes and transactions actually sent downtream

От
shveta malik
Дата:
On Fri, Sep 19, 2025 at 8:11 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Fri, Sep 19, 2025 at 11:48 AM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Thu, Sep 18, 2025 at 3:54 PM Ashutosh Bapat
> > <ashutosh.bapat.oss@gmail.com> wrote:
> > >
> > >
> > > >
> > > > Few comments:
> > > >
> > > > 1)
> > > > postgres=# select slot_name,
> > > > total_bytes,plugin_filtered_bytes,plugin_sent_bytes  from
> > > > pg_stat_replication_slots order by slot_name;
> > > >  slot_name | total_bytes | plugin_filtered_bytes | plugin_sent_bytes
> > > > -----------+-------------+-----------------------+-------------------
> > > >  slot1     |      800636 |                793188 |               211
> > > >  sub1      |      401496 |                132712 |             84041
> > > >  sub2      |      401496 |                396184 |               674
> > > >  sub3      |      401496 |                145912 |             79959
> > > > (4 rows)
> > > >
> > > > Currently it looks quite confusing. 'total_bytes' gives a sense that
> > > > it has to be a sum of filtered and sent. But they are no way like
> > > > that. In the thread earlier there was a proposal to change the name to
> > > > reordered_txns, reordered_bytes. That looks better to me. It will give
> > > > clarity without even someone digging into docs.
> > >
> > > I also agree with that. But that will break backward compatibility.
> >
> > Yes, that it will do.
> >
> > > Do
> > > you think other columns like spill_* and stream_* should also be
> > > renamed with the prefix "reordered"?
> > >
> >
> > Okay, I see that all fields in pg_stat_replication_slots are related
> > to the ReorderBuffer. On reconsideration, I’m unsure whether it's
> > appropriate to prefix all of them with reorderd_. For example,
> > renaming spill_bytes and stream_bytes to reordered_spill_bytes and
> > reordered_stream_bytes. These names start to feel overly long, and I
> > also noticed that ReorderBuffer isn’t clearly defined anywhere in the
> > documentation (or at least I couldn’t find it), even though the term
> > 'reorder buffer' does appear in a few places.
> >
> > As an example, see ReorderBufferRead, ReorderBufferWrite  wait-types
> > at [1]. Also in plugin-doc [2], we use 'ReorderBufferTXN'. And now, we
> > are adding: ReorderBufferChangeSize, ReorderBufferChange
> >
> > This gives me a feeling, will it be better to let
> > pg_stat_replication_slots as is and add a brief ReorderBuffer section
> > under Logical Decoding concepts [3] just before Output Plugins. And
> > then, pg_stat_replication_slots can refer to that section, clarifying
> > that the bytes, counts, and txn fields pertain to ReorderBuffer
> > (without changing any of the fields).
> >
> > And then to define plugin related data, we can have a new view, say
> > pg_stat_plugin_stats (as Amit suggested earlier) or
> > pg_stat_replication_plugins. I understand that adding a new view might
> > not be desirable, but it provides better clarity without requiring
> > changes to the existing fields in pg_stat_replication_slots. I also
> > strongly feel that to properly tie all this information together, a
> > brief definition of the ReorderBuffer is needed. Other pages that
> > reference this term can then point to that section. Thoughts?
>
> Even if we keep two views, when they are joined, users will still get
> confused by total_* names. So it's not solving the underlying problem.

Okay, I see your point.

> Andres had raised the point about renaming total_* fields with me
> off-list earlier. He suggested names total_wal_bytes, and
> total_wal_txns in an off list discussion today. I think those convey
> the true meaning - that these are txns and bytes that come from WAL.

I agree.

> Used those in the attached patches. Prefix reordered would give away
> lower level details, so I didn't use it.
>
> I agree that it would be good to mention ReorderBuffer in the logical
> decoding concepts section since it mentions structures ReorderBuffer*.
> But that would be a separate patch since we aren't using "reordered"
> in the names of the fields.

Okay.

> 0001 is the previous patch
> 0002 changes addressing your and Bertrand's comments.
>

Few trivial comments:

1)
Currently the doc says:

sentTxns is the number of transactions sent downstream by the output
plugin. sentBytes is the amount of data, in bytes, sent downstream by
the output plugin. OutputPluginWrite will update this counter if
ctx->stats is initialized by the output plugin. filteredBytes is the
size of changes, in bytes, that are filtered out by the output plugin.
Function ReorderBufferChangeSize may be used to find the size of
filtered ReorderBufferChange.

Shall we rearrange it to:

sentTxns is the number of transactions sent downstream by the output
plugin. sentBytes is the amount of data, in bytes, sent downstream by
the output plugin. filteredBytes is the size of changes, in bytes,
that are filtered out by the output plugin. OutputPluginWrite will
update these counters if ctx->stats is initialized by the output
plugin.
The function ReorderBufferChangeSize can be used to compute the size
of a filtered ReorderBufferChange, i.e., the filteredBytes.

2)
My preference will be to rename the fields 'total_txns' and
'total_bytes' in PgStat_StatReplSlotEntry to 'total_wal_txns' and
'total_wal_bytes' for better clarity. Additionally, upon rethinking,
it seems better to me that plugin-related fields are also named as
plugin_* to clearly indicate their association. OTOH, in
OutputPluginStats, the field names are fine as is, since the structure
name itself clearly indicates these are plugin-related fields.
PgStat_StatReplSlotEntry lacks such context and thus using full
descriptive names there would improve clarity.

3)
LogicalOutputWrite:
+ if (ctx->stats)
+ ctx->stats->sentBytes += ctx->out->len + sizeof(XLogRecPtr) +
sizeof(TransactionId);
  p->returned_rows++;

A blank line after the new change will increase readability.

~~

In my testing, the patch works as expected. Thanks!

thanks
Shveta



Re: Report bytes and transactions actually sent downtream

От
Bertrand Drouvot
Дата:
Hi,

On Fri, Sep 19, 2025 at 08:11:23PM +0530, Ashutosh Bapat wrote:
> On Fri, Sep 19, 2025 at 11:48 AM shveta malik <shveta.malik@gmail.com> wrote:
> >
> 0001 is the previous patch
> 0002 changes addressing your and Bertrand's comments.

Thanks for the new patch version!

I did not look closely to the code yet but did some testing and I've one remark
regarding plugin_filtered_bytes: It looks ok when a publication is doing rows
filtering but when I:

- create a table and use pg_logical_slot_get_changes with ('skip-empty-xacts', '0')
then I see plugin_sent_bytes increasing (which makes sense).

- create a table and use pg_logical_slot_get_changes with ('skip-empty-xacts', '1')
then I don't see plugin_sent_bytes increasing (which makes sense) but I also don't
see plugin_filtered_bytes increasing. I think that would make sense to also increase
plugin_filtered_bytes in this case (and for the other options that would skip
sending data). Thoughts?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Report bytes and transactions actually sent downtream

От
Ashutosh Bapat
Дата:
On Mon, Sep 22, 2025 at 10:44 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> Few trivial comments:
>
> 1)
> Currently the doc says:
>
> sentTxns is the number of transactions sent downstream by the output
> plugin. sentBytes is the amount of data, in bytes, sent downstream by
> the output plugin. OutputPluginWrite will update this counter if
> ctx->stats is initialized by the output plugin. filteredBytes is the
> size of changes, in bytes, that are filtered out by the output plugin.
> Function ReorderBufferChangeSize may be used to find the size of
> filtered ReorderBufferChange.
>
> Shall we rearrange it to:
>
> sentTxns is the number of transactions sent downstream by the output
> plugin. sentBytes is the amount of data, in bytes, sent downstream by
> the output plugin. filteredBytes is the size of changes, in bytes,
> that are filtered out by the output plugin. OutputPluginWrite will
> update these counters if ctx->stats is initialized by the output
> plugin.
> The function ReorderBufferChangeSize can be used to compute the size
> of a filtered ReorderBufferChange, i.e., the filteredBytes.
>

Only sentBytes is incremented by OutputPluginWrite(), so saying that
it will update counters is not correct. But I think you intend to keep
description of all the fields together followed by any additional
information. How about the following
      <literal>sentTxns</literal> is the number of transactions sent downstream
      by the output plugin. <literal>sentBytes</literal> is the amount of data,
      in bytes, sent downstream by the output plugin.
      <literal>filteredBytes</literal> is the size of changes, in bytes, that
      are filtered out by the output plugin.
      <function>OutputPluginWrite</function> will update
      <literal>sentBytes</literal> if <literal>ctx->stats</literal> is
      initialized by the output plugin. Function
      <literal>ReorderBufferChangeSize</literal> may be used to find the size of
      filtered <literal>ReorderBufferChange</literal>.

> 2)
> My preference will be to rename the fields 'total_txns' and
> 'total_bytes' in PgStat_StatReplSlotEntry to 'total_wal_txns' and
> 'total_wal_bytes' for better clarity. Additionally, upon rethinking,
> it seems better to me that plugin-related fields are also named as
> plugin_* to clearly indicate their association. OTOH, in
> OutputPluginStats, the field names are fine as is, since the structure
> name itself clearly indicates these are plugin-related fields.
> PgStat_StatReplSlotEntry lacks such context and thus using full
> descriptive names there would improve clarity.

Ok. Done.

>
> 3)
> LogicalOutputWrite:
> + if (ctx->stats)
> + ctx->stats->sentBytes += ctx->out->len + sizeof(XLogRecPtr) +
> sizeof(TransactionId);
>   p->returned_rows++;
>
> A blank line after the new change will increase readability.
>

Ok.

> ~~
>
> In my testing, the patch works as expected. Thanks!

Thanks for testing. Can we include any of your tests in the patch? Are
the tests in patch enough?

Applied those suggestions in my repository. Do you have any further
review comments?

--
Best Wishes,
Ashutosh Bapat



Re: Report bytes and transactions actually sent downtream

От
Ashutosh Bapat
Дата:
On Tue, Sep 23, 2025 at 12:14 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On Fri, Sep 19, 2025 at 08:11:23PM +0530, Ashutosh Bapat wrote:
> > On Fri, Sep 19, 2025 at 11:48 AM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > 0001 is the previous patch
> > 0002 changes addressing your and Bertrand's comments.
>
> Thanks for the new patch version!
>
> I did not look closely to the code yet but did some testing and I've one remark
> regarding plugin_filtered_bytes: It looks ok when a publication is doing rows
> filtering but when I:
>
> - create a table and use pg_logical_slot_get_changes with ('skip-empty-xacts', '0')
> then I see plugin_sent_bytes increasing (which makes sense).
>
> - create a table and use pg_logical_slot_get_changes with ('skip-empty-xacts', '1')
> then I don't see plugin_sent_bytes increasing (which makes sense) but I also don't
> see plugin_filtered_bytes increasing. I think that would make sense to also increase
> plugin_filtered_bytes in this case (and for the other options that would skip
> sending data). Thoughts?

Thanks for bringing this up. I don't think we discussed this
explicitly in the thread. The changes which are filtered out by the
core itself e.g. changes to the catalogs or changes to other databases
or changes from undesired origins are not added to the reorder buffer.
They are not counted in total_bytes. The transactions containing only
such changes are not added to reorder buffer, so even total_txns does
not count such empty transactions. If we count these changes and
transactions in plugin_filtered_bytes, and plugin_filtered_txns, that
would create an anomaly - filtered counts being higher than total
counts. Further since core does not add these changes and transactions
to the reorder buffer, there is no way for a plugin to know about
their existence and hence count them. Does that make sense?

--
Best Wishes,
Ashutosh Bapat



Re: Report bytes and transactions actually sent downtream

От
Ashutosh Sharma
Дата:
> 0001 is the previous patch
> 0002 changes addressing your and Bertrand's comments.
>

@@ -1573,6 +1573,13 @@ WalSndWriteData(LogicalDecodingContext *ctx,
XLogRecPtr lsn, TransactionId xid,
  /* output previously gathered data in a CopyData packet */
  pq_putmessage_noblock(PqMsg_CopyData, ctx->out->data, ctx->out->len);

+ /*
+ * If output plugin maintains statistics, update the amount of data sent
+ * downstream.
+ */
+ if (ctx->stats)
+ ctx->stats->sentBytes += ctx->out->len + 1; /* +1 for the 'd' */
+

Just a small observation: I think it’s actually pq_flush_if_writable()
that writes the buffered data to the socket, not pq_putmessage_noblock
(which is actually gathering data in the buffer and not sending). So
it might make more sense to increment the sent pointer after the call
to pq_flush_if_writable().

Should we also consider - pg_hton32((uint32) (len + 4)); -- the
additional 4 bytes of data added to the send buffer.

--
With Regards,
Ashutosh Sharma.



Re: Report bytes and transactions actually sent downtream

От
shveta malik
Дата:
On Tue, Sep 23, 2025 at 4:06 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Mon, Sep 22, 2025 at 10:44 AM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > Few trivial comments:
> >
> > 1)
> > Currently the doc says:
> >
> > sentTxns is the number of transactions sent downstream by the output
> > plugin. sentBytes is the amount of data, in bytes, sent downstream by
> > the output plugin. OutputPluginWrite will update this counter if
> > ctx->stats is initialized by the output plugin. filteredBytes is the
> > size of changes, in bytes, that are filtered out by the output plugin.
> > Function ReorderBufferChangeSize may be used to find the size of
> > filtered ReorderBufferChange.
> >
> > Shall we rearrange it to:
> >
> > sentTxns is the number of transactions sent downstream by the output
> > plugin. sentBytes is the amount of data, in bytes, sent downstream by
> > the output plugin. filteredBytes is the size of changes, in bytes,
> > that are filtered out by the output plugin. OutputPluginWrite will
> > update these counters if ctx->stats is initialized by the output
> > plugin.
> > The function ReorderBufferChangeSize can be used to compute the size
> > of a filtered ReorderBufferChange, i.e., the filteredBytes.
> >
>
> Only sentBytes is incremented by OutputPluginWrite(), so saying that
> it will update counters is not correct. But I think you intend to keep
> description of all the fields together followed by any additional
> information. How about the following
>       <literal>sentTxns</literal> is the number of transactions sent downstream
>       by the output plugin. <literal>sentBytes</literal> is the amount of data,
>       in bytes, sent downstream by the output plugin.
>       <literal>filteredBytes</literal> is the size of changes, in bytes, that
>       are filtered out by the output plugin.
>       <function>OutputPluginWrite</function> will update
>       <literal>sentBytes</literal> if <literal>ctx->stats</literal> is
>       initialized by the output plugin. Function
>       <literal>ReorderBufferChangeSize</literal> may be used to find the size of
>       filtered <literal>ReorderBufferChange</literal>.

Yes, this looks good.

>
> > 2)
> > My preference will be to rename the fields 'total_txns' and
> > 'total_bytes' in PgStat_StatReplSlotEntry to 'total_wal_txns' and
> > 'total_wal_bytes' for better clarity. Additionally, upon rethinking,
> > it seems better to me that plugin-related fields are also named as
> > plugin_* to clearly indicate their association. OTOH, in
> > OutputPluginStats, the field names are fine as is, since the structure
> > name itself clearly indicates these are plugin-related fields.
> > PgStat_StatReplSlotEntry lacks such context and thus using full
> > descriptive names there would improve clarity.
>
> Ok. Done.
>
> >
> > 3)
> > LogicalOutputWrite:
> > + if (ctx->stats)
> > + ctx->stats->sentBytes += ctx->out->len + sizeof(XLogRecPtr) +
> > sizeof(TransactionId);
> >   p->returned_rows++;
> >
> > A blank line after the new change will increase readability.
> >
>
> Ok.
>
> > ~~
> >
> > In my testing, the patch works as expected. Thanks!
>
> Thanks for testing. Can we include any of your tests in the patch? Are
> the tests in patch enough?

I tested the flows with
a) logical replication slot and get-changes.
b) filtered data flows: pub-sub creation with row_filters, 'publish'
options. I tried to verify plugin fields as compared to total_wal*
fields.
c) reset flow.

While tests for a and c are present already. I don't see tests for b
anywhere when it comes to stats. Do you think we shall add a test for
filtered data using row-filter somewhere?

>
> Applied those suggestions in my repository. Do you have any further
> review comments?
>

No, I think that is all.

thanks
Shveta



Re: Report bytes and transactions actually sent downtream

От
Ashutosh Bapat
Дата:
On Tue, Sep 23, 2025 at 6:28 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>
> > 0001 is the previous patch
> > 0002 changes addressing your and Bertrand's comments.
> >
>
> @@ -1573,6 +1573,13 @@ WalSndWriteData(LogicalDecodingContext *ctx,
> XLogRecPtr lsn, TransactionId xid,
>   /* output previously gathered data in a CopyData packet */
>   pq_putmessage_noblock(PqMsg_CopyData, ctx->out->data, ctx->out->len);
>
> + /*
> + * If output plugin maintains statistics, update the amount of data sent
> + * downstream.
> + */
> + if (ctx->stats)
> + ctx->stats->sentBytes += ctx->out->len + 1; /* +1 for the 'd' */
> +
>
> Just a small observation: I think it’s actually pq_flush_if_writable()
> that writes the buffered data to the socket, not pq_putmessage_noblock
> (which is actually gathering data in the buffer and not sending). So
> it might make more sense to increment the sent pointer after the call
> to pq_flush_if_writable().

That's a good point. I placed it after pq_putmessage_noblock() so that
it's easy to link the increment to sentBytes and the actual bytes
being sent. You are right that the bytes won't be sent unless
pq_flush_if_writable() is called but it will be called for sure before
the next UpdateDecodingStats(). So the reported bytes are never wrong.
I would prefer readability over seeming accuracy.

>
> Should we also consider - pg_hton32((uint32) (len + 4)); -- the
> additional 4 bytes of data added to the send buffer.
>

In WalSndWriteData() we can't rely on what happens in a low level API
like socket_putmessage(). And we are counting the number of bytes in
the logically decoded message. So, I actually wonder whether we should
count 1 byte of 'd' in sentBytes. Shveta, Bertand, what do you think?

--
Best Wishes,
Ashutosh Bapat



Re: Report bytes and transactions actually sent downtream

От
shveta malik
Дата:
On Wed, Sep 24, 2025 at 11:08 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Tue, Sep 23, 2025 at 6:28 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
> >
> > > 0001 is the previous patch
> > > 0002 changes addressing your and Bertrand's comments.
> > >
> >
> > @@ -1573,6 +1573,13 @@ WalSndWriteData(LogicalDecodingContext *ctx,
> > XLogRecPtr lsn, TransactionId xid,
> >   /* output previously gathered data in a CopyData packet */
> >   pq_putmessage_noblock(PqMsg_CopyData, ctx->out->data, ctx->out->len);
> >
> > + /*
> > + * If output plugin maintains statistics, update the amount of data sent
> > + * downstream.
> > + */
> > + if (ctx->stats)
> > + ctx->stats->sentBytes += ctx->out->len + 1; /* +1 for the 'd' */
> > +
> >
> > Just a small observation: I think it’s actually pq_flush_if_writable()
> > that writes the buffered data to the socket, not pq_putmessage_noblock
> > (which is actually gathering data in the buffer and not sending). So
> > it might make more sense to increment the sent pointer after the call
> > to pq_flush_if_writable().
>
> That's a good point. I placed it after pq_putmessage_noblock() so that
> it's easy to link the increment to sentBytes and the actual bytes
> being sent. You are right that the bytes won't be sent unless
> pq_flush_if_writable() is called but it will be called for sure before
> the next UpdateDecodingStats(). So the reported bytes are never wrong.
> I would prefer readability over seeming accuracy.
>
> >
> > Should we also consider - pg_hton32((uint32) (len + 4)); -- the
> > additional 4 bytes of data added to the send buffer.
> >
>
> In WalSndWriteData() we can't rely on what happens in a low level API
> like socket_putmessage(). And we are counting the number of bytes in
> the logically decoded message. So, I actually wonder whether we should
> count 1 byte of 'd' in sentBytes. Shveta, Bertand, what do you think?
>

If we are not counting all such metadata bytes ((or can't reliably do
so), then IMO, we shall skip counting msgtype as well.

thanks
Shveta



Re: Report bytes and transactions actually sent downtream

От
Bertrand Drouvot
Дата:
Hi,

On Wed, Sep 24, 2025 at 11:38:30AM +0530, shveta malik wrote:
> On Wed, Sep 24, 2025 at 11:08 AM Ashutosh Bapat
> <ashutosh.bapat.oss@gmail.com> wrote:
> >
> > In WalSndWriteData() we can't rely on what happens in a low level API
> > like socket_putmessage(). And we are counting the number of bytes in
> > the logically decoded message. So, I actually wonder whether we should
> > count 1 byte of 'd' in sentBytes. Shveta, Bertand, what do you think?
> >
> 
> If we are not counting all such metadata bytes ((or can't reliably do
> so), then IMO, we shall skip counting msgtype as well.

Agree. Maybe mention in the doc that metadata (including msgtype) bytes are not
taken into account?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Report bytes and transactions actually sent downtream

От
Bertrand Drouvot
Дата:
Hi,

On Tue, Sep 23, 2025 at 04:15:14PM +0530, Ashutosh Bapat wrote:
> On Tue, Sep 23, 2025 at 12:14 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > - create a table and use pg_logical_slot_get_changes with ('skip-empty-xacts', '1')
> > then I don't see plugin_sent_bytes increasing (which makes sense) but I also don't
> > see plugin_filtered_bytes increasing. I think that would make sense to also increase
> > plugin_filtered_bytes in this case (and for the other options that would skip
> > sending data). Thoughts?
> 
> Thanks for bringing this up. I don't think we discussed this
> explicitly in the thread. The changes which are filtered out by the
> core itself e.g. changes to the catalogs or changes to other databases
> or changes from undesired origins are not added to the reorder buffer.
> They are not counted in total_bytes. The transactions containing only
> such changes are not added to reorder buffer, so even total_txns does
> not count such empty transactions. If we count these changes and
> transactions in plugin_filtered_bytes, and plugin_filtered_txns, that
> would create an anomaly - filtered counts being higher than total
> counts. Further since core does not add these changes and transactions
> to the reorder buffer, there is no way for a plugin to know about
> their existence and hence count them. Does that make sense?

Yes. Do you think that the doc in the patch is clear enough regarding this point?
I mean the doc looks correct (mentioning the output plugin) but would that make
sense to insist that core filtering is not taken into account?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Report bytes and transactions actually sent downtream

От
Ashutosh Bapat
Дата:
On Wed, Sep 24, 2025 at 10:12 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> I tested the flows with
> a) logical replication slot and get-changes.
> b) filtered data flows: pub-sub creation with row_filters, 'publish'
> options. I tried to verify plugin fields as compared to total_wal*
> fields.
> c) reset flow.
>
> While tests for a and c are present already. I don't see tests for b
> anywhere when it comes to stats. Do you think we shall add a test for
> filtered data using row-filter somewhere?

Added a test in 028_row_filter. Please find it in the attached
patchset. I didn't find tests which test table level filtering or
operation level filtering. Can you please point me to such tests. I
will add similar test to other places. Once you review the test in
028_row_filter, I will replicate it to other places you point out.

On Wed, Sep 24, 2025 at 12:12 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On Wed, Sep 24, 2025 at 11:38:30AM +0530, shveta malik wrote:
> > On Wed, Sep 24, 2025 at 11:08 AM Ashutosh Bapat
> > <ashutosh.bapat.oss@gmail.com> wrote:
> > >
> > > In WalSndWriteData() we can't rely on what happens in a low level API
> > > like socket_putmessage(). And we are counting the number of bytes in
> > > the logically decoded message. So, I actually wonder whether we should
> > > count 1 byte of 'd' in sentBytes. Shveta, Bertand, what do you think?
> > >
> >
> > If we are not counting all such metadata bytes ((or can't reliably do
> > so), then IMO, we shall skip counting msgtype as well.
>
> Agree. Maybe mention in the doc that metadata (including msgtype) bytes are not
> taken into account?

We are counting the sentBytes in central places through which all the
logically decoded messages flow. So we are not missing on any metadata
bytes. Given that these bytes are part of the logically decoded
message itself, I think we should count them in the sentBytes. Now the
question remains is whether to count 4 bytes for length in the message
itself? The logical decoding code can not control that and thus should
not account for it. So I am leaving bytes counted for
pg_hton32((uint32) (len + 4)) out of sentBytes calculation.
--
Best Wishes,
Ashutosh Bapat

Вложения

Re: Report bytes and transactions actually sent downtream

От
Ashutosh Bapat
Дата:
On Wed, Sep 24, 2025 at 12:32 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On Tue, Sep 23, 2025 at 04:15:14PM +0530, Ashutosh Bapat wrote:
> > On Tue, Sep 23, 2025 at 12:14 PM Bertrand Drouvot
> > <bertranddrouvot.pg@gmail.com> wrote:
> > >
> > > - create a table and use pg_logical_slot_get_changes with ('skip-empty-xacts', '1')
> > > then I don't see plugin_sent_bytes increasing (which makes sense) but I also don't
> > > see plugin_filtered_bytes increasing. I think that would make sense to also increase
> > > plugin_filtered_bytes in this case (and for the other options that would skip
> > > sending data). Thoughts?
> >
> > Thanks for bringing this up. I don't think we discussed this
> > explicitly in the thread. The changes which are filtered out by the
> > core itself e.g. changes to the catalogs or changes to other databases
> > or changes from undesired origins are not added to the reorder buffer.
> > They are not counted in total_bytes. The transactions containing only
> > such changes are not added to reorder buffer, so even total_txns does
> > not count such empty transactions. If we count these changes and
> > transactions in plugin_filtered_bytes, and plugin_filtered_txns, that
> > would create an anomaly - filtered counts being higher than total
> > counts. Further since core does not add these changes and transactions
> > to the reorder buffer, there is no way for a plugin to know about
> > their existence and hence count them. Does that make sense?
>
> Yes. Do you think that the doc in the patch is clear enough regarding this point?
> I mean the doc looks correct (mentioning the output plugin) but would that make
> sense to insist that core filtering is not taken into account?

Do you mean, should we mention in the docs that core filtering is not
taken into account? I would question whether that's called filtering
at all, in the context of logical decoding. The view should be read in
the context of logical decoding. For example, we aren't mentioning
that total_bytes does not include changes from other database.

--
Best Wishes,
Ashutosh Bapat



Re: Report bytes and transactions actually sent downtream

От
Bertrand Drouvot
Дата:
Hi,

On Wed, Sep 24, 2025 at 12:51:29PM +0530, Ashutosh Bapat wrote:
> On Wed, Sep 24, 2025 at 12:32 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> > > > - create a table and use pg_logical_slot_get_changes with ('skip-empty-xacts', '1')
> > > > then I don't see plugin_sent_bytes increasing (which makes sense) but I also don't
> > > > see plugin_filtered_bytes increasing. I think that would make sense to also increase
> > > > plugin_filtered_bytes in this case (and for the other options that would skip
> > > > sending data). Thoughts?
> > >
> > > Thanks for bringing this up. I don't think we discussed this
> > > explicitly in the thread. The changes which are filtered out by the
> > > core itself e.g. changes to the catalogs or changes to other databases
> > > or changes from undesired origins are not added to the reorder buffer.
> > > They are not counted in total_bytes. The transactions containing only
> > > such changes are not added to reorder buffer, so even total_txns does
> > > not count such empty transactions. If we count these changes and
> > > transactions in plugin_filtered_bytes, and plugin_filtered_txns, that
> > > would create an anomaly - filtered counts being higher than total
> > > counts. Further since core does not add these changes and transactions
> > > to the reorder buffer, there is no way for a plugin to know about
> > > their existence and hence count them. Does that make sense?
> >
> > Yes. Do you think that the doc in the patch is clear enough regarding this point?
> > I mean the doc looks correct (mentioning the output plugin) but would that make
> > sense to insist that core filtering is not taken into account?
> 
> Do you mean, should we mention in the docs that core filtering is not
> taken into account?
> I would question whether that's called filtering
> at all, in the context of logical decoding. The view should be read in
> the context of logical decoding. For example, we aren't mentioning
> that total_bytes does not include changes from other database.

Right. But, in the example above, do you consider "skip-empty-xacts" as "core"
or "plugin" filtering?

It's an option part of the "test_decoding" plugin, so it's the plugin choice to
not display empty xacts (should the option be set accordingly). Then should it
be reported in plugin_filtered_bytes? (one could write a plugin, decide to
skip/filter empty xacts or whatever in the plugin callbacks: should that be
reported as plugin_filtered_bytes?)

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Report bytes and transactions actually sent downtream

От
shveta malik
Дата:
On Wed, Sep 24, 2025 at 12:47 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Wed, Sep 24, 2025 at 10:12 AM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > I tested the flows with
> > a) logical replication slot and get-changes.
> > b) filtered data flows: pub-sub creation with row_filters, 'publish'
> > options. I tried to verify plugin fields as compared to total_wal*
> > fields.
> > c) reset flow.
> >
> > While tests for a and c are present already. I don't see tests for b
> > anywhere when it comes to stats. Do you think we shall add a test for
> > filtered data using row-filter somewhere?
>
> Added a test in 028_row_filter. Please find it in the attached
> patchset.

Test looks good.

> I didn't find tests which test table level filtering or
> operation level filtering. Can you please point me to such tests. I
> will add similar test to other places. Once you review the test in
> 028_row_filter, I will replicate it to other places you point out.
>

I can see a few tests of operation level filtering present in
'subscription/t/001_rep_changes.pl'  and
'subscription/t/010_truncate.pl'

> On Wed, Sep 24, 2025 at 12:12 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > Hi,
> >
> > On Wed, Sep 24, 2025 at 11:38:30AM +0530, shveta malik wrote:
> > > On Wed, Sep 24, 2025 at 11:08 AM Ashutosh Bapat
> > > <ashutosh.bapat.oss@gmail.com> wrote:
> > > >
> > > > In WalSndWriteData() we can't rely on what happens in a low level API
> > > > like socket_putmessage(). And we are counting the number of bytes in
> > > > the logically decoded message. So, I actually wonder whether we should
> > > > count 1 byte of 'd' in sentBytes. Shveta, Bertand, what do you think?
> > > >
> > >
> > > If we are not counting all such metadata bytes ((or can't reliably do
> > > so), then IMO, we shall skip counting msgtype as well.
> >
> > Agree. Maybe mention in the doc that metadata (including msgtype) bytes are not
> > taken into account?
>
> We are counting the sentBytes in central places through which all the
> logically decoded messages flow. So we are not missing on any metadata
> bytes. Given that these bytes are part of the logically decoded
> message itself, I think we should count them in the sentBytes. Now the
> question remains is whether to count 4 bytes for length in the message
> itself? The logical decoding code can not control that and thus should
> not account for it. So I am leaving bytes counted for
> pg_hton32((uint32) (len + 4)) out of sentBytes calculation.
> --

Okay.

thanks
Shveta



Re: Report bytes and transactions actually sent downtream

От
Ashutosh Bapat
Дата:
On Wed, Sep 24, 2025 at 1:55 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On Wed, Sep 24, 2025 at 12:51:29PM +0530, Ashutosh Bapat wrote:
> > On Wed, Sep 24, 2025 at 12:32 PM Bertrand Drouvot
> > <bertranddrouvot.pg@gmail.com> wrote:
> > > > > - create a table and use pg_logical_slot_get_changes with ('skip-empty-xacts', '1')
> > > > > then I don't see plugin_sent_bytes increasing (which makes sense) but I also don't
> > > > > see plugin_filtered_bytes increasing. I think that would make sense to also increase
> > > > > plugin_filtered_bytes in this case (and for the other options that would skip
> > > > > sending data). Thoughts?
> > > >
> > > > Thanks for bringing this up. I don't think we discussed this
> > > > explicitly in the thread. The changes which are filtered out by the
> > > > core itself e.g. changes to the catalogs or changes to other databases
> > > > or changes from undesired origins are not added to the reorder buffer.
> > > > They are not counted in total_bytes. The transactions containing only
> > > > such changes are not added to reorder buffer, so even total_txns does
> > > > not count such empty transactions. If we count these changes and
> > > > transactions in plugin_filtered_bytes, and plugin_filtered_txns, that
> > > > would create an anomaly - filtered counts being higher than total
> > > > counts. Further since core does not add these changes and transactions
> > > > to the reorder buffer, there is no way for a plugin to know about
> > > > their existence and hence count them. Does that make sense?
> > >
> > > Yes. Do you think that the doc in the patch is clear enough regarding this point?
> > > I mean the doc looks correct (mentioning the output plugin) but would that make
> > > sense to insist that core filtering is not taken into account?
> >
> > Do you mean, should we mention in the docs that core filtering is not
> > taken into account?
> > I would question whether that's called filtering
> > at all, in the context of logical decoding. The view should be read in
> > the context of logical decoding. For example, we aren't mentioning
> > that total_bytes does not include changes from other database.
>
> Right. But, in the example above, do you consider "skip-empty-xacts" as "core"
> or "plugin" filtering?
>
> It's an option part of the "test_decoding" plugin, so it's the plugin choice to
> not display empty xacts (should the option be set accordingly). Then should it
> be reported in plugin_filtered_bytes? (one could write a plugin, decide to
> skip/filter empty xacts or whatever in the plugin callbacks: should that be
> reported as plugin_filtered_bytes?)

If a transaction becomes empty because the plugin filtered all the
changes then plugin_filtered_bytes will be incremented by the amount
of filtered changes. If the transaction was empty because core didn't
send any of the changes to the output plugin, there was nothing
filtered by the output plugin so plugin_filtered_bytes will not be
affected.

skip_empty_xacts controls whether BEGIN and COMMIT are sent for an
empty transaction or not. It does not filter "changes". It affects
"sent_bytes".

--
Best Wishes,
Ashutosh Bapat



Re: Report bytes and transactions actually sent downtream

От
Ashutosh Bapat
Дата:
On Wed, Sep 24, 2025 at 2:38 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Wed, Sep 24, 2025 at 12:47 PM Ashutosh Bapat
> <ashutosh.bapat.oss@gmail.com> wrote:
> >
> > On Wed, Sep 24, 2025 at 10:12 AM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > > I tested the flows with
> > > a) logical replication slot and get-changes.
> > > b) filtered data flows: pub-sub creation with row_filters, 'publish'
> > > options. I tried to verify plugin fields as compared to total_wal*
> > > fields.
> > > c) reset flow.
> > >
> > > While tests for a and c are present already. I don't see tests for b
> > > anywhere when it comes to stats. Do you think we shall add a test for
> > > filtered data using row-filter somewhere?
> >
> > Added a test in 028_row_filter. Please find it in the attached
> > patchset.
>
> Test looks good.

Thanks. Added to three more files. I think we have covered all the
cases where filtering can occur.

PFA patches.

--
Best Wishes,
Ashutosh Bapat

Вложения

Re: Report bytes and transactions actually sent downtream

От
Bertrand Drouvot
Дата:
Hi,

On Wed, Sep 24, 2025 at 03:37:07PM +0530, Ashutosh Bapat wrote:
> On Wed, Sep 24, 2025 at 1:55 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> > Right. But, in the example above, do you consider "skip-empty-xacts" as "core"
> > or "plugin" filtering?
> >
> > It's an option part of the "test_decoding" plugin, so it's the plugin choice to
> > not display empty xacts (should the option be set accordingly). Then should it
> > be reported in plugin_filtered_bytes? (one could write a plugin, decide to
> > skip/filter empty xacts or whatever in the plugin callbacks: should that be
> > reported as plugin_filtered_bytes?)
> 
> If a transaction becomes empty because the plugin filtered all the
> changes then plugin_filtered_bytes will be incremented by the amount
> of filtered changes. If the transaction was empty because core didn't
> send any of the changes to the output plugin, there was nothing
> filtered by the output plugin so plugin_filtered_bytes will not be
> affected.
> 
> skip_empty_xacts controls whether BEGIN and COMMIT are sent for an
> empty transaction or not. It does not filter "changes". It affects
> "sent_bytes".

skip_empty_xacts was just an example. I mean a plugin could decide to filter all
the inserts for example (not saying it makes sense). But I think we'are saying the
same: say a plugin wants to filter the inserts then it's its responsability to
increment ctx->stats->filteredBytes in its "change_cb" callback for the 
REORDER_BUFFER_CHANGE_INSERT action, right? If so, I wonder if it would make
sense to provide an example in the test_decoding plugin (I can see it's done
for pgoutput but that might sound more natural to look in contrib if one is
searching for an example).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Report bytes and transactions actually sent downtream

От
Bertrand Drouvot
Дата:
Hi,

On Wed, Sep 24, 2025 at 05:28:44PM +0530, Ashutosh Bapat wrote:
> On Wed, Sep 24, 2025 at 2:38 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Wed, Sep 24, 2025 at 12:47 PM Ashutosh Bapat
> > <ashutosh.bapat.oss@gmail.com> wrote:
> > >
> > > On Wed, Sep 24, 2025 at 10:12 AM shveta malik <shveta.malik@gmail.com> wrote:
> > > >
> > > > I tested the flows with
> > > > a) logical replication slot and get-changes.
> > > > b) filtered data flows: pub-sub creation with row_filters, 'publish'
> > > > options. I tried to verify plugin fields as compared to total_wal*
> > > > fields.
> > > > c) reset flow.
> > > >
> > > > While tests for a and c are present already. I don't see tests for b
> > > > anywhere when it comes to stats. Do you think we shall add a test for
> > > > filtered data using row-filter somewhere?
> > >
> > > Added a test in 028_row_filter. Please find it in the attached
> > > patchset.
> >
> > Test looks good.
> 
> Thanks. Added to three more files. I think we have covered all the
> cases where filtering can occur.
> 
> PFA patches.

Thanks for the new version!

A few random comments:

=== 1

+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+        <structfield>plugin_filtered_bytes</structfield> <type>bigint</type>
+       </para>
+       <para>
+        Amount of changes, from <structfield>total_wal_bytes</structfield>, filtered
+        out by the output plugin and not sent downstream. Please note that it
+        does not include the changes filtered before a change is sent to
+        the output plugin, e.g. the changes filtered by origin. The count is
+        maintained by the output plugin mentioned in
+        <structfield>plugin</structfield>.

I found "The count" somehow ambiguous. What about "This statistic" instead?

=== 2

+        subtransactions. These transactions are subset of transctions sent to

s/transctions/transactions

=== 3

+        the decoding plugin. Hence this count is expected to be lesser than or

s/be lesser/be less/? (not 100% sure)

=== 4

+extern Size ReorderBufferChangeSize(ReorderBufferChange *change);

Another approach could be to pass the change's size as an argument to the
callbacks? That would avoid to expose ReorderBufferChangeSize publicly.

=== 5

        ctx->output_plugin_private = data;
+       ctx->stats = palloc0(sizeof(OutputPluginStats));

I was wondering if we need to free this in pg_decode_shutdown, but it looks
like it's done through FreeDecodingContext() anyway.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Report bytes and transactions actually sent downtream

От
shveta malik
Дата:
On Wed, Sep 24, 2025 at 5:28 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Wed, Sep 24, 2025 at 2:38 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Wed, Sep 24, 2025 at 12:47 PM Ashutosh Bapat
> > <ashutosh.bapat.oss@gmail.com> wrote:
> > >
> > > On Wed, Sep 24, 2025 at 10:12 AM shveta malik <shveta.malik@gmail.com> wrote:
> > > >
> > > > I tested the flows with
> > > > a) logical replication slot and get-changes.
> > > > b) filtered data flows: pub-sub creation with row_filters, 'publish'
> > > > options. I tried to verify plugin fields as compared to total_wal*
> > > > fields.
> > > > c) reset flow.
> > > >
> > > > While tests for a and c are present already. I don't see tests for b
> > > > anywhere when it comes to stats. Do you think we shall add a test for
> > > > filtered data using row-filter somewhere?
> > >
> > > Added a test in 028_row_filter. Please find it in the attached
> > > patchset.
> >
> > Test looks good.
>
> Thanks. Added to three more files. I think we have covered all the
> cases where filtering can occur.
>

Yes. The test looks good now.

thanks
Shveta



Re: Report bytes and transactions actually sent downtream

От
Ashutosh Bapat
Дата:
On Wed, Sep 24, 2025 at 6:43 PM Bertrand Drouvot

<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On Wed, Sep 24, 2025 at 03:37:07PM +0530, Ashutosh Bapat wrote:
> > On Wed, Sep 24, 2025 at 1:55 PM Bertrand Drouvot
> > <bertranddrouvot.pg@gmail.com> wrote:
> > > Right. But, in the example above, do you consider "skip-empty-xacts" as "core"
> > > or "plugin" filtering?
> > >
> > > It's an option part of the "test_decoding" plugin, so it's the plugin choice to
> > > not display empty xacts (should the option be set accordingly). Then should it
> > > be reported in plugin_filtered_bytes? (one could write a plugin, decide to
> > > skip/filter empty xacts or whatever in the plugin callbacks: should that be
> > > reported as plugin_filtered_bytes?)
> >
> > If a transaction becomes empty because the plugin filtered all the
> > changes then plugin_filtered_bytes will be incremented by the amount
> > of filtered changes. If the transaction was empty because core didn't
> > send any of the changes to the output plugin, there was nothing
> > filtered by the output plugin so plugin_filtered_bytes will not be
> > affected.
> >
> > skip_empty_xacts controls whether BEGIN and COMMIT are sent for an
> > empty transaction or not. It does not filter "changes". It affects
> > "sent_bytes".
>
> skip_empty_xacts was just an example. I mean a plugin could decide to filter all
> the inserts for example (not saying it makes sense). But I think we'are saying the
> same: say a plugin wants to filter the inserts then it's its responsability to
> increment ctx->stats->filteredBytes in its "change_cb" callback for the
> REORDER_BUFFER_CHANGE_INSERT action, right?

Right.

> If so, I wonder if it would make
> sense to provide an example in the test_decoding plugin (I can see it's done
> for pgoutput but that might sound more natural to look in contrib if one is
> searching for an example).

test_decoding does not make use of publication at all. Publication
controls filtering and so test_decoding does not have any examples of
filtering code. Doesn't make sense to add code to manipulate
filteredBytes there.

--
Best Wishes,
Ashutosh Bapat



Re: Report bytes and transactions actually sent downtream

От
Ashutosh Bapat
Дата:
On Wed, Sep 24, 2025 at 8:11 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On Wed, Sep 24, 2025 at 05:28:44PM +0530, Ashutosh Bapat wrote:
> > On Wed, Sep 24, 2025 at 2:38 PM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > > On Wed, Sep 24, 2025 at 12:47 PM Ashutosh Bapat
> > > <ashutosh.bapat.oss@gmail.com> wrote:
> > > >
> > > > On Wed, Sep 24, 2025 at 10:12 AM shveta malik <shveta.malik@gmail.com> wrote:
> > > > >
> > > > > I tested the flows with
> > > > > a) logical replication slot and get-changes.
> > > > > b) filtered data flows: pub-sub creation with row_filters, 'publish'
> > > > > options. I tried to verify plugin fields as compared to total_wal*
> > > > > fields.
> > > > > c) reset flow.
> > > > >
> > > > > While tests for a and c are present already. I don't see tests for b
> > > > > anywhere when it comes to stats. Do you think we shall add a test for
> > > > > filtered data using row-filter somewhere?
> > > >
> > > > Added a test in 028_row_filter. Please find it in the attached
> > > > patchset.
> > >
> > > Test looks good.
> >
> > Thanks. Added to three more files. I think we have covered all the
> > cases where filtering can occur.
> >
> > PFA patches.
>
> Thanks for the new version!
>
> A few random comments:
>
> === 1
>
> +     <row>
> +      <entry role="catalog_table_entry"><para role="column_definition">
> +        <structfield>plugin_filtered_bytes</structfield> <type>bigint</type>
> +       </para>
> +       <para>
> +        Amount of changes, from <structfield>total_wal_bytes</structfield>, filtered
> +        out by the output plugin and not sent downstream. Please note that it
> +        does not include the changes filtered before a change is sent to
> +        the output plugin, e.g. the changes filtered by origin. The count is
> +        maintained by the output plugin mentioned in
> +        <structfield>plugin</structfield>.
>
> I found "The count" somehow ambiguous. What about "This statistic" instead?

Existing fields use term "The counter". Changed "The count" to "The counter".

>
> === 2
>
> +        subtransactions. These transactions are subset of transctions sent to
>
> s/transctions/transactions

Done.

>
> === 3
>
> +        the decoding plugin. Hence this count is expected to be lesser than or
>
> s/be lesser/be less/? (not 100% sure)

Less than is correct. Fixed.

>
> === 4
>
> +extern Size ReorderBufferChangeSize(ReorderBufferChange *change);
>
> Another approach could be to pass the change's size as an argument to the
> callbacks? That would avoid to expose ReorderBufferChangeSize publicly.

Do you see any problem in exposing ReorderBufferChangeSize(). It's a
pretty small function and may be quite handy to output plugins
otherwise as well. And we expose many ReorderBuffer related functions;
so this isn't the first.

If we were to do as you say, it will change other external facing APIs
like change_cb(). Output plugins will need to change their code
accordingly even when they don't want to support plugin statistics.
Given that we have made maintaining plugin statistics optional,
forcing API change does not make sense. For example, test_decoding
which does not filter anything would unnecessarily have to change its
code.

I considered adding a field size to ReorderBufferChange itself. But
that means we increase the amount of memory used in the reorder
buffer, which seems to have become prime estate these days. So
rejected that idea as well.

Advantage of this change is that the minimal cost of calculating the
size and maintaining the code change is incurred only when filtering
happens, by the plugins which want to filter and maintain statistics.

>
> === 5
>
>         ctx->output_plugin_private = data;
> +       ctx->stats = palloc0(sizeof(OutputPluginStats));
>
> I was wondering if we need to free this in pg_decode_shutdown, but it looks
> like it's done through FreeDecodingContext() anyway.

That's correct. Even output_plugin_private is freed when the decoding
memory context is freed.

Thanks for the review comments. I have addressed the comments in my
repository and the changes will be included in the next set of
patches.

Do you have any further review comments?

--
Best Wishes,
Ashutosh Bapat



Re: Report bytes and transactions actually sent downtream

От
Bertrand Drouvot
Дата:
Hi,

On Thu, Sep 25, 2025 at 10:16:35AM +0530, Ashutosh Bapat wrote:
> On Wed, Sep 24, 2025 at 8:11 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > === 4
> >
> > +extern Size ReorderBufferChangeSize(ReorderBufferChange *change);
> >
> > Another approach could be to pass the change's size as an argument to the
> > callbacks? That would avoid to expose ReorderBufferChangeSize publicly.
> 
> Do you see any problem in exposing ReorderBufferChangeSize(). It's a
> pretty small function and may be quite handy to output plugins
> otherwise as well. And we expose many ReorderBuffer related functions;
> so this isn't the first.

Right. I don't see a problem per say, just thinking that the less we expose
publicly to be used by extensions/plugins, the better.

> If we were to do as you say, it will change other external facing APIs
> like change_cb(). Output plugins will need to change their code
> accordingly even when they don't want to support plugin statistics.

Correct.

> Given that we have made maintaining plugin statistics optional,
> forcing API change does not make sense. For example, test_decoding
> which does not filter anything would unnecessarily have to change its
> code.

That's right.

> I considered adding a field size to ReorderBufferChange itself. But
> that means we increase the amount of memory used in the reorder
> buffer, which seems to have become prime estate these days. So
> rejected that idea as well.
> 
> Advantage of this change is that the minimal cost of calculating the
> size and maintaining the code change is incurred only when filtering
> happens, by the plugins which want to filter and maintain statistics.

Yes, anyway as it's unlikely that we have to fix a bug in a minor release that
would need a signature change to ReorderBufferChangeSize(), I think that's fine
as proposed.

> >
> > === 5
> >
> >         ctx->output_plugin_private = data;
> > +       ctx->stats = palloc0(sizeof(OutputPluginStats));
> >
> > I was wondering if we need to free this in pg_decode_shutdown, but it looks
> > like it's done through FreeDecodingContext() anyway.
> 
> That's correct. Even output_plugin_private is freed when the decoding
> memory context is freed.
> 
> Thanks for the review comments. I have addressed the comments in my
> repository and the changes will be included in the next set of
> patches.

Thanks!

> Do you have any further review comments?

Not right now. I'll give it another look by early next week the latest.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Report bytes and transactions actually sent downtream

От
Bertrand Drouvot
Дата:
Hi,

On Thu, Sep 25, 2025 at 01:01:55PM +0000, Bertrand Drouvot wrote:
> Hi,
> 
> On Thu, Sep 25, 2025 at 10:16:35AM +0530, Ashutosh Bapat wrote:
> > Do you have any further review comments?
> 
> Not right now. I'll give it another look by early next week the latest.
> 

=== 1

@@ -173,6 +173,7 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
        data->only_local = false;

        ctx->output_plugin_private = data;
+       ctx->stats = palloc0(sizeof(OutputPluginStats));

I was not sure where it's allocated, but looking at:

Breakpoint 1, pg_decode_startup (ctx=0x1ba853a0, opt=0x1ba85478, is_init=false) at test_decoding.c:164
164             bool            enable_streaming = false;
(gdb) n
166             data = palloc0(sizeof(TestDecodingData));
(gdb)
167             data->context = AllocSetContextCreate(ctx->context,
(gdb)
170             data->include_xids = true;
(gdb)
171             data->include_timestamp = false;
(gdb)
172             data->skip_empty_xacts = false;
(gdb)
173             data->only_local = false;
(gdb)
175             ctx->output_plugin_private = data;
(gdb)
176             ctx->stats = palloc0(sizeof(OutputPluginStats));
(gdb)
178             opt->output_type = OUTPUT_PLUGIN_TEXTUAL_OUTPUT;
(gdb) p CurrentMemoryContext
$7 = (MemoryContext) 0x1ba852a0
(gdb) p (*CurrentMemoryContext).name
$8 = 0xe4057d "Logical decoding context"
(gdb) p ctx->context
$9 = (MemoryContext) 0x1ba852a0

I can see that CurrentMemoryContext is "ctx->context" so the palloc0 done here
are done in the right context.

=== 2

Playing with "has stats" a bit.

-- Issue 1:

Say, plugin has stats enabled and I get:

postgres=# select plugin,plugin_sent_txns from pg_stat_replication_slots ;
     plugin     | plugin_sent_txns
----------------+------------------
 pg_commit_info |                9
(1 row)

If the engine is shutdown and the plugin is now replaced by a version that
does not provide stats, then, right after startup, I still get:

postgres=# select plugin,plugin_sent_txns from pg_stat_replication_slots ;
     plugin     | plugin_sent_txns
----------------+------------------
 pg_commit_info |                9
(1 row)

And that will be the case until the plugin decodes something (so that 
statent->plugin_has_stats gets replaced in pgstat_report_replslot()).

That's because plugin_has_stats is stored in PgStat_StatReplSlotEntry
and so it's restored from the stat file when the engine starts.

Now, let's do some inserts and decode:

postgres=# insert into t1 values ('a');
INSERT 0 1
postgres=# insert into t1 values ('a');
INSERT 0 1
postgres=# select * from pg_logical_slot_get_changes('logical_slot',NULL,NULL);
    lsn     | xid |                                          data
------------+-----+-----------------------------------------------------------------------------------------
 0/407121C0 | 766 | xid 766: lsn:0/40712190 inserts:1 deletes:0 updates:0 truncates:0 relations truncated:0
 0/40712268 | 767 | xid 767: lsn:0/40712238 inserts:1 deletes:0 updates:0 truncates:0 relations truncated:0
(2 rows)

postgres=# select plugin,plugin_sent_txns from pg_stat_replication_slots ;
     plugin     | plugin_sent_txns
----------------+------------------
 pg_commit_info |
(1 row)

All good. 

Issue 1 is that before any decoding happens, pg_stat_replication_slots is still
showing stale plugin statistics from a plugin that may no longer support stats.

I'm not sure how we could easily fix this issue, as we don't know the plugin's
stats capability until we actually use it.

-- Issue 2:

Let's shutdown, replace the plugin with a version that has stats enabled and
restart.

Same behavior as before:

postgres=# select plugin,plugin_sent_txns from pg_stat_replication_slots ;
     plugin     | plugin_sent_txns
----------------+------------------
 pg_commit_info |
(1 row)

Until pgstat_report_replslot() is not called, the statent->plugin_has_stats is
not updated. So it displays the stats as they were before the shutdown. But that's
not an issue in this case (when switching from non stats to stats).

Now, let's do some inserts and decode:

postgres=# insert into t1 values ('a');
INSERT 0 1
postgres=# select * from pg_logical_slot_get_changes('logical_slot',NULL,NULL);
    lsn     | xid |                                          data
------------+-----+-----------------------------------------------------------------------------------------
 0/407125B0 | 768 | xid 768: lsn:0/40712580 inserts:1 deletes:0 updates:0 truncates:0 relations truncated:0
(1 row)

and check the stats:

postgres=# select plugin,plugin_sent_txns from pg_stat_replication_slots ;
     plugin     | plugin_sent_txns
----------------+------------------
 pg_commit_info |               10
(1 row)

Now it reports 10, that's the 9 before we changed the plugin to not have stats
enabled plus this new one.

Issue 2: when switching from a non-stats plugin back to a stats-capable plugin, it
shows accumulated values from before the non-stats switch.

PFA attached a proposal to fix Issue 2.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Вложения

Re: Report bytes and transactions actually sent downtream

От
Ashutosh Bapat
Дата:
On Fri, Sep 26, 2025 at 4:43 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
>
> === 2
>
> Playing with "has stats" a bit.
>
> -- Issue 1:
>

Thanks for experiments! Thanks for bringing it up.

> Say, plugin has stats enabled and I get:
>
> postgres=# select plugin,plugin_sent_txns from pg_stat_replication_slots ;
>      plugin     | plugin_sent_txns
> ----------------+------------------
>  pg_commit_info |                9
> (1 row)
>
> If the engine is shutdown and the plugin is now replaced by a version that
> does not provide stats, then, right after startup, I still get:
>
> postgres=# select plugin,plugin_sent_txns from pg_stat_replication_slots ;
>      plugin     | plugin_sent_txns
> ----------------+------------------
>  pg_commit_info |                9
> (1 row)
>
> And that will be the case until the plugin decodes something (so that
> statent->plugin_has_stats gets replaced in pgstat_report_replslot()).
>
> That's because plugin_has_stats is stored in PgStat_StatReplSlotEntry
> and so it's restored from the stat file when the engine starts.
>
> Now, let's do some inserts and decode:
>
> postgres=# insert into t1 values ('a');
> INSERT 0 1
> postgres=# insert into t1 values ('a');
> INSERT 0 1
> postgres=# select * from pg_logical_slot_get_changes('logical_slot',NULL,NULL);
>     lsn     | xid |                                          data
> ------------+-----+-----------------------------------------------------------------------------------------
>  0/407121C0 | 766 | xid 766: lsn:0/40712190 inserts:1 deletes:0 updates:0 truncates:0 relations truncated:0
>  0/40712268 | 767 | xid 767: lsn:0/40712238 inserts:1 deletes:0 updates:0 truncates:0 relations truncated:0
> (2 rows)
>
> postgres=# select plugin,plugin_sent_txns from pg_stat_replication_slots ;
>      plugin     | plugin_sent_txns
> ----------------+------------------
>  pg_commit_info |
> (1 row)
>
> All good.
>
> Issue 1 is that before any decoding happens, pg_stat_replication_slots is still
> showing stale plugin statistics from a plugin that may no longer support stats.
>
> I'm not sure how we could easily fix this issue, as we don't know the plugin's
> stats capability until we actually use it.
>

I don't think this is an issue. There is no way for the core to tell
whether the plugin will provide stats or not, unless it sets that
ctx->stats which happens in the startup callback. Till then it is
rightly providing the values accumulated so far. Once the decoding
starts, we know that the plugin is not providing any stats and we
don't display anything.

> -- Issue 2:
>
> Let's shutdown, replace the plugin with a version that has stats enabled and
> restart.
>
> Same behavior as before:
>
> postgres=# select plugin,plugin_sent_txns from pg_stat_replication_slots ;
>      plugin     | plugin_sent_txns
> ----------------+------------------
>  pg_commit_info |
> (1 row)
>
> Until pgstat_report_replslot() is not called, the statent->plugin_has_stats is
> not updated. So it displays the stats as they were before the shutdown. But that's
> not an issue in this case (when switching from non stats to stats).
>
> Now, let's do some inserts and decode:
>
> postgres=# insert into t1 values ('a');
> INSERT 0 1
> postgres=# select * from pg_logical_slot_get_changes('logical_slot',NULL,NULL);
>     lsn     | xid |                                          data
> ------------+-----+-----------------------------------------------------------------------------------------
>  0/407125B0 | 768 | xid 768: lsn:0/40712580 inserts:1 deletes:0 updates:0 truncates:0 relations truncated:0
> (1 row)
>
> and check the stats:
>
> postgres=# select plugin,plugin_sent_txns from pg_stat_replication_slots ;
>      plugin     | plugin_sent_txns
> ----------------+------------------
>  pg_commit_info |               10
> (1 row)
>
> Now it reports 10, that's the 9 before we changed the plugin to not have stats
> enabled plus this new one.
>
> Issue 2: when switching from a non-stats plugin back to a stats-capable plugin, it
> shows accumulated values from before the non-stats switch.

This too seems to be a non-issue to me. The stats in the view get
reset only when a user resets them. So we shouldn't wipe out the
already accumulated values just because the plugin stopped providing
it. If the plugin keeps flip-flopping and only partial statistics
provided by the plugin will be accumulated. That's the plugin's
responsibility. Realistically a plugin will either decide to provide
statistics in some version and then continue forever OR it will decide
against it. Flip-flopping won't happen in practice.

If at all we decide to reset the stats when the plugin does not
provide them, I think a better fix is to set them to 0 in
pgstat_report_replslot() independent of previous state of has_stats.
It will be more or less same CPU instructions. like below
if (repSlotStat->plugin_has_stats)
{
REPLSLOT_ACC(plugin_sent_txns);
REPLSLOT_ACC(plugin_sent_bytes);
REPLSLOT_ACC(plugin_filtered_bytes);
}
else
{
statent->plugin_sent_txns = 0;
statent->plugin_sent_bytes = 0;
statent->plugin_filtered_bytes = 0
}

--
Best Wishes,
Ashutosh Bapat



Re: Report bytes and transactions actually sent downtream

От
Bertrand Drouvot
Дата:
Hi,

On Fri, Sep 26, 2025 at 06:14:28PM +0530, Ashutosh Bapat wrote:
> On Fri, Sep 26, 2025 at 4:43 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> >
> > === 2
> >
> > Issue 1 is that before any decoding happens, pg_stat_replication_slots is still
> > showing stale plugin statistics from a plugin that may no longer support stats.
> >
> > I'm not sure how we could easily fix this issue, as we don't know the plugin's
> > stats capability until we actually use it.
> >
> 
> I don't think this is an issue. There is no way for the core to tell
> whether the plugin will provide stats or not, unless it sets that
> ctx->stats which happens in the startup callback. Till then it is
> rightly providing the values accumulated so far. Once the decoding
> starts, we know that the plugin is not providing any stats and we
> don't display anything.

Yeah, I got the technical reasons, but I think there's a valid user experience
concern here: seeing statistics for a plugin that doesn't actually support
statistics is misleading.

What we need is a call to pgstat_report_replslot() to display stats that reflect
the current plugin behavior. We can't just call pgstat_report_replslot()
in say RestoreSlotFromDisk() because we really need the decoding to start.

So one idea could be to set a flag (per slot) when pgstat_report_replslot()
has been called (for good reasons) and check for this flag in
pg_stat_get_replication_slot().

If the flag is not set, then set the plugin fields to NULL.
If the flag is set, then display their values (like now).

And we should document that the plugin stats are not available (i.e are NULL)
until the decoding has valid stats to report after startup.

What do you think?

> 
> > -- Issue 2:
> >
> > Now it reports 10, that's the 9 before we changed the plugin to not have stats
> > enabled plus this new one.
> >
> > Issue 2: when switching from a non-stats plugin back to a stats-capable plugin, it
> > shows accumulated values from before the non-stats switch.
> 
> This too seems to be a non-issue to me. The stats in the view get
> reset only when a user resets them. So we shouldn't wipe out the
> already accumulated values just because the plugin stopped providing
> it. If the plugin keeps flip-flopping and only partial statistics
> provided by the plugin will be accumulated. That's the plugin's
> responsibility.

Okay but then I think that the plugin is missing some flexibility.

For example, how could the plugin set ctx->stats->sentTxns
to zero if it decides not to enable stats (while it was previously enable)?

Indeed, not enabling stats, means not doing
"ctx->stats = palloc0(sizeof(OutputPluginStats))" which means not having control
over the stats anymore.

So, with the current design, it has not other choice but having its previous
stats not reset to zero.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Report bytes and transactions actually sent downtream

От
Ashutosh Bapat
Дата:
On Fri, Sep 26, 2025 at 10:28 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> > >
> >
> > I don't think this is an issue. There is no way for the core to tell
> > whether the plugin will provide stats or not, unless it sets that
> > ctx->stats which happens in the startup callback. Till then it is
> > rightly providing the values accumulated so far. Once the decoding
> > starts, we know that the plugin is not providing any stats and we
> > don't display anything.
>
> Yeah, I got the technical reasons, but I think there's a valid user experience
> concern here: seeing statistics for a plugin that doesn't actually support
> statistics is misleading.
>

1. If the plugin never supported statistics, we will never report
stats. So nothing misleading there.
2. If the plugin starts supporting statistics and continues to do so,
we will report the stats since the time they are made available and
continue to do so. Nothing misleading there.
3. If the plugin starts supporting statistics and midway discontinues
its support, it already has a problem with backward compatibility.

Practically it would 1 or 2, which are working fine.

I don't think we will encounter case 3 practically. Do you have a
practical use case where a plugin would discontinue supporting stats?

Even in case 3, I think we need to consider the fact that these stats
are "cumulative". So if a plugin discontinues reporting stats, they
should go NULL only when the next accumulation action happens, not
before that.

> What we need is a call to pgstat_report_replslot() to display stats that reflect
> the current plugin behavior. We can't just call pgstat_report_replslot()
> in say RestoreSlotFromDisk() because we really need the decoding to start.
>
> So one idea could be to set a flag (per slot) when pgstat_report_replslot()
> has been called (for good reasons) and check for this flag in
> pg_stat_get_replication_slot().
>
> If the flag is not set, then set the plugin fields to NULL.
> If the flag is set, then display their values (like now).

This approach will have the same problem. Till
pgstat_report_replslot() is called, the old statistics will continue
to be shown.

>
> And we should document that the plugin stats are not available (i.e are NULL)
> until the decoding has valid stats to report after startup.

After "startup" would mislead users since then they will think that
the statistics will be NULL just before the decoding (re)starts.

The current documentation is " It is NULL when statistics is not
initialized or immediately after a reset or when not maintained by the
output plugin.". I think that covers all the cases.

--
Best Wishes,
Ashutosh Bapat



Re: Report bytes and transactions actually sent downtream

От
Bertrand Drouvot
Дата:
Hi,

On Mon, Sep 29, 2025 at 12:54:24PM +0530, Ashutosh Bapat wrote:
> On Fri, Sep 26, 2025 at 10:28 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > > >
> > >
> > > I don't think this is an issue. There is no way for the core to tell
> > > whether the plugin will provide stats or not, unless it sets that
> > > ctx->stats which happens in the startup callback. Till then it is
> > > rightly providing the values accumulated so far. Once the decoding
> > > starts, we know that the plugin is not providing any stats and we
> > > don't display anything.
> >
> > Yeah, I got the technical reasons, but I think there's a valid user experience
> > concern here: seeing statistics for a plugin that doesn't actually support
> > statistics is misleading.
> >
> 
> 3. If the plugin starts supporting statistics and midway discontinues
> its support, it already has a problem with backward compatibility.
> 
> Practically it would 1 or 2, which are working fine.
> 
> I don't think we will encounter case 3 practically. Do you have a
> practical use case where a plugin would discontinue supporting stats?

Not that I can think of currently. That looks unlikely but wanted to raise
the point though. Maybe others see a use case and/or have a different point
of view.

> > What we need is a call to pgstat_report_replslot() to display stats that reflect
> > the current plugin behavior. We can't just call pgstat_report_replslot()
> > in say RestoreSlotFromDisk() because we really need the decoding to start.
> >
> > So one idea could be to set a flag (per slot) when pgstat_report_replslot()
> > has been called (for good reasons) and check for this flag in
> > pg_stat_get_replication_slot().
> >
> > If the flag is not set, then set the plugin fields to NULL.
> > If the flag is set, then display their values (like now).
> 
> This approach will have the same problem. Till
> pgstat_report_replslot() is called, the old statistics will continue
> to be shown.

I don't think so because the flag would not be set.

> > And we should document that the plugin stats are not available (i.e are NULL)
> > until the decoding has valid stats to report after startup.
> 
> The current documentation is " It is NULL when statistics is not
> initialized or immediately after a reset or when not maintained by the
> output plugin.". I think that covers all the cases.

Do you think the doc covers the case we discussed above? i.e when a plugin
discontinue supporting stats, it would display stats until the decoding actually
starts.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com