Обсуждение: Report bytes and transactions actually sent downtream
Hi All, In a recent logical replication issue, there were multiple replication slots involved, each using a different publication. Thus the amount of data that was replicated through each slot was expected to be different. However, total_bytes and total_txns were reported the same for all the replication slots as expected. One of the slots started lagging and we were trying to figure out whether its the WAL sender slowing down or the consumer (in this case Debezium). The lagging slot then showed total_txns and total_bytes lesser than other slots giving an impression that the WAL sender is processing the data slowly. Had pg_stat_replication_slot reported the amount of data actually sent downstream, it would have been easier to compare it with the amount of data received by the consumer and thus pinpoint the bottleneck. Here's a patch to do the same. It adds two columns - sent_txns: The total number of transactions sent downstream. - sent_bytes: The total number of bytes sent downstream in data messages to pg_stat_replication_slots. sent_bytes includes only the bytes sent as part of 'd' messages and does not include keep alive messages or CopyDone messages for example. But those are very few and can be ignored. If others feel that those are important to be included, we can make that change. Plugins may choose not to send an empty transaction downstream. It's better to increment sent_txns counter in the plugin code when it actually sends a BEGIN message, for example in pgoutput_send_begin() and pg_output_begin(). This means that every plugin will need to be modified to increment the counter for it to reported correctly. I first thought of incrementing sent_bytes in OutputPluginWrite() which is a central function for all logical replication message writes. But that calls LogicalDecodingContext::write() which may further add bytes to the message e.g. WalSndWriteData() and LogicalOutputWrite(). So it's better to increment the counter in implementations of LogicalDecodingContext::write(), so that we count the exact number of bytes. These implementations are within core code so they won't miss updating sent_bytes. I think we should rename total_txns and total_bytes to reordered_txns and reordered_bytes respectively, and also update the documentation accordingly to make better sense of those numbers. But these patches do not contain that change. If others feel the same way, I will provide a patch with that change. -- Best Wishes, Ashutosh Bapat
Вложения
On Mon, Jun 30, 2025 at 3:24 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > Hi All, > In a recent logical replication issue, there were multiple replication > slots involved, each using a different publication. Thus the amount of > data that was replicated through each slot was expected to be > different. However, total_bytes and total_txns were reported the same > for all the replication slots as expected. One of the slots started > lagging and we were trying to figure out whether its the WAL sender > slowing down or the consumer (in this case Debezium). The lagging > slot then showed total_txns and total_bytes lesser than other slots > giving an impression that the WAL sender is processing the data > slowly. Had pg_stat_replication_slot reported the amount of data > actually sent downstream, it would have been easier to compare it with > the amount of data received by the consumer and thus pinpoint the > bottleneck. > > Here's a patch to do the same. It adds two columns > - sent_txns: The total number of transactions sent downstream. > - sent_bytes: The total number of bytes sent downstream in data messages > to pg_stat_replication_slots. sent_bytes includes only the bytes sent > as part of 'd' messages and does not include keep alive messages or > CopyDone messages for example. But those are very few and can be > ignored. If others feel that those are important to be included, we > can make that change. > > Plugins may choose not to send an empty transaction downstream. It's > better to increment sent_txns counter in the plugin code when it > actually sends a BEGIN message, for example in pgoutput_send_begin() > and pg_output_begin(). This means that every plugin will need to be > modified to increment the counter for it to reported correctly. > What if some plugin didn't implemented it or does it incorrectly? Users will then complain that PG view is showing incorrect value. Shouldn't the plugin specific stats be shown differently, for example, one may be interested in how much plugin has filtered the data because it was not published or because something like row_filter caused it skip sending such data? -- With Regards, Amit Kapila.
On Tue, Jul 1, 2025 at 4:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Jun 30, 2025 at 3:24 PM Ashutosh Bapat > <ashutosh.bapat.oss@gmail.com> wrote: > > > > Hi All, > > In a recent logical replication issue, there were multiple replication > > slots involved, each using a different publication. Thus the amount of > > data that was replicated through each slot was expected to be > > different. However, total_bytes and total_txns were reported the same > > for all the replication slots as expected. One of the slots started > > lagging and we were trying to figure out whether its the WAL sender > > slowing down or the consumer (in this case Debezium). The lagging > > slot then showed total_txns and total_bytes lesser than other slots > > giving an impression that the WAL sender is processing the data > > slowly. Had pg_stat_replication_slot reported the amount of data > > actually sent downstream, it would have been easier to compare it with > > the amount of data received by the consumer and thus pinpoint the > > bottleneck. > > > > Here's a patch to do the same. It adds two columns > > - sent_txns: The total number of transactions sent downstream. > > - sent_bytes: The total number of bytes sent downstream in data messages > > to pg_stat_replication_slots. sent_bytes includes only the bytes sent > > as part of 'd' messages and does not include keep alive messages or > > CopyDone messages for example. But those are very few and can be > > ignored. If others feel that those are important to be included, we > > can make that change. > > > > Plugins may choose not to send an empty transaction downstream. It's > > better to increment sent_txns counter in the plugin code when it > > actually sends a BEGIN message, for example in pgoutput_send_begin() > > and pg_output_begin(). This means that every plugin will need to be > > modified to increment the counter for it to reported correctly. > > > > What if some plugin didn't implemented it or does it incorrectly? > Users will then complain that PG view is showing incorrect value. That is right. To fix the problem of plugins not implementing the counter increment logic we could use logic similar to how we track whether OutputPluginPrepareWrite() has been called or not. In ReorderBufferTxn, we add a new member sent_status which would be an enum with 3 values UNKNOWN, SENT, NOT_SENT. Initially the sent_status = UNKNOWN. We provide a function called plugin_sent_txn(ReorderBufferTxn txn, sent bool) which will set sent_status = SENT when sent = true and sent_status = NOT_SENT when sent = false. In all the end transaction callback wrappers like commit_cb_wrapper(), prepare_cb_wrapper(), stream_abort_cb_wrapper(), stream_commit_cb_wrapper() and stream_prepare_cb_wrapper(), if tsent_status = UNKNOWN, we throw an error. If sent_status = SENT, we increment sent_txns. That will catch any plugin which does not call plugin_set_txn(). The plugin may still call plugin_sent_txn() with sent = true when it should have called it with sent = false or vice versa, but that's hard to monitor and control. Additionally, we should highlight in the document that sent_txns is as per report from the output plugin so that users know where to look for in case they see a wrong/dubious value. I see this similar to what we do with pg_stat_replication::reply_time which may be wrong if a non-PG standby reports the wrong value. Documentation says "Send time of last reply message received from standby server", so the users know where to look for incase they spot the error. Does that look good? I am open to other suggestions. > Shouldn't the plugin specific stats be shown differently, for example, > one may be interested in how much plugin has filtered the data because > it was not published or because something like row_filter caused it > skip sending such data? > That looks useful, we could track the ReorderBufferChange's that were not sent downstream and add their sizes to another counter ReorderBuffer::filtered_bytes and report it in pg_stat_replication_slots. I think we will need to devise a mechanism similar to above by which the plugin tells core whether a change has been filtered or not. However, that will not be a replacement for sent_bytes, since filtered_bytes or total_bytes - filtered_bytes won't tell us how much data was sent downstream, which is crucial to the purpose stated in my earlier email. -- Best Wishes, Ashutosh Bapat
On Tue, Jul 1, 2025 at 7:35 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > On Tue, Jul 1, 2025 at 4:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Mon, Jun 30, 2025 at 3:24 PM Ashutosh Bapat > > <ashutosh.bapat.oss@gmail.com> wrote: > > > > > > Hi All, > > > In a recent logical replication issue, there were multiple replication > > > slots involved, each using a different publication. Thus the amount of > > > data that was replicated through each slot was expected to be > > > different. However, total_bytes and total_txns were reported the same > > > for all the replication slots as expected. One of the slots started > > > lagging and we were trying to figure out whether its the WAL sender > > > slowing down or the consumer (in this case Debezium). The lagging > > > slot then showed total_txns and total_bytes lesser than other slots > > > giving an impression that the WAL sender is processing the data > > > slowly. Had pg_stat_replication_slot reported the amount of data > > > actually sent downstream, it would have been easier to compare it with > > > the amount of data received by the consumer and thus pinpoint the > > > bottleneck. > > > > > > Here's a patch to do the same. It adds two columns > > > - sent_txns: The total number of transactions sent downstream. > > > - sent_bytes: The total number of bytes sent downstream in data messages > > > to pg_stat_replication_slots. sent_bytes includes only the bytes sent > > > as part of 'd' messages and does not include keep alive messages or > > > CopyDone messages for example. But those are very few and can be > > > ignored. If others feel that those are important to be included, we > > > can make that change. > > > > > > Plugins may choose not to send an empty transaction downstream. It's > > > better to increment sent_txns counter in the plugin code when it > > > actually sends a BEGIN message, for example in pgoutput_send_begin() > > > and pg_output_begin(). This means that every plugin will need to be > > > modified to increment the counter for it to reported correctly. > > > > > > > What if some plugin didn't implemented it or does it incorrectly? > > Users will then complain that PG view is showing incorrect value. > > That is right. > > To fix the problem of plugins not implementing the counter increment > logic we could use logic similar to how we track whether > OutputPluginPrepareWrite() has been called or not. In > ReorderBufferTxn, we add a new member sent_status which would be an > enum with 3 values UNKNOWN, SENT, NOT_SENT. Initially the sent_status > = UNKNOWN. We provide a function called > plugin_sent_txn(ReorderBufferTxn txn, sent bool) which will set > sent_status = SENT when sent = true and sent_status = NOT_SENT when > sent = false. In all the end transaction callback wrappers like > commit_cb_wrapper(), prepare_cb_wrapper(), stream_abort_cb_wrapper(), > stream_commit_cb_wrapper() and stream_prepare_cb_wrapper(), if > tsent_status = UNKNOWN, we throw an error. > I think we don't want to make it mandatory for plugins to implement these stats, so instead of throwing ERROR, the view should show that the plugin doesn't provide stats. How about having OutputPluginStats similar to OutputPluginCallbacks and OutputPluginOptions members in LogicalDecodingContext? It will have members like stats_available, txns_sent or txns_skipped, txns_filtered, etc. I am thinking it will be better to provide this information in a separate view like pg_stat_plugin_stats or something like that, here we can report slot_name, plugin_name, then the other stats we want to implement part of OutputPluginStats. -- With Regards, Amit Kapila.
On Sun, Jul 13, 2025 at 4:34 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Jul 1, 2025 at 7:35 PM Ashutosh Bapat > <ashutosh.bapat.oss@gmail.com> wrote: > > > > On Tue, Jul 1, 2025 at 4:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Mon, Jun 30, 2025 at 3:24 PM Ashutosh Bapat > > > <ashutosh.bapat.oss@gmail.com> wrote: > > > > > > > > Hi All, > > > > In a recent logical replication issue, there were multiple replication > > > > slots involved, each using a different publication. Thus the amount of > > > > data that was replicated through each slot was expected to be > > > > different. However, total_bytes and total_txns were reported the same > > > > for all the replication slots as expected. One of the slots started > > > > lagging and we were trying to figure out whether its the WAL sender > > > > slowing down or the consumer (in this case Debezium). The lagging > > > > slot then showed total_txns and total_bytes lesser than other slots > > > > giving an impression that the WAL sender is processing the data > > > > slowly. Had pg_stat_replication_slot reported the amount of data > > > > actually sent downstream, it would have been easier to compare it with > > > > the amount of data received by the consumer and thus pinpoint the > > > > bottleneck. > > > > > > > > Here's a patch to do the same. It adds two columns > > > > - sent_txns: The total number of transactions sent downstream. > > > > - sent_bytes: The total number of bytes sent downstream in data messages > > > > to pg_stat_replication_slots. sent_bytes includes only the bytes sent > > > > as part of 'd' messages and does not include keep alive messages or > > > > CopyDone messages for example. But those are very few and can be > > > > ignored. If others feel that those are important to be included, we > > > > can make that change. > > > > > > > > Plugins may choose not to send an empty transaction downstream. It's > > > > better to increment sent_txns counter in the plugin code when it > > > > actually sends a BEGIN message, for example in pgoutput_send_begin() > > > > and pg_output_begin(). This means that every plugin will need to be > > > > modified to increment the counter for it to reported correctly. > > > > > > > > > > What if some plugin didn't implemented it or does it incorrectly? > > > Users will then complain that PG view is showing incorrect value. > > > > That is right. > > > > To fix the problem of plugins not implementing the counter increment > > logic we could use logic similar to how we track whether > > OutputPluginPrepareWrite() has been called or not. In > > ReorderBufferTxn, we add a new member sent_status which would be an > > enum with 3 values UNKNOWN, SENT, NOT_SENT. Initially the sent_status > > = UNKNOWN. We provide a function called > > plugin_sent_txn(ReorderBufferTxn txn, sent bool) which will set > > sent_status = SENT when sent = true and sent_status = NOT_SENT when > > sent = false. In all the end transaction callback wrappers like > > commit_cb_wrapper(), prepare_cb_wrapper(), stream_abort_cb_wrapper(), > > stream_commit_cb_wrapper() and stream_prepare_cb_wrapper(), if > > tsent_status = UNKNOWN, we throw an error. > > > > I think we don't want to make it mandatory for plugins to implement > these stats, so instead of throwing ERROR, the view should show that > the plugin doesn't provide stats. How about having OutputPluginStats > similar to OutputPluginCallbacks and OutputPluginOptions members in > LogicalDecodingContext? It will have members like stats_available, > txns_sent or txns_skipped, txns_filtered, etc. Not making mandatory looks useful. I can try your suggestion. Rather than having stats_available as a member of OutputPluginStats, it's better to have a NULL value for the corresponding member in LogicalDecodingContext. We don't want an output plugin to reset stats_available once set. Will that work? > I am thinking it will > be better to provide this information in a separate view like > pg_stat_plugin_stats or something like that, here we can report > slot_name, plugin_name, then the other stats we want to implement part > of OutputPluginStats. As you have previously pointed out, the view should make it explicit that the new stats are maintained by the plugin and not core. I agree with that intention. However, already have three views pg_replication_slots (which has slot name and plugin name), then pg_replication_stats which is about stats maintained by a WAL sender or running replication and then pg_stat_replication_slots, which is about accumulated statistics for a replication through a given replication slot. It's already a bit hard to keep track of who's who when debugging an issue. Adding one more view will add to confusion. Instead of adding a new view how about a. name the columns as plugin_sent_txns, plugin_sent_bytes, plugin_filtered_change_bytes to make it clear that these columns are maintained by plugin b. report these NULL if stats_available = false OR OutputPluginStats is not set in LogicalDecodingContext c. Document that NULL value for these columns indicates that the plugin is not maintaining/reporting these stats d. adding plugin name to pg_stat_replication_slots, that will make it easy for users to know which plugin they should look at in case of dubious or unavailable stats -- Best Wishes, Ashutosh Bapat
On Mon, Jul 14, 2025 at 10:55 AM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > On Sun, Jul 13, 2025 at 4:34 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > I think we don't want to make it mandatory for plugins to implement > > these stats, so instead of throwing ERROR, the view should show that > > the plugin doesn't provide stats. How about having OutputPluginStats > > similar to OutputPluginCallbacks and OutputPluginOptions members in > > LogicalDecodingContext? It will have members like stats_available, > > txns_sent or txns_skipped, txns_filtered, etc. > > Not making mandatory looks useful. I can try your suggestion. Rather > than having stats_available as a member of OutputPluginStats, it's > better to have a NULL value for the corresponding member in > LogicalDecodingContext. We don't want an output plugin to reset > stats_available once set. Will that work? > We can try that. > > I am thinking it will > > be better to provide this information in a separate view like > > pg_stat_plugin_stats or something like that, here we can report > > slot_name, plugin_name, then the other stats we want to implement part > > of OutputPluginStats. > > As you have previously pointed out, the view should make it explicit > that the new stats are maintained by the plugin and not core. I agree > with that intention. However, already have three views > pg_replication_slots (which has slot name and plugin name), then > pg_replication_stats which is about stats maintained by a WAL sender > or running replication and then pg_stat_replication_slots, which is > about accumulated statistics for a replication through a given > replication slot. It's already a bit hard to keep track of who's who > when debugging an issue. Adding one more view will add to confusion. > > Instead of adding a new view how about > a. name the columns as plugin_sent_txns, plugin_sent_bytes, > plugin_filtered_change_bytes to make it clear that these columns are > maintained by plugin > b. report these NULL if stats_available = false OR OutputPluginStats > is not set in LogicalDecodingContext > c. Document that NULL value for these columns indicates that the > plugin is not maintaining/reporting these stats > d. adding plugin name to pg_stat_replication_slots, that will make it > easy for users to know which plugin they should look at in case of > dubious or unavailable stats > Sounds reasonable. -- With Regards, Amit Kapila.
Hi Amit, On Mon, Jul 14, 2025 at 3:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Jul 14, 2025 at 10:55 AM Ashutosh Bapat > <ashutosh.bapat.oss@gmail.com> wrote: > > > > On Sun, Jul 13, 2025 at 4:34 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > I think we don't want to make it mandatory for plugins to implement > > > these stats, so instead of throwing ERROR, the view should show that > > > the plugin doesn't provide stats. How about having OutputPluginStats > > > similar to OutputPluginCallbacks and OutputPluginOptions members in > > > LogicalDecodingContext? It will have members like stats_available, > > > txns_sent or txns_skipped, txns_filtered, etc. > > > > Not making mandatory looks useful. I can try your suggestion. Rather > > than having stats_available as a member of OutputPluginStats, it's > > better to have a NULL value for the corresponding member in > > LogicalDecodingContext. We don't want an output plugin to reset > > stats_available once set. Will that work? > > > > We can try that. > > > > I am thinking it will > > > be better to provide this information in a separate view like > > > pg_stat_plugin_stats or something like that, here we can report > > > slot_name, plugin_name, then the other stats we want to implement part > > > of OutputPluginStats. > > > > As you have previously pointed out, the view should make it explicit > > that the new stats are maintained by the plugin and not core. I agree > > with that intention. However, already have three views > > pg_replication_slots (which has slot name and plugin name), then > > pg_replication_stats which is about stats maintained by a WAL sender > > or running replication and then pg_stat_replication_slots, which is > > about accumulated statistics for a replication through a given > > replication slot. It's already a bit hard to keep track of who's who > > when debugging an issue. Adding one more view will add to confusion. > > > > Instead of adding a new view how about > > a. name the columns as plugin_sent_txns, plugin_sent_bytes, > > plugin_filtered_change_bytes to make it clear that these columns are > > maintained by plugin > > b. report these NULL if stats_available = false OR OutputPluginStats > > is not set in LogicalDecodingContext > > c. Document that NULL value for these columns indicates that the > > plugin is not maintaining/reporting these stats > > d. adding plugin name to pg_stat_replication_slots, that will make it > > easy for users to know which plugin they should look at in case of > > dubious or unavailable stats > > > > Sounds reasonable. Here's the next patch which considers all the discussion so far. It adds four fields to pg_stat_replication_slots. - plugin - name of the output plugin - plugin_filtered_bytes - reports the amount of changes filtered out by the output plugin - plugin_sent_txns - the amount of transactions sent downstream by the output plugin - plugin_sent_bytes - the amount of data sent downstream by the outputplugin. There are some points up for a discussion: 1. pg_stat_reset_replication_slot() zeroes out the statistics entry by calling pgstat_reset() or pgstat_reset_of_kind() which don't know about the contents of the entry. So PgStat_StatReplSlotEntry::plugin_has_stats is set to false and plugin stats are reported as NULL, instead of zero, immediately after reset. This is the same case when the stats is queried immediately after the statistics is initialized and before any stats are reported. We could instead make it report zero, if we save the plugin_has_stats and restore it after reset. But doing that in pgstat_reset_of_kind() seems like an extra overhead + we will need to write a function to find all replication slot entries. Resetting the stats would be a rare event in practice. Trying to report 0 instead of NULL in that rare case doesn't seem to be worth the efforts and code. Given that the core code doesn't know whether a given plugin reports stats or not, I think this behaviour is appropriate as long as we document it. Please let me know if the documentation in the patch is clear enough. 2. There's also a bit of asymmetry in the way sent_bytes is handled. The code which actually sends the logical changes to the downstream is part of the core code but the format of the change and hence the number of bytes sent is decided by the plugin. It's a stat related to plugin but maintained by the core code. The patch implements it as a plugin stat (so the corresponding column has "plugin" prefix and is also reported as NULL upon reset etc.), but we may want to reconsider how to report and maintain it. 3. The names of new columns have the prefix "plugin_" but the internal variables tracking those don't for the sake of brevity. If you prefer to have the same prefix for the internal variables, I can change that. I think I have covered all the cases where filteredbytes should be incremented, but please let me know if I have missed any. -- Best Wishes, Ashutosh Bapat
Вложения
Hi, On Thu, Jul 24, 2025 at 12:24:26PM +0530, Ashutosh Bapat wrote: > Here's the next patch which considers all the discussion so far. It > adds four fields to pg_stat_replication_slots. > - plugin - name of the output plugin Is this one needed? (we could get it with a join on pg_replication_slots) > - plugin_filtered_bytes - reports the amount of changes filtered > out by the output plugin > - plugin_sent_txns - the amount of transactions sent downstream by > the output plugin > - plugin_sent_bytes - the amount of data sent downstream by the > outputplugin. > > There are some points up for a discussion: > 1. pg_stat_reset_replication_slot() zeroes out the statistics entry by > calling pgstat_reset() or pgstat_reset_of_kind() which don't know > about the contents of the entry. So > PgStat_StatReplSlotEntry::plugin_has_stats is set to false and plugin > stats are reported as NULL, instead of zero, immediately after reset. > This is the same case when the stats is queried immediately after the > statistics is initialized and before any stats are reported. We could > instead make it report > zero, if we save the plugin_has_stats and restore it after reset. But > doing that in pgstat_reset_of_kind() seems like an extra overhead + we > will need to write a function to find all replication slot entries. Could we store plugin_has_stats in ReplicationSlotPersistentData instead? That way it would not be reset. We would need to access ReplicationSlotPersistentData in pg_stat_get_replication_slot though. Also would that make sense to expose plugin_has_stats in pg_replication_slots? > 2. There's also a bit of asymmetry in the way sent_bytes is handled. > The code which actually sends the logical changes to the downstream is > part of the core code > but the format of the change and hence the number of bytes sent is > decided by the plugin. It's a stat related to plugin but maintained by > the core code. The patch implements it as a plugin stat (so the > corresponding column has "plugin" prefix The way it is done makes sense to me. > 3. The names of new columns have the prefix "plugin_" but the internal > variables tracking those don't for the sake of brevity. If you prefer > to have the same prefix for the internal variables, I can change that. Just my taste: I do prefer when they match. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Wed, Aug 27, 2025 at 7:14 PM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > > Hi, > > On Thu, Jul 24, 2025 at 12:24:26PM +0530, Ashutosh Bapat wrote: > > Here's the next patch which considers all the discussion so far. It > > adds four fields to pg_stat_replication_slots. > > - plugin - name of the output plugin > > Is this one needed? (we could get it with a join on pg_replication_slots) > In my opinion, when there are other plugin_* fields present, including the plugin name directly here seems like a better approach. So, +1 for the plugin field. > > - plugin_filtered_bytes - reports the amount of changes filtered > > out by the output plugin > > - plugin_sent_txns - the amount of transactions sent downstream by > > the output plugin > > - plugin_sent_bytes - the amount of data sent downstream by the > > outputplugin. > > > > There are some points up for a discussion: > > 1. pg_stat_reset_replication_slot() zeroes out the statistics entry by > > calling pgstat_reset() or pgstat_reset_of_kind() which don't know > > about the contents of the entry. So > > PgStat_StatReplSlotEntry::plugin_has_stats is set to false and plugin > > stats are reported as NULL, instead of zero, immediately after reset. > > This is the same case when the stats is queried immediately after the > > statistics is initialized and before any stats are reported. We could > > instead make it report > > zero, if we save the plugin_has_stats and restore it after reset. But > > doing that in pgstat_reset_of_kind() seems like an extra overhead + we > > will need to write a function to find all replication slot entries. I tried to think of an approach where we can differentiate between the cases 'not initialized' and 'reset' ones with the values. Say instead of plugin_has_stats, if we have plugin_stats_status, then we can maintain status like -1(not initialized), 0(reset). But this too will complicate the code further. Personally, I’m okay with NULL values appearing even after a reset, especially since the documentation explains this clearly. > > > 2. There's also a bit of asymmetry in the way sent_bytes is handled. > > The code which actually sends the logical changes to the downstream is > > part of the core code > > but the format of the change and hence the number of bytes sent is > > decided by the plugin. It's a stat related to plugin but maintained by > > the core code. The patch implements it as a plugin stat (so the > > corresponding column has "plugin" prefix > > The way it is done makes sense to me. > > > 3. The names of new columns have the prefix "plugin_" but the internal > > variables tracking those don't for the sake of brevity. If you prefer > > to have the same prefix for the internal variables, I can change that. > I am okay either way. Few comments: 1) postgres=# select slot_name, total_bytes,plugin_filtered_bytes,plugin_sent_bytes from pg_stat_replication_slots order by slot_name; slot_name | total_bytes | plugin_filtered_bytes | plugin_sent_bytes -----------+-------------+-----------------------+------------------- slot1 | 800636 | 793188 | 211 sub1 | 401496 | 132712 | 84041 sub2 | 401496 | 396184 | 674 sub3 | 401496 | 145912 | 79959 (4 rows) Currently it looks quite confusing. 'total_bytes' gives a sense that it has to be a sum of filtered and sent. But they are no way like that. In the thread earlier there was a proposal to change the name to reordered_txns, reordered_bytes. That looks better to me. It will give clarity without even someone digging into docs. 2) Tried to verify all filtered data tests, seems to work well. Also I tried tracking the usage of OutputPluginWrite() to see if there is any other place where data needs to be considered as filtered-data. Encountered this: send_relation_and_attrs has: if (!logicalrep_should_publish_column(att, columns, include_gencols_type)) continue; if (att->atttypid < FirstGenbkiObjectId) continue; But I don't think it needs to be considered as filtered data. This is mostly schema related info. But I wanted to confirm once. Thoughts? 3) +-- total_txns may vary based on the background activity but sent_txns should +-- always be 1 since the background transactions are always skipped. Filtered +-- bytes would be set only when there's a change that was passed to the plugin +-- but was filtered out. Depending upon the background transactions, filtered +-- bytes may or may not be zero. +SELECT slot_name, spill_txns = 0 AS spill_txns, spill_count = 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes, plugin_sent_txns, plugin_sent_bytes > 0 AS sent_bytes, plugin_filtered_bytes >= 0 AS filtered_bytes FROM pg_stat_replication_slots ORDER BY slot_name; In comment either we can say plugin_sent_txns instead of sent_txns or in the query we can fetch plugin_sent_txns AS sent_txns, so that we can relate comment and query. 4) + <literal>sentTxns</literal> is the number of transactions sent downstream + by the output plugin. <literal>sentBytes</literal> is the amount of data + sent downstream by the output plugin. + <function>OutputPluginWrite</function> is expected to update this counter + if <literal>ctx->stats</literal> is initialized by the output plugin. + <literal>filteredBytes</literal> is the size of changes in bytes that are + filtered out by the output plugin. Function + <literal>ReorderBufferChangeSize</literal> may be used to find the size of + filtered <literal>ReorderBufferChange</literal>. + </para> Either we can mention units as 'bytes' for both filteredBytes and sentBytes or for none. Currently filteredBytes says 'in bytes' while sentBytes does not. thanks Shveta
Hi Shveta, Bertrand, Replying to both of your review comments together. On Thu, Sep 18, 2025 at 10:52 AM shveta malik <shveta.malik@gmail.com> wrote: > > On Wed, Aug 27, 2025 at 7:14 PM Bertrand Drouvot > <bertranddrouvot.pg@gmail.com> wrote: > > > > Hi, > > > > On Thu, Jul 24, 2025 at 12:24:26PM +0530, Ashutosh Bapat wrote: > > > Here's the next patch which considers all the discussion so far. It > > > adds four fields to pg_stat_replication_slots. > > > - plugin - name of the output plugin > > > > Is this one needed? (we could get it with a join on pg_replication_slots) > > > > In my opinion, when there are other plugin_* fields present, including > the plugin name directly here seems like a better approach. So, +1 for > the plugin field. Yeah. I think so too. > > > > - plugin_filtered_bytes - reports the amount of changes filtered > > > out by the output plugin > > > - plugin_sent_txns - the amount of transactions sent downstream by > > > the output plugin > > > - plugin_sent_bytes - the amount of data sent downstream by the > > > outputplugin. > > > > > > There are some points up for a discussion: > > > 1. pg_stat_reset_replication_slot() zeroes out the statistics entry by > > > calling pgstat_reset() or pgstat_reset_of_kind() which don't know > > > about the contents of the entry. So > > > PgStat_StatReplSlotEntry::plugin_has_stats is set to false and plugin > > > stats are reported as NULL, instead of zero, immediately after reset. > > > This is the same case when the stats is queried immediately after the > > > statistics is initialized and before any stats are reported. We could > > > instead make it report > > > zero, if we save the plugin_has_stats and restore it after reset. But > > > doing that in pgstat_reset_of_kind() seems like an extra overhead + we > > > will need to write a function to find all replication slot entries. > > I tried to think of an approach where we can differentiate between the > cases 'not initialized' and 'reset' ones with the values. Say instead > of plugin_has_stats, if we have plugin_stats_status, then we can > maintain status like -1(not initialized), 0(reset). But this too will > complicate the code further. Personally, I’m okay with NULL values > appearing even after a reset, especially since the documentation > explains this clearly. Ok. Great. > > > Could we store plugin_has_stats in ReplicationSlotPersistentData instead? That > > way it would not be reset. We would need to access ReplicationSlotPersistentData > > in pg_stat_get_replication_slot though. > > > Also would that make sense to expose plugin_has_stats in pg_replication_slots? > A plugin may change its decision to support the stats across versions, we won't be able to tell when it changes that decision and thus reflect it accurately in ReplicationSlotPersistentData. Doing it in startup gives the opportunity to the plugin to change it as often as it wants OR even based on some plugin specific configurations. Further ReplicationSlotPersistentData is maintained by the core. It will not be a good place to store something plugin specific. > > > > > 2. There's also a bit of asymmetry in the way sent_bytes is handled. > > > The code which actually sends the logical changes to the downstream is > > > part of the core code > > > but the format of the change and hence the number of bytes sent is > > > decided by the plugin. It's a stat related to plugin but maintained by > > > the core code. The patch implements it as a plugin stat (so the > > > corresponding column has "plugin" prefix > > > > The way it is done makes sense to me. Great. > > > > > 3. The names of new columns have the prefix "plugin_" but the internal > > > variables tracking those don't for the sake of brevity. If you prefer > > > to have the same prefix for the internal variables, I can change that. > > > > I am okay either way. > > > Just my taste: I do prefer when they match. I don't see a strong preference to change what's there in the patch. Let's wait for more reviews. > > Few comments: > > 1) > postgres=# select slot_name, > total_bytes,plugin_filtered_bytes,plugin_sent_bytes from > pg_stat_replication_slots order by slot_name; > slot_name | total_bytes | plugin_filtered_bytes | plugin_sent_bytes > -----------+-------------+-----------------------+------------------- > slot1 | 800636 | 793188 | 211 > sub1 | 401496 | 132712 | 84041 > sub2 | 401496 | 396184 | 674 > sub3 | 401496 | 145912 | 79959 > (4 rows) > > Currently it looks quite confusing. 'total_bytes' gives a sense that > it has to be a sum of filtered and sent. But they are no way like > that. In the thread earlier there was a proposal to change the name to > reordered_txns, reordered_bytes. That looks better to me. It will give > clarity without even someone digging into docs. I also agree with that. But that will break backward compatibility. Do you think other columns like spill_* and stream_* should also be renamed with the prefix "reordered"? > > 2) > Tried to verify all filtered data tests, seems to work well. Also I > tried tracking the usage of OutputPluginWrite() to see if there is any > other place where data needs to be considered as filtered-data. > Encountered this: > > send_relation_and_attrs has: > if (!logicalrep_should_publish_column(att, columns, > > include_gencols_type)) > continue; > if (att->atttypid < FirstGenbkiObjectId) > continue; > > But I don't think it needs to be considered as filtered data. This is > mostly schema related info. But I wanted to confirm once. Thoughts? Yeah. It's part of metadata which in turn is sent only when needed. It's not part of, say, transaction changes. So it can't be considered as filtering. > > 3) > +-- total_txns may vary based on the background activity but sent_txns should > +-- always be 1 since the background transactions are always skipped. Filtered > +-- bytes would be set only when there's a change that was passed to the plugin > +-- but was filtered out. Depending upon the background transactions, filtered > +-- bytes may or may not be zero. > +SELECT slot_name, spill_txns = 0 AS spill_txns, spill_count = 0 AS > spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS > total_bytes, plugin_sent_txns, plugin_sent_bytes > 0 AS sent_bytes, > plugin_filtered_bytes >= 0 AS filtered_bytes FROM > pg_stat_replication_slots ORDER BY slot_name; > > In comment either we can say plugin_sent_txns instead of sent_txns or > in the query we can fetch plugin_sent_txns AS sent_txns, so that we > can relate comment and query. > Used plugin_sent_txns in the comment as well as query. > > 4) > + <literal>sentTxns</literal> is the number of transactions sent downstream > + by the output plugin. <literal>sentBytes</literal> is the amount of data > + sent downstream by the output plugin. > + <function>OutputPluginWrite</function> is expected to update this counter > + if <literal>ctx->stats</literal> is initialized by the output plugin. > + <literal>filteredBytes</literal> is the size of changes in bytes that are > + filtered out by the output plugin. Function > + <literal>ReorderBufferChangeSize</literal> may be used to find > the size of > + filtered <literal>ReorderBufferChange</literal>. > + </para> > > Either we can mention units as 'bytes' for both filteredBytes and > sentBytes or for none. Currently filteredBytes says 'in bytes' while > sentBytes does not. Used 'in bytes' in both the places. Thanks for your review. I will include these changes in the next set of patches. -- Best Wishes, Ashutosh Bapat
On Thu, Sep 18, 2025 at 3:54 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > > > > > Few comments: > > > > 1) > > postgres=# select slot_name, > > total_bytes,plugin_filtered_bytes,plugin_sent_bytes from > > pg_stat_replication_slots order by slot_name; > > slot_name | total_bytes | plugin_filtered_bytes | plugin_sent_bytes > > -----------+-------------+-----------------------+------------------- > > slot1 | 800636 | 793188 | 211 > > sub1 | 401496 | 132712 | 84041 > > sub2 | 401496 | 396184 | 674 > > sub3 | 401496 | 145912 | 79959 > > (4 rows) > > > > Currently it looks quite confusing. 'total_bytes' gives a sense that > > it has to be a sum of filtered and sent. But they are no way like > > that. In the thread earlier there was a proposal to change the name to > > reordered_txns, reordered_bytes. That looks better to me. It will give > > clarity without even someone digging into docs. > > I also agree with that. But that will break backward compatibility. Yes, that it will do. > Do > you think other columns like spill_* and stream_* should also be > renamed with the prefix "reordered"? > Okay, I see that all fields in pg_stat_replication_slots are related to the ReorderBuffer. On reconsideration, I’m unsure whether it's appropriate to prefix all of them with reorderd_. For example, renaming spill_bytes and stream_bytes to reordered_spill_bytes and reordered_stream_bytes. These names start to feel overly long, and I also noticed that ReorderBuffer isn’t clearly defined anywhere in the documentation (or at least I couldn’t find it), even though the term 'reorder buffer' does appear in a few places. As an example, see ReorderBufferRead, ReorderBufferWrite wait-types at [1]. Also in plugin-doc [2], we use 'ReorderBufferTXN'. And now, we are adding: ReorderBufferChangeSize, ReorderBufferChange This gives me a feeling, will it be better to let pg_stat_replication_slots as is and add a brief ReorderBuffer section under Logical Decoding concepts [3] just before Output Plugins. And then, pg_stat_replication_slots can refer to that section, clarifying that the bytes, counts, and txn fields pertain to ReorderBuffer (without changing any of the fields). And then to define plugin related data, we can have a new view, say pg_stat_plugin_stats (as Amit suggested earlier) or pg_stat_replication_plugins. I understand that adding a new view might not be desirable, but it provides better clarity without requiring changes to the existing fields in pg_stat_replication_slots. I also strongly feel that to properly tie all this information together, a brief definition of the ReorderBuffer is needed. Other pages that reference this term can then point to that section. Thoughts? [1]: https://www.postgresql.org/docs/17/monitoring-stats.html#WAIT-EVENT-IO-TABLE [2]: https://www.postgresql.org/docs/17/logicaldecoding-output-plugin.html [3]: https://www.postgresql.org/docs/17/logicaldecoding-explanation.html#LOGICALDECODING-EXPLANATION thanks Shveta
On Fri, Sep 19, 2025 at 11:48 AM shveta malik <shveta.malik@gmail.com> wrote: > > On Thu, Sep 18, 2025 at 3:54 PM Ashutosh Bapat > <ashutosh.bapat.oss@gmail.com> wrote: > > > > > > > > > > Few comments: > > > > > > 1) > > > postgres=# select slot_name, > > > total_bytes,plugin_filtered_bytes,plugin_sent_bytes from > > > pg_stat_replication_slots order by slot_name; > > > slot_name | total_bytes | plugin_filtered_bytes | plugin_sent_bytes > > > -----------+-------------+-----------------------+------------------- > > > slot1 | 800636 | 793188 | 211 > > > sub1 | 401496 | 132712 | 84041 > > > sub2 | 401496 | 396184 | 674 > > > sub3 | 401496 | 145912 | 79959 > > > (4 rows) > > > > > > Currently it looks quite confusing. 'total_bytes' gives a sense that > > > it has to be a sum of filtered and sent. But they are no way like > > > that. In the thread earlier there was a proposal to change the name to > > > reordered_txns, reordered_bytes. That looks better to me. It will give > > > clarity without even someone digging into docs. > > > > I also agree with that. But that will break backward compatibility. > > Yes, that it will do. > > > Do > > you think other columns like spill_* and stream_* should also be > > renamed with the prefix "reordered"? > > > > Okay, I see that all fields in pg_stat_replication_slots are related > to the ReorderBuffer. On reconsideration, I’m unsure whether it's > appropriate to prefix all of them with reorderd_. For example, > renaming spill_bytes and stream_bytes to reordered_spill_bytes and > reordered_stream_bytes. These names start to feel overly long, and I > also noticed that ReorderBuffer isn’t clearly defined anywhere in the > documentation (or at least I couldn’t find it), even though the term > 'reorder buffer' does appear in a few places. > > As an example, see ReorderBufferRead, ReorderBufferWrite wait-types > at [1]. Also in plugin-doc [2], we use 'ReorderBufferTXN'. And now, we > are adding: ReorderBufferChangeSize, ReorderBufferChange > > This gives me a feeling, will it be better to let > pg_stat_replication_slots as is and add a brief ReorderBuffer section > under Logical Decoding concepts [3] just before Output Plugins. And > then, pg_stat_replication_slots can refer to that section, clarifying > that the bytes, counts, and txn fields pertain to ReorderBuffer > (without changing any of the fields). > > And then to define plugin related data, we can have a new view, say > pg_stat_plugin_stats (as Amit suggested earlier) or > pg_stat_replication_plugins. I understand that adding a new view might > not be desirable, but it provides better clarity without requiring > changes to the existing fields in pg_stat_replication_slots. I also > strongly feel that to properly tie all this information together, a > brief definition of the ReorderBuffer is needed. Other pages that > reference this term can then point to that section. Thoughts? Even if we keep two views, when they are joined, users will still get confused by total_* names. So it's not solving the underlying problem. Andres had raised the point about renaming total_* fields with me off-list earlier. He suggested names total_wal_bytes, and total_wal_txns in an off list discussion today. I think those convey the true meaning - that these are txns and bytes that come from WAL. Used those in the attached patches. Prefix reordered would give away lower level details, so I didn't use it. I agree that it would be good to mention ReorderBuffer in the logical decoding concepts section since it mentions structures ReorderBuffer*. But that would be a separate patch since we aren't using "reordered" in the names of the fields. 0001 is the previous patch 0002 changes addressing your and Bertrand's comments. -- Best Wishes, Ashutosh Bapat
Вложения
On Fri, Sep 19, 2025 at 8:11 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > On Fri, Sep 19, 2025 at 11:48 AM shveta malik <shveta.malik@gmail.com> wrote: > > > > On Thu, Sep 18, 2025 at 3:54 PM Ashutosh Bapat > > <ashutosh.bapat.oss@gmail.com> wrote: > > > > > > > > > > > > > > Few comments: > > > > > > > > 1) > > > > postgres=# select slot_name, > > > > total_bytes,plugin_filtered_bytes,plugin_sent_bytes from > > > > pg_stat_replication_slots order by slot_name; > > > > slot_name | total_bytes | plugin_filtered_bytes | plugin_sent_bytes > > > > -----------+-------------+-----------------------+------------------- > > > > slot1 | 800636 | 793188 | 211 > > > > sub1 | 401496 | 132712 | 84041 > > > > sub2 | 401496 | 396184 | 674 > > > > sub3 | 401496 | 145912 | 79959 > > > > (4 rows) > > > > > > > > Currently it looks quite confusing. 'total_bytes' gives a sense that > > > > it has to be a sum of filtered and sent. But they are no way like > > > > that. In the thread earlier there was a proposal to change the name to > > > > reordered_txns, reordered_bytes. That looks better to me. It will give > > > > clarity without even someone digging into docs. > > > > > > I also agree with that. But that will break backward compatibility. > > > > Yes, that it will do. > > > > > Do > > > you think other columns like spill_* and stream_* should also be > > > renamed with the prefix "reordered"? > > > > > > > Okay, I see that all fields in pg_stat_replication_slots are related > > to the ReorderBuffer. On reconsideration, I’m unsure whether it's > > appropriate to prefix all of them with reorderd_. For example, > > renaming spill_bytes and stream_bytes to reordered_spill_bytes and > > reordered_stream_bytes. These names start to feel overly long, and I > > also noticed that ReorderBuffer isn’t clearly defined anywhere in the > > documentation (or at least I couldn’t find it), even though the term > > 'reorder buffer' does appear in a few places. > > > > As an example, see ReorderBufferRead, ReorderBufferWrite wait-types > > at [1]. Also in plugin-doc [2], we use 'ReorderBufferTXN'. And now, we > > are adding: ReorderBufferChangeSize, ReorderBufferChange > > > > This gives me a feeling, will it be better to let > > pg_stat_replication_slots as is and add a brief ReorderBuffer section > > under Logical Decoding concepts [3] just before Output Plugins. And > > then, pg_stat_replication_slots can refer to that section, clarifying > > that the bytes, counts, and txn fields pertain to ReorderBuffer > > (without changing any of the fields). > > > > And then to define plugin related data, we can have a new view, say > > pg_stat_plugin_stats (as Amit suggested earlier) or > > pg_stat_replication_plugins. I understand that adding a new view might > > not be desirable, but it provides better clarity without requiring > > changes to the existing fields in pg_stat_replication_slots. I also > > strongly feel that to properly tie all this information together, a > > brief definition of the ReorderBuffer is needed. Other pages that > > reference this term can then point to that section. Thoughts? > > Even if we keep two views, when they are joined, users will still get > confused by total_* names. So it's not solving the underlying problem. Okay, I see your point. > Andres had raised the point about renaming total_* fields with me > off-list earlier. He suggested names total_wal_bytes, and > total_wal_txns in an off list discussion today. I think those convey > the true meaning - that these are txns and bytes that come from WAL. I agree. > Used those in the attached patches. Prefix reordered would give away > lower level details, so I didn't use it. > > I agree that it would be good to mention ReorderBuffer in the logical > decoding concepts section since it mentions structures ReorderBuffer*. > But that would be a separate patch since we aren't using "reordered" > in the names of the fields. Okay. > 0001 is the previous patch > 0002 changes addressing your and Bertrand's comments. > Few trivial comments: 1) Currently the doc says: sentTxns is the number of transactions sent downstream by the output plugin. sentBytes is the amount of data, in bytes, sent downstream by the output plugin. OutputPluginWrite will update this counter if ctx->stats is initialized by the output plugin. filteredBytes is the size of changes, in bytes, that are filtered out by the output plugin. Function ReorderBufferChangeSize may be used to find the size of filtered ReorderBufferChange. Shall we rearrange it to: sentTxns is the number of transactions sent downstream by the output plugin. sentBytes is the amount of data, in bytes, sent downstream by the output plugin. filteredBytes is the size of changes, in bytes, that are filtered out by the output plugin. OutputPluginWrite will update these counters if ctx->stats is initialized by the output plugin. The function ReorderBufferChangeSize can be used to compute the size of a filtered ReorderBufferChange, i.e., the filteredBytes. 2) My preference will be to rename the fields 'total_txns' and 'total_bytes' in PgStat_StatReplSlotEntry to 'total_wal_txns' and 'total_wal_bytes' for better clarity. Additionally, upon rethinking, it seems better to me that plugin-related fields are also named as plugin_* to clearly indicate their association. OTOH, in OutputPluginStats, the field names are fine as is, since the structure name itself clearly indicates these are plugin-related fields. PgStat_StatReplSlotEntry lacks such context and thus using full descriptive names there would improve clarity. 3) LogicalOutputWrite: + if (ctx->stats) + ctx->stats->sentBytes += ctx->out->len + sizeof(XLogRecPtr) + sizeof(TransactionId); p->returned_rows++; A blank line after the new change will increase readability. ~~ In my testing, the patch works as expected. Thanks! thanks Shveta
Hi, On Fri, Sep 19, 2025 at 08:11:23PM +0530, Ashutosh Bapat wrote: > On Fri, Sep 19, 2025 at 11:48 AM shveta malik <shveta.malik@gmail.com> wrote: > > > 0001 is the previous patch > 0002 changes addressing your and Bertrand's comments. Thanks for the new patch version! I did not look closely to the code yet but did some testing and I've one remark regarding plugin_filtered_bytes: It looks ok when a publication is doing rows filtering but when I: - create a table and use pg_logical_slot_get_changes with ('skip-empty-xacts', '0') then I see plugin_sent_bytes increasing (which makes sense). - create a table and use pg_logical_slot_get_changes with ('skip-empty-xacts', '1') then I don't see plugin_sent_bytes increasing (which makes sense) but I also don't see plugin_filtered_bytes increasing. I think that would make sense to also increase plugin_filtered_bytes in this case (and for the other options that would skip sending data). Thoughts? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Mon, Sep 22, 2025 at 10:44 AM shveta malik <shveta.malik@gmail.com> wrote: > > Few trivial comments: > > 1) > Currently the doc says: > > sentTxns is the number of transactions sent downstream by the output > plugin. sentBytes is the amount of data, in bytes, sent downstream by > the output plugin. OutputPluginWrite will update this counter if > ctx->stats is initialized by the output plugin. filteredBytes is the > size of changes, in bytes, that are filtered out by the output plugin. > Function ReorderBufferChangeSize may be used to find the size of > filtered ReorderBufferChange. > > Shall we rearrange it to: > > sentTxns is the number of transactions sent downstream by the output > plugin. sentBytes is the amount of data, in bytes, sent downstream by > the output plugin. filteredBytes is the size of changes, in bytes, > that are filtered out by the output plugin. OutputPluginWrite will > update these counters if ctx->stats is initialized by the output > plugin. > The function ReorderBufferChangeSize can be used to compute the size > of a filtered ReorderBufferChange, i.e., the filteredBytes. > Only sentBytes is incremented by OutputPluginWrite(), so saying that it will update counters is not correct. But I think you intend to keep description of all the fields together followed by any additional information. How about the following <literal>sentTxns</literal> is the number of transactions sent downstream by the output plugin. <literal>sentBytes</literal> is the amount of data, in bytes, sent downstream by the output plugin. <literal>filteredBytes</literal> is the size of changes, in bytes, that are filtered out by the output plugin. <function>OutputPluginWrite</function> will update <literal>sentBytes</literal> if <literal>ctx->stats</literal> is initialized by the output plugin. Function <literal>ReorderBufferChangeSize</literal> may be used to find the size of filtered <literal>ReorderBufferChange</literal>. > 2) > My preference will be to rename the fields 'total_txns' and > 'total_bytes' in PgStat_StatReplSlotEntry to 'total_wal_txns' and > 'total_wal_bytes' for better clarity. Additionally, upon rethinking, > it seems better to me that plugin-related fields are also named as > plugin_* to clearly indicate their association. OTOH, in > OutputPluginStats, the field names are fine as is, since the structure > name itself clearly indicates these are plugin-related fields. > PgStat_StatReplSlotEntry lacks such context and thus using full > descriptive names there would improve clarity. Ok. Done. > > 3) > LogicalOutputWrite: > + if (ctx->stats) > + ctx->stats->sentBytes += ctx->out->len + sizeof(XLogRecPtr) + > sizeof(TransactionId); > p->returned_rows++; > > A blank line after the new change will increase readability. > Ok. > ~~ > > In my testing, the patch works as expected. Thanks! Thanks for testing. Can we include any of your tests in the patch? Are the tests in patch enough? Applied those suggestions in my repository. Do you have any further review comments? -- Best Wishes, Ashutosh Bapat
On Tue, Sep 23, 2025 at 12:14 PM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > > Hi, > > On Fri, Sep 19, 2025 at 08:11:23PM +0530, Ashutosh Bapat wrote: > > On Fri, Sep 19, 2025 at 11:48 AM shveta malik <shveta.malik@gmail.com> wrote: > > > > > 0001 is the previous patch > > 0002 changes addressing your and Bertrand's comments. > > Thanks for the new patch version! > > I did not look closely to the code yet but did some testing and I've one remark > regarding plugin_filtered_bytes: It looks ok when a publication is doing rows > filtering but when I: > > - create a table and use pg_logical_slot_get_changes with ('skip-empty-xacts', '0') > then I see plugin_sent_bytes increasing (which makes sense). > > - create a table and use pg_logical_slot_get_changes with ('skip-empty-xacts', '1') > then I don't see plugin_sent_bytes increasing (which makes sense) but I also don't > see plugin_filtered_bytes increasing. I think that would make sense to also increase > plugin_filtered_bytes in this case (and for the other options that would skip > sending data). Thoughts? Thanks for bringing this up. I don't think we discussed this explicitly in the thread. The changes which are filtered out by the core itself e.g. changes to the catalogs or changes to other databases or changes from undesired origins are not added to the reorder buffer. They are not counted in total_bytes. The transactions containing only such changes are not added to reorder buffer, so even total_txns does not count such empty transactions. If we count these changes and transactions in plugin_filtered_bytes, and plugin_filtered_txns, that would create an anomaly - filtered counts being higher than total counts. Further since core does not add these changes and transactions to the reorder buffer, there is no way for a plugin to know about their existence and hence count them. Does that make sense? -- Best Wishes, Ashutosh Bapat
> 0001 is the previous patch > 0002 changes addressing your and Bertrand's comments. > @@ -1573,6 +1573,13 @@ WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid, /* output previously gathered data in a CopyData packet */ pq_putmessage_noblock(PqMsg_CopyData, ctx->out->data, ctx->out->len); + /* + * If output plugin maintains statistics, update the amount of data sent + * downstream. + */ + if (ctx->stats) + ctx->stats->sentBytes += ctx->out->len + 1; /* +1 for the 'd' */ + Just a small observation: I think it’s actually pq_flush_if_writable() that writes the buffered data to the socket, not pq_putmessage_noblock (which is actually gathering data in the buffer and not sending). So it might make more sense to increment the sent pointer after the call to pq_flush_if_writable(). Should we also consider - pg_hton32((uint32) (len + 4)); -- the additional 4 bytes of data added to the send buffer. -- With Regards, Ashutosh Sharma.
On Tue, Sep 23, 2025 at 4:06 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > On Mon, Sep 22, 2025 at 10:44 AM shveta malik <shveta.malik@gmail.com> wrote: > > > > Few trivial comments: > > > > 1) > > Currently the doc says: > > > > sentTxns is the number of transactions sent downstream by the output > > plugin. sentBytes is the amount of data, in bytes, sent downstream by > > the output plugin. OutputPluginWrite will update this counter if > > ctx->stats is initialized by the output plugin. filteredBytes is the > > size of changes, in bytes, that are filtered out by the output plugin. > > Function ReorderBufferChangeSize may be used to find the size of > > filtered ReorderBufferChange. > > > > Shall we rearrange it to: > > > > sentTxns is the number of transactions sent downstream by the output > > plugin. sentBytes is the amount of data, in bytes, sent downstream by > > the output plugin. filteredBytes is the size of changes, in bytes, > > that are filtered out by the output plugin. OutputPluginWrite will > > update these counters if ctx->stats is initialized by the output > > plugin. > > The function ReorderBufferChangeSize can be used to compute the size > > of a filtered ReorderBufferChange, i.e., the filteredBytes. > > > > Only sentBytes is incremented by OutputPluginWrite(), so saying that > it will update counters is not correct. But I think you intend to keep > description of all the fields together followed by any additional > information. How about the following > <literal>sentTxns</literal> is the number of transactions sent downstream > by the output plugin. <literal>sentBytes</literal> is the amount of data, > in bytes, sent downstream by the output plugin. > <literal>filteredBytes</literal> is the size of changes, in bytes, that > are filtered out by the output plugin. > <function>OutputPluginWrite</function> will update > <literal>sentBytes</literal> if <literal>ctx->stats</literal> is > initialized by the output plugin. Function > <literal>ReorderBufferChangeSize</literal> may be used to find the size of > filtered <literal>ReorderBufferChange</literal>. Yes, this looks good. > > > 2) > > My preference will be to rename the fields 'total_txns' and > > 'total_bytes' in PgStat_StatReplSlotEntry to 'total_wal_txns' and > > 'total_wal_bytes' for better clarity. Additionally, upon rethinking, > > it seems better to me that plugin-related fields are also named as > > plugin_* to clearly indicate their association. OTOH, in > > OutputPluginStats, the field names are fine as is, since the structure > > name itself clearly indicates these are plugin-related fields. > > PgStat_StatReplSlotEntry lacks such context and thus using full > > descriptive names there would improve clarity. > > Ok. Done. > > > > > 3) > > LogicalOutputWrite: > > + if (ctx->stats) > > + ctx->stats->sentBytes += ctx->out->len + sizeof(XLogRecPtr) + > > sizeof(TransactionId); > > p->returned_rows++; > > > > A blank line after the new change will increase readability. > > > > Ok. > > > ~~ > > > > In my testing, the patch works as expected. Thanks! > > Thanks for testing. Can we include any of your tests in the patch? Are > the tests in patch enough? I tested the flows with a) logical replication slot and get-changes. b) filtered data flows: pub-sub creation with row_filters, 'publish' options. I tried to verify plugin fields as compared to total_wal* fields. c) reset flow. While tests for a and c are present already. I don't see tests for b anywhere when it comes to stats. Do you think we shall add a test for filtered data using row-filter somewhere? > > Applied those suggestions in my repository. Do you have any further > review comments? > No, I think that is all. thanks Shveta
On Tue, Sep 23, 2025 at 6:28 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > > > 0001 is the previous patch > > 0002 changes addressing your and Bertrand's comments. > > > > @@ -1573,6 +1573,13 @@ WalSndWriteData(LogicalDecodingContext *ctx, > XLogRecPtr lsn, TransactionId xid, > /* output previously gathered data in a CopyData packet */ > pq_putmessage_noblock(PqMsg_CopyData, ctx->out->data, ctx->out->len); > > + /* > + * If output plugin maintains statistics, update the amount of data sent > + * downstream. > + */ > + if (ctx->stats) > + ctx->stats->sentBytes += ctx->out->len + 1; /* +1 for the 'd' */ > + > > Just a small observation: I think it’s actually pq_flush_if_writable() > that writes the buffered data to the socket, not pq_putmessage_noblock > (which is actually gathering data in the buffer and not sending). So > it might make more sense to increment the sent pointer after the call > to pq_flush_if_writable(). That's a good point. I placed it after pq_putmessage_noblock() so that it's easy to link the increment to sentBytes and the actual bytes being sent. You are right that the bytes won't be sent unless pq_flush_if_writable() is called but it will be called for sure before the next UpdateDecodingStats(). So the reported bytes are never wrong. I would prefer readability over seeming accuracy. > > Should we also consider - pg_hton32((uint32) (len + 4)); -- the > additional 4 bytes of data added to the send buffer. > In WalSndWriteData() we can't rely on what happens in a low level API like socket_putmessage(). And we are counting the number of bytes in the logically decoded message. So, I actually wonder whether we should count 1 byte of 'd' in sentBytes. Shveta, Bertand, what do you think? -- Best Wishes, Ashutosh Bapat
On Wed, Sep 24, 2025 at 11:08 AM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > On Tue, Sep 23, 2025 at 6:28 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > > > > > 0001 is the previous patch > > > 0002 changes addressing your and Bertrand's comments. > > > > > > > @@ -1573,6 +1573,13 @@ WalSndWriteData(LogicalDecodingContext *ctx, > > XLogRecPtr lsn, TransactionId xid, > > /* output previously gathered data in a CopyData packet */ > > pq_putmessage_noblock(PqMsg_CopyData, ctx->out->data, ctx->out->len); > > > > + /* > > + * If output plugin maintains statistics, update the amount of data sent > > + * downstream. > > + */ > > + if (ctx->stats) > > + ctx->stats->sentBytes += ctx->out->len + 1; /* +1 for the 'd' */ > > + > > > > Just a small observation: I think it’s actually pq_flush_if_writable() > > that writes the buffered data to the socket, not pq_putmessage_noblock > > (which is actually gathering data in the buffer and not sending). So > > it might make more sense to increment the sent pointer after the call > > to pq_flush_if_writable(). > > That's a good point. I placed it after pq_putmessage_noblock() so that > it's easy to link the increment to sentBytes and the actual bytes > being sent. You are right that the bytes won't be sent unless > pq_flush_if_writable() is called but it will be called for sure before > the next UpdateDecodingStats(). So the reported bytes are never wrong. > I would prefer readability over seeming accuracy. > > > > > Should we also consider - pg_hton32((uint32) (len + 4)); -- the > > additional 4 bytes of data added to the send buffer. > > > > In WalSndWriteData() we can't rely on what happens in a low level API > like socket_putmessage(). And we are counting the number of bytes in > the logically decoded message. So, I actually wonder whether we should > count 1 byte of 'd' in sentBytes. Shveta, Bertand, what do you think? > If we are not counting all such metadata bytes ((or can't reliably do so), then IMO, we shall skip counting msgtype as well. thanks Shveta
Hi, On Wed, Sep 24, 2025 at 11:38:30AM +0530, shveta malik wrote: > On Wed, Sep 24, 2025 at 11:08 AM Ashutosh Bapat > <ashutosh.bapat.oss@gmail.com> wrote: > > > > In WalSndWriteData() we can't rely on what happens in a low level API > > like socket_putmessage(). And we are counting the number of bytes in > > the logically decoded message. So, I actually wonder whether we should > > count 1 byte of 'd' in sentBytes. Shveta, Bertand, what do you think? > > > > If we are not counting all such metadata bytes ((or can't reliably do > so), then IMO, we shall skip counting msgtype as well. Agree. Maybe mention in the doc that metadata (including msgtype) bytes are not taken into account? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, On Tue, Sep 23, 2025 at 04:15:14PM +0530, Ashutosh Bapat wrote: > On Tue, Sep 23, 2025 at 12:14 PM Bertrand Drouvot > <bertranddrouvot.pg@gmail.com> wrote: > > > > - create a table and use pg_logical_slot_get_changes with ('skip-empty-xacts', '1') > > then I don't see plugin_sent_bytes increasing (which makes sense) but I also don't > > see plugin_filtered_bytes increasing. I think that would make sense to also increase > > plugin_filtered_bytes in this case (and for the other options that would skip > > sending data). Thoughts? > > Thanks for bringing this up. I don't think we discussed this > explicitly in the thread. The changes which are filtered out by the > core itself e.g. changes to the catalogs or changes to other databases > or changes from undesired origins are not added to the reorder buffer. > They are not counted in total_bytes. The transactions containing only > such changes are not added to reorder buffer, so even total_txns does > not count such empty transactions. If we count these changes and > transactions in plugin_filtered_bytes, and plugin_filtered_txns, that > would create an anomaly - filtered counts being higher than total > counts. Further since core does not add these changes and transactions > to the reorder buffer, there is no way for a plugin to know about > their existence and hence count them. Does that make sense? Yes. Do you think that the doc in the patch is clear enough regarding this point? I mean the doc looks correct (mentioning the output plugin) but would that make sense to insist that core filtering is not taken into account? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Wed, Sep 24, 2025 at 10:12 AM shveta malik <shveta.malik@gmail.com> wrote: > > I tested the flows with > a) logical replication slot and get-changes. > b) filtered data flows: pub-sub creation with row_filters, 'publish' > options. I tried to verify plugin fields as compared to total_wal* > fields. > c) reset flow. > > While tests for a and c are present already. I don't see tests for b > anywhere when it comes to stats. Do you think we shall add a test for > filtered data using row-filter somewhere? Added a test in 028_row_filter. Please find it in the attached patchset. I didn't find tests which test table level filtering or operation level filtering. Can you please point me to such tests. I will add similar test to other places. Once you review the test in 028_row_filter, I will replicate it to other places you point out. On Wed, Sep 24, 2025 at 12:12 PM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > > Hi, > > On Wed, Sep 24, 2025 at 11:38:30AM +0530, shveta malik wrote: > > On Wed, Sep 24, 2025 at 11:08 AM Ashutosh Bapat > > <ashutosh.bapat.oss@gmail.com> wrote: > > > > > > In WalSndWriteData() we can't rely on what happens in a low level API > > > like socket_putmessage(). And we are counting the number of bytes in > > > the logically decoded message. So, I actually wonder whether we should > > > count 1 byte of 'd' in sentBytes. Shveta, Bertand, what do you think? > > > > > > > If we are not counting all such metadata bytes ((or can't reliably do > > so), then IMO, we shall skip counting msgtype as well. > > Agree. Maybe mention in the doc that metadata (including msgtype) bytes are not > taken into account? We are counting the sentBytes in central places through which all the logically decoded messages flow. So we are not missing on any metadata bytes. Given that these bytes are part of the logically decoded message itself, I think we should count them in the sentBytes. Now the question remains is whether to count 4 bytes for length in the message itself? The logical decoding code can not control that and thus should not account for it. So I am leaving bytes counted for pg_hton32((uint32) (len + 4)) out of sentBytes calculation. -- Best Wishes, Ashutosh Bapat
Вложения
On Wed, Sep 24, 2025 at 12:32 PM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > > Hi, > > On Tue, Sep 23, 2025 at 04:15:14PM +0530, Ashutosh Bapat wrote: > > On Tue, Sep 23, 2025 at 12:14 PM Bertrand Drouvot > > <bertranddrouvot.pg@gmail.com> wrote: > > > > > > - create a table and use pg_logical_slot_get_changes with ('skip-empty-xacts', '1') > > > then I don't see plugin_sent_bytes increasing (which makes sense) but I also don't > > > see plugin_filtered_bytes increasing. I think that would make sense to also increase > > > plugin_filtered_bytes in this case (and for the other options that would skip > > > sending data). Thoughts? > > > > Thanks for bringing this up. I don't think we discussed this > > explicitly in the thread. The changes which are filtered out by the > > core itself e.g. changes to the catalogs or changes to other databases > > or changes from undesired origins are not added to the reorder buffer. > > They are not counted in total_bytes. The transactions containing only > > such changes are not added to reorder buffer, so even total_txns does > > not count such empty transactions. If we count these changes and > > transactions in plugin_filtered_bytes, and plugin_filtered_txns, that > > would create an anomaly - filtered counts being higher than total > > counts. Further since core does not add these changes and transactions > > to the reorder buffer, there is no way for a plugin to know about > > their existence and hence count them. Does that make sense? > > Yes. Do you think that the doc in the patch is clear enough regarding this point? > I mean the doc looks correct (mentioning the output plugin) but would that make > sense to insist that core filtering is not taken into account? Do you mean, should we mention in the docs that core filtering is not taken into account? I would question whether that's called filtering at all, in the context of logical decoding. The view should be read in the context of logical decoding. For example, we aren't mentioning that total_bytes does not include changes from other database. -- Best Wishes, Ashutosh Bapat
Hi, On Wed, Sep 24, 2025 at 12:51:29PM +0530, Ashutosh Bapat wrote: > On Wed, Sep 24, 2025 at 12:32 PM Bertrand Drouvot > <bertranddrouvot.pg@gmail.com> wrote: > > > > - create a table and use pg_logical_slot_get_changes with ('skip-empty-xacts', '1') > > > > then I don't see plugin_sent_bytes increasing (which makes sense) but I also don't > > > > see plugin_filtered_bytes increasing. I think that would make sense to also increase > > > > plugin_filtered_bytes in this case (and for the other options that would skip > > > > sending data). Thoughts? > > > > > > Thanks for bringing this up. I don't think we discussed this > > > explicitly in the thread. The changes which are filtered out by the > > > core itself e.g. changes to the catalogs or changes to other databases > > > or changes from undesired origins are not added to the reorder buffer. > > > They are not counted in total_bytes. The transactions containing only > > > such changes are not added to reorder buffer, so even total_txns does > > > not count such empty transactions. If we count these changes and > > > transactions in plugin_filtered_bytes, and plugin_filtered_txns, that > > > would create an anomaly - filtered counts being higher than total > > > counts. Further since core does not add these changes and transactions > > > to the reorder buffer, there is no way for a plugin to know about > > > their existence and hence count them. Does that make sense? > > > > Yes. Do you think that the doc in the patch is clear enough regarding this point? > > I mean the doc looks correct (mentioning the output plugin) but would that make > > sense to insist that core filtering is not taken into account? > > Do you mean, should we mention in the docs that core filtering is not > taken into account? > I would question whether that's called filtering > at all, in the context of logical decoding. The view should be read in > the context of logical decoding. For example, we aren't mentioning > that total_bytes does not include changes from other database. Right. But, in the example above, do you consider "skip-empty-xacts" as "core" or "plugin" filtering? It's an option part of the "test_decoding" plugin, so it's the plugin choice to not display empty xacts (should the option be set accordingly). Then should it be reported in plugin_filtered_bytes? (one could write a plugin, decide to skip/filter empty xacts or whatever in the plugin callbacks: should that be reported as plugin_filtered_bytes?) Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Wed, Sep 24, 2025 at 12:47 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > On Wed, Sep 24, 2025 at 10:12 AM shveta malik <shveta.malik@gmail.com> wrote: > > > > I tested the flows with > > a) logical replication slot and get-changes. > > b) filtered data flows: pub-sub creation with row_filters, 'publish' > > options. I tried to verify plugin fields as compared to total_wal* > > fields. > > c) reset flow. > > > > While tests for a and c are present already. I don't see tests for b > > anywhere when it comes to stats. Do you think we shall add a test for > > filtered data using row-filter somewhere? > > Added a test in 028_row_filter. Please find it in the attached > patchset. Test looks good. > I didn't find tests which test table level filtering or > operation level filtering. Can you please point me to such tests. I > will add similar test to other places. Once you review the test in > 028_row_filter, I will replicate it to other places you point out. > I can see a few tests of operation level filtering present in 'subscription/t/001_rep_changes.pl' and 'subscription/t/010_truncate.pl' > On Wed, Sep 24, 2025 at 12:12 PM Bertrand Drouvot > <bertranddrouvot.pg@gmail.com> wrote: > > > > Hi, > > > > On Wed, Sep 24, 2025 at 11:38:30AM +0530, shveta malik wrote: > > > On Wed, Sep 24, 2025 at 11:08 AM Ashutosh Bapat > > > <ashutosh.bapat.oss@gmail.com> wrote: > > > > > > > > In WalSndWriteData() we can't rely on what happens in a low level API > > > > like socket_putmessage(). And we are counting the number of bytes in > > > > the logically decoded message. So, I actually wonder whether we should > > > > count 1 byte of 'd' in sentBytes. Shveta, Bertand, what do you think? > > > > > > > > > > If we are not counting all such metadata bytes ((or can't reliably do > > > so), then IMO, we shall skip counting msgtype as well. > > > > Agree. Maybe mention in the doc that metadata (including msgtype) bytes are not > > taken into account? > > We are counting the sentBytes in central places through which all the > logically decoded messages flow. So we are not missing on any metadata > bytes. Given that these bytes are part of the logically decoded > message itself, I think we should count them in the sentBytes. Now the > question remains is whether to count 4 bytes for length in the message > itself? The logical decoding code can not control that and thus should > not account for it. So I am leaving bytes counted for > pg_hton32((uint32) (len + 4)) out of sentBytes calculation. > -- Okay. thanks Shveta
On Wed, Sep 24, 2025 at 1:55 PM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > > Hi, > > On Wed, Sep 24, 2025 at 12:51:29PM +0530, Ashutosh Bapat wrote: > > On Wed, Sep 24, 2025 at 12:32 PM Bertrand Drouvot > > <bertranddrouvot.pg@gmail.com> wrote: > > > > > - create a table and use pg_logical_slot_get_changes with ('skip-empty-xacts', '1') > > > > > then I don't see plugin_sent_bytes increasing (which makes sense) but I also don't > > > > > see plugin_filtered_bytes increasing. I think that would make sense to also increase > > > > > plugin_filtered_bytes in this case (and for the other options that would skip > > > > > sending data). Thoughts? > > > > > > > > Thanks for bringing this up. I don't think we discussed this > > > > explicitly in the thread. The changes which are filtered out by the > > > > core itself e.g. changes to the catalogs or changes to other databases > > > > or changes from undesired origins are not added to the reorder buffer. > > > > They are not counted in total_bytes. The transactions containing only > > > > such changes are not added to reorder buffer, so even total_txns does > > > > not count such empty transactions. If we count these changes and > > > > transactions in plugin_filtered_bytes, and plugin_filtered_txns, that > > > > would create an anomaly - filtered counts being higher than total > > > > counts. Further since core does not add these changes and transactions > > > > to the reorder buffer, there is no way for a plugin to know about > > > > their existence and hence count them. Does that make sense? > > > > > > Yes. Do you think that the doc in the patch is clear enough regarding this point? > > > I mean the doc looks correct (mentioning the output plugin) but would that make > > > sense to insist that core filtering is not taken into account? > > > > Do you mean, should we mention in the docs that core filtering is not > > taken into account? > > I would question whether that's called filtering > > at all, in the context of logical decoding. The view should be read in > > the context of logical decoding. For example, we aren't mentioning > > that total_bytes does not include changes from other database. > > Right. But, in the example above, do you consider "skip-empty-xacts" as "core" > or "plugin" filtering? > > It's an option part of the "test_decoding" plugin, so it's the plugin choice to > not display empty xacts (should the option be set accordingly). Then should it > be reported in plugin_filtered_bytes? (one could write a plugin, decide to > skip/filter empty xacts or whatever in the plugin callbacks: should that be > reported as plugin_filtered_bytes?) If a transaction becomes empty because the plugin filtered all the changes then plugin_filtered_bytes will be incremented by the amount of filtered changes. If the transaction was empty because core didn't send any of the changes to the output plugin, there was nothing filtered by the output plugin so plugin_filtered_bytes will not be affected. skip_empty_xacts controls whether BEGIN and COMMIT are sent for an empty transaction or not. It does not filter "changes". It affects "sent_bytes". -- Best Wishes, Ashutosh Bapat
On Wed, Sep 24, 2025 at 2:38 PM shveta malik <shveta.malik@gmail.com> wrote: > > On Wed, Sep 24, 2025 at 12:47 PM Ashutosh Bapat > <ashutosh.bapat.oss@gmail.com> wrote: > > > > On Wed, Sep 24, 2025 at 10:12 AM shveta malik <shveta.malik@gmail.com> wrote: > > > > > > I tested the flows with > > > a) logical replication slot and get-changes. > > > b) filtered data flows: pub-sub creation with row_filters, 'publish' > > > options. I tried to verify plugin fields as compared to total_wal* > > > fields. > > > c) reset flow. > > > > > > While tests for a and c are present already. I don't see tests for b > > > anywhere when it comes to stats. Do you think we shall add a test for > > > filtered data using row-filter somewhere? > > > > Added a test in 028_row_filter. Please find it in the attached > > patchset. > > Test looks good. Thanks. Added to three more files. I think we have covered all the cases where filtering can occur. PFA patches. -- Best Wishes, Ashutosh Bapat
Вложения
Hi, On Wed, Sep 24, 2025 at 03:37:07PM +0530, Ashutosh Bapat wrote: > On Wed, Sep 24, 2025 at 1:55 PM Bertrand Drouvot > <bertranddrouvot.pg@gmail.com> wrote: > > Right. But, in the example above, do you consider "skip-empty-xacts" as "core" > > or "plugin" filtering? > > > > It's an option part of the "test_decoding" plugin, so it's the plugin choice to > > not display empty xacts (should the option be set accordingly). Then should it > > be reported in plugin_filtered_bytes? (one could write a plugin, decide to > > skip/filter empty xacts or whatever in the plugin callbacks: should that be > > reported as plugin_filtered_bytes?) > > If a transaction becomes empty because the plugin filtered all the > changes then plugin_filtered_bytes will be incremented by the amount > of filtered changes. If the transaction was empty because core didn't > send any of the changes to the output plugin, there was nothing > filtered by the output plugin so plugin_filtered_bytes will not be > affected. > > skip_empty_xacts controls whether BEGIN and COMMIT are sent for an > empty transaction or not. It does not filter "changes". It affects > "sent_bytes". skip_empty_xacts was just an example. I mean a plugin could decide to filter all the inserts for example (not saying it makes sense). But I think we'are saying the same: say a plugin wants to filter the inserts then it's its responsability to increment ctx->stats->filteredBytes in its "change_cb" callback for the REORDER_BUFFER_CHANGE_INSERT action, right? If so, I wonder if it would make sense to provide an example in the test_decoding plugin (I can see it's done for pgoutput but that might sound more natural to look in contrib if one is searching for an example). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, On Wed, Sep 24, 2025 at 05:28:44PM +0530, Ashutosh Bapat wrote: > On Wed, Sep 24, 2025 at 2:38 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > On Wed, Sep 24, 2025 at 12:47 PM Ashutosh Bapat > > <ashutosh.bapat.oss@gmail.com> wrote: > > > > > > On Wed, Sep 24, 2025 at 10:12 AM shveta malik <shveta.malik@gmail.com> wrote: > > > > > > > > I tested the flows with > > > > a) logical replication slot and get-changes. > > > > b) filtered data flows: pub-sub creation with row_filters, 'publish' > > > > options. I tried to verify plugin fields as compared to total_wal* > > > > fields. > > > > c) reset flow. > > > > > > > > While tests for a and c are present already. I don't see tests for b > > > > anywhere when it comes to stats. Do you think we shall add a test for > > > > filtered data using row-filter somewhere? > > > > > > Added a test in 028_row_filter. Please find it in the attached > > > patchset. > > > > Test looks good. > > Thanks. Added to three more files. I think we have covered all the > cases where filtering can occur. > > PFA patches. Thanks for the new version! A few random comments: === 1 + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>plugin_filtered_bytes</structfield> <type>bigint</type> + </para> + <para> + Amount of changes, from <structfield>total_wal_bytes</structfield>, filtered + out by the output plugin and not sent downstream. Please note that it + does not include the changes filtered before a change is sent to + the output plugin, e.g. the changes filtered by origin. The count is + maintained by the output plugin mentioned in + <structfield>plugin</structfield>. I found "The count" somehow ambiguous. What about "This statistic" instead? === 2 + subtransactions. These transactions are subset of transctions sent to s/transctions/transactions === 3 + the decoding plugin. Hence this count is expected to be lesser than or s/be lesser/be less/? (not 100% sure) === 4 +extern Size ReorderBufferChangeSize(ReorderBufferChange *change); Another approach could be to pass the change's size as an argument to the callbacks? That would avoid to expose ReorderBufferChangeSize publicly. === 5 ctx->output_plugin_private = data; + ctx->stats = palloc0(sizeof(OutputPluginStats)); I was wondering if we need to free this in pg_decode_shutdown, but it looks like it's done through FreeDecodingContext() anyway. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Wed, Sep 24, 2025 at 5:28 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > On Wed, Sep 24, 2025 at 2:38 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > On Wed, Sep 24, 2025 at 12:47 PM Ashutosh Bapat > > <ashutosh.bapat.oss@gmail.com> wrote: > > > > > > On Wed, Sep 24, 2025 at 10:12 AM shveta malik <shveta.malik@gmail.com> wrote: > > > > > > > > I tested the flows with > > > > a) logical replication slot and get-changes. > > > > b) filtered data flows: pub-sub creation with row_filters, 'publish' > > > > options. I tried to verify plugin fields as compared to total_wal* > > > > fields. > > > > c) reset flow. > > > > > > > > While tests for a and c are present already. I don't see tests for b > > > > anywhere when it comes to stats. Do you think we shall add a test for > > > > filtered data using row-filter somewhere? > > > > > > Added a test in 028_row_filter. Please find it in the attached > > > patchset. > > > > Test looks good. > > Thanks. Added to three more files. I think we have covered all the > cases where filtering can occur. > Yes. The test looks good now. thanks Shveta
On Wed, Sep 24, 2025 at 6:43 PM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > > Hi, > > On Wed, Sep 24, 2025 at 03:37:07PM +0530, Ashutosh Bapat wrote: > > On Wed, Sep 24, 2025 at 1:55 PM Bertrand Drouvot > > <bertranddrouvot.pg@gmail.com> wrote: > > > Right. But, in the example above, do you consider "skip-empty-xacts" as "core" > > > or "plugin" filtering? > > > > > > It's an option part of the "test_decoding" plugin, so it's the plugin choice to > > > not display empty xacts (should the option be set accordingly). Then should it > > > be reported in plugin_filtered_bytes? (one could write a plugin, decide to > > > skip/filter empty xacts or whatever in the plugin callbacks: should that be > > > reported as plugin_filtered_bytes?) > > > > If a transaction becomes empty because the plugin filtered all the > > changes then plugin_filtered_bytes will be incremented by the amount > > of filtered changes. If the transaction was empty because core didn't > > send any of the changes to the output plugin, there was nothing > > filtered by the output plugin so plugin_filtered_bytes will not be > > affected. > > > > skip_empty_xacts controls whether BEGIN and COMMIT are sent for an > > empty transaction or not. It does not filter "changes". It affects > > "sent_bytes". > > skip_empty_xacts was just an example. I mean a plugin could decide to filter all > the inserts for example (not saying it makes sense). But I think we'are saying the > same: say a plugin wants to filter the inserts then it's its responsability to > increment ctx->stats->filteredBytes in its "change_cb" callback for the > REORDER_BUFFER_CHANGE_INSERT action, right? Right. > If so, I wonder if it would make > sense to provide an example in the test_decoding plugin (I can see it's done > for pgoutput but that might sound more natural to look in contrib if one is > searching for an example). test_decoding does not make use of publication at all. Publication controls filtering and so test_decoding does not have any examples of filtering code. Doesn't make sense to add code to manipulate filteredBytes there. -- Best Wishes, Ashutosh Bapat
On Wed, Sep 24, 2025 at 8:11 PM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > > Hi, > > On Wed, Sep 24, 2025 at 05:28:44PM +0530, Ashutosh Bapat wrote: > > On Wed, Sep 24, 2025 at 2:38 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > > > On Wed, Sep 24, 2025 at 12:47 PM Ashutosh Bapat > > > <ashutosh.bapat.oss@gmail.com> wrote: > > > > > > > > On Wed, Sep 24, 2025 at 10:12 AM shveta malik <shveta.malik@gmail.com> wrote: > > > > > > > > > > I tested the flows with > > > > > a) logical replication slot and get-changes. > > > > > b) filtered data flows: pub-sub creation with row_filters, 'publish' > > > > > options. I tried to verify plugin fields as compared to total_wal* > > > > > fields. > > > > > c) reset flow. > > > > > > > > > > While tests for a and c are present already. I don't see tests for b > > > > > anywhere when it comes to stats. Do you think we shall add a test for > > > > > filtered data using row-filter somewhere? > > > > > > > > Added a test in 028_row_filter. Please find it in the attached > > > > patchset. > > > > > > Test looks good. > > > > Thanks. Added to three more files. I think we have covered all the > > cases where filtering can occur. > > > > PFA patches. > > Thanks for the new version! > > A few random comments: > > === 1 > > + <row> > + <entry role="catalog_table_entry"><para role="column_definition"> > + <structfield>plugin_filtered_bytes</structfield> <type>bigint</type> > + </para> > + <para> > + Amount of changes, from <structfield>total_wal_bytes</structfield>, filtered > + out by the output plugin and not sent downstream. Please note that it > + does not include the changes filtered before a change is sent to > + the output plugin, e.g. the changes filtered by origin. The count is > + maintained by the output plugin mentioned in > + <structfield>plugin</structfield>. > > I found "The count" somehow ambiguous. What about "This statistic" instead? Existing fields use term "The counter". Changed "The count" to "The counter". > > === 2 > > + subtransactions. These transactions are subset of transctions sent to > > s/transctions/transactions Done. > > === 3 > > + the decoding plugin. Hence this count is expected to be lesser than or > > s/be lesser/be less/? (not 100% sure) Less than is correct. Fixed. > > === 4 > > +extern Size ReorderBufferChangeSize(ReorderBufferChange *change); > > Another approach could be to pass the change's size as an argument to the > callbacks? That would avoid to expose ReorderBufferChangeSize publicly. Do you see any problem in exposing ReorderBufferChangeSize(). It's a pretty small function and may be quite handy to output plugins otherwise as well. And we expose many ReorderBuffer related functions; so this isn't the first. If we were to do as you say, it will change other external facing APIs like change_cb(). Output plugins will need to change their code accordingly even when they don't want to support plugin statistics. Given that we have made maintaining plugin statistics optional, forcing API change does not make sense. For example, test_decoding which does not filter anything would unnecessarily have to change its code. I considered adding a field size to ReorderBufferChange itself. But that means we increase the amount of memory used in the reorder buffer, which seems to have become prime estate these days. So rejected that idea as well. Advantage of this change is that the minimal cost of calculating the size and maintaining the code change is incurred only when filtering happens, by the plugins which want to filter and maintain statistics. > > === 5 > > ctx->output_plugin_private = data; > + ctx->stats = palloc0(sizeof(OutputPluginStats)); > > I was wondering if we need to free this in pg_decode_shutdown, but it looks > like it's done through FreeDecodingContext() anyway. That's correct. Even output_plugin_private is freed when the decoding memory context is freed. Thanks for the review comments. I have addressed the comments in my repository and the changes will be included in the next set of patches. Do you have any further review comments? -- Best Wishes, Ashutosh Bapat
Hi, On Thu, Sep 25, 2025 at 10:16:35AM +0530, Ashutosh Bapat wrote: > On Wed, Sep 24, 2025 at 8:11 PM Bertrand Drouvot > <bertranddrouvot.pg@gmail.com> wrote: > > > > === 4 > > > > +extern Size ReorderBufferChangeSize(ReorderBufferChange *change); > > > > Another approach could be to pass the change's size as an argument to the > > callbacks? That would avoid to expose ReorderBufferChangeSize publicly. > > Do you see any problem in exposing ReorderBufferChangeSize(). It's a > pretty small function and may be quite handy to output plugins > otherwise as well. And we expose many ReorderBuffer related functions; > so this isn't the first. Right. I don't see a problem per say, just thinking that the less we expose publicly to be used by extensions/plugins, the better. > If we were to do as you say, it will change other external facing APIs > like change_cb(). Output plugins will need to change their code > accordingly even when they don't want to support plugin statistics. Correct. > Given that we have made maintaining plugin statistics optional, > forcing API change does not make sense. For example, test_decoding > which does not filter anything would unnecessarily have to change its > code. That's right. > I considered adding a field size to ReorderBufferChange itself. But > that means we increase the amount of memory used in the reorder > buffer, which seems to have become prime estate these days. So > rejected that idea as well. > > Advantage of this change is that the minimal cost of calculating the > size and maintaining the code change is incurred only when filtering > happens, by the plugins which want to filter and maintain statistics. Yes, anyway as it's unlikely that we have to fix a bug in a minor release that would need a signature change to ReorderBufferChangeSize(), I think that's fine as proposed. > > > > === 5 > > > > ctx->output_plugin_private = data; > > + ctx->stats = palloc0(sizeof(OutputPluginStats)); > > > > I was wondering if we need to free this in pg_decode_shutdown, but it looks > > like it's done through FreeDecodingContext() anyway. > > That's correct. Even output_plugin_private is freed when the decoding > memory context is freed. > > Thanks for the review comments. I have addressed the comments in my > repository and the changes will be included in the next set of > patches. Thanks! > Do you have any further review comments? Not right now. I'll give it another look by early next week the latest. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, On Thu, Sep 25, 2025 at 01:01:55PM +0000, Bertrand Drouvot wrote: > Hi, > > On Thu, Sep 25, 2025 at 10:16:35AM +0530, Ashutosh Bapat wrote: > > Do you have any further review comments? > > Not right now. I'll give it another look by early next week the latest. > === 1 @@ -173,6 +173,7 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt, data->only_local = false; ctx->output_plugin_private = data; + ctx->stats = palloc0(sizeof(OutputPluginStats)); I was not sure where it's allocated, but looking at: Breakpoint 1, pg_decode_startup (ctx=0x1ba853a0, opt=0x1ba85478, is_init=false) at test_decoding.c:164 164 bool enable_streaming = false; (gdb) n 166 data = palloc0(sizeof(TestDecodingData)); (gdb) 167 data->context = AllocSetContextCreate(ctx->context, (gdb) 170 data->include_xids = true; (gdb) 171 data->include_timestamp = false; (gdb) 172 data->skip_empty_xacts = false; (gdb) 173 data->only_local = false; (gdb) 175 ctx->output_plugin_private = data; (gdb) 176 ctx->stats = palloc0(sizeof(OutputPluginStats)); (gdb) 178 opt->output_type = OUTPUT_PLUGIN_TEXTUAL_OUTPUT; (gdb) p CurrentMemoryContext $7 = (MemoryContext) 0x1ba852a0 (gdb) p (*CurrentMemoryContext).name $8 = 0xe4057d "Logical decoding context" (gdb) p ctx->context $9 = (MemoryContext) 0x1ba852a0 I can see that CurrentMemoryContext is "ctx->context" so the palloc0 done here are done in the right context. === 2 Playing with "has stats" a bit. -- Issue 1: Say, plugin has stats enabled and I get: postgres=# select plugin,plugin_sent_txns from pg_stat_replication_slots ; plugin | plugin_sent_txns ----------------+------------------ pg_commit_info | 9 (1 row) If the engine is shutdown and the plugin is now replaced by a version that does not provide stats, then, right after startup, I still get: postgres=# select plugin,plugin_sent_txns from pg_stat_replication_slots ; plugin | plugin_sent_txns ----------------+------------------ pg_commit_info | 9 (1 row) And that will be the case until the plugin decodes something (so that statent->plugin_has_stats gets replaced in pgstat_report_replslot()). That's because plugin_has_stats is stored in PgStat_StatReplSlotEntry and so it's restored from the stat file when the engine starts. Now, let's do some inserts and decode: postgres=# insert into t1 values ('a'); INSERT 0 1 postgres=# insert into t1 values ('a'); INSERT 0 1 postgres=# select * from pg_logical_slot_get_changes('logical_slot',NULL,NULL); lsn | xid | data ------------+-----+----------------------------------------------------------------------------------------- 0/407121C0 | 766 | xid 766: lsn:0/40712190 inserts:1 deletes:0 updates:0 truncates:0 relations truncated:0 0/40712268 | 767 | xid 767: lsn:0/40712238 inserts:1 deletes:0 updates:0 truncates:0 relations truncated:0 (2 rows) postgres=# select plugin,plugin_sent_txns from pg_stat_replication_slots ; plugin | plugin_sent_txns ----------------+------------------ pg_commit_info | (1 row) All good. Issue 1 is that before any decoding happens, pg_stat_replication_slots is still showing stale plugin statistics from a plugin that may no longer support stats. I'm not sure how we could easily fix this issue, as we don't know the plugin's stats capability until we actually use it. -- Issue 2: Let's shutdown, replace the plugin with a version that has stats enabled and restart. Same behavior as before: postgres=# select plugin,plugin_sent_txns from pg_stat_replication_slots ; plugin | plugin_sent_txns ----------------+------------------ pg_commit_info | (1 row) Until pgstat_report_replslot() is not called, the statent->plugin_has_stats is not updated. So it displays the stats as they were before the shutdown. But that's not an issue in this case (when switching from non stats to stats). Now, let's do some inserts and decode: postgres=# insert into t1 values ('a'); INSERT 0 1 postgres=# select * from pg_logical_slot_get_changes('logical_slot',NULL,NULL); lsn | xid | data ------------+-----+----------------------------------------------------------------------------------------- 0/407125B0 | 768 | xid 768: lsn:0/40712580 inserts:1 deletes:0 updates:0 truncates:0 relations truncated:0 (1 row) and check the stats: postgres=# select plugin,plugin_sent_txns from pg_stat_replication_slots ; plugin | plugin_sent_txns ----------------+------------------ pg_commit_info | 10 (1 row) Now it reports 10, that's the 9 before we changed the plugin to not have stats enabled plus this new one. Issue 2: when switching from a non-stats plugin back to a stats-capable plugin, it shows accumulated values from before the non-stats switch. PFA attached a proposal to fix Issue 2. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Вложения
On Fri, Sep 26, 2025 at 4:43 PM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > > > === 2 > > Playing with "has stats" a bit. > > -- Issue 1: > Thanks for experiments! Thanks for bringing it up. > Say, plugin has stats enabled and I get: > > postgres=# select plugin,plugin_sent_txns from pg_stat_replication_slots ; > plugin | plugin_sent_txns > ----------------+------------------ > pg_commit_info | 9 > (1 row) > > If the engine is shutdown and the plugin is now replaced by a version that > does not provide stats, then, right after startup, I still get: > > postgres=# select plugin,plugin_sent_txns from pg_stat_replication_slots ; > plugin | plugin_sent_txns > ----------------+------------------ > pg_commit_info | 9 > (1 row) > > And that will be the case until the plugin decodes something (so that > statent->plugin_has_stats gets replaced in pgstat_report_replslot()). > > That's because plugin_has_stats is stored in PgStat_StatReplSlotEntry > and so it's restored from the stat file when the engine starts. > > Now, let's do some inserts and decode: > > postgres=# insert into t1 values ('a'); > INSERT 0 1 > postgres=# insert into t1 values ('a'); > INSERT 0 1 > postgres=# select * from pg_logical_slot_get_changes('logical_slot',NULL,NULL); > lsn | xid | data > ------------+-----+----------------------------------------------------------------------------------------- > 0/407121C0 | 766 | xid 766: lsn:0/40712190 inserts:1 deletes:0 updates:0 truncates:0 relations truncated:0 > 0/40712268 | 767 | xid 767: lsn:0/40712238 inserts:1 deletes:0 updates:0 truncates:0 relations truncated:0 > (2 rows) > > postgres=# select plugin,plugin_sent_txns from pg_stat_replication_slots ; > plugin | plugin_sent_txns > ----------------+------------------ > pg_commit_info | > (1 row) > > All good. > > Issue 1 is that before any decoding happens, pg_stat_replication_slots is still > showing stale plugin statistics from a plugin that may no longer support stats. > > I'm not sure how we could easily fix this issue, as we don't know the plugin's > stats capability until we actually use it. > I don't think this is an issue. There is no way for the core to tell whether the plugin will provide stats or not, unless it sets that ctx->stats which happens in the startup callback. Till then it is rightly providing the values accumulated so far. Once the decoding starts, we know that the plugin is not providing any stats and we don't display anything. > -- Issue 2: > > Let's shutdown, replace the plugin with a version that has stats enabled and > restart. > > Same behavior as before: > > postgres=# select plugin,plugin_sent_txns from pg_stat_replication_slots ; > plugin | plugin_sent_txns > ----------------+------------------ > pg_commit_info | > (1 row) > > Until pgstat_report_replslot() is not called, the statent->plugin_has_stats is > not updated. So it displays the stats as they were before the shutdown. But that's > not an issue in this case (when switching from non stats to stats). > > Now, let's do some inserts and decode: > > postgres=# insert into t1 values ('a'); > INSERT 0 1 > postgres=# select * from pg_logical_slot_get_changes('logical_slot',NULL,NULL); > lsn | xid | data > ------------+-----+----------------------------------------------------------------------------------------- > 0/407125B0 | 768 | xid 768: lsn:0/40712580 inserts:1 deletes:0 updates:0 truncates:0 relations truncated:0 > (1 row) > > and check the stats: > > postgres=# select plugin,plugin_sent_txns from pg_stat_replication_slots ; > plugin | plugin_sent_txns > ----------------+------------------ > pg_commit_info | 10 > (1 row) > > Now it reports 10, that's the 9 before we changed the plugin to not have stats > enabled plus this new one. > > Issue 2: when switching from a non-stats plugin back to a stats-capable plugin, it > shows accumulated values from before the non-stats switch. This too seems to be a non-issue to me. The stats in the view get reset only when a user resets them. So we shouldn't wipe out the already accumulated values just because the plugin stopped providing it. If the plugin keeps flip-flopping and only partial statistics provided by the plugin will be accumulated. That's the plugin's responsibility. Realistically a plugin will either decide to provide statistics in some version and then continue forever OR it will decide against it. Flip-flopping won't happen in practice. If at all we decide to reset the stats when the plugin does not provide them, I think a better fix is to set them to 0 in pgstat_report_replslot() independent of previous state of has_stats. It will be more or less same CPU instructions. like below if (repSlotStat->plugin_has_stats) { REPLSLOT_ACC(plugin_sent_txns); REPLSLOT_ACC(plugin_sent_bytes); REPLSLOT_ACC(plugin_filtered_bytes); } else { statent->plugin_sent_txns = 0; statent->plugin_sent_bytes = 0; statent->plugin_filtered_bytes = 0 } -- Best Wishes, Ashutosh Bapat
Hi, On Fri, Sep 26, 2025 at 06:14:28PM +0530, Ashutosh Bapat wrote: > On Fri, Sep 26, 2025 at 4:43 PM Bertrand Drouvot > <bertranddrouvot.pg@gmail.com> wrote: > > > > > > === 2 > > > > Issue 1 is that before any decoding happens, pg_stat_replication_slots is still > > showing stale plugin statistics from a plugin that may no longer support stats. > > > > I'm not sure how we could easily fix this issue, as we don't know the plugin's > > stats capability until we actually use it. > > > > I don't think this is an issue. There is no way for the core to tell > whether the plugin will provide stats or not, unless it sets that > ctx->stats which happens in the startup callback. Till then it is > rightly providing the values accumulated so far. Once the decoding > starts, we know that the plugin is not providing any stats and we > don't display anything. Yeah, I got the technical reasons, but I think there's a valid user experience concern here: seeing statistics for a plugin that doesn't actually support statistics is misleading. What we need is a call to pgstat_report_replslot() to display stats that reflect the current plugin behavior. We can't just call pgstat_report_replslot() in say RestoreSlotFromDisk() because we really need the decoding to start. So one idea could be to set a flag (per slot) when pgstat_report_replslot() has been called (for good reasons) and check for this flag in pg_stat_get_replication_slot(). If the flag is not set, then set the plugin fields to NULL. If the flag is set, then display their values (like now). And we should document that the plugin stats are not available (i.e are NULL) until the decoding has valid stats to report after startup. What do you think? > > > -- Issue 2: > > > > Now it reports 10, that's the 9 before we changed the plugin to not have stats > > enabled plus this new one. > > > > Issue 2: when switching from a non-stats plugin back to a stats-capable plugin, it > > shows accumulated values from before the non-stats switch. > > This too seems to be a non-issue to me. The stats in the view get > reset only when a user resets them. So we shouldn't wipe out the > already accumulated values just because the plugin stopped providing > it. If the plugin keeps flip-flopping and only partial statistics > provided by the plugin will be accumulated. That's the plugin's > responsibility. Okay but then I think that the plugin is missing some flexibility. For example, how could the plugin set ctx->stats->sentTxns to zero if it decides not to enable stats (while it was previously enable)? Indeed, not enabling stats, means not doing "ctx->stats = palloc0(sizeof(OutputPluginStats))" which means not having control over the stats anymore. So, with the current design, it has not other choice but having its previous stats not reset to zero. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Fri, Sep 26, 2025 at 10:28 PM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > > > > > > > > I don't think this is an issue. There is no way for the core to tell > > whether the plugin will provide stats or not, unless it sets that > > ctx->stats which happens in the startup callback. Till then it is > > rightly providing the values accumulated so far. Once the decoding > > starts, we know that the plugin is not providing any stats and we > > don't display anything. > > Yeah, I got the technical reasons, but I think there's a valid user experience > concern here: seeing statistics for a plugin that doesn't actually support > statistics is misleading. > 1. If the plugin never supported statistics, we will never report stats. So nothing misleading there. 2. If the plugin starts supporting statistics and continues to do so, we will report the stats since the time they are made available and continue to do so. Nothing misleading there. 3. If the plugin starts supporting statistics and midway discontinues its support, it already has a problem with backward compatibility. Practically it would 1 or 2, which are working fine. I don't think we will encounter case 3 practically. Do you have a practical use case where a plugin would discontinue supporting stats? Even in case 3, I think we need to consider the fact that these stats are "cumulative". So if a plugin discontinues reporting stats, they should go NULL only when the next accumulation action happens, not before that. > What we need is a call to pgstat_report_replslot() to display stats that reflect > the current plugin behavior. We can't just call pgstat_report_replslot() > in say RestoreSlotFromDisk() because we really need the decoding to start. > > So one idea could be to set a flag (per slot) when pgstat_report_replslot() > has been called (for good reasons) and check for this flag in > pg_stat_get_replication_slot(). > > If the flag is not set, then set the plugin fields to NULL. > If the flag is set, then display their values (like now). This approach will have the same problem. Till pgstat_report_replslot() is called, the old statistics will continue to be shown. > > And we should document that the plugin stats are not available (i.e are NULL) > until the decoding has valid stats to report after startup. After "startup" would mislead users since then they will think that the statistics will be NULL just before the decoding (re)starts. The current documentation is " It is NULL when statistics is not initialized or immediately after a reset or when not maintained by the output plugin.". I think that covers all the cases. -- Best Wishes, Ashutosh Bapat
Hi, On Mon, Sep 29, 2025 at 12:54:24PM +0530, Ashutosh Bapat wrote: > On Fri, Sep 26, 2025 at 10:28 PM Bertrand Drouvot > <bertranddrouvot.pg@gmail.com> wrote: > > > > > > > > > > > > I don't think this is an issue. There is no way for the core to tell > > > whether the plugin will provide stats or not, unless it sets that > > > ctx->stats which happens in the startup callback. Till then it is > > > rightly providing the values accumulated so far. Once the decoding > > > starts, we know that the plugin is not providing any stats and we > > > don't display anything. > > > > Yeah, I got the technical reasons, but I think there's a valid user experience > > concern here: seeing statistics for a plugin that doesn't actually support > > statistics is misleading. > > > > 3. If the plugin starts supporting statistics and midway discontinues > its support, it already has a problem with backward compatibility. > > Practically it would 1 or 2, which are working fine. > > I don't think we will encounter case 3 practically. Do you have a > practical use case where a plugin would discontinue supporting stats? Not that I can think of currently. That looks unlikely but wanted to raise the point though. Maybe others see a use case and/or have a different point of view. > > What we need is a call to pgstat_report_replslot() to display stats that reflect > > the current plugin behavior. We can't just call pgstat_report_replslot() > > in say RestoreSlotFromDisk() because we really need the decoding to start. > > > > So one idea could be to set a flag (per slot) when pgstat_report_replslot() > > has been called (for good reasons) and check for this flag in > > pg_stat_get_replication_slot(). > > > > If the flag is not set, then set the plugin fields to NULL. > > If the flag is set, then display their values (like now). > > This approach will have the same problem. Till > pgstat_report_replslot() is called, the old statistics will continue > to be shown. I don't think so because the flag would not be set. > > And we should document that the plugin stats are not available (i.e are NULL) > > until the decoding has valid stats to report after startup. > > The current documentation is " It is NULL when statistics is not > initialized or immediately after a reset or when not maintained by the > output plugin.". I think that covers all the cases. Do you think the doc covers the case we discussed above? i.e when a plugin discontinue supporting stats, it would display stats until the decoding actually starts. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com