Обсуждение: Report reorder buffer size
Hi All, The question of how to tune logical_decoding_work_mem has come up a few times in the context of customer cases. If it is too low then the changes are spilled to disk, slowing down replication and consuming disk space. In case of streamed transactions, it won't consume space on primary but will consume resources (transactional resources/disk space) on replica. If decoding memory is too high, it may cause memory pressure especially when there are many wal senders and large amounts of changes queued to reorder buffers frequently. In order to tune logical_decoding_work_mem optimally we need a measure of the size of changes belonging to the transactions queued in the reorder buffer, whether spilled to the disk or in memory or sent downstream as part of an unfinished streamed transaction. Currently there is no way to determine that. pg_stat_replication_slots reports cumulative statistics about spilled or streamed transactions. It helps to know whether the logical decoding work memory is sufficient or not to hold all the changes. But it does not provide a way to estimate an optimal value for it since it doesn't capture the state of the reorder buffer at a given point in time. With the attached WIP patch, we can report, at a given point in time, total size of reorder buffer i.e. sum of the size of all the changes belonging to the transactions active in the reorder buffer whether those changes are in memory, on disk or sent downstream as part of an unfinished streamed transaction. By sampling this value at regular intervals, one can observe the reorder buffer trendline as shown in the plots attached. In the plots, the Y-axis is the total size of reorder buffer and the X axis is time of day. rbsize_trendlines_pgbench.png shows trendline when just regular pgbench is run with 3K TPS and rbsize_trendlines_large_transaction_spikes.png shows the trendline when there are long and large transactions in addition to pgbench. Using these trendlines, a user may decide to keep logical_decoding_work_mem to minimum and thus let all the large transactions spill to disk or be streamed. Or they may decide to set it as 100MB, 200MB or 500MB reducing or eliminating the disk spill or streaming. It makes it easy to quantify the trade-offs in tuning logical_decoding_work_mem. Tracking total size of reorder buffer also helps to quantify load on a wal sender. It is an additional metric that can be useful to debug WAL buildup, slow replication etc. For example, the plot clearly and directly shows when the wal sender started processing the large transaction, when it finished etc. The patch is not complete. There are many things to be worked on 1. Whether the metrics should be reported in pg_stat_replication or pg_stat_replication_slots. Given that it's a point-in-time metric, it fits pg_stat_replication better. But that view does not report logical decoding activity happening in backends other than the wal senders. But that's true with the other metrics in that view as well. pg_stat_replication_slots, however, covers all users of logical decoding. For now I have used pg_stat_replication, but it can be changed based on the discussion here. 2. Is the size of the reorder buffer enough or we want to also track the size of changes on disk and the size of changes sent downstream as part of unfinished streamed transactions separately? Also the number of transactions being tracked by the reorder buffer? If we are going to report so much statistics about the contents of the reorder buffer, is it better to have a separate view pg_stat_reorder_buffer for the same? 3. The patch doesn't cover all the places where the reorder buffer's total size should be maintained. I need to find all such places and add required maths. But before I work on those, I would like to know others' opinions, usefulness and acceptance. FWIW, attached is a tap test to test the changes -- Best Wishes, Ashutosh Bapat
Вложения
Hi, On Wed, Aug 13, 2025 at 06:09:47PM +0530, Ashutosh Bapat wrote: > Hi All, > The question of how to tune logical_decoding_work_mem has come up a > few times in the context of customer cases. Same here. > If it is too low then the > changes are spilled to disk, slowing down replication and consuming > disk space. In case of streamed transactions, it won't consume space > on primary but will consume resources (transactional resources/disk > space) on replica. If decoding memory is too high, it may cause memory > pressure especially when there are many wal senders and large amounts > of changes queued to reorder buffers frequently. > > In order to tune logical_decoding_work_mem optimally we need a measure > of the size of changes belonging to the transactions queued in the > reorder buffer, whether spilled to the disk or in memory or sent > downstream as part of an unfinished streamed transaction. Fully agree, that's something I started to work on some time ago. Not only to tune the logical_decoding_work_mem but also to help diagnose issues with high memory consumption and/or OOM due to logical decoding. > Currently > there is no way to determine that. pg_stat_replication_slots reports > cumulative statistics about spilled or streamed transactions. It helps > to know whether the logical decoding work memory is sufficient or not > to hold all the changes. But it does not provide a way to estimate an > optimal value for it since it doesn't capture the state of the reorder > buffer at a given point in time. Agree. > With the attached WIP patch, we can report, at a given point in time, > total size of reorder buffer i.e. sum of the size of all the changes > belonging to the transactions active in the reorder buffer whether > those changes are in memory, on disk or sent downstream as part of an > unfinished streamed transaction. By sampling this value at regular > intervals, one can observe the reorder buffer trendline as shown in > the plots attached. In the plots, the Y-axis is the total size of > reorder buffer and the X axis is time of day. > rbsize_trendlines_pgbench.png shows trendline when just regular > pgbench is run with 3K TPS and > rbsize_trendlines_large_transaction_spikes.png shows the trendline > when there are long and large transactions in addition to pgbench. > Using these trendlines, a user may decide to keep > logical_decoding_work_mem to minimum and thus let all the large > transactions spill to disk or be streamed. Right. Also, the issue with spill files is that when they are read back from disk then the reorder buffer can consume a lot of memory. That's why I think tracking the xid, sub xid and LSNs could be useful too. That could help diagnose why it was using that memory. Attached is what I did prepare a couple of years ago. I did not share it until now because it "only" logs information in the log file and I wanted to do more. It's far from being polished and I'm not sure it's 100% correct. I share it just as food for thought about what information I thought could be useful to log. It applies on top of yours. > Tracking total size of reorder buffer also helps to quantify load on a > wal sender. Yeah, and memory being used. > It is an additional metric that can be useful to debug WAL > buildup, slow replication etc. For example, the plot clearly and > directly shows when the wal sender started processing the large > transaction, when it finished etc. Agree that's useful info to have. > The patch is not complete. There are many things to be worked on > 1. Whether the metrics should be reported in pg_stat_replication or > pg_stat_replication_slots. Given that it's a point-in-time metric, it > fits pg_stat_replication better. I do think the same. > But that view does not report logical > decoding activity happening in backends other than the wal senders. > But that's true with the other metrics in that view as well. > pg_stat_replication_slots, however, covers all users of logical > decoding. For now I have used pg_stat_replication, but it can be > changed based on the discussion here. I think pg_stat_replication is a good place to record "real time" activities. Maybe we could log 2 metrics: the reorder buffer size before creating the spill files and the reorder buffer size while reading back the spill files? That would help make the distinction when logical_decoding_work_mem is involved (i.e before creating the spill files) and when it is not (reading them back). pg_stat_replication_slots could be used to add some counters too: like the number of times logical_decoding_work_mem has been reached while decoding from that slot. Also maybe record the max value the reorder buffer reached and the associated xid, number of sub xids, and LSN? That could be useful to diagnose high memory consumption and/or OOM issues due to logical decoding. Also, should we log some of information to the server log? > But before I work on those, I would like to know others' opinions, > usefulness and acceptance. Thanks for working on it. I like the idea and also think that is an area where more details/metrics should be provided. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Вложения
Hi Bertrand, Thanks for your response. I am glad that you have found the proposal to be generally useful. On Thu, Aug 14, 2025 at 3:50 PM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > > > Using these trendlines, a user may decide to keep > > logical_decoding_work_mem to minimum and thus let all the large > > transactions spill to disk or be streamed. > > Right. Also, the issue with spill files is that when they are read back from disk > then the reorder buffer can consume a lot of memory. That's why I think tracking > the xid, sub xid and LSNs could be useful too. That could help diagnose why > it was using that memory. When the replication is stuck or is slow or WAL builds up a detailed look at the reorder buffer's contents would be useful. It will need to be done outside of pg_stat_replication or pg_stat_replication_slots. I think it will be useful to provide a function which will trigger a given WAL sender to dump the information about the contents of the reorder buffer to the server error log or send them to the requesting client. > > I think pg_stat_replication is a good place to record "real time" activities. > Ok. Thanks for confirming. > Maybe we could log 2 metrics: the reorder buffer size before creating the spill > files and the reorder buffer size while reading back the spill files? That would > help make the distinction when logical_decoding_work_mem is involved (i.e before > creating the spill files) and when it is not (reading them back). > > pg_stat_replication_slots could be used to add some counters too: like the > number of times logical_decoding_work_mem has been reached while decoding from > that slot. I think pg_stat_replication_slots::spill_count and pg_stat_replication_slots::stream_count give that value. > Also maybe record the max value the reorder buffer reached and > the associated xid, number of sub xids, and LSN? That could be useful to diagnose > high memory consumption and/or OOM issues due to logical decoding. A function which dumps the information about reorder buffers can be used along with the trendline for this purpose. For example, if a user notices WAL accumulation and an upward trend in the metric exposed by my patch, they can trigger a dump of reorder buffer contents at that time and diagnose the problem and even fix it proactively. > > Also, should we log some of information to the server log? > Information in server error logs is hard to access and analyse, especially in cloud environment. Having a SQL construct is better. > > But before I work on those, I would like to know others' opinions, > > usefulness and acceptance. > > Thanks for working on it. I like the idea and also think that is an area where > more details/metrics should be provided. Thanks. -- Best Wishes, Ashutosh Bapat
Hi Ashutosh, On Thu, Aug 21, 2025 at 07:26:41PM +0530, Ashutosh Bapat wrote: > Hi Bertrand, > Thanks for your response. I am glad that you have found the proposal > to be generally useful. > > On Thu, Aug 14, 2025 at 3:50 PM Bertrand Drouvot > <bertranddrouvot.pg@gmail.com> wrote: > > > > > Using these trendlines, a user may decide to keep > > > logical_decoding_work_mem to minimum and thus let all the large > > > transactions spill to disk or be streamed. > > > > Right. Also, the issue with spill files is that when they are read back from disk > > then the reorder buffer can consume a lot of memory. That's why I think tracking > > the xid, sub xid and LSNs could be useful too. That could help diagnose why > > it was using that memory. > > When the replication is stuck or is slow or WAL builds up a detailed > look at the reorder buffer's contents would be useful. It will need to > be done outside of pg_stat_replication or pg_stat_replication_slots. I > think it will be useful to provide a function which will trigger a > given WAL sender to dump the information about the contents of the > reorder buffer to the server error log or send them to the requesting > client. Maybe we could add such functionalities in contrib/pg_logicalinspect? > > Maybe we could log 2 metrics: the reorder buffer size before creating the spill > > files and the reorder buffer size while reading back the spill files? That would > > help make the distinction when logical_decoding_work_mem is involved (i.e before > > creating the spill files) and when it is not (reading them back). > > > > pg_stat_replication_slots could be used to add some counters too: like the > > number of times logical_decoding_work_mem has been reached while decoding from > > that slot. > > I think pg_stat_replication_slots::spill_count and > pg_stat_replication_slots::stream_count give that value. Yes as long as subtrans are not involved. But if say I've one transaction made of 1500 subtrans, I'd get something like: slot_name | spill_count | stream_count | total_txns --------------+-------------+--------------+------------ logical_slot | 1501 | 0 | 1 So we don't know how many times logical_decoding_work_mem has been reached ( except by looking at total_txns). But as soon as another transaction (that does not involve spill) is decoded: slot_name | spill_count | stream_count | total_txns --------------+-------------+--------------+------------ logical_slot | 1501 | 0 | 2 Then we don't know if logical_decoding_work_mem has been reached one or two times. > > Also maybe record the max value the reorder buffer reached and > > the associated xid, number of sub xids, and LSN? That could be useful to diagnose > > high memory consumption and/or OOM issues due to logical decoding. > > A function which dumps the information about reorder buffers can be > used along with the trendline for this purpose. For example, if a user > notices WAL accumulation and an upward trend in the metric exposed by > my patch, they can trigger a dump of reorder buffer contents at that > time and diagnose the problem and even fix it proactively. Right. That would work for cases where the issue is occuring right now and the system is still available (OOM not triggered for example). > > Also, should we log some of information to the server log? > > > > Information in server error logs is hard to access and analyse, > especially in cloud environment. Having a SQL construct is better. That's right as long as the system is still available. I think a mix of log and SQL API help more use cases: real time, historical and when the system is back to a usable state (if it was not anymore). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Fri, Aug 22, 2025 at 11:43 AM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > > Hi Ashutosh, > > On Thu, Aug 21, 2025 at 07:26:41PM +0530, Ashutosh Bapat wrote: > > Hi Bertrand, > > Thanks for your response. I am glad that you have found the proposal > > to be generally useful. > > > > On Thu, Aug 14, 2025 at 3:50 PM Bertrand Drouvot > > <bertranddrouvot.pg@gmail.com> wrote: > > > > > > > Using these trendlines, a user may decide to keep > > > > logical_decoding_work_mem to minimum and thus let all the large > > > > transactions spill to disk or be streamed. > > > > > > Right. Also, the issue with spill files is that when they are read back from disk > > > then the reorder buffer can consume a lot of memory. That's why I think tracking > > > the xid, sub xid and LSNs could be useful too. That could help diagnose why > > > it was using that memory. > > > > When the replication is stuck or is slow or WAL builds up a detailed > > look at the reorder buffer's contents would be useful. It will need to > > be done outside of pg_stat_replication or pg_stat_replication_slots. I > > think it will be useful to provide a function which will trigger a > > given WAL sender to dump the information about the contents of the > > reorder buffer to the server error log or send them to the requesting > > client. > > Maybe we could add such functionalities in contrib/pg_logicalinspect? I don't have an opinion about where to place this. But if place it in the core, there are higher chances that it will be trusted and used. > > > > Maybe we could log 2 metrics: the reorder buffer size before creating the spill > > > files and the reorder buffer size while reading back the spill files? That would > > > help make the distinction when logical_decoding_work_mem is involved (i.e before > > > creating the spill files) and when it is not (reading them back). > > > > > > pg_stat_replication_slots could be used to add some counters too: like the > > > number of times logical_decoding_work_mem has been reached while decoding from > > > that slot. > > > > I think pg_stat_replication_slots::spill_count and > > pg_stat_replication_slots::stream_count give that value. > > Yes as long as subtrans are not involved. But if say I've one transaction made of > 1500 subtrans, I'd get something like: > > slot_name | spill_count | stream_count | total_txns > --------------+-------------+--------------+------------ > logical_slot | 1501 | 0 | 1 > > So we don't know how many times logical_decoding_work_mem has been reached ( > except by looking at total_txns). > > But as soon as another transaction (that does not involve spill) is decoded: > > slot_name | spill_count | stream_count | total_txns > --------------+-------------+--------------+------------ > logical_slot | 1501 | 0 | 2 > > Then we don't know if logical_decoding_work_mem has been reached one or two > times. I didn't know this is how it works. Thanks for correcting me. In that case, I think we should add a column in pg_stat_replication_slots reporting the number of times the memory limit is reached since the last reset. I am +0.5 on it being useful. > > > > Also maybe record the max value the reorder buffer reached and > > > the associated xid, number of sub xids, and LSN? That could be useful to diagnose > > > high memory consumption and/or OOM issues due to logical decoding. > > > > A function which dumps the information about reorder buffers can be > > used along with the trendline for this purpose. For example, if a user > > notices WAL accumulation and an upward trend in the metric exposed by > > my patch, they can trigger a dump of reorder buffer contents at that > > time and diagnose the problem and even fix it proactively. > > Right. That would work for cases where the issue is occuring right now and > the system is still available (OOM not triggered for example). > > > > Also, should we log some of information to the server log? > > > > > > > Information in server error logs is hard to access and analyse, > > especially in cloud environment. Having a SQL construct is better. > > That's right as long as the system is still available. I think a mix of log > and SQL API help more use cases: real time, historical and when the system is > back to a usable state (if it was not anymore). The way I envision this is that users will sample the views periodically or when certain thresholds (amount of WAL, size of reorder buffers) are crossed. That way they can have historical data in an easy-to-query manner when the actual incident occurs. -- Best Wishes, Ashutosh Bapat
Hi, On Mon, Aug 25, 2025 at 01:48:42PM +0530, Ashutosh Bapat wrote: > On Fri, Aug 22, 2025 at 11:43 AM Bertrand Drouvot > <bertranddrouvot.pg@gmail.com> wrote: > > > > Yes as long as subtrans are not involved. But if say I've one transaction made of > > 1500 subtrans, I'd get something like: > > > > slot_name | spill_count | stream_count | total_txns > > --------------+-------------+--------------+------------ > > logical_slot | 1501 | 0 | 1 > > > > So we don't know how many times logical_decoding_work_mem has been reached ( > > except by looking at total_txns). > > > > But as soon as another transaction (that does not involve spill) is decoded: > > > > slot_name | spill_count | stream_count | total_txns > > --------------+-------------+--------------+------------ > > logical_slot | 1501 | 0 | 2 > > > > Then we don't know if logical_decoding_work_mem has been reached one or two > > times. > > I didn't know this is how it works. Thanks for correcting me. In that > case, I think we should > add a column in pg_stat_replication_slots reporting the number of > times the memory limit is reached since the last reset. I am +0.5 on > it being useful. Thanks. With this in place one could get a ratio like total_txns/total_txns_that_exceeds. That could help to see if reaching logical_decoding_work_mem is rare or frequent enough. If frequent, then maybe there is a need to adjust logical_decoding_work_mem. > > That's right as long as the system is still available. I think a mix of log > > and SQL API help more use cases: real time, historical and when the system is > > back to a usable state (if it was not anymore). > > The way I envision this is that users will sample the views > periodically or when certain thresholds (amount of WAL, size of > reorder buffers) are crossed. That way they can have historical data > in an easy-to-query manner when the actual incident occurs. Yeah. OTOH, having this information in the log could also help users that did not think about sampling the views and hit an incident. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, On Wed, Aug 13, 2025 at 5:40 AM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > Hi All, > The question of how to tune logical_decoding_work_mem has come up a > few times in the context of customer cases. If it is too low then the > changes are spilled to disk, slowing down replication and consuming > disk space. In case of streamed transactions, it won't consume space > on primary but will consume resources (transactional resources/disk > space) on replica. If decoding memory is too high, it may cause memory > pressure especially when there are many wal senders and large amounts > of changes queued to reorder buffers frequently. Thank you for starting the thread about this topic. This is one of the topics that I've been eager to address. Here are some random comments: > > In order to tune logical_decoding_work_mem optimally we need a measure > of the size of changes belonging to the transactions queued in the > reorder buffer, whether spilled to the disk or in memory or sent > downstream as part of an unfinished streamed transaction. Currently > there is no way to determine that. pg_stat_replication_slots reports > cumulative statistics about spilled or streamed transactions. It helps > to know whether the logical decoding work memory is sufficient or not > to hold all the changes. But it does not provide a way to estimate an > optimal value for it since it doesn't capture the state of the reorder > buffer at a given point in time. > > With the attached WIP patch, we can report, at a given point in time, > total size of reorder buffer i.e. sum of the size of all the changes > belonging to the transactions active in the reorder buffer whether > those changes are in memory, on disk or sent downstream as part of an > unfinished streamed transaction. By sampling this value at regular > intervals, one can observe the reorder buffer trendline as shown in > the plots attached. Alternative idea (or an additional metric) would be the high-watermark of memory usage. That way, users won't miss memory usage spikes that might be missed by sampling the total memory usage. > Tracking total size of reorder buffer also helps to quantify load on a > wal sender. It is an additional metric that can be useful to debug WAL > buildup, slow replication etc. For example, the plot clearly and > directly shows when the wal sender started processing the large > transaction, when it finished etc. Agreed. > The patch is not complete. There are many things to be worked on > 1. Whether the metrics should be reported in pg_stat_replication or > pg_stat_replication_slots. Given that it's a point-in-time metric, it > fits pg_stat_replication better. But that view does not report logical > decoding activity happening in backends other than the wal senders. > But that's true with the other metrics in that view as well. > pg_stat_replication_slots, however, covers all users of logical > decoding. For now I have used pg_stat_replication, but it can be > changed based on the discussion here. > > 2. Is the size of the reorder buffer enough or we want to also track > the size of changes on disk and the size of changes sent downstream as > part of unfinished streamed transactions separately? Also the number > of transactions being tracked by the reorder buffer? For the purpose of tuning logical_decoding_work_mem, the additional metrics and statistics we need might not be many. But in order to make logical decoding more visible for users for debugging or diagnosing purposes, more statistics like the number of transactions being tracked by the reorder buffer might be required. We need to note that the actual size of changes sent downstream actually depends on logical decoding plugins. For instance, we have table, row, and column filters in logical replication cases. It might be worth considering providing such statistics too. The statistics like the number of changes filtered out by the table filters or the change-kind filter (e.g., only publishing INSERTs etc) might be helpful for users to confirm the effectiveness of the filters they set. > If we are going > to report so much statistics about the contents of the reorder buffer, > is it better to have a separate view pg_stat_reorder_buffer for the > same? Given logical decoding can be used also by regular backend processes, I guess that such dynamic metrics would fit a system view dedicated to logical decoding, say pg_stat_reorder_buffer or pg_stat_logical_decoding. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
On Mon, Aug 25, 2025 at 6:28 PM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > > I didn't know this is how it works. Thanks for correcting me. In that > > case, I think we should > > add a column in pg_stat_replication_slots reporting the number of > > times the memory limit is reached since the last reset. I am +0.5 on > > it being useful. > > Thanks. With this in place one could get a ratio like total_txns/total_txns_that_exceeds. > That could help to see if reaching logical_decoding_work_mem is rare or > frequent enough. If frequent, then maybe there is a need to adjust logical_decoding_work_mem. > We are discussing some improvements to pg_stat_replication_slots over at [1]. Would you be able to create a patch reporting the number of times logical_decoding_work_mem is reached, in the patchset being discussed there? [1] https://www.postgresql.org/message-id/CAExHW5s6KntzUyUoMbKR5dgwRmdV2Ay_2+AnTgYGAzo=Qv61wA@mail.gmail.com -- Best Wishes, Ashutosh Bapat
Hi Masahiko, Thanks for your inputs. On Tue, Aug 26, 2025 at 2:17 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > On Wed, Aug 13, 2025 at 5:40 AM Ashutosh Bapat > <ashutosh.bapat.oss@gmail.com> wrote: > > Thank you for starting the thread about this topic. This is one of the > topics that I've been eager to address. Here are some random comments: > Glad that more people think there is a need for improving the reorder buffer statistics. > > > > With the attached WIP patch, we can report, at a given point in time, > > total size of reorder buffer i.e. sum of the size of all the changes > > belonging to the transactions active in the reorder buffer whether > > those changes are in memory, on disk or sent downstream as part of an > > unfinished streamed transaction. By sampling this value at regular > > intervals, one can observe the reorder buffer trendline as shown in > > the plots attached. > > Alternative idea (or an additional metric) would be the high-watermark > of memory usage. That way, users won't miss memory usage spikes that > might be missed by sampling the total memory usage. I think the trendline is important to make right trade-offs in case setting logical_decoding_work_mem to maximum is not possible. With a high watermark the question is how long does a high remain high? As the load changes, a high that was once may become lower or higher than the new high and soon irrelevant. Probably we could reset the high-watermark every time the view is sampled, so that we provide a trendline for the high-water mark as well. But I want to be cautious about adding that complexity of tracking maxima accurately and then maintain it forever. If sampling is frequent enough usually it will capture a maxima and minima good enough for practical purposes. The users will need to consider the trendline as approximate anyway since the load will show slight variations over the time. Please share your idea of reporting high-watermark. > > > > 2. Is the size of the reorder buffer enough or we want to also track > > the size of changes on disk and the size of changes sent downstream as > > part of unfinished streamed transactions separately? Also the number > > of transactions being tracked by the reorder buffer? > > For the purpose of tuning logical_decoding_work_mem, the additional > metrics and statistics we need might not be many. But in order to make > logical decoding more visible for users for debugging or diagnosing > purposes, more statistics like the number of transactions being > tracked by the reorder buffer might be required. > Ok. We could add the total number of transactions in the reorder buffer at a given point in time to the report easily. How about subtransactions? How about prepared but not committed/aborted transactions? > We need to note that the actual size of changes sent downstream > actually depends on logical decoding plugins. For instance, we have > table, row, and column filters in logical replication cases. It might > be worth considering providing such statistics too. The statistics > like the number of changes filtered out by the table filters or the > change-kind filter (e.g., only publishing INSERTs etc) might be > helpful for users to confirm the effectiveness of the filters they > set. I have proposed this in [1]. Please check. > > > If we are going > > to report so much statistics about the contents of the reorder buffer, > > is it better to have a separate view pg_stat_reorder_buffer for the > > same? > > Given logical decoding can be used also by regular backend processes, > I guess that such dynamic metrics would fit a system view dedicated to > logical decoding, say pg_stat_reorder_buffer or > pg_stat_logical_decoding. Hmm. That means we will have to reserve some area in the shared memory with as many slots as the number of replication slots (since that is the maximum number of reorder buffers that can be in the system) and use each one to maintain the stats. I haven't yet checked whether we could use the stats maintaining system for it, but it should be doable. -- Best Wishes, Ashutosh Bapat
Hi, On Tue, Aug 26, 2025 at 02:56:14PM +0530, Ashutosh Bapat wrote: > On Mon, Aug 25, 2025 at 6:28 PM Bertrand Drouvot > <bertranddrouvot.pg@gmail.com> wrote: > > > I didn't know this is how it works. Thanks for correcting me. In that > > > case, I think we should > > > add a column in pg_stat_replication_slots reporting the number of > > > times the memory limit is reached since the last reset. I am +0.5 on > > > it being useful. > > > > Thanks. With this in place one could get a ratio like total_txns/total_txns_that_exceeds. > > That could help to see if reaching logical_decoding_work_mem is rare or > > frequent enough. If frequent, then maybe there is a need to adjust logical_decoding_work_mem. > > > > We are discussing some improvements to pg_stat_replication_slots over > at [1]. Would you be able to create a patch reporting the number of > times logical_decoding_work_mem is reached, in the patchset being > discussed there? > > [1] https://www.postgresql.org/message-id/CAExHW5s6KntzUyUoMbKR5dgwRmdV2Ay_2+AnTgYGAzo=Qv61wA@mail.gmail.com Thanks for pointing me to this thread, I'll look at it and propose a patch to report the number of times logical_decoding_work_mem is reached. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
> > > > > > If we are going > > > to report so much statistics about the contents of the reorder buffer, > > > is it better to have a separate view pg_stat_reorder_buffer for the > > > same? > > > > Given logical decoding can be used also by regular backend processes, > > I guess that such dynamic metrics would fit a system view dedicated to > > logical decoding, say pg_stat_reorder_buffer or > > pg_stat_logical_decoding. > > Hmm. That means we will have to reserve some area in the shared memory > with as many slots as the number of replication slots (since that is > the maximum number of reorder buffers that can be in the system) and > use each one to maintain the stats. I haven't yet checked whether we > could use the stats maintaining system for it, but it should be > doable. I think the beauty of reporting the statistics only for WAL senders, as is done with the patch, was that we could piggyback stats update on current logic to update WalSnd. If we want to do it for other backends, it would be an extra overhead to lock and update the stats outside WalSnd. And as I said in my email, WAL senders would be arguably only serious users of logical decoding whose usage will be worth monitoring - that's why pg_stat_replication reports only stats about WAL senders and not other backends. Even if we report stats about only WAL senders, it should be possible to report them through a view other than pg_stat_replication. What's your opinion about this? -- Best Wishes, Ashutosh Bapat
Hi, On Tue, Aug 26, 2025 at 09:48:04AM +0000, Bertrand Drouvot wrote: > Hi, > > On Tue, Aug 26, 2025 at 02:56:14PM +0530, Ashutosh Bapat wrote: > > On Mon, Aug 25, 2025 at 6:28 PM Bertrand Drouvot > > <bertranddrouvot.pg@gmail.com> wrote: > > > > I didn't know this is how it works. Thanks for correcting me. In that > > > > case, I think we should > > > > add a column in pg_stat_replication_slots reporting the number of > > > > times the memory limit is reached since the last reset. I am +0.5 on > > > > it being useful. > > > > > > Thanks. With this in place one could get a ratio like total_txns/total_txns_that_exceeds. > > > That could help to see if reaching logical_decoding_work_mem is rare or > > > frequent enough. If frequent, then maybe there is a need to adjust logical_decoding_work_mem. > > > > > > > We are discussing some improvements to pg_stat_replication_slots over > > at [1]. Would you be able to create a patch reporting the number of > > times logical_decoding_work_mem is reached, in the patchset being > > discussed there? > > > > [1] https://www.postgresql.org/message-id/CAExHW5s6KntzUyUoMbKR5dgwRmdV2Ay_2+AnTgYGAzo=Qv61wA@mail.gmail.com > > Thanks for pointing me to this thread, I'll look at it and propose a patch > to report the number of times logical_decoding_work_mem is reached. I preferred to create a dedicated thread for it [1] (I think both may need dedicated discussions). [1]: https://www.postgresql.org/message-id/flat/aK6zNcmb%2BPpySFvG%40ip-10-97-1-34.eu-west-3.compute.internal Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Tue, Aug 26, 2025 at 2:45 AM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > Hi Masahiko, > Thanks for your inputs. > > On Tue, Aug 26, 2025 at 2:17 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > On Wed, Aug 13, 2025 at 5:40 AM Ashutosh Bapat > > <ashutosh.bapat.oss@gmail.com> wrote: > > > > Thank you for starting the thread about this topic. This is one of the > > topics that I've been eager to address. Here are some random comments: > > > > Glad that more people think there is a need for improving the reorder > buffer statistics. > > > > > > > With the attached WIP patch, we can report, at a given point in time, > > > total size of reorder buffer i.e. sum of the size of all the changes > > > belonging to the transactions active in the reorder buffer whether > > > those changes are in memory, on disk or sent downstream as part of an > > > unfinished streamed transaction. By sampling this value at regular > > > intervals, one can observe the reorder buffer trendline as shown in > > > the plots attached. > > > > Alternative idea (or an additional metric) would be the high-watermark > > of memory usage. That way, users won't miss memory usage spikes that > > might be missed by sampling the total memory usage. > > I think the trendline is important to make right trade-offs in case > setting logical_decoding_work_mem to maximum is not possible. With a > high watermark the question is how long does a high remain high? As > the load changes, a high that was once may become lower or higher than > the new high and soon irrelevant. Probably we could reset the > high-watermark every time the view is sampled, so that we provide a > trendline for the high-water mark as well. But I want to be cautious > about adding that complexity of tracking maxima accurately and then > maintain it forever. If sampling is frequent enough usually it will > capture a maxima and minima good enough for practical purposes. The > users will need to consider the trendline as approximate anyway since > the load will show slight variations over the time. Agreed with the importance of trendline. > Please share your idea of reporting high-watermark. I was thinking of this high-watermark idea since it doesn't require any external system/tool to get the information for tuning logical_decoding_work_mem. It would be easy to use when users want to figure out maximum data usage for logical decoding of the particular workload. > > > > > > > 2. Is the size of the reorder buffer enough or we want to also track > > > the size of changes on disk and the size of changes sent downstream as > > > part of unfinished streamed transactions separately? Also the number > > > of transactions being tracked by the reorder buffer? > > > > For the purpose of tuning logical_decoding_work_mem, the additional > > metrics and statistics we need might not be many. But in order to make > > logical decoding more visible for users for debugging or diagnosing > > purposes, more statistics like the number of transactions being > > tracked by the reorder buffer might be required. > > > > Ok. We could add the total number of transactions in the reorder > buffer at a given point in time to the report easily. How about > subtransactions? How about prepared but not committed/aborted > transactions? For debugging or diagnosing purposes, this information might be useful, but I'm not sure we need this information of logical decodings that are running. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
On Thu, Aug 28, 2025 at 5:56 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Tue, Aug 26, 2025 at 2:45 AM Ashutosh Bapat > <ashutosh.bapat.oss@gmail.com> wrote: > > > > Hi Masahiko, > > Thanks for your inputs. > > > > On Tue, Aug 26, 2025 at 2:17 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > On Wed, Aug 13, 2025 at 5:40 AM Ashutosh Bapat > > > <ashutosh.bapat.oss@gmail.com> wrote: > > > > > > Thank you for starting the thread about this topic. This is one of the > > > topics that I've been eager to address. Here are some random comments: > > > > > > > Glad that more people think there is a need for improving the reorder > > buffer statistics. > > > > > > > > > > With the attached WIP patch, we can report, at a given point in time, > > > > total size of reorder buffer i.e. sum of the size of all the changes > > > > belonging to the transactions active in the reorder buffer whether > > > > those changes are in memory, on disk or sent downstream as part of an > > > > unfinished streamed transaction. By sampling this value at regular > > > > intervals, one can observe the reorder buffer trendline as shown in > > > > the plots attached. > > > > > > Alternative idea (or an additional metric) would be the high-watermark > > > of memory usage. That way, users won't miss memory usage spikes that > > > might be missed by sampling the total memory usage. > > > > I think the trendline is important to make right trade-offs in case > > setting logical_decoding_work_mem to maximum is not possible. With a > > high watermark the question is how long does a high remain high? As > > the load changes, a high that was once may become lower or higher than > > the new high and soon irrelevant. Probably we could reset the > > high-watermark every time the view is sampled, so that we provide a > > trendline for the high-water mark as well. But I want to be cautious > > about adding that complexity of tracking maxima accurately and then > > maintain it forever. If sampling is frequent enough usually it will > > capture a maxima and minima good enough for practical purposes. The > > users will need to consider the trendline as approximate anyway since > > the load will show slight variations over the time. > > Agreed with the importance of trendline. > > > Please share your idea of reporting high-watermark. > > I was thinking of this high-watermark idea since it doesn't require > any external system/tool to get the information for tuning > logical_decoding_work_mem. It would be easy to use when users want to > figure out maximum data usage for logical decoding of the particular > workload. Ok. I am interested in knowing how you think we should keep track of the high watermark. Do we keep it updated as new maxima are found? And keep reporting the last high watermark seen till a new maxima is reached? With that, users would end up setting logical_decoding_work_mem (and hence provisioning for it) at a higher value, even if the workload changes so that the reorder buffer never reaches that high watermark. So, I think we should reset the high watermark. Do you think so? If yes, When do we do that? How often do we reset it? Tracking maxima and minima of other parameters in pg_stat_replication like replication lag might also help users e.g. to tune their networks. But we don't track their watermarks. External tools are used for that. I was envisioning something like that for reorder buffer size as well. > > > > > > > > > > > 2. Is the size of the reorder buffer enough or we want to also track > > > > the size of changes on disk and the size of changes sent downstream as > > > > part of unfinished streamed transactions separately? Also the number > > > > of transactions being tracked by the reorder buffer? > > > > > > For the purpose of tuning logical_decoding_work_mem, the additional > > > metrics and statistics we need might not be many. But in order to make > > > logical decoding more visible for users for debugging or diagnosing > > > purposes, more statistics like the number of transactions being > > > tracked by the reorder buffer might be required. > > > > > > > Ok. We could add the total number of transactions in the reorder > > buffer at a given point in time to the report easily. How about > > subtransactions? How about prepared but not committed/aborted > > transactions? > > For debugging or diagnosing purposes, this information might be > useful, but I'm not sure we need this information of logical decodings > that are running. I was not clear. Sorry. I meant reporting the number of subtransactions and prepared transactions in the reorder buffer (not the actual subtransactions or prepared transactions themselves). Do you think that reporting just the number of transactions in the reorder buffer is enough? Do you think that reporting the number of subtransactions and the number of prepared transactions is not required? -- Best Wishes, Ashutosh Bapat
On Wed, Aug 27, 2025 at 9:36 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > On Thu, Aug 28, 2025 at 5:56 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Tue, Aug 26, 2025 at 2:45 AM Ashutosh Bapat > > <ashutosh.bapat.oss@gmail.com> wrote: > > > > > > Hi Masahiko, > > > Thanks for your inputs. > > > > > > On Tue, Aug 26, 2025 at 2:17 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > On Wed, Aug 13, 2025 at 5:40 AM Ashutosh Bapat > > > > <ashutosh.bapat.oss@gmail.com> wrote: > > > > > > > > Thank you for starting the thread about this topic. This is one of the > > > > topics that I've been eager to address. Here are some random comments: > > > > > > > > > > Glad that more people think there is a need for improving the reorder > > > buffer statistics. > > > > > > > > > > > > > With the attached WIP patch, we can report, at a given point in time, > > > > > total size of reorder buffer i.e. sum of the size of all the changes > > > > > belonging to the transactions active in the reorder buffer whether > > > > > those changes are in memory, on disk or sent downstream as part of an > > > > > unfinished streamed transaction. By sampling this value at regular > > > > > intervals, one can observe the reorder buffer trendline as shown in > > > > > the plots attached. > > > > > > > > Alternative idea (or an additional metric) would be the high-watermark > > > > of memory usage. That way, users won't miss memory usage spikes that > > > > might be missed by sampling the total memory usage. > > > > > > I think the trendline is important to make right trade-offs in case > > > setting logical_decoding_work_mem to maximum is not possible. With a > > > high watermark the question is how long does a high remain high? As > > > the load changes, a high that was once may become lower or higher than > > > the new high and soon irrelevant. Probably we could reset the > > > high-watermark every time the view is sampled, so that we provide a > > > trendline for the high-water mark as well. But I want to be cautious > > > about adding that complexity of tracking maxima accurately and then > > > maintain it forever. If sampling is frequent enough usually it will > > > capture a maxima and minima good enough for practical purposes. The > > > users will need to consider the trendline as approximate anyway since > > > the load will show slight variations over the time. > > > > Agreed with the importance of trendline. > > > > > Please share your idea of reporting high-watermark. > > > > I was thinking of this high-watermark idea since it doesn't require > > any external system/tool to get the information for tuning > > logical_decoding_work_mem. It would be easy to use when users want to > > figure out maximum data usage for logical decoding of the particular > > workload. > > Ok. I am interested in knowing how you think we should keep track of > the high watermark. > > Do we keep it updated as new maxima are found? And keep reporting the > last high watermark seen till a new maxima is reached? With that, > users would end up setting logical_decoding_work_mem (and hence > provisioning for it) at a higher value, even if the workload changes > so that the reorder buffer never reaches that high watermark. So, I > think we should reset the high watermark. Do you think so? If yes, > When do we do that? How often do we reset it? I think it ultimately depends on use cases but I imagine that users can reset the high watermark value after adjusting logical_decoding_work_mem, and see how the new value works, then adjust the parameter and reset the value again, ... repeating. > Tracking maxima and minima of other parameters in pg_stat_replication > like replication lag might also help users e.g. to tune their > networks. But we don't track their watermarks. External tools are used > for that. I was envisioning something like that for reorder buffer > size as well. I think that high watermark value would be a new type of statistics we collect. As far as I know pg_stat_statement collects similar values but there is no precedent in the core. > > > > > > > > > > > > > > > > 2. Is the size of the reorder buffer enough or we want to also track > > > > > the size of changes on disk and the size of changes sent downstream as > > > > > part of unfinished streamed transactions separately? Also the number > > > > > of transactions being tracked by the reorder buffer? > > > > > > > > For the purpose of tuning logical_decoding_work_mem, the additional > > > > metrics and statistics we need might not be many. But in order to make > > > > logical decoding more visible for users for debugging or diagnosing > > > > purposes, more statistics like the number of transactions being > > > > tracked by the reorder buffer might be required. > > > > > > > > > > Ok. We could add the total number of transactions in the reorder > > > buffer at a given point in time to the report easily. How about > > > subtransactions? How about prepared but not committed/aborted > > > transactions? > > > > For debugging or diagnosing purposes, this information might be > > useful, but I'm not sure we need this information of logical decodings > > that are running. > > I was not clear. Sorry. I meant reporting the number of > subtransactions and prepared transactions in the reorder buffer (not > the actual subtransactions or prepared transactions themselves). Do > you think that reporting just the number of transactions in the > reorder buffer is enough? Do you think that reporting the number of > subtransactions and the number of prepared transactions is not > required? Given that collecting new statistics and updating the statistics real-time require costs, it's ultimately rewards vs. costs. In terms of additional costs, I guess tracking the number of prepared transactions in the reorderbuffer would not be costly, but I'm not sure about the number of subtransactions. I personally would avoid collecting these statistics unless there are explicit use cases. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
On Tue, Sep 23, 2025 at 11:30 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Wed, Aug 27, 2025 at 9:36 PM Ashutosh Bapat > <ashutosh.bapat.oss@gmail.com> wrote: > > > > Ok. I am interested in knowing how you think we should keep track of > > the high watermark. > > > > Do we keep it updated as new maxima are found? And keep reporting the > > last high watermark seen till a new maxima is reached? With that, > > users would end up setting logical_decoding_work_mem (and hence > > provisioning for it) at a higher value, even if the workload changes > > so that the reorder buffer never reaches that high watermark. So, I > > think we should reset the high watermark. Do you think so? If yes, > > When do we do that? How often do we reset it? > > I think it ultimately depends on use cases but I imagine that users > can reset the high watermark value after adjusting > logical_decoding_work_mem, and see how the new value works, then > adjust the parameter and reset the value again, ... repeating. > > > Tracking maxima and minima of other parameters in pg_stat_replication > > like replication lag might also help users e.g. to tune their > > networks. But we don't track their watermarks. External tools are used > > for that. I was envisioning something like that for reorder buffer > > size as well. > > I think that high watermark value would be a new type of statistics we > collect. As far as I know pg_stat_statement collects similar values > but there is no precedent in the core. > Thanks for pointing me to pg_stat_statements. I see we store cumulative statistics like min and max plan and execution times for queries there. We maintain cumulative statistics about logical decoding/replication in pg_stat_replication_slot. I think the high watermark fits there. It also has functionality to reset the statistics which can be used to reset the high watermark along with other statistics. > > > > > > > > > > > > > > > > > > > > > 2. Is the size of the reorder buffer enough or we want to also track > > > > > > the size of changes on disk and the size of changes sent downstream as > > > > > > part of unfinished streamed transactions separately? Also the number > > > > > > of transactions being tracked by the reorder buffer? > > > > > > > > > > For the purpose of tuning logical_decoding_work_mem, the additional > > > > > metrics and statistics we need might not be many. But in order to make > > > > > logical decoding more visible for users for debugging or diagnosing > > > > > purposes, more statistics like the number of transactions being > > > > > tracked by the reorder buffer might be required. > > > > > > > > > > > > > Ok. We could add the total number of transactions in the reorder > > > > buffer at a given point in time to the report easily. How about > > > > subtransactions? How about prepared but not committed/aborted > > > > transactions? > > > > > > For debugging or diagnosing purposes, this information might be > > > useful, but I'm not sure we need this information of logical decodings > > > that are running. > > > > I was not clear. Sorry. I meant reporting the number of > > subtransactions and prepared transactions in the reorder buffer (not > > the actual subtransactions or prepared transactions themselves). Do > > you think that reporting just the number of transactions in the > > reorder buffer is enough? Do you think that reporting the number of > > subtransactions and the number of prepared transactions is not > > required? > > Given that collecting new statistics and updating the statistics > real-time require costs, it's ultimately rewards vs. costs. In terms > of additional costs, I guess tracking the number of prepared > transactions in the reorderbuffer would not be costly, but I'm not > sure about the number of subtransactions. I personally would avoid > collecting these statistics unless there are explicit use cases. > I agree. I think it's not clear whether statistics about number of transactions currently active in the reorder buffer is useful. So let's leave it. If needed we will add it at a later point in time. I will submit a complete patch soon. -- Best Wishes, Ashutosh Bapat
Hi, On 2025-09-23 17:45:19 +0530, Ashutosh Bapat wrote: > On Tue, Sep 23, 2025 at 11:30 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Wed, Aug 27, 2025 at 9:36 PM Ashutosh Bapat > > <ashutosh.bapat.oss@gmail.com> wrote: > > > > > > Ok. I am interested in knowing how you think we should keep track of > > > the high watermark. > > > > > > Do we keep it updated as new maxima are found? And keep reporting the > > > last high watermark seen till a new maxima is reached? With that, > > > users would end up setting logical_decoding_work_mem (and hence > > > provisioning for it) at a higher value, even if the workload changes > > > so that the reorder buffer never reaches that high watermark. So, I > > > think we should reset the high watermark. Do you think so? If yes, > > > When do we do that? How often do we reset it? > > > > I think it ultimately depends on use cases but I imagine that users > > can reset the high watermark value after adjusting > > logical_decoding_work_mem, and see how the new value works, then > > adjust the parameter and reset the value again, ... repeating. > > > > > Tracking maxima and minima of other parameters in pg_stat_replication > > > like replication lag might also help users e.g. to tune their > > > networks. But we don't track their watermarks. External tools are used > > > for that. I was envisioning something like that for reorder buffer > > > size as well. > > > > I think that high watermark value would be a new type of statistics we > > collect. As far as I know pg_stat_statement collects similar values > > but there is no precedent in the core. > > > > Thanks for pointing me to pg_stat_statements. I see we store > cumulative statistics like min and max plan and execution times for > queries there. We maintain cumulative statistics about logical > decoding/replication in pg_stat_replication_slot. I think the high > watermark fits there. It also has functionality to reset the > statistics which can be used to reset the high watermark along with > other statistics. I think pg_stat_statement is an anti-example, if anything. It's been so overloaded with barely useful fields that it got so so slow that it starts to cause contention even in just moderately busy workloads. Let's make this a minimal change, instead of over-complicating this beyond usefulness. Greetings, Andres Freund
On Tue, Sep 23, 2025 at 6:04 PM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2025-09-23 17:45:19 +0530, Ashutosh Bapat wrote: > > On Tue, Sep 23, 2025 at 11:30 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > On Wed, Aug 27, 2025 at 9:36 PM Ashutosh Bapat > > > <ashutosh.bapat.oss@gmail.com> wrote: > > > > > > > > Ok. I am interested in knowing how you think we should keep track of > > > > the high watermark. > > > > > > > > Do we keep it updated as new maxima are found? And keep reporting the > > > > last high watermark seen till a new maxima is reached? With that, > > > > users would end up setting logical_decoding_work_mem (and hence > > > > provisioning for it) at a higher value, even if the workload changes > > > > so that the reorder buffer never reaches that high watermark. So, I > > > > think we should reset the high watermark. Do you think so? If yes, > > > > When do we do that? How often do we reset it? > > > > > > I think it ultimately depends on use cases but I imagine that users > > > can reset the high watermark value after adjusting > > > logical_decoding_work_mem, and see how the new value works, then > > > adjust the parameter and reset the value again, ... repeating. > > > > > > > Tracking maxima and minima of other parameters in pg_stat_replication > > > > like replication lag might also help users e.g. to tune their > > > > networks. But we don't track their watermarks. External tools are used > > > > for that. I was envisioning something like that for reorder buffer > > > > size as well. > > > > > > I think that high watermark value would be a new type of statistics we > > > collect. As far as I know pg_stat_statement collects similar values > > > but there is no precedent in the core. > > > > > > > Thanks for pointing me to pg_stat_statements. I see we store > > cumulative statistics like min and max plan and execution times for > > queries there. We maintain cumulative statistics about logical > > decoding/replication in pg_stat_replication_slot. I think the high > > watermark fits there. It also has functionality to reset the > > statistics which can be used to reset the high watermark along with > > other statistics. > > I think pg_stat_statement is an anti-example, if anything. It's been so > overloaded with barely useful fields that it got so so slow that it starts to > cause contention even in just moderately busy workloads. > > Let's make this a minimal change, instead of over-complicating this beyond > usefulness. Thanks Bertrand, Masahiko and Andres for your thoughts. Summarising the discussion so far, it seems we agree that reporting the current size of reorder buffer (including the spilled and streamed transactions) in pg_stat_replication is useful.Including the sizes of changes from spilled as well as streamed transactions allows users to see how reorder buffer size would trend if logical_decoding_work_mem were to be infinite. If it's sampled frequently enough, the trendline can be used to find an optimal value of logical_decoding_work_mem considering various trade-offs. We did consider other metrics like number of transactions in the reorder buffer and highest size of reorder buffer. But the opinions about their usefulness differ. Hence defering them for now. I will work on producing a complete patch next. -- Best Wishes, Ashutosh Bapat