Обсуждение: Add memory_limit_hits to pg_stat_replication_slots
Hi hackers, I think that it's currently not always possible to determine how many times logical_decoding_work_mem has been reached. For example, say a transaction is made of 40 subtransactions, and I get: slot_name | spill_txns | spill_count | total_txns --------------+------------+-------------+------------ logical_slot | 40 | 41 | 1 (1 row) Then I know that logical_decoding_work_mem has been reached one time (total_txns). But as soon as another transaction is decoded (that does not involve spilling): slot_name | spill_txns | spill_count | total_txns --------------+------------+-------------+------------ logical_slot | 40 | 41 | 2 (1 row) Then we don't know if logical_decoding_work_mem has been reached one or two times. Please find attached a patch to $SUBJECT, to report the number of times the logical_decoding_work_mem has been reached. With such a counter one could get a ratio like total_txns/memory_limit_hits. That could help to see if reaching logical_decoding_work_mem is rare or frequent enough. If frequent, then maybe there is a need to adjust logical_decoding_work_mem. Based on my simple example above, one could say that it might be possible to get the same with: (spill_count - spill_txns) + (stream_count - stream_txns) but that doesn't appear to be the case with a more complicated example (277 vs 247): slot_name | spill_txns | spill_count | total_txns | stream_txns | stream_count | memory_limit_hits | (spc-spct)+(strc-strt) --------------+------------+-------------+------------+-------------+--------------+-------------------+------------------------ logical_slot | 405 | 552 | 19 | 5 | 105 | 277 | 247 (1 row) Not sure I like memory_limit_hits that much, maybe work_mem_exceeded is better? Looking forward to your feedback, Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Вложения
On Wed, Aug 27, 2025 at 12:26 AM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > > Hi hackers, > > I think that it's currently not always possible to determine how many times > logical_decoding_work_mem has been reached. > > For example, say a transaction is made of 40 subtransactions, and I get: > > slot_name | spill_txns | spill_count | total_txns > --------------+------------+-------------+------------ > logical_slot | 40 | 41 | 1 > (1 row) > > Then I know that logical_decoding_work_mem has been reached one time (total_txns). > > But as soon as another transaction is decoded (that does not involve spilling): > > slot_name | spill_txns | spill_count | total_txns > --------------+------------+-------------+------------ > logical_slot | 40 | 41 | 2 > (1 row) > > Then we don't know if logical_decoding_work_mem has been reached one or two > times. > > Please find attached a patch to $SUBJECT, to report the number of times the > logical_decoding_work_mem has been reached. > > With such a counter one could get a ratio like total_txns/memory_limit_hits. > > That could help to see if reaching logical_decoding_work_mem is rare or > frequent enough. If frequent, then maybe there is a need to adjust > logical_decoding_work_mem. > > Based on my simple example above, one could say that it might be possible to get > the same with: > > (spill_count - spill_txns) + (stream_count - stream_txns) > > but that doesn't appear to be the case with a more complicated example (277 vs 247): > > slot_name | spill_txns | spill_count | total_txns | stream_txns | stream_count | memory_limit_hits | (spc-spct)+(strc-strt) > --------------+------------+-------------+------------+-------------+--------------+-------------------+------------------------ > logical_slot | 405 | 552 | 19 | 5 | 105 | 277 | 247 > (1 row) > > Not sure I like memory_limit_hits that much, maybe work_mem_exceeded is better? > > Looking forward to your feedback, Yes, it's a quite different situation in two cases: spilling 100 transactions in one ReorderBufferCheckMemoryLimit() call and spilling 1 transaction in each 100 ReorderBufferCheckMemoryLimit() calls, even though spill_txn is 100 in both cases. And we don't have any statistics to distinguish between these cases. I agree with the statistics. One minor comment is: @@ -1977,6 +1978,7 @@ UpdateDecodingStats(LogicalDecodingContext *ctx) repSlotStat.stream_bytes = rb->streamBytes; repSlotStat.total_txns = rb->totalTxns; repSlotStat.total_bytes = rb->totalBytes; + repSlotStat.memory_limit_hits = rb->memory_limit_hits; Since other statistics counter names are camel cases I think it's better to follow that for the new counter. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Hi, On Thu, Sep 11, 2025 at 03:24:54PM -0700, Masahiko Sawada wrote: > On Wed, Aug 27, 2025 at 12:26 AM Bertrand Drouvot > <bertranddrouvot.pg@gmail.com> wrote: > > > > Looking forward to your feedback, > > Yes, Thanks for looking at it! > it's a quite different situation in two cases: spilling 100 > transactions in one ReorderBufferCheckMemoryLimit() call and spilling > 1 transaction in each 100 ReorderBufferCheckMemoryLimit() calls, even > though spill_txn is 100 in both cases. And we don't have any > statistics to distinguish between these cases. Right. > One minor comment is: > > @@ -1977,6 +1978,7 @@ UpdateDecodingStats(LogicalDecodingContext *ctx) > repSlotStat.stream_bytes = rb->streamBytes; > repSlotStat.total_txns = rb->totalTxns; > repSlotStat.total_bytes = rb->totalBytes; > + repSlotStat.memory_limit_hits = rb->memory_limit_hits; > > Since other statistics counter names are camel cases I think it's > better to follow that for the new counter. Makes sense, done with memoryLimitHits in v2 attached (that's the only change as compared with v1). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Вложения
On Mon, Sep 22, 2025 at 1:41 PM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > > > > > Since other statistics counter names are camel cases I think it's > > better to follow that for the new counter. > > Makes sense, done with memoryLimitHits in v2 attached (that's the only change > as compared with v1). > The memory_limit_hits doesn't go well with the other names in the view. Can we consider memory_exceeded_count? I find memory_exceeded_count (or memory_exceeds_count) more clear and matching with the existing counters. Also, how about keeping it immediately after slot_name in the view? Keeping it in the end after total_bytes seems out of place. -- With Regards, Amit Kapila.
On Mon, Sep 22, 2025 at 4:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Sep 22, 2025 at 1:41 PM Bertrand Drouvot > <bertranddrouvot.pg@gmail.com> wrote: > > > > > > > > Since other statistics counter names are camel cases I think it's > > > better to follow that for the new counter. > > > > Makes sense, done with memoryLimitHits in v2 attached (that's the only change > > as compared with v1). > > > > The memory_limit_hits doesn't go well with the other names in the > view. Can we consider memory_exceeded_count? I find > memory_exceeded_count (or memory_exceeds_count) more clear and > matching with the existing counters. Also, how about keeping it > immediately after slot_name in the view? Keeping it in the end after > total_bytes seems out of place. > Since fields like spill_txns, spill_bytes, and stream_txns also talk about exceeding 'logical_decoding_work_mem', my preference would be to place this new field immediately after these spill and stream fields (and before total_bytes). If not this, then as Amit suggested, immediately before all these fields. Other options for name could be 'mem_limit_exceeded_count' or 'mem_limit_hit_count' thanks Shveta
On Mon, Sep 22, 2025 at 05:21:35PM +0530, shveta malik wrote: > On Mon, Sep 22, 2025 at 4:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Mon, Sep 22, 2025 at 1:41 PM Bertrand Drouvot > > <bertranddrouvot.pg@gmail.com> wrote: > > > > > > > > > > > Since other statistics counter names are camel cases I think it's > > > > better to follow that for the new counter. > > > > > > Makes sense, done with memoryLimitHits in v2 attached (that's the only change > > > as compared with v1). > > > > > > > The memory_limit_hits doesn't go well with the other names in the > > view. Can we consider memory_exceeded_count? I find > > memory_exceeded_count (or memory_exceeds_count) more clear and > > matching with the existing counters. Also, how about keeping it > > immediately after slot_name in the view? Keeping it in the end after > > total_bytes seems out of place. > > > > Since fields like spill_txns, spill_bytes, and stream_txns also talk > about exceeding 'logical_decoding_work_mem', my preference would be to > place this new field immediately after these spill and stream fields > (and before total_bytes). If not this, then as Amit suggested, > immediately before all these fields. > Other options for name could be 'mem_limit_exceeded_count' or > 'mem_limit_hit_count' Thank you, Shveta and Amit, for looking at it. Since we already use txns as abbreviation for transactions then I think it's ok to use "mem". Then I'm using a mix of your proposals with "mem_exceeded_count" in v3 attached. Regarding the field position, I like Shveta's proposal and did it that way. However, technically speaking, "exceeded" is not the perfect wording since the code was doing (and is still doing something similar to): if (debug_logical_replication_streaming == DEBUG_LOGICAL_REP_STREAMING_BUFFERED && - rb->size < logical_decoding_work_mem * (Size) 1024) + !memory_limit_reached) return; as the comment describes correctly using "reached": /* * Check whether the logical_decoding_work_mem limit was reached, and if yes * pick the largest (sub)transaction at-a-time to evict and spill its changes to * disk or send to the output plugin until we reach under the memory limit. So I think that "reached" or "hit" would be better wording. However, the documentation for spill_txns, stream_txns already use "exceeded" (and not "reached") so I went with "exceeded" for consistency. I think that's fine, if not we may want to use "reached" for those 3 stats descriptions. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Вложения
On Tue, Sep 23, 2025 at 1:52 AM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > > On Mon, Sep 22, 2025 at 05:21:35PM +0530, shveta malik wrote: > > On Mon, Sep 22, 2025 at 4:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Mon, Sep 22, 2025 at 1:41 PM Bertrand Drouvot > > > <bertranddrouvot.pg@gmail.com> wrote: > > > > > > > > > > > > > > Since other statistics counter names are camel cases I think it's > > > > > better to follow that for the new counter. > > > > > > > > Makes sense, done with memoryLimitHits in v2 attached (that's the only change > > > > as compared with v1). > > > > > > > > > > The memory_limit_hits doesn't go well with the other names in the > > > view. Can we consider memory_exceeded_count? I find > > > memory_exceeded_count (or memory_exceeds_count) more clear and > > > matching with the existing counters. Also, how about keeping it > > > immediately after slot_name in the view? Keeping it in the end after > > > total_bytes seems out of place. > > > > > > > Since fields like spill_txns, spill_bytes, and stream_txns also talk > > about exceeding 'logical_decoding_work_mem', my preference would be to > > place this new field immediately after these spill and stream fields > > (and before total_bytes). If not this, then as Amit suggested, > > immediately before all these fields. > > Other options for name could be 'mem_limit_exceeded_count' or > > 'mem_limit_hit_count' > > Thank you, Shveta and Amit, for looking at it. Since we already use txns as > abbreviation for transactions then I think it's ok to use "mem". Then I'm using > a mix of your proposals with "mem_exceeded_count" in v3 attached. Regarding the > field position, I like Shveta's proposal and did it that way. > > However, technically speaking, "exceeded" is not the perfect wording since > the code was doing (and is still doing something similar to): > > if (debug_logical_replication_streaming == DEBUG_LOGICAL_REP_STREAMING_BUFFERED && > - rb->size < logical_decoding_work_mem * (Size) 1024) > + !memory_limit_reached) > return; > > as the comment describes correctly using "reached": > > /* > * Check whether the logical_decoding_work_mem limit was reached, and if yes > * pick the largest (sub)transaction at-a-time to evict and spill its changes to > * disk or send to the output plugin until we reach under the memory limit. > > So I think that "reached" or "hit" would be better wording. However, the > documentation for spill_txns, stream_txns already use "exceeded" (and not "reached") > so I went with "exceeded" for consistency. I think that's fine, if not we may want > to use "reached" for those 3 stats descriptions. I find "exceeded" is fine as the documentation for logical decoding also uses it[1]: Similar to spill-to-disk behavior, streaming is triggered when the total amount of changes decoded from the WAL (for all in-progress transactions) exceeds the limit defined by logical_decoding_work_mem setting. One comment for the v3 patch: + <para> + Number of times <literal>logical_decoding_work_mem</literal> has been + exceeded while decoding changes from WAL for this slot. + </para> How about rewording it to like: Number of times the memory used by logical decoding has exceeded logical_decoding_work_mem. Regards, [1] https://www.postgresql.org/docs/devel/logicaldecoding-streaming.html -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Hi, On Tue, Sep 23, 2025 at 11:39:22AM -0700, Masahiko Sawada wrote: > On Tue, Sep 23, 2025 at 1:52 AM Bertrand Drouvot > <bertranddrouvot.pg@gmail.com> wrote: > > > > However, technically speaking, "exceeded" is not the perfect wording since > > the code was doing (and is still doing something similar to): > > > > if (debug_logical_replication_streaming == DEBUG_LOGICAL_REP_STREAMING_BUFFERED && > > - rb->size < logical_decoding_work_mem * (Size) 1024) > > + !memory_limit_reached) > > return; > > > > as the comment describes correctly using "reached": > > > > /* > > * Check whether the logical_decoding_work_mem limit was reached, and if yes > > * pick the largest (sub)transaction at-a-time to evict and spill its changes to > > * disk or send to the output plugin until we reach under the memory limit. > > > > So I think that "reached" or "hit" would be better wording. However, the > > documentation for spill_txns, stream_txns already use "exceeded" (and not "reached") > > so I went with "exceeded" for consistency. I think that's fine, if not we may want > > to use "reached" for those 3 stats descriptions. > > I find "exceeded" is fine as the documentation for logical decoding > also uses it[1]: > > Similar to spill-to-disk behavior, streaming is triggered when the > total amount of changes decoded from the WAL (for all in-progress > transactions) exceeds the limit defined by logical_decoding_work_mem > setting. Yes it also uses "exceeds" but I think it's not 100% accurate. It would be if, in ReorderBufferCheckMemoryLimit, we were using "<=" instead of "<" in: if (debug_logical_replication_streaming == DEBUG_LOGICAL_REP_STREAMING_BUFFERED && rb->size < logical_decoding_work_mem * (Size) 1024) I think an accurate wording would be "reaches or exceeds" in all those places, but just using "exceeds" looks good enough. > One comment for the v3 patch: > > + <para> > + Number of times <literal>logical_decoding_work_mem</literal> has been > + exceeded while decoding changes from WAL for this slot. > + </para> > > How about rewording it to like: > > Number of times the memory used by logical decoding has exceeded > logical_decoding_work_mem. That sounds better, thanks! Used this wording in v4 attached (that's the only change as compared to v3). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Вложения
On Tue, Sep 23, 2025 at 11:31 PM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > > Hi, > > On Tue, Sep 23, 2025 at 11:39:22AM -0700, Masahiko Sawada wrote: > > On Tue, Sep 23, 2025 at 1:52 AM Bertrand Drouvot > > <bertranddrouvot.pg@gmail.com> wrote: > > > > > > However, technically speaking, "exceeded" is not the perfect wording since > > > the code was doing (and is still doing something similar to): > > > > > > if (debug_logical_replication_streaming == DEBUG_LOGICAL_REP_STREAMING_BUFFERED && > > > - rb->size < logical_decoding_work_mem * (Size) 1024) > > > + !memory_limit_reached) > > > return; > > > > > > as the comment describes correctly using "reached": > > > > > > /* > > > * Check whether the logical_decoding_work_mem limit was reached, and if yes > > > * pick the largest (sub)transaction at-a-time to evict and spill its changes to > > > * disk or send to the output plugin until we reach under the memory limit. > > > > > > So I think that "reached" or "hit" would be better wording. However, the > > > documentation for spill_txns, stream_txns already use "exceeded" (and not "reached") > > > so I went with "exceeded" for consistency. I think that's fine, if not we may want > > > to use "reached" for those 3 stats descriptions. > > > > I find "exceeded" is fine as the documentation for logical decoding > > also uses it[1]: > > > > Similar to spill-to-disk behavior, streaming is triggered when the > > total amount of changes decoded from the WAL (for all in-progress > > transactions) exceeds the limit defined by logical_decoding_work_mem > > setting. > > Yes it also uses "exceeds" but I think it's not 100% accurate. It would be > if, in ReorderBufferCheckMemoryLimit, we were using "<=" instead of "<" in: > > if (debug_logical_replication_streaming == DEBUG_LOGICAL_REP_STREAMING_BUFFERED && > rb->size < logical_decoding_work_mem * (Size) 1024) > > I think an accurate wording would be "reaches or exceeds" in all those places, > but just using "exceeds" looks good enough. > > > One comment for the v3 patch: > > > > + <para> > > + Number of times <literal>logical_decoding_work_mem</literal> has been > > + exceeded while decoding changes from WAL for this slot. > > + </para> > > > > How about rewording it to like: > > > > Number of times the memory used by logical decoding has exceeded > > logical_decoding_work_mem. > > That sounds better, thanks! Used this wording in v4 attached (that's the only > change as compared to v3). > Thank you for updating the patch! Here are some comments: --- + bool memory_limit_reached = (rb->size >= logical_decoding_work_mem * (Size) 1024); + + if (memory_limit_reached) + rb->memExceededCount += 1; Do we want to use 'exceeded' for the variable too for better consistency? --- One thing I want to clarify is that even if the memory usage exceeds the logical_decoding_work_mem it doesn't necessarily mean we serialize or stream transactions because of ReorderBufferCheckAndTruncateAbortedTXN(). For example, in a situation where many large already-aborted transactions are truncated by transactionsReorderBufferCheckAndTruncateAbortedTXN(), users would see a high number in mem_exceeded_count column but it might not actually require any adjustment for logical_decoding_work_mem. One idea is to increment that counter if exceeding memory usage is caused to serialize or stream any transactions. On the other hand, it might make sense and be straightforward too to show a pure statistic that the memory usage exceeded the logical_decoding_work_mem. What do you think? Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Hi, On Wed, Sep 24, 2025 at 10:11:20AM -0700, Masahiko Sawada wrote: > On Tue, Sep 23, 2025 at 11:31 PM Bertrand Drouvot > <bertranddrouvot.pg@gmail.com> wrote: > > Thank you for updating the patch! Here are some comments: > > --- > + bool memory_limit_reached = (rb->size >= > logical_decoding_work_mem * (Size) 1024); > + > + if (memory_limit_reached) > + rb->memExceededCount += 1; > > Do we want to use 'exceeded' for the variable too for better consistency? I thought about it, but since we use ">=" I think that "reached" is more accurate. So I went for "reached" for this one and "exceeded" for "user facing" ones. That said I don't have a strong opinion about it, and I'd be ok to use "exceeded" if you feel strong about it. > --- > One thing I want to clarify is that even if the memory usage exceeds > the logical_decoding_work_mem it doesn't necessarily mean we serialize > or stream transactions because of > ReorderBufferCheckAndTruncateAbortedTXN(). Right. > For example, in a situation > where many large already-aborted transactions are truncated by > transactionsReorderBufferCheckAndTruncateAbortedTXN(), users would see > a high number in mem_exceeded_count column but it might not actually > require any adjustment for logical_decoding_work_mem. Yes, but in that case mem_exceeded_count would be high compared to spill_txns, stream_txns, no? > One idea is to > increment that counter if exceeding memory usage is caused to > serialize or stream any transactions. On the other hand, it might make > sense and be straightforward too to show a pure statistic that the > memory usage exceeded the logical_decoding_work_mem. What do you > think? The new counter, as it is proposed, helps to see if the workload hits the logical_decoding_work_mem frequently or not. I think it's valuable information to have on its own. Now to check if logical_decoding_work_mem needs adjustment, one could compare mem_exceeded_count with the existing spill_txns and stream_txns. For example, If I abort 20 transactions that exceeded logical_decoding_work_mem , I'd get: postgres=# select spill_txns,stream_txns,mem_exceeded_count from pg_stat_replication_slots ; spill_txns | stream_txns | mem_exceeded_count ------------+-------------+-------------------- 0 | 0 | 20 (1 row) That way I could figure out that mem_exceeded_count has been reached for aborted transactions. OTOH, If one see spill_txns + stream_txns close to mem_exceeded_count, like: postgres=# select spill_txns,stream_txns,mem_exceeded_count from pg_stat_replication_slots ; spill_txns | stream_txns | mem_exceeded_count ------------+-------------+-------------------- 38 | 20 | 58 (1 row) That probably means that mem_exceeded_count would need to be increased. What do you think? BTW, while doing some tests for the above examples, I noticed that the patch was missing a check on memExceededCount in UpdateDecodingStats() (that produced mem_exceeded_count being 0 for one of the new test in test_decoding): Fixed in v5 attached. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Вложения
On Thu, Sep 25, 2025 at 3:17 AM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > > Hi, > > On Wed, Sep 24, 2025 at 10:11:20AM -0700, Masahiko Sawada wrote: > > On Tue, Sep 23, 2025 at 11:31 PM Bertrand Drouvot > > <bertranddrouvot.pg@gmail.com> wrote: > > > > Thank you for updating the patch! Here are some comments: > > > > --- > > + bool memory_limit_reached = (rb->size >= > > logical_decoding_work_mem * (Size) 1024); > > + > > + if (memory_limit_reached) > > + rb->memExceededCount += 1; > > > > Do we want to use 'exceeded' for the variable too for better consistency? > > I thought about it, but since we use ">=" I think that "reached" is more > accurate. So I went for "reached" for this one and "exceeded" for "user facing" > ones. That said I don't have a strong opinion about it, and I'd be ok to use > "exceeded" if you feel strong about it. Agreed with the current style. Thank you for the explanation. > > > --- > > One thing I want to clarify is that even if the memory usage exceeds > > the logical_decoding_work_mem it doesn't necessarily mean we serialize > > or stream transactions because of > > ReorderBufferCheckAndTruncateAbortedTXN(). > > Right. > > > For example, in a situation > > where many large already-aborted transactions are truncated by > > transactionsReorderBufferCheckAndTruncateAbortedTXN(), users would see > > a high number in mem_exceeded_count column but it might not actually > > require any adjustment for logical_decoding_work_mem. > > Yes, but in that case mem_exceeded_count would be high compared to spill_txns, > stream_txns, no? Right. I think only mem_exceeded_count has a high number while spill_txns and stream_txns have lower numbers in this case (like you shown in your first example below). > > > One idea is to > > increment that counter if exceeding memory usage is caused to > > serialize or stream any transactions. On the other hand, it might make > > sense and be straightforward too to show a pure statistic that the > > memory usage exceeded the logical_decoding_work_mem. What do you > > think? > > The new counter, as it is proposed, helps to see if the workload hits the > logical_decoding_work_mem frequently or not. I think it's valuable information > to have on its own. > > Now to check if logical_decoding_work_mem needs adjustment, one could compare > mem_exceeded_count with the existing spill_txns and stream_txns. > > For example, If I abort 20 transactions that exceeded logical_decoding_work_mem > , I'd get: > > postgres=# select spill_txns,stream_txns,mem_exceeded_count from pg_stat_replication_slots ; > spill_txns | stream_txns | mem_exceeded_count > ------------+-------------+-------------------- > 0 | 0 | 20 > (1 row) > > That way I could figure out that mem_exceeded_count has been reached for > aborted transactions. > > OTOH, If one see spill_txns + stream_txns close to mem_exceeded_count, like: > > postgres=# select spill_txns,stream_txns,mem_exceeded_count from pg_stat_replication_slots ; > spill_txns | stream_txns | mem_exceeded_count > ------------+-------------+-------------------- > 38 | 20 | 58 > (1 row) > > That probably means that mem_exceeded_count would need to be increased. > > What do you think? Right. But one might argue that if we increment mem_exceeded_count only when serializing or streaming is actually performed, mem_exceeded_count would be 0 in the first example and therefore users would be able to simply check mem_exceeded_count without any computation. > > BTW, while doing some tests for the above examples, I noticed that the patch > was missing a check on memExceededCount in UpdateDecodingStats() (that produced > mem_exceeded_count being 0 for one of the new test in test_decoding): Fixed in > v5 attached. Thank you for updating the patch! Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Hi, On Thu, Sep 25, 2025 at 12:14:04PM -0700, Masahiko Sawada wrote: > On Thu, Sep 25, 2025 at 3:17 AM Bertrand Drouvot > <bertranddrouvot.pg@gmail.com> wrote: > > > > That probably means that mem_exceeded_count would need to be increased. > > > > What do you think? > > Right. But one might argue that if we increment mem_exceeded_count > only when serializing or streaming is actually performed, > mem_exceeded_count would be 0 in the first example and therefore users > would be able to simply check mem_exceeded_count without any > computation. Right but we'd not be able to see when the memory limit has been reached for all the cases (that would hide the aborted transactions case). I think that with the current approach we have the best of both world (even if it requires some computations). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi Bertrand,
Thanks for the patch. The patch overall goods look to me. Just a few small comments:
On Sep 25, 2025, at 18:17, Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote:
<v5-0001-Add-mem_exceeded_count-to-pg_stat_replication_slo.patch>
1.
```
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -690,6 +690,9 @@ struct ReorderBuffer
int64 streamCount; /* streaming invocation counter */
int64 streamBytes; /* amount of data decoded */
+ /* Number of times logical_decoding_work_mem has been reached */
+ int64 memExceededCount;
```
For other metrics, the commented with “Statistics about xxx” above, and line comment after every metric. Maybe use the same style, so that it would be easy to add new metrics in future.
2.
```
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -2100,7 +2100,7 @@ pg_stat_get_archiver(PG_FUNCTION_ARGS)
Datum
pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
{
-#define PG_STAT_GET_REPLICATION_SLOT_COLS 10
+#define PG_STAT_GET_REPLICATION_SLOT_COLS 11
text *slotname_text = PG_GETARG_TEXT_P(0);
NameData slotname;
TupleDesc tupdesc;
@@ -2125,11 +2125,13 @@ pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
INT8OID, -1, 0);
TupleDescInitEntry(tupdesc, (AttrNumber) 7, "stream_bytes",
INT8OID, -1, 0);
- TupleDescInitEntry(tupdesc, (AttrNumber) 8, "total_txns",
+ TupleDescInitEntry(tupdesc, (AttrNumber) 8, "mem_exceeded_count",
INT8OID, -1, 0);
- TupleDescInitEntry(tupdesc, (AttrNumber) 9, "total_bytes",
+ TupleDescInitEntry(tupdesc, (AttrNumber) 9, "total_txns",
INT8OID, -1, 0);
- TupleDescInitEntry(tupdesc, (AttrNumber) 10, "stats_reset",
+ TupleDescInitEntry(tupdesc, (AttrNumber) 10, "total_bytes",
+ INT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 11, "stats_reset",
TIMESTAMPTZOID, -1, 0);
BlessTupleDesc(tupdesc);
```
Is it better to add the new field in the last place?
Say if a client does “select * from pg_stat_get_replication_slit()”, it will just gets an extra column instead of mis-ordered columns.
3.
```
+ <para>
+ Number of times the memory used by logical decoding has exceeded
+ <literal>logical_decoding_work_mem</literal>.
+ </para>
```
Feels like “has” is not needed.
Maybe the wording can be simplified as:
Number of times logical decoding exceeded <literal>logical_decoding_work_mem</literal>.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
HighGo Software Co., Ltd.
https://www.highgo.com/
Hi Evan, On Fri, Sep 26, 2025 at 02:34:58PM +0800, Chao Li wrote: > Hi Bertrand, > > Thanks for the patch. The patch overall goods look to me. Just a few small comments: Thanks for looking at it! > > On Sep 25, 2025, at 18:17, Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > > > > <v5-0001-Add-mem_exceeded_count-to-pg_stat_replication_slo.patch> > > > 1. > ``` > --- a/src/include/replication/reorderbuffer.h > +++ b/src/include/replication/reorderbuffer.h > @@ -690,6 +690,9 @@ struct ReorderBuffer > int64 streamCount; /* streaming invocation counter */ > int64 streamBytes; /* amount of data decoded */ > > + /* Number of times logical_decoding_work_mem has been reached */ > + int64 memExceededCount; > ``` > > For other metrics, the commented with “Statistics about xxx” above, and line comment after every metric. Maybe use thesame style, so that it would be easy to add new metrics in future. I'm not sure: for the moment we have only this stat related to logical_decoding_work_mem, memory usage. If we add other stats in this area later, we could add a comment "section" as you suggest. > 2. > ``` > --- a/src/backend/utils/adt/pgstatfuncs.c > +++ b/src/backend/utils/adt/pgstatfuncs.c > @@ -2100,7 +2100,7 @@ pg_stat_get_archiver(PG_FUNCTION_ARGS) > Datum > pg_stat_get_replication_slot(PG_FUNCTION_ARGS) > { > -#define PG_STAT_GET_REPLICATION_SLOT_COLS 10 > +#define PG_STAT_GET_REPLICATION_SLOT_COLS 11 > text *slotname_text = PG_GETARG_TEXT_P(0); > NameData slotname; > TupleDesc tupdesc; > @@ -2125,11 +2125,13 @@ pg_stat_get_replication_slot(PG_FUNCTION_ARGS) > INT8OID, -1, 0); > TupleDescInitEntry(tupdesc, (AttrNumber) 7, "stream_bytes", > INT8OID, -1, 0); > - TupleDescInitEntry(tupdesc, (AttrNumber) 8, "total_txns", > + TupleDescInitEntry(tupdesc, (AttrNumber) 8, "mem_exceeded_count", > INT8OID, -1, 0); > - TupleDescInitEntry(tupdesc, (AttrNumber) 9, "total_bytes", > + TupleDescInitEntry(tupdesc, (AttrNumber) 9, "total_txns", > INT8OID, -1, 0); > - TupleDescInitEntry(tupdesc, (AttrNumber) 10, "stats_reset", > + TupleDescInitEntry(tupdesc, (AttrNumber) 10, "total_bytes", > + INT8OID, -1, 0); > + TupleDescInitEntry(tupdesc, (AttrNumber) 11, "stats_reset", > TIMESTAMPTZOID, -1, 0); > BlessTupleDesc(tupdesc); > ``` > > Is it better to add the new field in the last place? > > Say if a client does “select * from pg_stat_get_replication_slit()”, it will just gets an extra column instead of mis-orderedcolumns. I think it's good to have the function fields "ordering" matching the view fields ordering. FWIW, the ordering has been discussed in [1]. > 3. > ``` > + <para> > + Number of times the memory used by logical decoding has exceeded > + <literal>logical_decoding_work_mem</literal>. > + </para> > ``` > > Feels like “has” is not needed. It's already done that way in other parts of the documentation: $ git grep "has exceeded" "*sgml" doc/src/sgml/maintenance.sgml: vacuum has exceeded the defined insert threshold, which is defined as: doc/src/sgml/monitoring.sgml: logical decoding to decode changes from WAL has exceeded doc/src/sgml/monitoring.sgml: from WAL for this slot has exceeded doc/src/sgml/monitoring.sgml: Number of times the memory used by logical decoding has exceeded doc/src/sgml/ref/create_subscription.sgml: retention duration has exceeded the So that looks ok to me (I'm not a native English speaker though). [1]: https://www.postgresql.org/message-id/CAJpy0uBskXMq65rvWm8a-KR7cSb_sZH9CPRCnWAQrTOF5fciGw%40mail.gmail.com Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Thu, Sep 25, 2025 at 10:26 PM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > > Hi, > > On Thu, Sep 25, 2025 at 12:14:04PM -0700, Masahiko Sawada wrote: > > On Thu, Sep 25, 2025 at 3:17 AM Bertrand Drouvot > > <bertranddrouvot.pg@gmail.com> wrote: > > > > > > That probably means that mem_exceeded_count would need to be increased. > > > > > > What do you think? > > > > Right. But one might argue that if we increment mem_exceeded_count > > only when serializing or streaming is actually performed, > > mem_exceeded_count would be 0 in the first example and therefore users > > would be able to simply check mem_exceeded_count without any > > computation. > > Right but we'd not be able to see when the memory limit has been reached for all > the cases (that would hide the aborted transactions case). I think that with > the current approach we have the best of both world (even if it requires some > computations). Agreed. It would be better to show a raw statistic so that users can use the number as they want. I've made a small comment change and added the commit message to the v5 patch. I'm going to push the attached patch barring any objection or review comments. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com