Обсуждение: [Proposal] Expose internal MultiXact member count function for efficient monitoring
[Proposal] Expose internal MultiXact member count function for efficient monitoring
От
Naga Appani
Дата:
Hi,
I would like to propose exposing an internal PostgreSQL function called
By exposing
I believe exposing
Best regards,
I would like to propose exposing an internal PostgreSQL function called
ReadMultiXactCounts()
to allow for efficient monitoring of MultiXact member usage. This function provides an accurate, real-time view of MultiXact activity by directly retrieving the actual member count, rather than relying on storage-based calculations.Current Challenges: The existing approach we are currently using to estimate MultiXact member usage has several drawbacks:
- Filesystem scanning overhead: These functions recursively scan the
pg_multixact
directory, iterating over potentially thousands or millions of files, and retrieving file sizes usingstat()
calls, which introduces significant I/O overhead. - Potential performance bottleneck: On systems with high transaction throughput generating large numbers of MultiXact members, the filesystem-based approach scales poorly due to the latency of
stat()
calls, especially on network-based filesystems like RDS/Aurora. - Not a real-time or memory-efficient solution: The current approach does not provide a direct, in-memory view of MultiXact activity.
ReadMultiXactCounts()
function, implemented in multixact.c
, directly calculates the number of MultiXact members by reading live state from shared memory. This approach avoids the performance issues of the current filesystem-based estimation methods.By exposing
ReadMultiXactCounts()
for external use, we can provide PostgreSQL users with an efficient way to monitor MultiXact member usage. This could be particularly useful for integrating with tools like Amazon RDS Performance Insights and Amazon CloudWatch to provide enhanced database insights and proactive managed monitoring for users.The performance comparison between the current and proposed approaches shows a significant improvement, with the proposed solution taking only a fraction of a millisecond to retrieve the MultiXact member count, compared to tens or hundreds of milliseconds for the current filesystem-based approach.
Following is the comparison of performance between calculating storage of MultiXact members directory and retrieving the count of members.
Implementation | Used size | MultiXact members (approx) | Time taken (ms) | Time factor (vs Baseline) |
EC2 community (RDS version of pg_ls_multixactdir) | 8642 MB | 1.8 billion | 96.879 | 1.00 |
Linux du command | 8642 MB | 1.8 billion | 96 | NA |
Proposal (ReadMultiXactCounts) | N/A | 1.99 billion | 0.167 | 580 times faster |
I believe exposing
ReadMultiXactCounts()
would be a valuable addition to the PostgreSQL ecosystem, providing users with a more reliable and efficient way to monitor MultiXact usage. Appreciate your feedback or discussion on this proposal. Please let me know if this approach is acceptable, so I’ll go ahead and submit a patch.
Thank you!
Best regards,
Naga Appani
Postgres Database Engineer
Amazon Web Services
On Mon, Mar 10, 2025 at 10:43 AM Naga Appani <nagnrik@gmail.com> wrote:
Hi,
I would like to propose exposing an internal PostgreSQL function calledReadMultiXactCounts()
to allow for efficient monitoring of MultiXact member usage. This function provides an accurate, real-time view of MultiXact activity by directly retrieving the actual member count, rather than relying on storage-based calculations.Current Challenges: The existing approach we are currently using to estimate MultiXact member usage has several drawbacks:Proposed SolutionThe internal
- Filesystem scanning overhead: These functions recursively scan the
pg_multixact
directory, iterating over potentially thousands or millions of files, and retrieving file sizes usingstat()
calls, which introduces significant I/O overhead.- Potential performance bottleneck: On systems with high transaction throughput generating large numbers of MultiXact members, the filesystem-based approach scales poorly due to the latency of
stat()
calls, especially on network-based filesystems like RDS/Aurora.- Not a real-time or memory-efficient solution: The current approach does not provide a direct, in-memory view of MultiXact activity.
ReadMultiXactCounts()
function, implemented inmultixact.c
, directly calculates the number of MultiXact members by reading live state from shared memory. This approach avoids the performance issues of the current filesystem-based estimation methods................................
**************************************************************************************************************************************************************
I would like to propose exposing an internal PostgreSQL function called ReadMultiXactCounts()[1] to allow for efficient monitoring of MultiXact member usage. This function provides an accurate, real-time view of MultiXact activity by directly retrieving the actual member count, rather than relying on storage-based calculations.
================
Current Challenges
================
The existing approach we are currently using to estimate MultiXact member usage has several drawbacks:
- Filesystem scanning overhead: These functions recursively scan the pg_multixact directory, iterating over potentially thousands or millions of files, and retrieving file sizes using stat() calls, which introduces significant I/O overhead.
- Potential performance bottleneck: On systems with high transaction throughput generating large numbers of MultiXact members, the filesystem-based approach scales poorly due to the latency of stat() calls, especially on network-based filesystems like RDS/Aurora.
- Not a real-time or memory-efficient solution: The current approach does not provide a direct, in-memory view of MultiXact activity.
=================
Proposal
=================
The internal ReadMultiXactCounts() function, implemented in multixact.c, directly calculates the number of MultiXact members by reading live state from shared memory. This approach avoids the performance issues of the current filesystem-based estimation methods.
By exposing ReadMultiXactCounts() for external use, we can provide PostgreSQL users with an efficient way to monitor MultiXact member usage. This could be particularly useful for integrating with tools like Amazon RDS Performance Insights and Amazon CloudWatch to provide enhanced database insights and proactive managed monitoring for users.
=========================
Performance comparison
=========================
The performance comparison between the current and proposed approaches shows a significant improvement, with the proposed solution taking only a fraction of a millisecond to retrieve the MultiXact member count, compared to tens or hundreds of milliseconds for the current filesystem-based approach. And as more members are generated, the gap widens.
Following is the comparison of performance between calculating storage of MultiXact members directory and retrieving the count of members.
Implementation | Used size | MultiXact members
----------------------------------------------------+-------------+------------------
EC2 community (RDS version of pg_ls_multixactdir) | 8642 MB | 1.8 billion
Linux du command | 8642 MB | 1.8 billion
Proposal (ReadMultiXactCounts) | 8642 MB | 1.8 billion
============================================================================================
Sample runs
============================================================================================
Using "du -h"
--------------------
postgres=# \! time du -h /rdsdbdata/db/17.4/data/pg_multixact/members
13G /rdsdbdata/db/17.4/data/pg_multixact/members
real 0m0.285s <============================= time taken
user 0m0.050s <============================= time taken
sys 0m0.140s
Using RDS's pg_ls_multixactdir ():
------------------------------------------------------------
postgres=# SELECT
pg_size_pretty(coalesce(sum(size), 0)) AS members_size
FROM
pg_ls_multixactdir ()
WHERE
name LIKE 'pg_multixact/members%';
members_size
--------------
13 GB
(1 row)
Time: 226.533 ms <============================= time taken
Using proposed function:
----------------------------------------
postgres=# SELECT to_char(pg_get_multixact_members_count(), '999,999,999,999') AS members_count;
members_count
------------------
2,745,823,171
(1 row)
Time: 0.142 ms <============================= time taken
============================================================================================
I believe exposing ReadMultiXactCounts() would be a valuable addition to the PostgreSQL ecosystem, providing users with a more reliable and efficient way to monitor MultiXact usage. Appreciate your feedback or discussion on this proposal.
Please let me know if this approach is acceptable, so I’ll go ahead and submit a patch.
Thank you!
References:
Please let me know if this approach is acceptable, so I’ll go ahead and submit a patch.
Thank you!
References:
[1] https://github.com/postgres/postgres/blob/master/src/backend/access/transam/multixact.c#L2925-L2948
Thank you!
Best regards,Naga AppaniPostgres Database EngineerAmazon Web Services
Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring
От
Kirill Reshke
Дата:
On Tue, 11 Mar 2025 at 14:37, Naga Appani <nagnrik@gmail.com> wrote: > > > > On Mon, Mar 10, 2025 at 10:43 AM Naga Appani <nagnrik@gmail.com> wrote: >> >> Hi, >> Hi > ================= > Proposal > ================= > The internal ReadMultiXactCounts() function, implemented in multixact.c, directly calculates the number of MultiXact membersby reading live state from shared memory. This approach avoids the performance issues of the current filesystem-basedestimation methods. This proposal looks sane. It is indeed helpful to keep an eye out for multixact usage in systems that are heavily loaded. > By exposing ReadMultiXactCounts() for external use, we can provide PostgreSQL users with an efficient way to monitor MultiXactmember usage. This could be particularly useful for integrating with tools like Amazon RDS Performance Insightsand Amazon CloudWatch to provide enhanced database insights and proactive managed monitoring for users. > > Please let me know if this approach is acceptable, so I’ll go ahead and submit a patch. Let's give it a try! -- Best regards, Kirill Reshke
On Tue, Mar 11, 2025 at 4:48 AM Kirill Reshke <reshkekirill@gmail.com> wrote: > > On Tue, 11 Mar 2025 at 14:37, Naga Appani <nagnrik@gmail.com> wrote: > > > > > > > > On Mon, Mar 10, 2025 at 10:43 AM Naga Appani <nagnrik@gmail.com> wrote: > >> > >> Hi, > >> > > Hi > > > ================= > > Proposal > > ================= > > The internal ReadMultiXactCounts() function, implemented in multixact.c, directly calculates the number of MultiXactmembers by reading live state from shared memory. This approach avoids the performance issues of the current filesystem-basedestimation methods. > > This proposal looks sane. It is indeed helpful to keep an eye out for > multixact usage in systems that are heavily loaded. > > > By exposing ReadMultiXactCounts() for external use, we can provide PostgreSQL users with an efficient way to monitorMultiXact member usage. This could be particularly useful for integrating with tools like Amazon RDS Performance Insightsand Amazon CloudWatch to provide enhanced database insights and proactive managed monitoring for users. > > > > Please let me know if this approach is acceptable, so I’ll go ahead and submit a patch. > > Let's give it a try! Hi, As a follow-up, I’m submitting a patch that introduces a SQL-callable function to retrieve MultiXact usage metrics. Although the motivation has been discussed earlier in this thread, I’m including a brief recap below to provide context for the patch itself. While wraparound due to MultiXacts (MXID) is less frequent than XID wraparound, it can still lead to aggressive/wraparound vacuum behavior or downtime in certain workloads — especially those involving foreign keys, shared row locks, or long-lived transactions. Currently, users have no SQL-level visibility into MultiXact member consumption, which makes it hard to proactively respond before issues arise. The only workaround today involves scanning the pg_multixact/members directory on disk, current workaround uses stat() calls over potentially millions of small segment files, adds I/O overhead, and is unsuitable for periodic monitoring or integration into observability platforms. Unlike the approach originally proposed or discussed in this thread, this patch does not expose the internal ReadMultiXactCounts() function directly. Instead, it wraps it internally (without changing its visibility) to make the data available via a new SQL function. This patch adds: pg_get_multixact_count() It returns a composite of: - multixacts: number of MultiXact IDs that currently exist - members: number of MultiXact member entries currently exist Implementation -------------- - Defined in multixact.c - Calls ReadMultiXactCounts() - Returns a composite record (multixacts, members) - Includes documentation Use cases --------- This function enables users to: - Monitor member usage to anticipate aggressive vacuum and avoid wraparound risk - Track long-lived workloads that accumulate MultiXacts - Power lightweight monitoring/diagnostics tools without scanning the filesystem - Log and analyze MultiXact growth over time Sample output ------------- multixacts | members ------------+------------ 182371396 | 2826221174 (1 row) Performance comparison ---------------------- While performance is not the primary motivation for this patch, it becomes important in monitoring scenarios where frequent polling is expected. The proposed function executes in sub-millisecond time and avoids any filesystem I/O, making it well-suited for lightweight, periodic monitoring. Implementation | Used size | MultiXact members | Time (ms) | Relative cost -------------------------------------+-----------+-------------------+-----------+---------------- Community (pg_ls_multixactdir) | 8642 MB | 1.8 billion | 96.879 | 1.00 (baseline) Linux (du command) | 8642 MB | 1.8 billion | 96 | 1.00 Proposal (ReadMultiXactCounts-based) | N/A | 1.99 billion | 0.167 | ~580x faster Documentation ------------- - A new section is added to func.sgml to group multixact-related functions - A reference to this new function is included in the "Multixacts and Wraparound" subsection of maintenance.sgml To keep related functions grouped together, we can consider moving mxid_age() into the new section as well unless there are objections to relocating it from the current section. This patch aims to fill a long-standing observability gap. Patch attached. Best regards, Naga Appani Postgres Database Engineer Amazon Web Services > > > > -- > Best regards, > Kirill Reshke
Вложения
Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring
От
Ashutosh Bapat
Дата:
On Tue, Jun 10, 2025 at 7:50 PM Naga Appani <nagnrik@gmail.com> wrote: > > On Tue, Mar 11, 2025 at 4:48 AM Kirill Reshke <reshkekirill@gmail.com> wrote: > > > > On Tue, 11 Mar 2025 at 14:37, Naga Appani <nagnrik@gmail.com> wrote: > > > > > > > > > > > > On Mon, Mar 10, 2025 at 10:43 AM Naga Appani <nagnrik@gmail.com> wrote: > > >> > > >> Hi, > > >> > > > > Hi > > > > > ================= > > > Proposal > > > ================= > > > The internal ReadMultiXactCounts() function, implemented in multixact.c, directly calculates the number of MultiXactmembers by reading live state from shared memory. This approach avoids the performance issues of the current filesystem-basedestimation methods. > > > > This proposal looks sane. It is indeed helpful to keep an eye out for > > multixact usage in systems that are heavily loaded. > > > > > By exposing ReadMultiXactCounts() for external use, we can provide PostgreSQL users with an efficient way to monitorMultiXact member usage. This could be particularly useful for integrating with tools like Amazon RDS Performance Insightsand Amazon CloudWatch to provide enhanced database insights and proactive managed monitoring for users. > > > > > > Please let me know if this approach is acceptable, so I’ll go ahead and submit a patch. > > > > Let's give it a try! > > Hi, > > As a follow-up, I’m submitting a patch that introduces a SQL-callable > function to retrieve MultiXact usage metrics. Although the motivation > has been discussed earlier in this thread, I’m including a brief recap > below to provide context for the patch itself. > > While wraparound due to MultiXacts (MXID) is less frequent than XID > wraparound, it can still lead to aggressive/wraparound vacuum behavior > or downtime in certain workloads — especially those involving foreign > keys, shared row locks, or long-lived transactions. Currently, users > have no SQL-level visibility into MultiXact member consumption, which > makes it hard to proactively respond before issues arise. I see mxid_age() will just give mxid consumption but not members consumption. So just that function is not enough. > > Sample output > ------------- > multixacts | members > ------------+------------ > 182371396 | 2826221174 > (1 row) > > Performance comparison > ---------------------- > While performance is not the primary motivation for this patch, it > becomes important in monitoring scenarios where frequent polling is > expected. The proposed function executes in sub-millisecond time and > avoids any filesystem I/O, making it well-suited for lightweight, > periodic monitoring. > > Implementation | Used size | MultiXact members > | Time (ms) | Relative cost > -------------------------------------+-----------+-------------------+-----------+---------------- > Community (pg_ls_multixactdir) | 8642 MB | 1.8 billion | > 96.879 | 1.00 (baseline) > Linux (du command) | 8642 MB | 1.8 billion | > 96 | 1.00 > Proposal (ReadMultiXactCounts-based) | N/A | 1.99 billion | > 0.167 | ~580x faster > > Documentation > ------------- > - A new section is added to func.sgml to group multixact-related functions > - A reference to this new function is included in the "Multixacts and > Wraparound" subsection of maintenance.sgml > > To keep related functions grouped together, we can consider moving > mxid_age() into the new section as well unless there are objections to > relocating it from the current section. In [1], we decided to document pg_get_multixact_member() in section "Transaction ID and Snapshot Information Functions". I think the discussion in the email thread applies to this function as well. + <sect2 id="functions-info-multixact-information"> + <title>MultiXact Information Functions</title> + + <entry role="func_table_entry"> + <para role="func_signature"> + <indexterm><primary>pg_get_multixact_count</primary></indexterm> + <function>pg_get_multixact_count</function> () + <returnvalue>record</returnvalue> + </para> + <para> + Returns a record with the fields <structfield>multixacts</structfield> and <structfield>members</structfield>: + <itemizedlist> + <listitem> + <para><structfield>multixacts</structfield>: Number of MultiXacts assigned. + PostgreSQL initiates aggressive autovacuum when this value grows beyond the threshold + defined by <varname>autovacuum_multixact_freeze_max_age</varname>, which is based on + the age of <literal>datminmxid</literal>. For more details, see + <ulink url="https://www.postgresql.org/docs/current/routine-vacuuming.html#VACUUM-FOR-MULTIXACT-WRAPAROUND"> + Routine Vacuuming: Multixact Wraparound</ulink>.</para> + </listitem> + <listitem> + <para><structfield>members</structfield>: Number of MultiXact member entries created. + These are stored in files under the <filename>pg_multixact/members</filename> subdirectory. + Wraparound occurs after approximately 4.29 billion entries (~20 GiB). PostgreSQL initiates + aggressive autovacuum when the number of members created exceeds approximately 2.145 billion + or when storage consumption in <filename>pg_multixact/members</filename> approaches 10 GiB.</para> + </listitem> + </itemizedlist> + </para> + </entry> The description here doesn't follow the format of the other functions in this section. We usually explain the inputs and outputs of the function but not how to use the outputs. In this case, you might want to just refer to Multixact Wraparound section under Routine Vacuuming chapter rather than describing the autovacuum behaviour. You can do that by inserting <xref linkend="vacuum-for-multixact-wraparound"/> instead of a full URL. These links are appropriately resolved when creating HTML to version specific links. The URL you have used will always point to "Current" version. + <para> + The <function><link linkend="functions-multixact-information">pg_get_multixact_count</link></function> + function provides a way to check how many multixacts and member entries have been allocated. This can + be useful for identifying unusual multixact activity, monitoring progress toward wraparound, anticipating + system-wide aggressive autovacuum as usage approaches critical thresholds, or verifying whether autovacuum + is keeping up with demand. + </para> + This is the right place to go in details of how the function can be used; not the function documentation itself. I am yet to make up whether we need the whole description. I think the first line is enough and goes well with the rest of the section. + + if (!ReadMultiXactCounts(&multixacts, &members)) + ereport(ERROR, + (errmsg("could not read multixact counts"))); Throwing an error causes the surrounding transaction to abort, so it should be avoided in a monitoring/reporting function if possible. In this case for example, we could throw a warning instead or report NULL values. If ReadMultiXactCounts() returns false, MultiXactMemberFreezeThreshold() returns 0, which will cause the autovacuum to be more aggressive. I think it will be good to highlight that in the function description since that's one of the objectives of this function: to know when the autovacuum is going to be more aggressive. + + values[0] = UInt32GetDatum(multixacts); + values[1] = UInt32GetDatum(members); + + tuple = heap_form_tuple(tupdesc, values, nulls); + PG_RETURN_DATUM(HeapTupleGetDatum(tuple)); +} In PG14+, the transaction wraparound is triggered if the size of the directory exceeds 10GB. This function does not help monitoring that condition. So a user will need to use du or pg_ls_multixactdir() anyway, which defeats the purpose of this function being more efficient than those methods. Am I correct? Can we also report the size of the directory in this function? The patch needs tests. [1] https://www.postgresql.org/message-id/aF8b_fp_9Va58vB9%40nathan -- Best Wishes, Ashutosh Bapat
Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring
От
Michael Paquier
Дата:
On Fri, Jul 25, 2025 at 04:27:37PM +0530, Ashutosh Bapat wrote: > In [1], we decided to document pg_get_multixact_member() in section > "Transaction ID and Snapshot Information Functions". I think the > discussion in the email thread applies to this function as well. Yep, let's be consistent. > Throwing an error causes the surrounding transaction to abort, so it > should be avoided in a monitoring/reporting function if possible. In > this case for example, we could throw a warning instead or report NULL > values. Most likely returning NULL is the best thing we can do, as a safe fallback. > The patch needs tests. Indeed. May I also suggest a split of the multixact SQL functions into a separate file, a src/backend/utils/adt/multixactfuncs.c? The existing pg_get_multixact_members() relies on GetMultiXactIdMembers(), available in multixact.h. The new function pg_get_multixact_count() relies on ReadMultiXactCounts(), which would mean adding it in multixact.h. Even if we finish without an agreement about the SQL function and the end, publishing ReadMultiXactCounts() would give an access to the internals to external code. +PG_FUNCTION_INFO_V1(pg_get_multixact_count); There should be no need for that, pg_proc.dat handling the declaration AFAIK. FWIW, these functions are always kind of hard to use for the end-user without proper documentation. You may want to add an example of how one can use it for monitoring in the docs. -- Michael
Вложения
Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring
От
Ashutosh Bapat
Дата:
On Mon, Jul 28, 2025 at 9:52 AM Michael Paquier <michael@paquier.xyz> wrote: > > May I also suggest a split of the multixact SQL functions into a > separate file, a src/backend/utils/adt/multixactfuncs.c? The existing > pg_get_multixact_members() relies on GetMultiXactIdMembers(), > available in multixact.h. The new function pg_get_multixact_count() > relies on ReadMultiXactCounts(), which would mean adding it in > multixact.h. Even if we finish without an agreement about the SQL > function and the end, publishing ReadMultiXactCounts() would give an > access to the internals to external code. > > +PG_FUNCTION_INFO_V1(pg_get_multixact_count); > > There should be no need for that, pg_proc.dat handling the > declaration AFAIK. > > FWIW, these functions are always kind of hard to use for the end-user > without proper documentation. You may want to add an example of how > one can use it for monitoring in the docs. +1. Let's say if the user knows that the counts are so high that a wraparound is imminent, but vacuuming isn't solving the problem, they would like to know which transactions are holding it back. pg_get_multixact_members() can be used to get the members of the oldest multixact if it's reported and then the user can deal with those transactions. However, the oldest multixact is not reported anywhere, AFAIK. It's also part of MultiXactState, so can be extracted via ReadMultiXactCounts(). We could report it through pg_get_multixact_counts - after renaming it and ReadMultiXactCounts to pg_get_multixact_stats() and ReadMultiXactStats() respectively. Or we could write another function to do so. But it comes handy using query like below #select oldestmultixact, pg_get_multixact_members(oldestmultixact::text::xid) from pg_get_multixact_count(); oldestmultixact | pg_get_multixact_members ------------------+-------------------------- 1 | (757,sh) 1 | (768,sh) (2 rows) Here's a quick patch implementing the same. Please feel free to incorporate and refine it in your patch if you like it. -- Best Wishes, Ashutosh Bapat
Вложения
Hi Ashutosh, Michael, Thanks for the detailed reviews. I have incorporated the feedback; please find attached v2 and my responses inline below. On Fri, Jul 25, 2025 at 5:57 AM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > In [1], we decided to document pg_get_multixact_member() in section > "Transaction ID and Snapshot Information Functions". I think the > discussion in the email thread applies to this function as well. Done -- the function is now documented under “Transaction ID and Snapshot Information Functions” for consistency. > The description here doesn't follow the format of the other functions > in this section. Updated the description in func.sgml to match the style of other functions. Extended usage guidance is now in maintenance.sgml. > Throwing an error causes the surrounding transaction to abort, so it > should be avoided in a monitoring/reporting function if possible. The function now returns NULL instead of throwing an error when counts can’t be read. > If ReadMultiXactCounts() returns false, MultiXactMemberFreezeThreshold() returns 0... Noted -- the docs now mention that the function can be used to anticipate more aggressive autovacuum behavior in such cases. > In PG14+, the transaction wraparound is triggered if the size of the > directory exceeds 10GB. This function does not help monitoring that > condition. So a user will need to use du or pg_ls_multixactdir() > anyway, which defeats the purpose of this function being more > efficient than those methods. Am I correct? Can we also report the > size of the directory in this function? Correct, that is the intent of the function. The members count returned by this function already provides the necessary information to determine the directory size, since each member entry has a fixed size. The constants and formulas in [0] and discussed in [1] show that each group stores four bytes of flags plus four TransactionIds (20 bytes total), yielding 409 groups per 8 KB page and a fixed members‑to‑bytes ratio. This means ~2 billion members corresponds to ~10 GB (aggressive autovacuum threshold) and ~4 billion members corresponds to ~20 GB (wraparound). Since the function already provides the member count, including the physical size in its output would duplicate information and add no extra benefit. > The patch needs tests. Added an isolation test to cover initial state, MultiXact creation, counts, and oldest MultiXact reporting. On Mon, Jul 28, 2025 at 1:00 AM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > Let's say if the user knows that the counts are so high that a > wraparound is imminent, but vacuuming isn't solving the problem... > Here's a quick patch implementing the same. Please feel free to > incorporate and refine it in your patch if you like it. Thank you for sharing the patch. I have incorporated it into this version with minor adjustments, and it fits well with the overall design of the function. On Mon, Jul 28, 2025 at 4:22 AM Michael Paquier <michael@paquier.xyz> wrote: > Yep, let's be consistent. Done -- placed in “Transaction ID and Snapshot Information Functions” for consistency. > Most likely returning NULL is the best thing we can do, as a safe fallback. Implemented -- the function now returns NULL if counts can’t be read. > The patch needs tests. Isolation tests have been added as described above. > May I also suggest a split of the multixact SQL functions into a > separate file, a src/backend/utils/adt/multixactfuncs.c? I agree that would be cleaner, but I’d prefer to keep the implementation in multixact.c for now to maintain focus on this patch and revisit the refactoring later. > +PG_FUNCTION_INFO_V1(pg_get_multixact_count); Removed -- now handled entirely by pg_proc.dat. > ...You may want to add an example of how one can use it for monitoring in the docs. I’ve added a usage example with sample output in the docs. If you had a different kind of demo in mind (e.g., creating multixacts manually and showing the output), please let me know. References: [0] https://github.com/postgres/postgres/blob/master/src/backend/access/transam/multixact.c#L130-L156 [1] https://www.postgresql.org/message-id/flat/CACbFw60UOk6fCC02KsyT3OfU9Dnuq5roYxdw2aFisiN_p1L0bg%40mail.gmail.com Best regards, Naga Appani
Вложения
On Mon, Aug 4, 2025 at 1:16 AM Naga Appani <nagnrik@gmail.com> wrote: > > Hi Ashutosh, Michael, > > Thanks for the detailed reviews. I have incorporated the feedback; > please find attached v2 and my responses inline below. > > On Fri, Jul 25, 2025 at 5:57 AM Ashutosh Bapat > <ashutosh.bapat.oss@gmail.com> wrote: > > > In [1], we decided to document pg_get_multixact_member() in section > > "Transaction ID and Snapshot Information Functions". I think the > > discussion in the email thread applies to this function as well. > > Done -- the function is now documented under “Transaction ID and > Snapshot Information Functions” for consistency. > > > The description here doesn't follow the format of the other functions > > in this section. > > Updated the description in func.sgml to match the style of other > functions. Extended usage guidance is now in maintenance.sgml. > > > Throwing an error causes the surrounding transaction to abort, so it > > should be avoided in a monitoring/reporting function if possible. > > The function now returns NULL instead of throwing an error when counts > can’t be read. > > > If ReadMultiXactCounts() returns false, MultiXactMemberFreezeThreshold() returns 0... > > Noted -- the docs now mention that the function can be used to > anticipate more aggressive autovacuum behavior in such cases. > > > In PG14+, the transaction wraparound is triggered if the size of the > > directory exceeds 10GB. This function does not help monitoring that > > condition. So a user will need to use du or pg_ls_multixactdir() > > anyway, which defeats the purpose of this function being more > > efficient than those methods. Am I correct? Can we also report the > > size of the directory in this function? > > Correct, that is the intent of the function. The members count > returned by this function already provides the necessary information > to determine the directory size, since each member entry has a fixed > size. The constants and formulas in [0] and discussed in [1] show that > each group stores four bytes of flags plus four TransactionIds (20 > bytes total), yielding 409 groups per 8 KB page and a fixed > members‑to‑bytes ratio. This means ~2 billion members corresponds to > ~10 GB (aggressive autovacuum threshold) and ~4 billion members > corresponds to ~20 GB (wraparound). > > Since the function already provides the member count, including the > physical size in its output would duplicate information and add no > extra benefit. > > > The patch needs tests. > > Added an isolation test to cover initial state, MultiXact creation, > counts, and oldest MultiXact reporting. > > On Mon, Jul 28, 2025 at 1:00 AM Ashutosh Bapat > <ashutosh.bapat.oss@gmail.com> wrote: > > > Let's say if the user knows that the counts are so high that a > > wraparound is imminent, but vacuuming isn't solving the problem... > > Here's a quick patch implementing the same. Please feel free to > > incorporate and refine it in your patch if you like it. > > Thank you for sharing the patch. I have incorporated it into this > version with minor adjustments, and it fits well with the overall > design of the function. > > On Mon, Jul 28, 2025 at 4:22 AM Michael Paquier <michael@paquier.xyz> wrote: > > > Yep, let's be consistent. > > Done -- placed in “Transaction ID and Snapshot Information Functions” > for consistency. > > > Most likely returning NULL is the best thing we can do, as a safe fallback. > > Implemented -- the function now returns NULL if counts can’t be read. > > > The patch needs tests. > > Isolation tests have been added as described above. > > > May I also suggest a split of the multixact SQL functions into a > > separate file, a src/backend/utils/adt/multixactfuncs.c? > > I agree that would be cleaner, but I’d prefer to keep the > implementation in multixact.c for now to maintain focus on this patch > and revisit the refactoring later. > > > +PG_FUNCTION_INFO_V1(pg_get_multixact_count); > > Removed -- now handled entirely by pg_proc.dat. > > > ...You may want to add an example of how one can use it for monitoring in the docs. > > I’ve added a usage example with sample output in the docs. If you had > a different kind of demo in mind (e.g., creating multixacts manually > and showing the output), please let me know. > > References: > [0] https://github.com/postgres/postgres/blob/master/src/backend/access/transam/multixact.c#L130-L156 > [1] https://www.postgresql.org/message-id/flat/CACbFw60UOk6fCC02KsyT3OfU9Dnuq5roYxdw2aFisiN_p1L0bg%40mail.gmail.com > > Best regards, > Naga Appani Following up on my v2 from yesterday — the recent commit [0] changed the directory layout, which broke the patch (v2). This v3 updates the code to work with the new structure and also fixes some formatting issues I noticed while revisiting the changes. The rest of the patch remains the same as v2, which incorporated feedback from Ashutosh and Michael (see my previous email for details). Please find v3 attached. References: [0] https://github.com/postgres/postgres/commit/4e23c9ef65accde7eb3e56aa28d50ae5cf79b64b Best regards, Naga Appani
Вложения
Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring
От
Michael Paquier
Дата:
On Mon, Aug 04, 2025 at 04:51:30PM -0500, Naga Appani wrote: > The rest of the patch remains the same as v2, which incorporated > feedback from Ashutosh and Michael (see my previous email for > details). > > Please find v3 attached. I am reading again what you have here, and I really think that we should move the SQL function parts of multixact.c into their own new file, exposing ReadMultiXactCounts() in multixact.h, because I also suspect that this can become really useful for extensions that aim at doing things similar to your proposal in terms of data monitoring for autovacuum wraparound. This means two refactoring patches: - One to expose the new routine in multixact.h. - One to move the existing SQL code to its new file. ReadMultiXactCounts() is also incorrectly named with your proposal to expose oldestMultiXactId in the information returned to the caller, where the key point is to make sure that the information retrieved is consistent across a single LWLock acquisition. So perhaps this should be named GetMultiXactInformation() or something similar? The top of ReadMultiXactCounts() (or whatever its new name) should also document the information returned across a single call. It looks inconsistent to return oldestMultiXactId if the oldestOffsetKnown is false. What about oldestOffset itself? Should it be returned for consistency with the counts and oldestMultiXactId? -- Michael
Вложения
Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring
От
Ashutosh Bapat
Дата:
On Mon, Aug 4, 2025 at 11:46 AM Naga Appani <nagnrik@gmail.com> wrote: > > In PG14+, the transaction wraparound is triggered if the size of the > > directory exceeds 10GB. This function does not help monitoring that > > condition. So a user will need to use du or pg_ls_multixactdir() > > anyway, which defeats the purpose of this function being more > > efficient than those methods. Am I correct? Can we also report the > > size of the directory in this function? > > Correct, that is the intent of the function. The members count > returned by this function already provides the necessary information > to determine the directory size, since each member entry has a fixed > size. The constants and formulas in [0] and discussed in [1] show that > each group stores four bytes of flags plus four TransactionIds (20 > bytes total), yielding 409 groups per 8 KB page and a fixed > members‑to‑bytes ratio. This means ~2 billion members corresponds to > ~10 GB (aggressive autovacuum threshold) and ~4 billion members > corresponds to ~20 GB (wraparound). Would it be better to do that math in the function and output the result? Users may not be able to read and understand the PostgreSQL code or pgsql-hackers threads Or the constants may change across versions. It will be more convenient for users if they get the output from the function itself. On Fri, Aug 8, 2025 at 6:05 AM Michael Paquier <michael@paquier.xyz> wrote: > > ReadMultiXactCounts() is also incorrectly named with your proposal to > expose oldestMultiXactId in the information returned to the caller, > where the key point is to make sure that the information retrieved is > consistent across a single LWLock acquisition. So perhaps this should > be named GetMultiXactInformation() or something similar? +1 > > The top of ReadMultiXactCounts() (or whatever its new name) should > also document the information returned across a single call. It looks > inconsistent to return oldestMultiXactId if the oldestOffsetKnown is > false. What about oldestOffset itself? Should it be returned for > consistency with the counts and oldestMultiXactId? +1 Some more comments on the patch + <literal>multixacts</literal> is the number of multixact IDs assigned, + <literal>members</literal> is the number of multixact member entries created, + and <literal>oldest_multixact</literal> is the oldest MultiXact ID still in use. Now that the name of the function is changed, we need the names to indicate that they are counts e.g. num_mxids, num_members. + These values can be used to monitor multixact consumption and anticipate + autovacuum behavior. See <xref linkend="vacuum-for-multixact-wraparound"/> + for further details on multixact wraparound. + </para> + + <para> + <literal>SELECT * FROM pg_get_multixact_stats();</literal> +<programlisting> + multixacts | members | oldest_multixact +------------+-------------+------------------ + 182371396 | 2826221174 | 754321 +</programlisting> This file doesn't provide usage examples of other functions. This function doesn't seem to be an exception. I think we should mention that the statistics may get stale as soon as it's fetched, even with REPEATABLE READ isolation level. + linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if either + the storage occupied by multixact members exceeds about 10GB or the number + of members created exceeds approximately 2 billion entries, aggressive vacuum In case each member starts consuming more or less space than it does today what would be the basis of triggering workaround? Space or number of members? I think we should mention only that. scans will occur more often for all tables, starting with those that - have the oldest multixact-age. Both of these kinds of aggressive + have the oldest multixact-age. Both of these kinds of aggressive scans will occur even if autovacuum is nominally disabled. The members storage - area can grow up to about 20GB before reaching wraparound. + area can grow up to about 20GB or approximately 4 billion entries before + reaching wraparound. Similar to above. + </para> + + <para> + The <function>pg_get_multixact_stats()</function> function provides a way + to monitor multixact allocation and usage patterns in real time. By exposing This is the right place to elaborate the usage of this function with an example. + counts of multixacts, member entries, and the oldest multixact ID, it helps: + <orderedlist> + <listitem> + <simpara> + Identify unusual multixact activity from concurrent row-level locks + or foreign key operations + </simpara> + </listitem> + <listitem> + <simpara> + Monitor progress toward wraparound thresholds that trigger aggressive + autovacuum (approximately 2 billion members or 10GB storage) + </simpara> + </listitem> + <listitem> + <simpara> + Verify whether autovacuum is effectively managing multixact cleanup + before reaching critical thresholds + </simpara> + </listitem> + </orderedlist> + See <xref linkend="functions-info-snapshot"/> for details. I think the second point here repeats what's already mentioned earlier. It will be good to elaborate each point with an example instead of just narration. +/* + * pg_get_multixact_stats + * + * SQL-callable function to retrieve MultiXact statistics. + * + * Returns a composite row containing: + * - total number of MultiXact IDs created since startup, + * - total number of MultiXact members created, ... since startup or the number of existing members? + * - the oldest existing MultiXact ID. + * + * This is primarily useful for monitoring MultiXact usage and ensuring + * appropriate wraparound protection. The last two lines are not required, I think. One of its usage is monitoring but users may find other usages. + +step commit1: COMMIT; +step commit2: COMMIT; +step check: + SELECT + multixacts, + members, + oldest_multixact + FROM pg_get_multixact_stats(); + +multixacts|members|oldest_multixact +----------+-------+---------------- + 1| 3| 1 +(1 row) Vacuum may clean the multixact between commit2 and check, in which case the result won't be stable. -- Best Wishes, Ashutosh Bapat
Hi Michael, Ashutosh, Thanks a lot for the detailed reviews and feedback. Please find attached v4 of the patchset. Summary of changes in v4: - Split into two patches as suggested: 1. Expose and rename ReadMultiXactCounts() -> GetMultiXactInfo() in multixact.h with clearer comments. 2. Add pg_get_multixact_stats() as a SQL-callable function in a new file (multixactfuncs.c), with docs and tests. - Function now also returns oldestOffset for consistency. - Field names updated to num_mxids, num_members, oldest_multixact, oldest_offset. - Documentation revised to describe thresholds only in terms of member counts (disk size wording removed). - Added a minimal example in maintenance.sgml where multixact wraparound is already described. - Isolation tests are rewritten so they no longer depend on exact counts, but only on monotonic properties guaranteed while a multixact is pinned. Replies inline below: On Thu, Aug 7, 2025 at 7:35 PM Michael Paquier <michael@paquier.xyz> wrote: > > I really think that we should move the SQL function parts of multixact.c > into their own new file, exposing ReadMultiXactCounts() in multixact.h... Done. The SQL-callable code now lives in src/backend/utils/adt/multixactfuncs.c and the accessor is declared in src/include/access/multixact.h. > ReadMultiXactCounts() is also incorrectly named with your proposal to > expose oldestMultiXactId in the information returned to the caller... > So perhaps this should be named GetMultiXactInformation() or something > similar? Renamed to GetMultiXactInfo(). > The top of ReadMultiXactCounts() (or whatever its new name) should > also document the information returned across a single call. Added detailed comments about consistency under a single LWLock and the meaning of each field. > It looks inconsistent to return oldestMultiXactId if the > oldestOffsetKnown is false. What about oldestOffset itself? GetMultiXactInfo() now returns oldestOffset as well. If the oldest offset isn’t currently known, the function returns false and clears all outputs, so callers don’t see a partially valid struct. --- On Fri, Aug 8, 2025 at 4:33 AM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > Would it be better to do that math in the function and output the result? That’s a cool idea, thanks for pointing it out. For now I have kept the SQL function focused only on exposing the raw counts (num_mxids, num_members, oldest IDs). My thought was that keeping the API lean makes it easier to maintain across versions, while leaving any derived calculations like approximate storage size to SQL or external tooling. This way the function remains simple and future-proof, while still giving users the building blocks to get the view they need. I’m happy to revisit this if others feel it would be better for the function to provide an approximate size directly — I wanted to start with the simplest surface and gather feedback first. > Now that the name of the function is changed, we need the names to > indicate that they are counts e.g. num_mxids, num_members. Adjusted. The SQL function returns: num_mxids, num_members, oldest_multixact, oldest_offset. > This file doesn't provide usage examples of other functions. This > function doesn't seem to be an exception. Earlier I thought it was fine to add an example since pg_input_error_info() also has one, so in this version I placed the example in maintenance.sgml, where we already discuss multixact wraparound. That seemed like the most natural place for it. I agree with your point about consistency, though, so I kept the style minimal and aligned with the surrounding text. > I think we should mention that the statistics may get stale as soon as > it's fetched, even with REPEATABLE READ isolation level. Added a note that values are a live snapshot and can change immediately. > In case each member starts consuming more or less space than it does > today what would be the basis of triggering wraparound? Space or > number of members? I think we should mention only that. I updated the docs to describe wraparound in terms of member counts only. The earlier mention of disk size has been dropped, since the thresholds are defined by counts. > This is the right place to elaborate the usage of this function with an > example. Expanded with a short example, while keeping it consistent with nearby entries. > ... since startup or the number of existing members? Clarified that the values reflect what’s *currently in use* (i.e., derived from next/oldest) and that NULLs are returned if the multixact subsystem has not been initialized yet. > The last two lines are not required, I think. One of its usage is > monitoring but users may find other usages. Dropped those lines. > Vacuum may clean the multixact between commit2 and check, in which > case the result won't be stable. Right, the earlier version of the test assumed stable counts, which could fail if autovacuum or background cleanup removed entries in between steps. In v4 the isolation test no longer relies on exact numbers. Instead it asserts only the monotonic properties that are guaranteed while a multixact is pinned, and avoids assumptions once locks are released. That makes the test robust against concurrent vacuum activity. --- Thanks again for the thoughtful reviews and guidance. Please let me know if you see further adjustments needed. Best regards, Naga
Вложения
Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring
От
Michael Paquier
Дата:
On Sun, Aug 17, 2025 at 01:27:29AM -0500, Naga Appani wrote: > On Thu, Aug 7, 2025 at 7:35 PM Michael Paquier <michael@paquier.xyz> wrote: >> >> I really think that we should move the SQL function parts of multixact.c >> into their own new file, exposing ReadMultiXactCounts() in multixact.h... > > Done. The SQL-callable code now lives in > src/backend/utils/adt/multixactfuncs.c > and the accessor is declared in > src/include/access/multixact.h. My point was a bit different: multixactfuncs.c should be created first because we already have one SQL function in multixact.c that can be moved inside it, with the declarations it requires added to multixact.h. I've extracted what you did, moved the existing pg_get_multixact_members() inside the new file, and applied the result. >> ReadMultiXactCounts() is also incorrectly named with your proposal to >> expose oldestMultiXactId in the information returned to the caller... >> So perhaps this should be named GetMultiXactInformation() or something >> similar? > > Renamed to GetMultiXactInfo(). + * Returns information about current MultiXact state in a single atomic read: This comment is incorrect. This is not an atomic read, grabbing a consistent state of the data across one single lock acquisition. Except for this comment, this looks pretty much OK. Ashutosh, any comments? I have not looked at the rest. -- Michael
Вложения
Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring
От
Ashutosh Bapat
Дата:
On Sun, Aug 17, 2025 at 11:57 AM Naga Appani <nagnrik@gmail.com> wrote: > On Fri, Aug 8, 2025 at 4:33 AM Ashutosh Bapat > <ashutosh.bapat.oss@gmail.com> wrote: > > > In case each member starts consuming more or less space than it does > > today what would be the basis of triggering wraparound? Space or > > number of members? I think we should mention only that. > > I updated the docs to describe wraparound in terms of member counts only. > The earlier mention of disk size has been dropped, since the thresholds > are defined by counts. The current document says "Also, if the storage occupied by multixacts members exceeds about 10GB, aggressive vacuum scans will occur more often for all tables, starting with those that have the oldest multixact-age." - do you mean that it's wrong. Instead of checking 10GB threashold, is the code checking the equivalent member count? If so, I think we need a separate patch to correct the documentation first. Can you please point me to the code? Documentation should reflect the code. > > That’s a cool idea, thanks for pointing it out. For now I have kept the > SQL function focused only on exposing the raw counts (num_mxids, > num_members, oldest IDs). My thought was that keeping the API lean makes > it easier to maintain across versions, while leaving any derived > calculations like approximate storage size to SQL or external tooling. > This way the function remains simple and future-proof, while still > giving users the building blocks to get the view they need. > > I’m happy to revisit this if others feel it would be better for the > function to provide an approximate size directly — I wanted to start > with the simplest surface and gather feedback first. The constant multiplier which converts a count into the disk size is in the server code. Duplicating it outside the server code, even in documentation, would require maintenance. GetMultiXactInfo() may not do the arithmetic but pg_get_multixact_stats() is lean enough to add a couple computations. If size is being used as a threshold, reporting count is useless because user wouldn't know the relation easily. If count is used as a threshold, reporting count makes sense. -- Best Wishes, Ashutosh Bapat
Hi Michael, Ashutosh, Thanks a lot for taking the time to review this patch and share your thoughts. Here’s a short summary of what has changed in v5: - Added the new pg_get_multixact_stats() function in multixactfuncs.c. - Fixed the misleading “atomic read” comment in the accessor. - Clarified documentation: thresholds are described in terms of counts, since that’s what the code uses. - Added a members_bytes column in pg_get_multixact_stats() to give users a rough size estimate (num_members * 5), while making it clear this is layout-dependent. Please see my in-line replies below. --- On Mon, Aug 18, 2025 at 1:49 AM Michael Paquier <michael@paquier.xyz> wrote: > My point was a bit different: multixactfuncs.c should be created first > because we already have one SQL function in multixact.c that can be > moved inside it, with the declarations it requires added to > multixact.h. I've extracted what you did, moved the existing > pg_get_multixact_members() inside the new file, and applied the > result. > Really appreciate your clarification and for making that change. I misunderstood your earlier point. > + * Returns information about current MultiXact state in a single atomic read: > > This comment is incorrect. This is not an atomic read, grabbing a > consistent state of the data across one single lock acquisition. > Fixed and adjusted wording. --- On Mon, Aug 18, 2025 at 6:56 AM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > The current document says > "Also, if the storage occupied by multixacts members exceeds about > 10GB, aggressive vacuum scans will occur more often for all tables, > starting with those that have the oldest multixact-age." - do you mean > that it's wrong. Instead of checking 10GB threashold, is the code > checking the equivalent member count? If so, I think we need a > separate patch to correct the documentation first. Can you please > point me to the code? Documentation should reflect the code. > The decision is made in MultiXactMemberFreezeThreshold() [0], and it is entirely count-based: if (members <= MULTIXACT_MEMBER_SAFE_THRESHOLD) return autovacuum_multixact_freeze_max_age; fraction = (double) (members - MULTIXACT_MEMBER_SAFE_THRESHOLD) / (MULTIXACT_MEMBER_DANGER_THRESHOLD - MULTIXACT_MEMBER_SAFE_THRESHOLD); MaxMultiXactOffset is defined in multixact.h [1]: #define MaxMultiXactOffset ((MultiXactOffset) 0xFFFFFFFF) Thresholds are defined in multixact.c [2] #define MULTIXACT_MEMBER_SAFE_THRESHOLD (MaxMultiXactOffset / 2) #define MULTIXACT_MEMBER_DANGER_THRESHOLD \ (MaxMultiXactOffset - MaxMultiXactOffset / 4) These translate to: - MaxMultiXactOffset: ~4.29 billion (2^32 - 1) - MULTIXACT_MEMBER_SAFE_THRESHOLD: ~2.14 billion (2^31 - 1) - MULTIXACT_MEMBER_DANGER_THRESHOLD: ~3.22 billion (3/4 * 2^32) So the code path is count-driven. Regarding docs: For earlier versions (18 and before), the storage-size approximation remains relevant because users don’t have direct access to member count information. Since we don’t plan to backpatch (I assume so) this new function, the documentation for older branches should continue to rely on the storage-based approximation. Now that pg_get_multixact_stats() exposes num_members, the HEAD branch docs can describe the thresholds in terms of counts directly. For older branches, the storage approximation still provides users with a practical way to reason about wraparound risk. > The constant multiplier which converts a count into the disk size is > in the server code. Duplicating it outside the server code, even in > documentation, would require maintenance. GetMultiXactInfo() may not > do the arithmetic but pg_get_multixact_stats() is lean enough to add a > couple computations. > Thanks for suggesting this — it makes sense, especially for users upgrading from earlier versions to 19 and higher. I’ve added a members_bytes column to pg_get_multixact_stats(), computed as num_members * 5. This respects the existing server-side logic while also giving those users a familiar reference point, helping them connect the older size-based guidance with the new count-based view. --- References: [0] https://github.com/postgres/postgres/blob/master/src/backend/access/transam/multixact.c#L2916 [1] https://github.com/postgres/postgres/blob/master/src/include/access/multixact.h#L31 [2] https://github.com/postgres/postgres/blob/master/src/backend/access/transam/multixact.c#L216-L218 Patch v5 is attached. Thanks again for the thoughtful reviews — I really appreciate the guidance and look forward to further feedback. Best regards, Naga
Вложения
Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring
От
Michael Paquier
Дата:
On Mon, Aug 18, 2025 at 08:32:39PM -0500, Naga Appani wrote: > Thanks a lot for taking the time to review this patch and share your thoughts. > > Here’s a short summary of what has changed in v5: > - Added the new pg_get_multixact_stats() function in multixactfuncs.c. > - Fixed the misleading “atomic read” comment in the accessor. > - Clarified documentation: thresholds are described in terms of > counts, since that’s what the code uses. > - Added a members_bytes column in pg_get_multixact_stats() to give > users a rough size estimate (num_members * 5), while making it clear > this is layout-dependent. > > Please see my in-line replies below. FWIW, I think that you should be a bit more careful before sending updated patch sets. You have missed an extra point I have raised upthread about the refactoring pieces: the switch from ReadMultiXactCounts() to GetMultiXactInfo() can be done in a patch of its own. So I have extracted this part from your latest patch, and applied it independently of the SQL function business. Now we are in an advantageous position on HEAD: even if we do not conclude about the SQL function to show the mxact numbers and offsets, we have the function that gives an access to the data you are looking for. In short, it is now possible to provide an equivalent of the feature you want outside of core. Not saying that the patch cannot be useful, but such refactoring pieces open more possibilities, and offer a cleaner commit history with less churn in the main patches. -- Michael
Вложения
On Tue, Aug 19, 2025 at 1:32 AM Michael Paquier <michael@paquier.xyz> wrote: > FWIW, I think that you should be a bit more careful before sending > updated patch sets. You have missed an extra point I have raised > upthread about the refactoring pieces: the switch from > ReadMultiXactCounts() to GetMultiXactInfo() can be done in a patch of > its own. > > So I have extracted this part from your latest patch, and applied it > independently of the SQL function business. Now we are in an > advantageous position on HEAD: even if we do not conclude about the > SQL function to show the mxact numbers and offsets, we have the > function that gives an access to the data you are looking for. In > short, it is now possible to provide an equivalent of the feature you > want outside of core. Not saying that the patch cannot be useful, but > such refactoring pieces open more possibilities, and offer a cleaner > commit history with less churn in the main patches. > -- Thanks for the review and separating the refactoring into its own commit. Point taken on being more careful when sending updated patch sets. I’ll make sure to keep refactoring and SQL layer changes clearly separated going forward. Attached is v6, rebased on top of HEAD. This version is limited to the SQL function only. Changes since v5: - Removed the refactoring, as GetMultiXactInfo() is already committed. - Documentation revised to describe thresholds in terms of raw counts. Hopefully this makes the proposal easier to evaluate on its own merits.
Вложения
On 2025-08-20 13:27, Naga Appani wrote: Thanks for working on this! > On Tue, Aug 19, 2025 at 1:32 AM Michael Paquier <michael@paquier.xyz> > wrote: >> FWIW, I think that you should be a bit more careful before sending >> updated patch sets. You have missed an extra point I have raised >> upthread about the refactoring pieces: the switch from >> ReadMultiXactCounts() to GetMultiXactInfo() can be done in a patch of >> its own. >> >> So I have extracted this part from your latest patch, and applied it >> independently of the SQL function business. Now we are in an >> advantageous position on HEAD: even if we do not conclude about the >> SQL function to show the mxact numbers and offsets, we have the >> function that gives an access to the data you are looking for. In >> short, it is now possible to provide an equivalent of the feature you >> want outside of core. Not saying that the patch cannot be useful, but >> such refactoring pieces open more possibilities, and offer a cleaner >> commit history with less churn in the main patches. >> -- > > Thanks for the review and separating the refactoring into its own > commit. > Point taken on being more careful when sending updated patch sets. > I’ll make sure to keep > refactoring and SQL layer changes clearly separated going forward. > > Attached is v6, rebased on top of HEAD. This version is limited to the > SQL function only. > diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml index e7a9f58c015..6f0e8d7c10a 100644 --- a/doc/src/sgml/maintenance.sgml +++ b/doc/src/sgml/maintenance.sgml @@ -813,12 +813,56 @@ HINT: Execute a database-wide VACUUM in that database. <para> As a safety device, an aggressive vacuum scan will occur for any table whose multixact-age is greater than <xref - linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if the - storage occupied by multixacts members exceeds about 10GB, aggressive vacuum + linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if the number + of members created exceeds approximately 2^31 entries, aggressive vacuum scans will occur more often for all tables, starting with those that Looking at commit ff20ccae9fdb, it seems that the documentation was intentionally written in terms of gigabytes rather than the number: > The threshold is two billion members, which was interpreted as 2GB > in the documentation. Fix to reflect that each member takes up five > bytes, which translates to about 10GB. This is not exact, because of > page boundaries. While at it, mention the maximum size 20GB. Anyway, I also think, as Ashutosh suggested, that if we want to fix this documentation, it would be better to do so in a separate patch. -- Regards, -- Atsushi Torikoshi Seconded from NTT DATA Japan Corporation to SRA OSS K.K.
On 2025-08-22 09:28, torikoshia wrote: > On 2025-08-20 13:27, Naga Appani wrote: > > Thanks for working on this! > >> On Tue, Aug 19, 2025 at 1:32 AM Michael Paquier <michael@paquier.xyz> >> wrote: >>> FWIW, I think that you should be a bit more careful before sending >>> updated patch sets. You have missed an extra point I have raised >>> upthread about the refactoring pieces: the switch from >>> ReadMultiXactCounts() to GetMultiXactInfo() can be done in a patch of >>> its own. >>> >>> So I have extracted this part from your latest patch, and applied it >>> independently of the SQL function business. Now we are in an >>> advantageous position on HEAD: even if we do not conclude about the >>> SQL function to show the mxact numbers and offsets, we have the >>> function that gives an access to the data you are looking for. In >>> short, it is now possible to provide an equivalent of the feature you >>> want outside of core. Not saying that the patch cannot be useful, >>> but >>> such refactoring pieces open more possibilities, and offer a cleaner >>> commit history with less churn in the main patches. >>> -- >> >> Thanks for the review and separating the refactoring into its own >> commit. >> Point taken on being more careful when sending updated patch sets. >> I’ll make sure to keep >> refactoring and SQL layer changes clearly separated going forward. >> >> Attached is v6, rebased on top of HEAD. This version is limited to the >> SQL function only. >> > > diff --git a/doc/src/sgml/maintenance.sgml > b/doc/src/sgml/maintenance.sgml > index e7a9f58c015..6f0e8d7c10a 100644 > --- a/doc/src/sgml/maintenance.sgml > +++ b/doc/src/sgml/maintenance.sgml > @@ -813,12 +813,56 @@ HINT: Execute a database-wide VACUUM in that > database. > <para> > As a safety device, an aggressive vacuum scan will > occur for any table whose multixact-age is greater than <xref > - linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if > the > - storage occupied by multixacts members exceeds about 10GB, > aggressive vacuum > + linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if the > number > + of members created exceeds approximately 2^31 entries, aggressive > vacuum > scans will occur more often for all tables, starting with those > that > > Looking at commit ff20ccae9fdb, it seems that the documentation was > intentionally written in terms of gigabytes rather than the number: > >> The threshold is two billion members, which was interpreted as 2GB >> in the documentation. Fix to reflect that each member takes up five >> bytes, which translates to about 10GB. This is not exact, because of >> page boundaries. While at it, mention the maximum size 20GB. > > Anyway, I also think, as Ashutosh suggested, that if we want to fix > this documentation, it would be better to do so in a separate patch. Ah, I've found why you choose to add this doc modification in this patch in the thread, sorry for skipping over the part: | For earlier versions (18 and before), the storage-size approximation | remains relevant because users don’t have direct access to member | count information. Since we don’t plan to backpatch (I assume so) this | new function, the documentation for older branches should continue to | rely on the storage-based approximation. | Now that pg_get_multixact_stats() exposes num_members, the HEAD branch | docs can describe the thresholds in terms of counts directly. Personally, I think it might be fine to keep the gigabyte-based description, and perhaps we could show both the number of members and gigabytes, since it'd be also helpful to have a sense of the scale. -- Regards, -- Atsushi Torikoshi Seconded from NTT DATA Japan Corporation to SRA OSS K.K.
Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring
От
Ashutosh Bapat
Дата:
On Fri, Aug 22, 2025 at 7:37 AM torikoshia <torikoshia@oss.nttdata.com> wrote: > > | Now that pg_get_multixact_stats() exposes num_members, the HEAD branch > | docs can describe the thresholds in terms of counts directly. > > Personally, I think it might be fine to keep the gigabyte-based > description, and perhaps we could show both the number of members and > gigabytes, since it'd be also helpful to have a sense of the scale. > Those who have grown their own utilities to monitor the on-disk usage will not be able to use the count based thresholds and might take some time for them to starting using pg_get_multixact_stats(). It makes sense to mention both the count and the corresponding disk usage threshold. Something like "Also, if the number of multixact members exceeds approximately 2^31 entries (occupying roughly more than 10GB in storage) ... ". Users can choose which threshold they want to use. Adding disk storage threshold in parenthesis indicates that the count is more accurate and more useful. Here's detailed review of the patch + Returns statistics about current multixact usage: + <literal>num_mxids</literal> is the number of multixact IDs assigned, + <literal>num_members</literal> is the number of multixact member entries created, + <literal>members_bytes</literal> is the storage occupied by <literal>num_members</literal> I thought mentioning bytes, a unit, in column name members_bytes would not be appropriate in case we start reporting it in a different unit like kB in future. But we already have pg_stat_replication_slots::spill_bytes with similar naming. So may be it's okay. But I would prefer members_size or members_storage or some such units-free name. + in <literal>pg_multixact/members</literal> directory, + <literal>oldest_multixact</literal> is the oldest multixact ID still in use, and + <literal>oldest_offset</literal> is the oldest member offset still in use. I am not sure whether oldest_offset is worth exposing. It is an implementation detail. Upthread, Michael suggested to expose oldest offset from GetMultiXactInfo(), but I don't see him explicitly saying that we should expose it through this function as well. Michael what do you think? + These values can be used to monitor multixact consumption and anticipate + autovacuum behavior. See <xref linkend="vacuum-for-multixact-wraparound"/> + for further details on multixact wraparound. I still think that this is not needed. There is no reason to restrict how users want to use this function. We usually don't do that unless there is a hazard associated with it. + <para> + This is a live snapshot of shared counters; the numbers can change between calls, + even within the same transaction. + </para></entry> I have not seen the phrase "live snapshot" being used in the documentation before. How about "The function reports the statistics at the time of invoking the function. They may vary between calls even within the same transaction."? + linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if the number + of members created exceeds approximately 2^31 entries, aggressive vacuum a member means the transaction participating in a multixact. What you intend to say is "if the number of multixacts member entries created ...", right? + <para> + The <function>pg_get_multixact_stats()</function> function, described in unnecessary pair of commas. + This output shows a system with significant multixact activity: about ~100 million + multixact IDs and ~773 million member entries have been created since the oldest + surviving multixact (ID 39974368). By leveraging this information, the function helps: + <orderedlist> + <listitem> ... snip ... + Detect potential performance impacts before they become critical. + For instance, high multixact usage from frequent row-level locking or + foreign key operations can lead to increased I/O and CPU overhead during + vacuum operations. Monitoring these stats helps tune autovacuum frequency + and transaction patterns. + </simpara> + </listitem> + </orderedlist> I am unsure whether we should be mentioning use cases in such detail. Users may find other ways to use those counts. I think the following paragraph should be placed here. + These values can be used to monitor multixact consumption and anticipate + autovacuum behavior. See <xref linkend="vacuum-for-multixact-wraparound"/> + for further details on multixact wraparound. But others may have different opinions. Maybe you could further write in your example that an aggressive autovacuum will be triggered in another 10 seconds (or so) if the number of member entries continues to double every 5 seconds. Or some practical "usage example" like that. + * Returns statistics about current MultiXact usage: + * - num_mxids: Number of MultiXact IDs in use + * - num_members: Total number of member entries + * - oldest_multixact: Oldest MultiXact ID still needed + * - oldest_offset: Oldest offset still in use We don't need to mention each column here, it's evident from the function body and also from the user facing documentation. Just the first line is ok. + * + * Returns a row of NULLs if the MultiXact system is not yet initialized. tuple or record instead of row. In the earlier patch you were calling PG_RETURN_NULL(), which I thought was better. It would get converted into a record of NULLs if someone is to do SELECT * FROM pg_get_multixact_stats(). I don't think "the MultiXact system is not yet initialized" is the right description of that condition. GetMultiXactInfo() prologue says " Returns false if unable to determine, the oldest offset being unknown." MultiXactStatData has following comment for oldest offset. /* * Oldest multixact offset that is potentially referenced by a multixact * referenced by a relation. We don't always know this value, so there's * a flag here to indicate whether or not we currently do. */ And also /* Have we completed multixact startup? */ bool finishedStartup; I think we need to define this condition more accurately. And include it in the documentation as well. + * Calculate approximate storage space: + * - Members are stored in groups of 4 + * - Each group takes 20 bytes (5 bytes per member) + * Note: This ignores small page overhead (12 bytes per 8KB) + */ + membersBytes = (int64) members * 5; Do we have some constant macros or sizeof(some structure) defined for 5 and 4? That way this computation will be self maintaining and self documenting. + nulls[0] = nulls[1] = nulls[2] = nulls[3] = nulls[4] = false; memset(nulls, false, sizeof(nulls)); is better and used everywhere. In fact, instead of initializing it all to true first and then setting all to false here, we should memset here and set it to true in else block. +++ b/src/test/isolation/specs/multixact_stats.spec I have not an seen an isolation test being used for testing a stats function. But I find it useful. Let's see what others think. @@ -0,0 +1,127 @@ +# High-signal invariants for pg_get_multixact_stats() +# We create exactly one fresh MultiXact on a brand-new table. While it is pinned +# by two open transactions, we assert only invariants that background VACUUM/FREEZE +# cannot violate: +# • members increased by ≥ 1 when the second locker arrived, +# • num_mxids / num_members did not decrease vs earlier snapshots, +# • oldest_* never decreases. +# We make NO assertions after releasing locks (freezing/truncation may shrink deltas). +# NOTE: Snapshots snap0 and subsequent checks are taken inside an open driver +# transaction to narrow the window for unrelated truncation between snapshots. What's a driver transaction? +# +# Terminology (global counters): +# num_mxids, num_members : “in-use” deltas derived from global horizons +# oldest_multixact, offset : oldest horizons; they move forward, never backward +# +# All assertions execute while our multixact is pinned by open txns, which protects +# the truncation horizon (VACUUM can’t advance past our pinned multi). Probably this comment is not needed. But from the sequence of steps executed, the data is collected when multixact is pinned (what does that mean?) but the assertions are executed at the end when all the transactions are committed. Am I correct? +step snap0 { + CREATE TEMP TABLE snap0 AS + SELECT num_mxids, num_members, oldest_multixact, oldest_offset + FROM pg_get_multixact_stats(); +} You could use a single table with a primary key column to distinguish snaps which can be used for joining the rows. Why use a temporary table? Just setup and tear down the snap table as well? + +# Pretty, deterministic key/value output of boolean checks. +# Keys: ... snip ... + (s1.num_mxids >= COALESCE(s0.num_mxids, 0)), + (s2.num_mxids >= COALESCE(s1.num_mxids, 0)), + (s1.num_members >= COALESCE(s0.num_members, 0)), + (s2.num_members >= COALESCE(s1.num_members, 0)) + ] This is getting too complex to follow. It produces pretty output but the query is complex. Instead just let keys as the columns in the query. Maybe you could print expanded output if that's possible in an isolation test. -- Best Wishes, Ashutosh Bapat
Hi Atsushi and Ashutosh, Thank you for reviewing the patch. Attached is v7, incorporating the feedback. Please see my responses in-line below. On Fri, Aug 22, 2025 at 6:45 AM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > On Fri, Aug 22, 2025 at 7:37 AM torikoshia <torikoshia@oss.nttdata.com> wrote: > > > > | Now that pg_get_multixact_stats() exposes num_members, the HEAD branch > > | docs can describe the thresholds in terms of counts directly. > > > > Personally, I think it might be fine to keep the gigabyte-based > > description, and perhaps we could show both the number of members and > > gigabytes, since it'd be also helpful to have a sense of the scale. > > > > Those who have grown their own utilities to monitor the on-disk usage > will not be able to use the count based thresholds and might take some > time for them to starting using pg_get_multixact_stats(). It makes > sense to mention both the count and the corresponding disk usage > threshold. Something like "Also, if the number of multixact members > exceeds approximately 2^31 entries (occupying roughly more than 10GB > in storage) ... ". Users can choose which threshold they want to use. > Adding disk storage threshold in parenthesis indicates that the count > is more accurate and more useful. Updated docs to include both counts and approximate storage. > I thought mentioning bytes, a unit, in column name members_bytes would > not be appropriate in case we start reporting it in a different unit > like kB in future. But we already have > pg_stat_replication_slots::spill_bytes with similar naming. So may be > it's okay. But I would prefer members_size or members_storage or some > such units-free name. Good point! Adjusted to a units-free name: members_size. > + in <literal>pg_multixact/members</literal> directory, > + <literal>oldest_multixact</literal> is the oldest multixact ID still > in use, and > + <literal>oldest_offset</literal> is the oldest member offset still in use. > > I am not sure whether oldest_offset is worth exposing. It is an > implementation detail. Upthread, Michael suggested to expose oldest > offset from GetMultiXactInfo(), but I don't see him explicitly saying > that we should expose it through this function as well. Michael what > do you think? IMHO, exposing oldest_offset gives a full picture of multixact state. It complements oldest_multixact, and including it won’t hurt. That said, if consensus is against it, I’m happy to drop it. > + These values can be used to monitor multixact consumption and anticipate > + autovacuum behavior. See <xref linkend="vacuum-for-multixact-wraparound"/> > + for further details on multixact wraparound. > > I still think that this is not needed. There is no reason to restrict > how users want to use this function. We usually don't do that unless > there is a hazard associated with it. Removed. > + <para> > + This is a live snapshot of shared counters; the numbers can change > between calls, > + even within the same transaction. > + </para></entry> > > I have not seen the phrase "live snapshot" being used in the > documentation before. How about "The function reports the statistics > at the time of invoking the function. They may vary between calls even > within the same transaction."? Updated wording. > + linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if the number > + of members created exceeds approximately 2^31 entries, aggressive vacuum > > a member means the transaction participating in a multixact. What you > intend to say is "if the number of multixacts member entries created > ...", right? Correct, updated. > + <para> > + The <function>pg_get_multixact_stats()</function> function, described in > > unnecessary pair of commas. Fixed. > + This output shows a system with significant multixact activity: > about ~100 million > + multixact IDs and ~773 million member entries have been created > since the oldest > + surviving multixact (ID 39974368). By leveraging this information, > the function helps: > + <orderedlist> > + <listitem> > ... snip ... > + Detect potential performance impacts before they become critical. > + For instance, high multixact usage from frequent row-level locking or > + foreign key operations can lead to increased I/O and CPU overhead during > + vacuum operations. Monitoring these stats helps tune autovacuum frequency > + and transaction patterns. > + </simpara> > + </listitem> > + </orderedlist> > > I am unsure whether we should be mentioning use cases in such detail. > Users may find other ways to use those counts. I think the following > paragraph should be placed here. > > + These values can be used to monitor multixact consumption and anticipate > + autovacuum behavior. See <xref linkend="vacuum-for-multixact-wraparound"/> > + for further details on multixact wraparound. > > But others may have different opinions. > > Maybe you could further write in your example that an aggressive > autovacuum will be triggered in another 10 seconds (or so) if the > number of member entries continues to double every 5 seconds. Or some > practical "usage example" like that. Ack. I believe keeping the example with a short list is helpful for users to navigate and interpret the stats. If preferred, I can trim it to a brief paragraph with just the query in the next rev. > + * Returns statistics about current MultiXact usage: > + * - num_mxids: Number of MultiXact IDs in use > + * - num_members: Total number of member entries > + * - oldest_multixact: Oldest MultiXact ID still needed > + * - oldest_offset: Oldest offset still in use > > We don't need to mention each column here, it's evident from the > function body and also from the user facing documentation. Just the > first line is ok. Updated - kept only the high-level description. > + * > + * Returns a row of NULLs if the MultiXact system is not yet initialized. > > tuple or record instead of row. > > In the earlier patch you were calling PG_RETURN_NULL(), which I > thought was better. It would get converted into a record of NULLs if > someone is to do SELECT * FROM pg_get_multixact_stats(). > > I don't think "the MultiXact system is not yet initialized" is the > right description of that condition. GetMultiXactInfo() prologue says > " > Returns false if unable to determine, the oldest offset being > unknown." MultiXactStatData has following comment for oldest offset. > /* > * Oldest multixact offset that is potentially referenced by a multixact > * referenced by a relation. We don't always know this value, so there's > * a flag here to indicate whether or not we currently do. > */ Switched to PG_RETURN_NULL() and rephrased both code comment and docs. > + * Calculate approximate storage space: > + * - Members are stored in groups of 4 > + * - Each group takes 20 bytes (5 bytes per member) > + * Note: This ignores small page overhead (12 bytes per 8KB) > + */ > + membersBytes = (int64) members * 5; > > Do we have some constant macros or sizeof(some structure) defined for > 5 and 4? That way this computation will be self maintaining and self > documenting. Those macros are already defined in multixact.c - for example, MULTIXACT_MEMBERS_PER_MEMBERGROUP and MULTIXACT_MEMBERGROUP_SIZE encode the 4-per-group and 20-byte layout. They are local today, and I’m not sure why they were never exposed. Rather than moving them into a header and creating wider changes, v7 retains the explicit 5-bytes/member estimate with an explanatory comment to stay consistent with existing guidance. If we feel these macros should be promoted to a header, I think that would be best handled as a small, separate patch, and I’d be happy to help with that. > + nulls[0] = nulls[1] = nulls[2] = nulls[3] = nulls[4] = false; > > memset(nulls, false, sizeof(nulls)); is better and used everywhere. > > In fact, instead of initializing it all to true first and then setting > all to false here, we should memset here and set it to true in else > block. Updated. v7 uses memset(false) and only sets true where needed. > +++ b/src/test/isolation/specs/multixact_stats.spec > > I have not an seen an isolation test being used for testing a stats > function. But I find it useful. Let's see what others think. > > @@ -0,0 +1,127 @@ > +# High-signal invariants for pg_get_multixact_stats() > +# We create exactly one fresh MultiXact on a brand-new table. While > it is pinned > +# by two open transactions, we assert only invariants that background > VACUUM/FREEZE > +# cannot violate: > +# • members increased by ≥ 1 when the second locker arrived, > +# • num_mxids / num_members did not decrease vs earlier snapshots, > +# • oldest_* never decreases. > +# We make NO assertions after releasing locks (freezing/truncation > may shrink deltas). > +# NOTE: Snapshots snap0 and subsequent checks are taken inside an open driver > +# transaction to narrow the window for unrelated truncation between snapshots. > > What's a driver transaction? A driver transaction is simply the controlling session that stays open while snapshots are taken. > +# > +# Terminology (global counters): > +# num_mxids, num_members : “in-use” deltas derived from global horizons > +# oldest_multixact, offset : oldest horizons; they move forward, never backward > +# > +# All assertions execute while our multixact is pinned by open txns, > which protects > +# the truncation horizon (VACUUM can’t advance past our pinned multi). > > Probably this comment is not needed. But from the sequence of steps > executed, the data is collected when multixact is pinned (what does > that mean?) but the assertions are executed at the end when all the > transactions are committed. Am I correct? You are correct — the assertions are executed at the end, after the commits. The key point is that all snapshots are taken while the multixact is pinned by open transactions, so the invariants hold despite the final check happening later. > +step snap0 { > + CREATE TEMP TABLE snap0 AS > + SELECT num_mxids, num_members, oldest_multixact, oldest_offset > + FROM pg_get_multixact_stats(); > +} > > You could use a single table with a primary key column to distinguish > snaps which can be used for joining the rows. Why use a temporary > table? Just setup and tear down the snap table as well? I kept separate temp tables to keep each snapshot isolated and easy to read in the spec. A single table with a PK would work too, but I felt temp tables made the sequence clearer. > + > +# Pretty, deterministic key/value output of boolean checks. > +# Keys: > ... snip ... > + (s1.num_mxids >= COALESCE(s0.num_mxids, 0)), > + (s2.num_mxids >= COALESCE(s1.num_mxids, 0)), > + (s1.num_members >= COALESCE(s0.num_members, 0)), > + (s2.num_members >= COALESCE(s1.num_members, 0)) > + ] > > This is getting too complex to follow. It produces pretty output but > the query is complex. Instead just let keys as the columns in the > query. Maybe you could print expanded output if that's possible in an > isolation test. > I used the labeled key/value array to mimic \x-style readability while keeping the output deterministic for isolation’s text diffs. It clearly names each invariant and avoids formatter-dependent width/spacing. Thanks again for the thoughtful reviews. I really appreciate the guidance and will be glad to adjust further if needed. Best regards, Naga
Вложения
Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring
От
Ashutosh Bapat
Дата:
On Thu, Sep 4, 2025 at 2:41 AM Naga Appani <nagnrik@gmail.com> wrote: > > On Fri, Aug 22, 2025 at 6:45 AM Ashutosh Bapat > <ashutosh.bapat.oss@gmail.com> wrote: > > > > On Fri, Aug 22, 2025 at 7:37 AM torikoshia <torikoshia@oss.nttdata.com> wrote: > > > > Updated docs to include both counts and approximate storage. > This one is remaining. + up to approximately 2^32 entries before reaching wraparound. ... 2^32 entries (occupying roughly 20GB in the <literal>pg_multixact/members</literal> directory) before reaching wraparound. ... + See <xref linkend="vacuum-for-multixact-wraparound"/> for further details on multixact wraparound. I don't think we need this reference here. Reference back from that section is enough. + * Returns NULL if the oldest referenced offset is unknown, which happens during + * system startup or when no MultiXact references exist in any relation. If no MultiXact references exist, and GetMultiXactInfo() returns false, MultiXactMemberFreezeThreshold() will assume the worst, which I take as meaning that it will trigger aggressive autovacuum. No MultiXact references existing is a common case which shouldn't be assumed as the worst case. The comment I quoted means "the oldest value of the offset referenced by any multi-xact referenced by a relation *may not be always known". You seem to have interpreted "may not be known" as "does not exist" That's not right. I would write this as "Returns NULL if the oldest referenced offset is unknown which happens during system startup". Similarly I would rephrase the following docs as + <para> + The function returns <literal>NULL</literal> when multixact statistics are unavailable. + For example, during startup before multixact initialization completes or when + the oldest member offset cannot be determined. "The function returns <literal>NULL</literal> when multixact statistics when the oldest multixact offset corresponding to a multixact referenced by a relation is not known after starting the system." > > > > @@ -0,0 +1,127 @@ > > +# High-signal invariants for pg_get_multixact_stats() What does "High-signal" mean here? Is that term defined somewhere? Using terms that most of the contributors are familiar with improves readability. If a new term is required, it needs to be defined first. But I doubt something here requires defining a new term. > > What's a driver transaction? > A driver transaction is simply the controlling session that stays open > while snapshots are taken. I still don't understand the purpose of this transaction. pg_get_multixact_stats() isn't transactional so the driver transaction isn't holding any "snapshot" of the stats. It's also not creating any multixact and hence does not contribute to testing the output of pg_get_multixact_stats. Whatever this session is doing, can be done outside a transaction too. Which step in this session requires an outer transaction? Some more comments + Returns statistics about current multixact usage: + <literal>num_mxids</literal> is the number of multixact IDs assigned, Is this the number of multixact IDs assigned till now (since whatever time) or the number of multixact IDs currently in the system? + <literal>num_members</literal> is the number of multixact member entries created, Similarly this. + multixact allocation and usage patterns in real time. For example: suggestion: ... real time, for example: ... Otherwise the sentence started by "For example" is not a complete sentence. + Returns statistics about current multixact usage: + <literal>num_mxids</literal> is the number of multixact IDs assigned, Is this the number of multixact IDs assigned till now (since whatever time) or the number of multixact IDs currently in the system? + <literal>num_members</literal> is the number of multixact member entries created, Similarly this. + multixact allocation and usage patterns in real time. For example: suggestion: ... real time, for example: ... Otherwise the sentence started by "For example" is not a complete sentence. + + values[0] = Int32GetDatum(multixacts); This should be UInt32GetDatum() multixacts is uint32. + values[1] = Int64GetDatum(members); Similarly this since MultiXactOffset is uint32. + values[4] = Int64GetDatum(oldestOffset); Similarly this since MultiXactOffset is uint32. +# Get MultiXact state +{ + oid => '9001', + descr => 'get current multixact member and multixact ID counts and oldest values', suggestion: get current multixact usage statistics. + proname => 'pg_get_multixact_stats', + prorettype => 'record', + proargtypes => '', + proallargtypes => '{int4,int8,int8,xid,int8}', + proargmodes => '{o,o,o,o,o}', + proargnames => '{num_mxids,num_members,members_size,oldest_multixact,oldest_offset}', + provolatile => 'v', + proparallel => 's', + prosrc => 'pg_get_multixact_stats' +}, I like the way you have formatted the new entry, but other entries in this file are not formatted this way. It would be good to format it like other entries but if other reviewers prefer this way, we can go with this too. -- Best Wishes, Ashutosh Bapat
Hi Ashutosh, Thank you for continuing to review the patch. Attached is v8, incorporating the feedback. Please see my responses inline below. On Fri, Sep 5, 2025 at 6:27 AM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > This one is remaining. > + up to approximately 2^32 entries before reaching wraparound. > > ... 2^32 entries (occupying roughly 20GB in the > <literal>pg_multixact/members</literal> directory) before reaching > wraparound. ... Done. > + See <xref linkend="vacuum-for-multixact-wraparound"/> for further > details on multixact wraparound. > > I don't think we need this reference here. Reference back from that > section is enough. I kept the cross-reference for now since other multixact function docs (such as pg_get_multixact_members()) already use this style, and it helps readers who land directly on the function reference page. Please let me know if you would prefer that I remove it. > + * Returns NULL if the oldest referenced offset is unknown, which > happens during > + * system startup or when no MultiXact references exist in any relation. > > If no MultiXact references exist, and GetMultiXactInfo() returns > false, MultiXactMemberFreezeThreshold() will assume the worst, which I > take as meaning that it will trigger aggressive autovacuum. No > MultiXact references existing is a common case which shouldn't be > assumed as the worst case. The comment I quoted means "the oldest > value of the offset referenced by any multi-xact referenced by a > relation *may not be always known". You seem to have interpreted "may > not be known" as "does not exist" That's not right. I would write this > as "Returns NULL if the oldest referenced offset is unknown which > happens during system startup". > > Similarly I would rephrase the following docs as > + <para> > + The function returns <literal>NULL</literal> when multixact > statistics are unavailable. > + For example, during startup before multixact initialization completes or when > + the oldest member offset cannot be determined. > > "The function returns <literal>NULL</literal> when multixact > statistics when the oldest multixact offset corresponding to a > multixact referenced by a relation is not known after starting the > system." > Updated. > > > > > > @@ -0,0 +1,127 @@ > > > +# High-signal invariants for pg_get_multixact_stats() > > What does "High-signal" mean here? Is that term defined somewhere? > Using terms that most of the contributors are familiar with improves > readability. If a new term is required, it needs to be defined first. > But I doubt something here requires defining a new term. Dropped that wording and simplified the isolation test. > > > What's a driver transaction? > > A driver transaction is simply the controlling session that stays open > > while snapshots are taken. > > I still don't understand the purpose of this transaction. > pg_get_multixact_stats() isn't transactional so the driver transaction > isn't holding any "snapshot" of the stats. It's also not creating any > multixact and hence does not contribute to testing the output of > pg_get_multixact_stats. Whatever this session is doing, can be done > outside a transaction too. Which step in this session requires an > outer transaction? Removed this mention; the test now only checks monotonicity without extra transaction scaffolding. > Some more comments > + Returns statistics about current multixact usage: > + <literal>num_mxids</literal> is the number of multixact IDs assigned, > > Is this the number of multixact IDs assigned till now (since whatever > time) or the number of multixact IDs currently in the system? > > + <literal>num_members</literal> is the number of multixact member > entries created, Updated. > + Returns statistics about current multixact usage: > + <literal>num_mxids</literal> is the number of multixact IDs assigned, > > Is this the number of multixact IDs assigned till now (since whatever > time) or the number of multixact IDs currently in the system? > > + <literal>num_members</literal> is the number of multixact member > entries created, Updated. > + multixact allocation and usage patterns in real time. For example: > > suggestion: ... real time, for example: ... Otherwise the sentence > started by "For example" is not a complete sentence. Updated. > + values[0] = Int32GetDatum(multixacts); > > This should be UInt32GetDatum() multixacts is uint32. > > + values[1] = Int64GetDatum(members); > > Similarly this since MultiXactOffset is uint32. > > + values[4] = Int64GetDatum(oldestOffset); > > Similarly this since MultiXactOffset is uint32. Thanks for pointing this out. I had originally followed the existing types but drifted, fixed now. > +# Get MultiXact state > +{ > + oid => '9001', > + descr => 'get current multixact member and multixact ID counts and > oldest values', > > suggestion: get current multixact usage statistics. Updated > + proname => 'pg_get_multixact_stats', > + prorettype => 'record', > + proargtypes => '', > + proallargtypes => '{int4,int8,int8,xid,int8}', > + proargmodes => '{o,o,o,o,o}', > + proargnames => > '{num_mxids,num_members,members_size,oldest_multixact,oldest_offset}', > + provolatile => 'v', > + proparallel => 's', > + prosrc => 'pg_get_multixact_stats' > +}, > > I like the way you have formatted the new entry, but other entries in > this file are not formatted this way. It would be good to format it > like other entries but if other reviewers prefer this way, we can go > with this too. I reformatted the pg_proc.dat entry to match the surrounding style. Best regards, Naga
Вложения
Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring
От
Ashutosh Bapat
Дата:
On Fri, Sep 12, 2025 at 9:03 AM Naga Appani <nagnrik@gmail.com> wrote: > + </para> + <para> + Returns statistics about current multixact usage: + <literal>num_mxids</literal> is the total number of multixact IDs assigned since startup, + <literal>num_members</literal> is the total number of multixact member entries created since startup, GetMultiXactInfo() returns following *members = nextOffset - *oldestOffset; *multixacts = nextMultiXactId - *oldestMultiXactId; They seem to be the numbers that exist in the system at the time of the call and not since the startup. Am I missing something? + up to approximately 2^32 entries(occupying roughly 20GB in the space between s and ( + proallargtypes => '{int4,int8,int8,xid,int8}', I think the first parameter should also be int8 since uint32 won't fit into int4. > > > + See <xref linkend="vacuum-for-multixact-wraparound"/> for further > > details on multixact wraparound. > > > > I don't think we need this reference here. Reference back from that > > section is enough. > I kept the cross-reference for now since other multixact function docs > (such as pg_get_multixact_members()) already use this style, and it helps > readers who land directly on the function reference page. Please let me > know if you would prefer that I remove it. None of the write up there talks about multixact wraparound so the reference seems arbitrary to me. I would remove it. > > > + * Returns NULL if the oldest referenced offset is unknown, which > > happens during > > + * system startup or when no MultiXact references exist in any relation. > > > > If no MultiXact references exist, and GetMultiXactInfo() returns > > false, MultiXactMemberFreezeThreshold() will assume the worst, which I > > take as meaning that it will trigger aggressive autovacuum. No > > MultiXact references existing is a common case which shouldn't be > > assumed as the worst case. The comment I quoted means "the oldest > > value of the offset referenced by any multi-xact referenced by a > > relation *may not be always known". You seem to have interpreted "may > > not be known" as "does not exist" That's not right. I would write this > > as "Returns NULL if the oldest referenced offset is unknown which > > happens during system startup". > > > > Similarly I would rephrase the following docs as > > + <para> > > + The function returns <literal>NULL</literal> when multixact > > statistics are unavailable. > > + For example, during startup before multixact initialization completes or when > > + the oldest member offset cannot be determined. > > > > "The function returns <literal>NULL</literal> when multixact > > statistics when the oldest multixact offset corresponding to a > > multixact referenced by a relation is not known after starting the > > system." > > > Updated. Thanks for updating the documentation. But the comment in prologue of pg_get_multixact_stats is not completely correct as mentioned in my previous reply. I would just say "Returns NULL if the oldest referenced offset is unknown". Anybody who wants to know when can that happen, may search relevant code by looking at GetMultiXactInfo(). I still find the comment at the start of the isolation test a bit verbose. But I think it's best to leave that to a committer's judgement. However, the word "locker" is unusual. You want to say the session that locks a row (or something similar). We may leave exact words to a committer's judgement. I still find think that the specific usage scenarios described in the documentation are not required. And the computation in pg_get_multixact_stats() should use macros instead of bare numbers. But we can leave that for a committer to decide. Once you address above comments, we may mark the CF entry as RFC. -- Best Wishes, Ashutosh Bapat
Hi Ashutosh, Thank you for continuing to review the patch. Attached is v9, incorporating the feedback. Please see my responses inline below. On Fri, Sep 12, 2025 at 5:34 AM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > + Returns statistics about current multixact usage: > + <literal>num_mxids</literal> is the total number of multixact IDs > assigned since startup, > + <literal>num_members</literal> is the total number of multixact > member entries created since startup, > > GetMultiXactInfo() returns following > *members = nextOffset - *oldestOffset; > *multixacts = nextMultiXactId - *oldestMultiXactId; > They seem to be the numbers that exist in the system at the time of > the call and not since the startup. Am I missing something? You are right, these counts reflect the numbers currently present in the system, not cumulative totals since startup. I have reworded the docs to say “currently present”. > + up to approximately 2^32 entries(occupying roughly 20GB in the > > space between s and ( Fixed. > + proallargtypes => '{int4,int8,int8,xid,int8}', > > I think the first parameter should also be int8 since uint32 won't fit > into int4. Updated. > > > + See <xref linkend="vacuum-for-multixact-wraparound"/> for further > > > details on multixact wraparound. > > > > > > I don't think we need this reference here. Reference back from that > > > section is enough. > > I kept the cross-reference for now since other multixact function docs > > (such as pg_get_multixact_members()) already use this style, and it helps > > readers who land directly on the function reference page. Please let me > > know if you would prefer that I remove it. > > None of the write up there talks about multixact wraparound so the > reference seems arbitrary to me. I would remove it. Removed. > Thanks for updating the documentation. But the comment in prologue of > pg_get_multixact_stats is not completely correct as mentioned in my > previous reply. I would just say "Returns NULL if the oldest > referenced offset is unknown". Anybody who wants to know when can that > happen, may search relevant code by looking at GetMultiXactInfo(). Simplified the prologue comment as suggested. > I still find the comment at the start of the isolation test a bit > verbose. But I think it's best to leave that to a committer's > judgement. However, the word "locker" is unusual. You want to say the > session that locks a row (or something similar). We may leave exact > words to a committer's judgement. Reworded to remove "locker" and simplified. > I still find think that the specific usage scenarios described in the > documentation are not required. And the computation in > pg_get_multixact_stats() should use macros instead of bare numbers. > But we can leave that for a committer to decide. Once you address > above comments, we may mark the CF entry as RFC. Sounds good! With these updates in v9, I believe the patch is now in good shape to be marked RFC. I’ll go ahead and update the CommitFest entry. Thank you again for your thorough reviews and thoughtful guidance on this patch — it has been very helpful. Best regards, Naga