Обсуждение: [Proposal] Expose internal MultiXact member count function for efficient monitoring

Поиск
Список
Период
Сортировка
Hi,

I would like to propose exposing an internal PostgreSQL function called ReadMultiXactCounts() to allow for efficient monitoring of MultiXact member usage. This function provides an accurate, real-time view of MultiXact activity by directly retrieving the actual member count, rather than relying on storage-based calculations.

Current Challenges: The existing approach we are currently using to estimate MultiXact member usage has several drawbacks:
  • Filesystem scanning overhead: These functions recursively scan the pg_multixact directory, iterating over potentially thousands or millions of files, and retrieving file sizes using stat() calls, which introduces significant I/O overhead.
  • Potential performance bottleneck: On systems with high transaction throughput generating large numbers of MultiXact members, the filesystem-based approach scales poorly due to the latency of stat() calls, especially on network-based filesystems like RDS/Aurora.
  • Not a real-time or memory-efficient solution: The current approach does not provide a direct, in-memory view of MultiXact activity.
Proposed SolutionThe internal ReadMultiXactCounts() function, implemented in multixact.c, directly calculates the number of MultiXact members by reading live state from shared memory. This approach avoids the performance issues of the current filesystem-based estimation methods.

By exposing ReadMultiXactCounts() for external use, we can provide PostgreSQL users with an efficient way to monitor MultiXact member usage. This could be particularly useful for integrating with tools like Amazon RDS Performance Insights and Amazon CloudWatch to provide enhanced database insights and proactive managed monitoring for users.

The performance comparison between the current and proposed approaches shows a significant improvement, with the proposed solution taking only a fraction of a millisecond to retrieve the MultiXact member count, compared to tens or hundreds of milliseconds for the current filesystem-based approach. 

Following is the comparison of performance between calculating storage of MultiXact members directory and retrieving the count of members. 

ImplementationUsed sizeMultiXact members (approx)Time taken (ms)Time factor (vs Baseline)
EC2 community (RDS version of pg_ls_multixactdir)8642 MB1.8 billion96.8791.00
Linux du command8642 MB1.8 billion96NA
Proposal (ReadMultiXactCounts)N/A1.99 billion0.167580 times faster

I believe exposing ReadMultiXactCounts() would be a valuable addition to the PostgreSQL ecosystem, providing users with a more reliable and efficient way to monitor MultiXact usage. Appreciate your feedback or discussion on this proposal. 

Please let me know if this approach is acceptable,  so I’ll go ahead and submit a patch.

Thank you!

Best regards, 
Naga Appani
Postgres Database Engineer
Amazon Web Services


On Mon, Mar 10, 2025 at 10:43 AM Naga Appani <nagnrik@gmail.com> wrote:
Hi,

I would like to propose exposing an internal PostgreSQL function called ReadMultiXactCounts() to allow for efficient monitoring of MultiXact member usage. This function provides an accurate, real-time view of MultiXact activity by directly retrieving the actual member count, rather than relying on storage-based calculations.

Current Challenges: The existing approach we are currently using to estimate MultiXact member usage has several drawbacks:
  • Filesystem scanning overhead: These functions recursively scan the pg_multixact directory, iterating over potentially thousands or millions of files, and retrieving file sizes using stat() calls, which introduces significant I/O overhead.
  • Potential performance bottleneck: On systems with high transaction throughput generating large numbers of MultiXact members, the filesystem-based approach scales poorly due to the latency of stat() calls, especially on network-based filesystems like RDS/Aurora.
  • Not a real-time or memory-efficient solution: The current approach does not provide a direct, in-memory view of MultiXact activity.
Proposed SolutionThe internal ReadMultiXactCounts() function, implemented in multixact.c, directly calculates the number of MultiXact members by reading live state from shared memory. This approach avoids the performance issues of the current filesystem-based estimation methods.
................
...............


My apologies for re-posting. This is my first time writing to the hackers list, and I accidentally used HTML formatting. Below is the original request in plain text:

**************************************************************************************************************************************************************
I would like to propose exposing an internal PostgreSQL function called ReadMultiXactCounts()[1] to allow for efficient monitoring of MultiXact member usage. This function provides an accurate, real-time view of MultiXact activity by directly retrieving the actual member count, rather than relying on storage-based calculations.

================
Current Challenges
================
The existing approach we are currently using to estimate MultiXact member usage has several drawbacks:
- Filesystem scanning overhead: These functions recursively scan the pg_multixact directory, iterating over potentially thousands or millions of files, and retrieving file sizes using stat() calls, which introduces significant I/O overhead.
- Potential performance bottleneck: On systems with high transaction throughput generating large numbers of MultiXact members, the filesystem-based approach scales poorly due to the latency of stat() calls, especially on network-based filesystems like RDS/Aurora.
- Not a real-time or memory-efficient solution: The current approach does not provide a direct, in-memory view of MultiXact activity.

=================
Proposal
=================
The internal ReadMultiXactCounts() function, implemented in multixact.c, directly calculates the number of MultiXact members by reading live state from shared memory. This approach avoids the performance issues of the current filesystem-based estimation methods.

By exposing ReadMultiXactCounts() for external use, we can provide PostgreSQL users with an efficient way to monitor MultiXact member usage. This could be particularly useful for integrating with tools like Amazon RDS Performance Insights and Amazon CloudWatch to provide enhanced database insights and proactive managed monitoring for users.

=========================
Performance comparison
=========================
The performance comparison between the current and proposed approaches shows a significant improvement, with the proposed solution taking only a fraction of a millisecond to retrieve the MultiXact member count, compared to tens or hundreds of milliseconds for the current filesystem-based approach.  And as more members are generated, the gap widens.

Following is the comparison of performance between calculating storage of MultiXact members directory and retrieving the count of members.

Implementation                                       |  Used size  | MultiXact members
----------------------------------------------------+-------------+------------------
EC2 community (RDS version of pg_ls_multixactdir)   |   8642 MB  | 1.8 billion
Linux du command                                    |   8642 MB  | 1.8 billion
Proposal (ReadMultiXactCounts)                      |   8642 MB  | 1.8 billion

============================================================================================
Sample runs
============================================================================================
Using "du -h"
--------------------
postgres=# \! time du -h /rdsdbdata/db/17.4/data/pg_multixact/members
13G     /rdsdbdata/db/17.4/data/pg_multixact/members

real    0m0.285s    <=============================  time taken
user    0m0.050s    <=============================  time taken
sys     0m0.140s

Using RDS's  pg_ls_multixactdir ():
------------------------------------------------------------
postgres=# SELECT
    pg_size_pretty(coalesce(sum(size), 0)) AS members_size
FROM
    pg_ls_multixactdir ()
WHERE
    name LIKE 'pg_multixact/members%';
 members_size
--------------
 13 GB
(1 row)

Time: 226.533 ms <=============================  time taken

Using proposed function:
----------------------------------------
postgres=# SELECT to_char(pg_get_multixact_members_count(), '999,999,999,999') AS members_count;
  members_count
------------------
    2,745,823,171
(1 row)

Time: 0.142 ms   <=============================  time taken
============================================================================================

I believe exposing ReadMultiXactCounts() would be a valuable addition to the PostgreSQL ecosystem, providing users with a more reliable and efficient way to monitor MultiXact usage. Appreciate your feedback or discussion on this proposal.

Please let me know if this approach is acceptable, so I’ll go ahead and submit a patch.

Thank you!

References:

Thank you!

Best regards, 
Naga Appani
Postgres Database Engineer
Amazon Web Services
On Tue, 11 Mar 2025 at 14:37, Naga Appani <nagnrik@gmail.com> wrote:
>
>
>
> On Mon, Mar 10, 2025 at 10:43 AM Naga Appani <nagnrik@gmail.com> wrote:
>>
>> Hi,
>>

Hi

> =================
> Proposal
> =================
> The internal ReadMultiXactCounts() function, implemented in multixact.c, directly calculates the number of MultiXact
membersby reading live state from shared memory. This approach avoids the performance issues of the current
filesystem-basedestimation methods. 

This proposal looks sane. It is indeed helpful to keep an eye out for
multixact usage in systems that are heavily loaded.

> By exposing ReadMultiXactCounts() for external use, we can provide PostgreSQL users with an efficient way to monitor
MultiXactmember usage. This could be particularly useful for integrating with tools like Amazon RDS Performance
Insightsand Amazon CloudWatch to provide enhanced database insights and proactive managed monitoring for users. 
>
> Please let me know if this approach is acceptable, so I’ll go ahead and submit a patch.

Let's give it a try!



--
Best regards,
Kirill Reshke



On Tue, Mar 11, 2025 at 4:48 AM Kirill Reshke <reshkekirill@gmail.com> wrote:
>
> On Tue, 11 Mar 2025 at 14:37, Naga Appani <nagnrik@gmail.com> wrote:
> >
> >
> >
> > On Mon, Mar 10, 2025 at 10:43 AM Naga Appani <nagnrik@gmail.com> wrote:
> >>
> >> Hi,
> >>
>
> Hi
>
> > =================
> > Proposal
> > =================
> > The internal ReadMultiXactCounts() function, implemented in multixact.c, directly calculates the number of
MultiXactmembers by reading live state from shared memory. This approach avoids the performance issues of the current
filesystem-basedestimation methods. 
>
> This proposal looks sane. It is indeed helpful to keep an eye out for
> multixact usage in systems that are heavily loaded.
>
> > By exposing ReadMultiXactCounts() for external use, we can provide PostgreSQL users with an efficient way to
monitorMultiXact member usage. This could be particularly useful for integrating with tools like Amazon RDS Performance
Insightsand Amazon CloudWatch to provide enhanced database insights and proactive managed monitoring for users. 
> >
> > Please let me know if this approach is acceptable, so I’ll go ahead and submit a patch.
>
> Let's give it a try!

Hi,

As a follow-up, I’m submitting a patch that introduces a SQL-callable
function to retrieve MultiXact usage metrics. Although the motivation
has been discussed earlier in this thread, I’m including a brief recap
below to provide context for the patch itself.

While wraparound due to MultiXacts (MXID) is less frequent than XID
wraparound, it can still lead to aggressive/wraparound vacuum behavior
or downtime in certain workloads — especially those involving foreign
keys, shared row locks, or long-lived transactions. Currently, users
have no SQL-level visibility into MultiXact member consumption, which
makes it hard to proactively respond before issues arise. The only
workaround today involves scanning the pg_multixact/members directory
on disk, current workaround uses stat() calls over potentially
millions of small segment files, adds I/O overhead, and is unsuitable
for periodic monitoring or integration into observability platforms.

Unlike the approach originally proposed or discussed in this thread,
this patch does not expose the internal ReadMultiXactCounts() function
directly. Instead, it wraps it internally (without changing its
visibility) to make the data available via a new SQL function.

This patch adds:

    pg_get_multixact_count()

It returns a composite of:
    - multixacts: number of MultiXact IDs that currently exist
    - members:    number of MultiXact member entries currently exist

Implementation
--------------
- Defined in multixact.c
- Calls ReadMultiXactCounts()
- Returns a composite record (multixacts, members)
- Includes documentation

Use cases
---------
This function enables users to:
- Monitor member usage to anticipate aggressive vacuum and avoid wraparound risk
- Track long-lived workloads that accumulate MultiXacts
- Power lightweight monitoring/diagnostics tools without scanning the filesystem
- Log and analyze MultiXact growth over time

Sample output
-------------
 multixacts |  members
------------+------------
 182371396  | 2826221174
(1 row)

Performance comparison
----------------------
While performance is not the primary motivation for this patch, it
becomes important in monitoring scenarios where frequent polling is
expected. The proposed function executes in sub-millisecond time and
avoids any filesystem I/O, making it well-suited for lightweight,
periodic monitoring.

Implementation                        | Used size | MultiXact members
| Time (ms) | Relative cost
-------------------------------------+-----------+-------------------+-----------+----------------
Community (pg_ls_multixactdir)       | 8642 MB   | 1.8 billion       |
96.879    | 1.00 (baseline)
Linux (du command)                   | 8642 MB   | 1.8 billion       |
96        | 1.00
Proposal (ReadMultiXactCounts-based) | N/A       | 1.99 billion      |
0.167     | ~580x faster

Documentation
-------------
- A new section is added to func.sgml to group multixact-related functions
- A reference to this new function is included in the "Multixacts and
Wraparound" subsection of maintenance.sgml

To keep related functions grouped together, we can consider moving
mxid_age() into the new section as well unless there are objections to
relocating it from the current section.

This patch aims to fill a long-standing observability gap.

Patch attached.

Best regards,
Naga Appani
Postgres Database Engineer
Amazon Web Services

>
>
>
> --
> Best regards,
> Kirill Reshke

Вложения
On Tue, Jun 10, 2025 at 7:50 PM Naga Appani <nagnrik@gmail.com> wrote:
>
> On Tue, Mar 11, 2025 at 4:48 AM Kirill Reshke <reshkekirill@gmail.com> wrote:
> >
> > On Tue, 11 Mar 2025 at 14:37, Naga Appani <nagnrik@gmail.com> wrote:
> > >
> > >
> > >
> > > On Mon, Mar 10, 2025 at 10:43 AM Naga Appani <nagnrik@gmail.com> wrote:
> > >>
> > >> Hi,
> > >>
> >
> > Hi
> >
> > > =================
> > > Proposal
> > > =================
> > > The internal ReadMultiXactCounts() function, implemented in multixact.c, directly calculates the number of
MultiXactmembers by reading live state from shared memory. This approach avoids the performance issues of the current
filesystem-basedestimation methods. 
> >
> > This proposal looks sane. It is indeed helpful to keep an eye out for
> > multixact usage in systems that are heavily loaded.
> >
> > > By exposing ReadMultiXactCounts() for external use, we can provide PostgreSQL users with an efficient way to
monitorMultiXact member usage. This could be particularly useful for integrating with tools like Amazon RDS Performance
Insightsand Amazon CloudWatch to provide enhanced database insights and proactive managed monitoring for users. 
> > >
> > > Please let me know if this approach is acceptable, so I’ll go ahead and submit a patch.
> >
> > Let's give it a try!
>
> Hi,
>
> As a follow-up, I’m submitting a patch that introduces a SQL-callable
> function to retrieve MultiXact usage metrics. Although the motivation
> has been discussed earlier in this thread, I’m including a brief recap
> below to provide context for the patch itself.
>
> While wraparound due to MultiXacts (MXID) is less frequent than XID
> wraparound, it can still lead to aggressive/wraparound vacuum behavior
> or downtime in certain workloads — especially those involving foreign
> keys, shared row locks, or long-lived transactions. Currently, users
> have no SQL-level visibility into MultiXact member consumption, which
> makes it hard to proactively respond before issues arise.

I see mxid_age() will just give mxid consumption but not members
consumption. So just that function is not enough.

>
> Sample output
> -------------
>  multixacts |  members
> ------------+------------
>  182371396  | 2826221174
> (1 row)
>
> Performance comparison
> ----------------------
> While performance is not the primary motivation for this patch, it
> becomes important in monitoring scenarios where frequent polling is
> expected. The proposed function executes in sub-millisecond time and
> avoids any filesystem I/O, making it well-suited for lightweight,
> periodic monitoring.
>
> Implementation                        | Used size | MultiXact members
> | Time (ms) | Relative cost
> -------------------------------------+-----------+-------------------+-----------+----------------
> Community (pg_ls_multixactdir)       | 8642 MB   | 1.8 billion       |
> 96.879    | 1.00 (baseline)
> Linux (du command)                   | 8642 MB   | 1.8 billion       |
> 96        | 1.00
> Proposal (ReadMultiXactCounts-based) | N/A       | 1.99 billion      |
> 0.167     | ~580x faster
>
> Documentation
> -------------
> - A new section is added to func.sgml to group multixact-related functions
> - A reference to this new function is included in the "Multixacts and
> Wraparound" subsection of maintenance.sgml
>
> To keep related functions grouped together, we can consider moving
> mxid_age() into the new section as well unless there are objections to
> relocating it from the current section.

In [1], we decided to document pg_get_multixact_member() in section
"Transaction ID and Snapshot Information Functions". I think the
discussion in the email thread applies to this function as well.


+  <sect2 id="functions-info-multixact-information">
+   <title>MultiXact Information Functions</title>
+

+       <entry role="func_table_entry">
+        <para role="func_signature">
+         <indexterm><primary>pg_get_multixact_count</primary></indexterm>
+         <function>pg_get_multixact_count</function> ()
+         <returnvalue>record</returnvalue>
+        </para>
+        <para>
+         Returns a record with the fields
<structfield>multixacts</structfield> and
<structfield>members</structfield>:
+         <itemizedlist>
+          <listitem>
+           <para><structfield>multixacts</structfield>: Number of
MultiXacts assigned.
+            PostgreSQL initiates aggressive autovacuum when this
value grows beyond the threshold
+            defined by
<varname>autovacuum_multixact_freeze_max_age</varname>, which is based
on
+            the age of <literal>datminmxid</literal>. For more details, see
+            <ulink
url="https://www.postgresql.org/docs/current/routine-vacuuming.html#VACUUM-FOR-MULTIXACT-WRAPAROUND">
+            Routine Vacuuming: Multixact Wraparound</ulink>.</para>
+          </listitem>
+          <listitem>
+           <para><structfield>members</structfield>: Number of
MultiXact member entries created.
+            These are stored in files under the
<filename>pg_multixact/members</filename> subdirectory.
+            Wraparound occurs after approximately 4.29 billion
entries (~20 GiB). PostgreSQL initiates
+            aggressive autovacuum when the number of members created
exceeds approximately 2.145 billion
+            or when storage consumption in
<filename>pg_multixact/members</filename> approaches 10 GiB.</para>
+          </listitem>
+         </itemizedlist>
+        </para>
+       </entry>

The description here doesn't follow the format of the other functions
in this section. We usually explain the inputs and outputs of the
function but not how to use the outputs. In this case, you might want
to just refer to Multixact Wraparound section under Routine Vacuuming
chapter rather than describing the autovacuum behaviour. You can do
that by inserting <xref linkend="vacuum-for-multixact-wraparound"/>
instead of a full URL. These links are appropriately resolved when
creating HTML to version specific links. The URL you have used will
always point to "Current" version.

+    <para>
+     The <function><link
linkend="functions-multixact-information">pg_get_multixact_count</link></function>
+     function provides a way to check how many multixacts and member
entries have been allocated. This can
+     be useful for identifying unusual multixact activity, monitoring
progress toward wraparound, anticipating
+     system-wide aggressive autovacuum as usage approaches critical
thresholds, or verifying whether autovacuum
+     is keeping up with demand.
+    </para>
+

This is the right place to go in details of how the function can be
used; not the function documentation itself. I am yet to make up
whether we need the whole description. I think the first line is
enough and goes well with the rest of the section.

+
+ if (!ReadMultiXactCounts(&multixacts, &members))
+ ereport(ERROR,
+ (errmsg("could not read multixact counts")));

Throwing an error causes the surrounding transaction to abort, so it
should be avoided in a monitoring/reporting function if possible. In
this case for example, we could throw a warning instead or report NULL
values.

If ReadMultiXactCounts() returns false,
MultiXactMemberFreezeThreshold() returns 0, which will cause the
autovacuum to be more aggressive. I think it will be good to highlight
that in the function description since that's one of the objectives of
this function: to know when the autovacuum is going to be more
aggressive.

+
+ values[0] = UInt32GetDatum(multixacts);
+ values[1] = UInt32GetDatum(members);
+
+ tuple = heap_form_tuple(tupdesc, values, nulls);
+ PG_RETURN_DATUM(HeapTupleGetDatum(tuple));
+}

In PG14+, the transaction wraparound is triggered if the size of the
directory exceeds 10GB. This function does not help monitoring that
condition. So a user will need to use du or pg_ls_multixactdir()
anyway, which defeats the purpose of this function being more
efficient than those methods. Am I correct? Can we also report the
size of the directory in this function?

The patch needs tests.

[1] https://www.postgresql.org/message-id/aF8b_fp_9Va58vB9%40nathan

--
Best Wishes,
Ashutosh Bapat



On Fri, Jul 25, 2025 at 04:27:37PM +0530, Ashutosh Bapat wrote:
> In [1], we decided to document pg_get_multixact_member() in section
> "Transaction ID and Snapshot Information Functions". I think the
> discussion in the email thread applies to this function as well.

Yep, let's be consistent.

> Throwing an error causes the surrounding transaction to abort, so it
> should be avoided in a monitoring/reporting function if possible. In
> this case for example, we could throw a warning instead or report NULL
> values.

Most likely returning NULL is the best thing we can do, as a safe
fallback.

> The patch needs tests.

Indeed.

May I also suggest a split of the multixact SQL functions into a
separate file, a src/backend/utils/adt/multixactfuncs.c?  The existing
pg_get_multixact_members() relies on GetMultiXactIdMembers(),
available in multixact.h.  The new function pg_get_multixact_count()
relies on ReadMultiXactCounts(), which would mean adding it in
multixact.h.  Even if we finish without an agreement about the SQL
function and the end, publishing ReadMultiXactCounts() would give an
access to the internals to external code.

+PG_FUNCTION_INFO_V1(pg_get_multixact_count);

There should be no need for that, pg_proc.dat handling the
declaration AFAIK.

FWIW, these functions are always kind of hard to use for the end-user
without proper documentation.  You may want to add an example of how
one can use it for monitoring in the docs.
--
Michael

Вложения
On Mon, Jul 28, 2025 at 9:52 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> May I also suggest a split of the multixact SQL functions into a
> separate file, a src/backend/utils/adt/multixactfuncs.c?  The existing
> pg_get_multixact_members() relies on GetMultiXactIdMembers(),
> available in multixact.h.  The new function pg_get_multixact_count()
> relies on ReadMultiXactCounts(), which would mean adding it in
> multixact.h.  Even if we finish without an agreement about the SQL
> function and the end, publishing ReadMultiXactCounts() would give an
> access to the internals to external code.
>
> +PG_FUNCTION_INFO_V1(pg_get_multixact_count);
>
> There should be no need for that, pg_proc.dat handling the
> declaration AFAIK.
>
> FWIW, these functions are always kind of hard to use for the end-user
> without proper documentation.  You may want to add an example of how
> one can use it for monitoring in the docs.

+1.

Let's say if the user knows that the counts are so high that a
wraparound is imminent, but vacuuming isn't solving the problem, they
would like to know which transactions are holding it back.
pg_get_multixact_members() can be used to get the members of the
oldest multixact if it's reported and then the user can deal with
those transactions. However, the oldest multixact is not reported
anywhere, AFAIK. It's also part of MultiXactState, so can be extracted
via ReadMultiXactCounts(). We could report it through
pg_get_multixact_counts - after renaming it and ReadMultiXactCounts to
pg_get_multixact_stats() and ReadMultiXactStats() respectively. Or we
could write another function to do so. But it comes handy using query
like below
#select oldestmultixact,
pg_get_multixact_members(oldestmultixact::text::xid) from
pg_get_multixact_count();
 oldestmultixact | pg_get_multixact_members
------------------+--------------------------
                1 | (757,sh)
                1 | (768,sh)
(2 rows)

Here's a quick patch implementing the same. Please feel free to
incorporate and refine it in your patch if you like it.

--
Best Wishes,
Ashutosh Bapat

Вложения
Hi Ashutosh, Michael,

Thanks for the detailed reviews. I have incorporated the feedback;
please find attached v2 and my responses inline below.

On Fri, Jul 25, 2025 at 5:57 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

> In [1], we decided to document pg_get_multixact_member() in section
> "Transaction ID and Snapshot Information Functions". I think the
> discussion in the email thread applies to this function as well.

Done -- the function is now documented under “Transaction ID and
Snapshot Information Functions” for consistency.

> The description here doesn't follow the format of the other functions
> in this section.

Updated the description in func.sgml to match the style of other
functions. Extended usage guidance is now in maintenance.sgml.

> Throwing an error causes the surrounding transaction to abort, so it
> should be avoided in a monitoring/reporting function if possible.

The function now returns NULL instead of throwing an error when counts
can’t be read.

> If ReadMultiXactCounts() returns false, MultiXactMemberFreezeThreshold() returns 0...

Noted -- the docs now mention that the function can be used to
anticipate more aggressive autovacuum behavior in such cases.

> In PG14+, the transaction wraparound is triggered if the size of the
> directory exceeds 10GB. This function does not help monitoring that
> condition. So a user will need to use du or pg_ls_multixactdir()
> anyway, which defeats the purpose of this function being more
> efficient than those methods. Am I correct? Can we also report the
> size of the directory in this function?

Correct, that is the intent of the function. The members count
returned by this function already provides the necessary information
to determine the directory size, since each member entry has a fixed
size. The constants and formulas in [0] and discussed in [1] show that
each group stores four bytes of flags plus four TransactionIds (20
bytes total), yielding 409 groups per 8 KB page and a fixed
members‑to‑bytes ratio. This means ~2 billion members corresponds to
~10 GB (aggressive autovacuum threshold) and ~4 billion members
corresponds to ~20 GB (wraparound).

Since the function already provides the member count, including the
physical size in its output would duplicate information and add no
extra benefit.

> The patch needs tests.

Added an isolation test to cover initial state, MultiXact creation,
counts, and oldest MultiXact reporting.

On Mon, Jul 28, 2025 at 1:00 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

> Let's say if the user knows that the counts are so high that a
> wraparound is imminent, but vacuuming isn't solving the problem...
> Here's a quick patch implementing the same. Please feel free to
> incorporate and refine it in your patch if you like it.

Thank you for sharing the patch. I have incorporated it into this
version with minor adjustments, and it fits well with the overall
design of the function.

On Mon, Jul 28, 2025 at 4:22 AM Michael Paquier <michael@paquier.xyz> wrote:

> Yep, let's be consistent.

Done -- placed in “Transaction ID and Snapshot Information Functions”
for consistency.

> Most likely returning NULL is the best thing we can do, as a safe fallback.

Implemented -- the function now returns NULL if counts can’t be read.

> The patch needs tests.

Isolation tests have been added as described above.

> May I also suggest a split of the multixact SQL functions into a
> separate file, a src/backend/utils/adt/multixactfuncs.c?

I agree that would be cleaner, but I’d prefer to keep the
implementation in multixact.c for now to maintain focus on this patch
and revisit the refactoring later.

> +PG_FUNCTION_INFO_V1(pg_get_multixact_count);

Removed -- now handled entirely by pg_proc.dat.

> ...You may want to add an example of how one can use it for monitoring in the docs.

I’ve added a usage example with sample output in the docs. If you had
a different kind of demo in mind (e.g., creating multixacts manually
and showing the output), please let me know.

References:
[0] https://github.com/postgres/postgres/blob/master/src/backend/access/transam/multixact.c#L130-L156
[1] https://www.postgresql.org/message-id/flat/CACbFw60UOk6fCC02KsyT3OfU9Dnuq5roYxdw2aFisiN_p1L0bg%40mail.gmail.com

Best regards,
Naga Appani

Вложения
On Mon, Aug 4, 2025 at 1:16 AM Naga Appani <nagnrik@gmail.com> wrote:
>
> Hi Ashutosh, Michael,
>
> Thanks for the detailed reviews. I have incorporated the feedback;
> please find attached v2 and my responses inline below.
>
> On Fri, Jul 25, 2025 at 5:57 AM Ashutosh Bapat
> <ashutosh.bapat.oss@gmail.com> wrote:
>
> > In [1], we decided to document pg_get_multixact_member() in section
> > "Transaction ID and Snapshot Information Functions". I think the
> > discussion in the email thread applies to this function as well.
>
> Done -- the function is now documented under “Transaction ID and
> Snapshot Information Functions” for consistency.
>
> > The description here doesn't follow the format of the other functions
> > in this section.
>
> Updated the description in func.sgml to match the style of other
> functions. Extended usage guidance is now in maintenance.sgml.
>
> > Throwing an error causes the surrounding transaction to abort, so it
> > should be avoided in a monitoring/reporting function if possible.
>
> The function now returns NULL instead of throwing an error when counts
> can’t be read.
>
> > If ReadMultiXactCounts() returns false, MultiXactMemberFreezeThreshold() returns 0...
>
> Noted -- the docs now mention that the function can be used to
> anticipate more aggressive autovacuum behavior in such cases.
>
> > In PG14+, the transaction wraparound is triggered if the size of the
> > directory exceeds 10GB. This function does not help monitoring that
> > condition. So a user will need to use du or pg_ls_multixactdir()
> > anyway, which defeats the purpose of this function being more
> > efficient than those methods. Am I correct? Can we also report the
> > size of the directory in this function?
>
> Correct, that is the intent of the function. The members count
> returned by this function already provides the necessary information
> to determine the directory size, since each member entry has a fixed
> size. The constants and formulas in [0] and discussed in [1] show that
> each group stores four bytes of flags plus four TransactionIds (20
> bytes total), yielding 409 groups per 8 KB page and a fixed
> members‑to‑bytes ratio. This means ~2 billion members corresponds to
> ~10 GB (aggressive autovacuum threshold) and ~4 billion members
> corresponds to ~20 GB (wraparound).
>
> Since the function already provides the member count, including the
> physical size in its output would duplicate information and add no
> extra benefit.
>
> > The patch needs tests.
>
> Added an isolation test to cover initial state, MultiXact creation,
> counts, and oldest MultiXact reporting.
>
> On Mon, Jul 28, 2025 at 1:00 AM Ashutosh Bapat
> <ashutosh.bapat.oss@gmail.com> wrote:
>
> > Let's say if the user knows that the counts are so high that a
> > wraparound is imminent, but vacuuming isn't solving the problem...
> > Here's a quick patch implementing the same. Please feel free to
> > incorporate and refine it in your patch if you like it.
>
> Thank you for sharing the patch. I have incorporated it into this
> version with minor adjustments, and it fits well with the overall
> design of the function.
>
> On Mon, Jul 28, 2025 at 4:22 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> > Yep, let's be consistent.
>
> Done -- placed in “Transaction ID and Snapshot Information Functions”
> for consistency.
>
> > Most likely returning NULL is the best thing we can do, as a safe fallback.
>
> Implemented -- the function now returns NULL if counts can’t be read.
>
> > The patch needs tests.
>
> Isolation tests have been added as described above.
>
> > May I also suggest a split of the multixact SQL functions into a
> > separate file, a src/backend/utils/adt/multixactfuncs.c?
>
> I agree that would be cleaner, but I’d prefer to keep the
> implementation in multixact.c for now to maintain focus on this patch
> and revisit the refactoring later.
>
> > +PG_FUNCTION_INFO_V1(pg_get_multixact_count);
>
> Removed -- now handled entirely by pg_proc.dat.
>
> > ...You may want to add an example of how one can use it for monitoring in the docs.
>
> I’ve added a usage example with sample output in the docs. If you had
> a different kind of demo in mind (e.g., creating multixacts manually
> and showing the output), please let me know.
>
> References:
> [0] https://github.com/postgres/postgres/blob/master/src/backend/access/transam/multixact.c#L130-L156
> [1] https://www.postgresql.org/message-id/flat/CACbFw60UOk6fCC02KsyT3OfU9Dnuq5roYxdw2aFisiN_p1L0bg%40mail.gmail.com
>
> Best regards,
> Naga Appani

Following up on my v2 from yesterday — the recent commit [0] changed
the directory layout, which broke the patch (v2). This v3 updates the
code to work with the new structure and also fixes some formatting
issues I noticed while revisiting the changes.

The rest of the patch remains the same as v2, which incorporated
feedback from Ashutosh and Michael (see my previous email for
details).

Please find v3 attached.

References:
[0] https://github.com/postgres/postgres/commit/4e23c9ef65accde7eb3e56aa28d50ae5cf79b64b

Best regards,
Naga Appani

Вложения

Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

От
Michael Paquier
Дата:
On Mon, Aug 04, 2025 at 04:51:30PM -0500, Naga Appani wrote:
> The rest of the patch remains the same as v2, which incorporated
> feedback from Ashutosh and Michael (see my previous email for
> details).
>
> Please find v3 attached.

I am reading again what you have here, and I really think that we
should move the SQL function parts of multixact.c into their own new
file, exposing ReadMultiXactCounts() in multixact.h, because I also
suspect that this can become really useful for extensions that aim at
doing things similar to your proposal in terms of data monitoring for
autovacuum wraparound.  This means two refactoring patches:
- One to expose the new routine in multixact.h.
- One to move the existing SQL code to its new file.

ReadMultiXactCounts() is also incorrectly named with your proposal to
expose oldestMultiXactId in the information returned to the caller,
where the key point is to make sure that the information retrieved is
consistent across a single LWLock acquisition.  So perhaps this should
be named GetMultiXactInformation() or something similar?

The top of ReadMultiXactCounts() (or whatever its new name) should
also document the information returned across a single call.  It looks
inconsistent to return oldestMultiXactId if the oldestOffsetKnown is
false.  What about oldestOffset itself?  Should it be returned for
consistency with the counts and oldestMultiXactId?
--
Michael

Вложения

Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

От
Ashutosh Bapat
Дата:
On Mon, Aug 4, 2025 at 11:46 AM Naga Appani <nagnrik@gmail.com> wrote:
> > In PG14+, the transaction wraparound is triggered if the size of the
> > directory exceeds 10GB. This function does not help monitoring that
> > condition. So a user will need to use du or pg_ls_multixactdir()
> > anyway, which defeats the purpose of this function being more
> > efficient than those methods. Am I correct? Can we also report the
> > size of the directory in this function?
>
> Correct, that is the intent of the function. The members count
> returned by this function already provides the necessary information
> to determine the directory size, since each member entry has a fixed
> size. The constants and formulas in [0] and discussed in [1] show that
> each group stores four bytes of flags plus four TransactionIds (20
> bytes total), yielding 409 groups per 8 KB page and a fixed
> members‑to‑bytes ratio. This means ~2 billion members corresponds to
> ~10 GB (aggressive autovacuum threshold) and ~4 billion members
> corresponds to ~20 GB (wraparound).

Would it be better to do that math in the function and output the
result? Users may not be able to read and understand the PostgreSQL
code or pgsql-hackers threads Or the constants may change across
versions. It will be more convenient for users if they get the output
from the function itself.

On Fri, Aug 8, 2025 at 6:05 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> ReadMultiXactCounts() is also incorrectly named with your proposal to
> expose oldestMultiXactId in the information returned to the caller,
> where the key point is to make sure that the information retrieved is
> consistent across a single LWLock acquisition.  So perhaps this should
> be named GetMultiXactInformation() or something similar?

+1

>
> The top of ReadMultiXactCounts() (or whatever its new name) should
> also document the information returned across a single call.  It looks
> inconsistent to return oldestMultiXactId if the oldestOffsetKnown is
> false.  What about oldestOffset itself?  Should it be returned for
> consistency with the counts and oldestMultiXactId?

+1

Some more comments on the patch
+ <literal>multixacts</literal> is the number of multixact IDs assigned,
+ <literal>members</literal> is the number of multixact member entries created,
+ and <literal>oldest_multixact</literal> is the oldest MultiXact ID
still in use.

Now that the name of the function is changed, we need the names to
indicate that they are counts e.g. num_mxids, num_members.

+ These values can be used to monitor multixact consumption and anticipate
+ autovacuum behavior. See <xref linkend="vacuum-for-multixact-wraparound"/>
+ for further details on multixact wraparound.
+ </para>
+
+ <para>
+ <literal>SELECT * FROM pg_get_multixact_stats();</literal>
+<programlisting>
+ multixacts | members | oldest_multixact
+------------+-------------+------------------
+ 182371396 | 2826221174 | 754321
+</programlisting>

This file doesn't provide usage examples of other functions. This
function doesn't seem to be an exception.

I think we should mention that the statistics may get stale as soon as
it's fetched, even with REPEATABLE READ isolation level.

+ linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if either
+ the storage occupied by multixact members exceeds about 10GB or the number
+ of members created exceeds approximately 2 billion entries, aggressive vacuum

In case each member starts consuming more or less space than it does
today what would be the basis of triggering workaround? Space or
number of members? I think we should mention only that.

scans will occur more often for all tables, starting with those that
- have the oldest multixact-age. Both of these kinds of aggressive
+ have the oldest multixact-age. Both of these kinds of aggressive
scans will occur even if autovacuum is nominally disabled. The members storage
- area can grow up to about 20GB before reaching wraparound.
+ area can grow up to about 20GB or approximately 4 billion entries before
+ reaching wraparound.

Similar to above.

+ </para>
+
+ <para>
+ The <function>pg_get_multixact_stats()</function> function provides a way
+ to monitor multixact allocation and usage patterns in real time. By exposing

This is the right place to elaborate the usage of this function with an example.

+ counts of multixacts, member entries, and the oldest multixact ID, it helps:
+ <orderedlist>
+ <listitem>
+ <simpara>
+ Identify unusual multixact activity from concurrent row-level locks
+ or foreign key operations
+ </simpara>
+ </listitem>
+ <listitem>
+ <simpara>
+ Monitor progress toward wraparound thresholds that trigger aggressive
+ autovacuum (approximately 2 billion members or 10GB storage)
+ </simpara>
+ </listitem>
+ <listitem>
+ <simpara>
+ Verify whether autovacuum is effectively managing multixact cleanup
+ before reaching critical thresholds
+ </simpara>
+ </listitem>
+ </orderedlist>
+ See <xref linkend="functions-info-snapshot"/> for details.

I think the second point here repeats what's already mentioned
earlier. It will be good to elaborate each point with an example
instead of just narration.

+/*
+ * pg_get_multixact_stats
+ *
+ * SQL-callable function to retrieve MultiXact statistics.
+ *
+ * Returns a composite row containing:
+ * - total number of MultiXact IDs created since startup,
+ * - total number of MultiXact members created,

... since startup or the number of existing members?

+ * - the oldest existing MultiXact ID.
+ *
+ * This is primarily useful for monitoring MultiXact usage and ensuring
+ * appropriate wraparound protection.

The last two lines are not required, I think. One of its usage is
monitoring but users may find other usages.

+
+step commit1: COMMIT;
+step commit2: COMMIT;
+step check:
+ SELECT
+ multixacts,
+ members,
+ oldest_multixact
+ FROM pg_get_multixact_stats();
+
+multixacts|members|oldest_multixact
+----------+-------+----------------
+ 1| 3| 1
+(1 row)

Vacuum may clean the multixact between commit2 and check, in which
case the result won't be stable.

--
Best Wishes,
Ashutosh Bapat



Hi Michael, Ashutosh,

Thanks a lot for the detailed reviews and feedback. Please find
attached v4 of the patchset.

Summary of changes in v4:
- Split into two patches as suggested:
  1. Expose and rename ReadMultiXactCounts() -> GetMultiXactInfo() in
multixact.h with clearer comments.
  2. Add pg_get_multixact_stats() as a SQL-callable function in a new
file (multixactfuncs.c), with docs and tests.
- Function now also returns oldestOffset for consistency.
- Field names updated to num_mxids, num_members, oldest_multixact,
oldest_offset.
- Documentation revised to describe thresholds only in terms of member
counts (disk size wording removed).
- Added a minimal example in maintenance.sgml where multixact
wraparound is already described.
- Isolation tests are rewritten so they no longer depend on exact
counts, but only on monotonic properties guaranteed while a multixact
is pinned.

Replies inline below:

On Thu, Aug 7, 2025 at 7:35 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> I really think that we should move the SQL function parts of multixact.c
> into their own new file, exposing ReadMultiXactCounts() in multixact.h...

Done. The SQL-callable code now lives in
  src/backend/utils/adt/multixactfuncs.c
and the accessor is declared in
  src/include/access/multixact.h.

> ReadMultiXactCounts() is also incorrectly named with your proposal to
> expose oldestMultiXactId in the information returned to the caller...
> So perhaps this should be named GetMultiXactInformation() or something
> similar?

Renamed to GetMultiXactInfo().

> The top of ReadMultiXactCounts() (or whatever its new name) should
> also document the information returned across a single call.

Added detailed comments about consistency under a single LWLock and the
meaning of each field.

> It looks inconsistent to return oldestMultiXactId if the
> oldestOffsetKnown is false. What about oldestOffset itself?

GetMultiXactInfo() now returns oldestOffset as well. If the oldest
offset isn’t currently known, the function returns false and clears
all outputs, so callers don’t see a partially valid struct.

---

On Fri, Aug 8, 2025 at 4:33 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> Would it be better to do that math in the function and output the result?

That’s a cool idea, thanks for pointing it out. For now I have kept the
SQL function focused only on exposing the raw counts (num_mxids,
num_members, oldest IDs). My thought was that keeping the API lean makes
it easier to maintain across versions, while leaving any derived
calculations like approximate storage size to SQL or external tooling.
This way the function remains simple and future-proof, while still
giving users the building blocks to get the view they need.

I’m happy to revisit this if others feel it would be better for the
function to provide an approximate size directly — I wanted to start
with the simplest surface and gather feedback first.

> Now that the name of the function is changed, we need the names to
> indicate that they are counts e.g. num_mxids, num_members.

Adjusted. The SQL function returns: num_mxids, num_members,
oldest_multixact, oldest_offset.

> This file doesn't provide usage examples of other functions. This
> function doesn't seem to be an exception.

Earlier I thought it was fine to add an example since
pg_input_error_info() also has one, so in this version I placed the
example in maintenance.sgml, where we already discuss multixact
wraparound. That seemed like the most natural place for it. I agree with
your point about consistency, though, so I kept the style minimal and
aligned with the surrounding text.

> I think we should mention that the statistics may get stale as soon as
> it's fetched, even with REPEATABLE READ isolation level.

Added a note that values are a live snapshot and can change immediately.

> In case each member starts consuming more or less space than it does
> today what would be the basis of triggering wraparound? Space or
> number of members? I think we should mention only that.

I updated the docs to describe wraparound in terms of member counts only.
The earlier mention of disk size has been dropped, since the thresholds
are defined by counts.

> This is the right place to elaborate the usage of this function with an
> example.

Expanded with a short example, while keeping it consistent with nearby
entries.

> ... since startup or the number of existing members?

Clarified that the values reflect what’s *currently in use* (i.e.,
derived from next/oldest) and that NULLs are returned if the multixact
subsystem has not been initialized yet.

> The last two lines are not required, I think. One of its usage is
> monitoring but users may find other usages.

Dropped those lines.

> Vacuum may clean the multixact between commit2 and check, in which
> case the result won't be stable.

Right, the earlier version of the test assumed stable counts, which
could fail if autovacuum or background cleanup removed entries in
between steps. In v4 the isolation test no longer relies on exact
numbers. Instead it asserts only the monotonic properties that are
guaranteed while a multixact is pinned, and avoids assumptions once
locks are released. That makes the test robust against concurrent vacuum
activity.

---

Thanks again for the thoughtful reviews and guidance. Please let me know
if you see further adjustments needed.

Best regards,
Naga

Вложения

Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

От
Michael Paquier
Дата:
On Sun, Aug 17, 2025 at 01:27:29AM -0500, Naga Appani wrote:
> On Thu, Aug 7, 2025 at 7:35 PM Michael Paquier <michael@paquier.xyz> wrote:
>>
>> I really think that we should move the SQL function parts of multixact.c
>> into their own new file, exposing ReadMultiXactCounts() in multixact.h...
>
> Done. The SQL-callable code now lives in
>   src/backend/utils/adt/multixactfuncs.c
> and the accessor is declared in
>   src/include/access/multixact.h.

My point was a bit different: multixactfuncs.c should be created first
because we already have one SQL function in multixact.c that can be
moved inside it, with the declarations it requires added to
multixact.h.  I've extracted what you did, moved the existing
pg_get_multixact_members() inside the new file, and applied the
result.

>> ReadMultiXactCounts() is also incorrectly named with your proposal to
>> expose oldestMultiXactId in the information returned to the caller...
>> So perhaps this should be named GetMultiXactInformation() or something
>> similar?
>
> Renamed to GetMultiXactInfo().

+ * Returns information about current MultiXact state in a single atomic read:

This comment is incorrect.  This is not an atomic read, grabbing a
consistent state of the data across one single lock acquisition.

Except for this comment, this looks pretty much OK.  Ashutosh, any
comments?

I have not looked at the rest.
--
Michael

Вложения

Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

От
Ashutosh Bapat
Дата:
On Sun, Aug 17, 2025 at 11:57 AM Naga Appani <nagnrik@gmail.com> wrote:
> On Fri, Aug 8, 2025 at 4:33 AM Ashutosh Bapat
> <ashutosh.bapat.oss@gmail.com> wrote:

>
> > In case each member starts consuming more or less space than it does
> > today what would be the basis of triggering wraparound? Space or
> > number of members? I think we should mention only that.
>
> I updated the docs to describe wraparound in terms of member counts only.
> The earlier mention of disk size has been dropped, since the thresholds
> are defined by counts.

The current document says
"Also, if the storage occupied by multixacts members exceeds about
10GB, aggressive vacuum scans will occur more often for all tables,
starting with those that have the oldest multixact-age." - do you mean
that it's wrong. Instead of checking 10GB threashold, is the code
checking the equivalent member count? If so, I think we need a
separate patch to correct the documentation first. Can you please
point me to the code? Documentation should reflect the code.

>
> That’s a cool idea, thanks for pointing it out. For now I have kept the
> SQL function focused only on exposing the raw counts (num_mxids,
> num_members, oldest IDs). My thought was that keeping the API lean makes
> it easier to maintain across versions, while leaving any derived
> calculations like approximate storage size to SQL or external tooling.
> This way the function remains simple and future-proof, while still
> giving users the building blocks to get the view they need.
>
> I’m happy to revisit this if others feel it would be better for the
> function to provide an approximate size directly — I wanted to start
> with the simplest surface and gather feedback first.

The constant multiplier which converts a count into the disk size is
in the server code. Duplicating it outside the server code, even in
documentation, would require maintenance. GetMultiXactInfo() may not
do the arithmetic but pg_get_multixact_stats() is lean enough to add a
couple computations.

If size is being used as a threshold, reporting count is useless
because user wouldn't know the relation easily. If count is used as a
threshold, reporting count makes sense.

--
Best Wishes,
Ashutosh Bapat



Hi Michael, Ashutosh,

Thanks a lot for taking the time to review this patch and share your thoughts.

Here’s a short summary of what has changed in v5:
- Added the new pg_get_multixact_stats() function in multixactfuncs.c.
- Fixed the misleading “atomic read” comment in the accessor.
- Clarified documentation: thresholds are described in terms of
counts, since that’s what the code uses.
- Added a members_bytes column in pg_get_multixact_stats() to give
users a rough size estimate (num_members * 5), while making it clear
this is layout-dependent.

Please see my in-line replies below.

---

On Mon, Aug 18, 2025 at 1:49 AM Michael Paquier <michael@paquier.xyz> wrote:
> My point was a bit different: multixactfuncs.c should be created first
> because we already have one SQL function in multixact.c that can be
> moved inside it, with the declarations it requires added to
> multixact.h.  I've extracted what you did, moved the existing
> pg_get_multixact_members() inside the new file, and applied the
> result.
>
Really appreciate your clarification and for making that change. I
misunderstood your earlier point.

> + * Returns information about current MultiXact state in a single atomic read:
>
> This comment is incorrect.  This is not an atomic read, grabbing a
> consistent state of the data across one single lock acquisition.
>
Fixed and adjusted wording.

---

On Mon, Aug 18, 2025 at 6:56 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
> The current document says
> "Also, if the storage occupied by multixacts members exceeds about
> 10GB, aggressive vacuum scans will occur more often for all tables,
> starting with those that have the oldest multixact-age." - do you mean
> that it's wrong. Instead of checking 10GB threashold, is the code
> checking the equivalent member count? If so, I think we need a
> separate patch to correct the documentation first. Can you please
> point me to the code? Documentation should reflect the code.
>
The decision is made in MultiXactMemberFreezeThreshold() [0], and it
is entirely count-based:

if (members <= MULTIXACT_MEMBER_SAFE_THRESHOLD)
    return autovacuum_multixact_freeze_max_age;

fraction = (double) (members - MULTIXACT_MEMBER_SAFE_THRESHOLD) /
    (MULTIXACT_MEMBER_DANGER_THRESHOLD - MULTIXACT_MEMBER_SAFE_THRESHOLD);

MaxMultiXactOffset is defined in multixact.h [1]:

#define MaxMultiXactOffset ((MultiXactOffset) 0xFFFFFFFF)

Thresholds are defined in multixact.c [2]
#define MULTIXACT_MEMBER_SAFE_THRESHOLD    (MaxMultiXactOffset / 2)
#define MULTIXACT_MEMBER_DANGER_THRESHOLD  \
        (MaxMultiXactOffset - MaxMultiXactOffset / 4)

These translate to:
- MaxMultiXactOffset: ~4.29 billion (2^32 - 1)
- MULTIXACT_MEMBER_SAFE_THRESHOLD: ~2.14 billion (2^31 - 1)
- MULTIXACT_MEMBER_DANGER_THRESHOLD: ~3.22 billion (3/4 * 2^32)

So the code path is count-driven.

Regarding docs:
For earlier versions (18 and before), the storage-size approximation
remains relevant because users don’t have direct access to member
count information. Since we don’t plan to backpatch (I assume so) this
new function, the documentation for older branches should continue to
rely on the storage-based approximation.

Now that pg_get_multixact_stats() exposes num_members, the HEAD branch
docs can describe the thresholds in terms of counts directly. For
older branches, the storage approximation still provides users with a
practical way to reason about wraparound risk.


> The constant multiplier which converts a count into the disk size is
> in the server code. Duplicating it outside the server code, even in
> documentation, would require maintenance. GetMultiXactInfo() may not
> do the arithmetic but pg_get_multixact_stats() is lean enough to add a
> couple computations.
>
Thanks for suggesting this — it makes sense, especially for users
upgrading from earlier versions to 19 and higher. I’ve added a
members_bytes column to pg_get_multixact_stats(), computed as
num_members * 5. This respects the existing server-side logic while
also giving those users a familiar reference point, helping them
connect the older size-based guidance with the new count-based view.

---

References:
[0] https://github.com/postgres/postgres/blob/master/src/backend/access/transam/multixact.c#L2916
[1] https://github.com/postgres/postgres/blob/master/src/include/access/multixact.h#L31
[2] https://github.com/postgres/postgres/blob/master/src/backend/access/transam/multixact.c#L216-L218

Patch v5 is attached. Thanks again for the thoughtful reviews — I really
appreciate the guidance and look forward to further feedback.

Best regards,
Naga

Вложения

Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

От
Michael Paquier
Дата:
On Mon, Aug 18, 2025 at 08:32:39PM -0500, Naga Appani wrote:
> Thanks a lot for taking the time to review this patch and share your thoughts.
>
> Here’s a short summary of what has changed in v5:
> - Added the new pg_get_multixact_stats() function in multixactfuncs.c.
> - Fixed the misleading “atomic read” comment in the accessor.
> - Clarified documentation: thresholds are described in terms of
> counts, since that’s what the code uses.
> - Added a members_bytes column in pg_get_multixact_stats() to give
> users a rough size estimate (num_members * 5), while making it clear
> this is layout-dependent.
>
> Please see my in-line replies below.

FWIW, I think that you should be a bit more careful before sending
updated patch sets.  You have missed an extra point I have raised
upthread about the refactoring pieces: the switch from
ReadMultiXactCounts() to GetMultiXactInfo() can be done in a patch of
its own.

So I have extracted this part from your latest patch, and applied it
independently of the SQL function business.  Now we are in an
advantageous position on HEAD: even if we do not conclude about the
SQL function to show the mxact numbers and offsets, we have the
function that gives an access to the data you are looking for.  In
short, it is now possible to provide an equivalent of the feature you
want outside of core.  Not saying that the patch cannot be useful, but
such refactoring pieces open more possibilities, and offer a cleaner
commit history with less churn in the main patches.
--
Michael

Вложения
On Tue, Aug 19, 2025 at 1:32 AM Michael Paquier <michael@paquier.xyz> wrote:
> FWIW, I think that you should be a bit more careful before sending
> updated patch sets.  You have missed an extra point I have raised
> upthread about the refactoring pieces: the switch from
> ReadMultiXactCounts() to GetMultiXactInfo() can be done in a patch of
> its own.
>
> So I have extracted this part from your latest patch, and applied it
> independently of the SQL function business.  Now we are in an
> advantageous position on HEAD: even if we do not conclude about the
> SQL function to show the mxact numbers and offsets, we have the
> function that gives an access to the data you are looking for.  In
> short, it is now possible to provide an equivalent of the feature you
> want outside of core.  Not saying that the patch cannot be useful, but
> such refactoring pieces open more possibilities, and offer a cleaner
> commit history with less churn in the main patches.
> --

Thanks for the review and separating the refactoring into its own commit.
Point taken on being more careful when sending updated patch sets.
I’ll make sure to keep
refactoring and SQL layer changes clearly separated going forward.

Attached is v6, rebased on top of HEAD. This version is limited to the
SQL function only.

Changes since v5:
- Removed the refactoring, as GetMultiXactInfo() is already committed.
- Documentation revised to describe thresholds in terms of raw counts.

Hopefully this makes the proposal easier to evaluate on its own merits.

Вложения
On 2025-08-20 13:27, Naga Appani wrote:

Thanks for working on this!

> On Tue, Aug 19, 2025 at 1:32 AM Michael Paquier <michael@paquier.xyz>
> wrote:
>> FWIW, I think that you should be a bit more careful before sending
>> updated patch sets.  You have missed an extra point I have raised
>> upthread about the refactoring pieces: the switch from
>> ReadMultiXactCounts() to GetMultiXactInfo() can be done in a patch of
>> its own.
>>
>> So I have extracted this part from your latest patch, and applied it
>> independently of the SQL function business.  Now we are in an
>> advantageous position on HEAD: even if we do not conclude about the
>> SQL function to show the mxact numbers and offsets, we have the
>> function that gives an access to the data you are looking for.  In
>> short, it is now possible to provide an equivalent of the feature you
>> want outside of core.  Not saying that the patch cannot be useful, but
>> such refactoring pieces open more possibilities, and offer a cleaner
>> commit history with less churn in the main patches.
>> --
>
> Thanks for the review and separating the refactoring into its own
> commit.
> Point taken on being more careful when sending updated patch sets.
> I’ll make sure to keep
> refactoring and SQL layer changes clearly separated going forward.
>
> Attached is v6, rebased on top of HEAD. This version is limited to the
> SQL function only.
>

diff --git a/doc/src/sgml/maintenance.sgml
b/doc/src/sgml/maintenance.sgml
index e7a9f58c015..6f0e8d7c10a 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -813,12 +813,56 @@ HINT:  Execute a database-wide VACUUM in that
database.
      <para>
       As a safety device, an aggressive vacuum scan will
       occur for any table whose multixact-age is greater than <xref
-     linkend="guc-autovacuum-multixact-freeze-max-age"/>.  Also, if the
-     storage occupied by multixacts members exceeds about 10GB,
aggressive vacuum
+     linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if the
number
+     of members created exceeds approximately 2^31 entries, aggressive
vacuum
       scans will occur more often for all tables, starting with those
that

Looking at commit ff20ccae9fdb, it seems that the documentation was
intentionally written in terms of gigabytes rather than the number:

> The threshold is two billion members, which was interpreted as 2GB
> in the documentation. Fix to reflect that each member takes up five
> bytes, which translates to about 10GB. This is not exact, because of
> page boundaries. While at it, mention the maximum size 20GB.

Anyway, I also think, as Ashutosh suggested, that if we want to fix this
documentation, it would be better to do so in a separate patch.

--
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA Japan Corporation to SRA OSS K.K.



On 2025-08-22 09:28, torikoshia wrote:
> On 2025-08-20 13:27, Naga Appani wrote:
>
> Thanks for working on this!
>
>> On Tue, Aug 19, 2025 at 1:32 AM Michael Paquier <michael@paquier.xyz>
>> wrote:
>>> FWIW, I think that you should be a bit more careful before sending
>>> updated patch sets.  You have missed an extra point I have raised
>>> upthread about the refactoring pieces: the switch from
>>> ReadMultiXactCounts() to GetMultiXactInfo() can be done in a patch of
>>> its own.
>>>
>>> So I have extracted this part from your latest patch, and applied it
>>> independently of the SQL function business.  Now we are in an
>>> advantageous position on HEAD: even if we do not conclude about the
>>> SQL function to show the mxact numbers and offsets, we have the
>>> function that gives an access to the data you are looking for.  In
>>> short, it is now possible to provide an equivalent of the feature you
>>> want outside of core.  Not saying that the patch cannot be useful,
>>> but
>>> such refactoring pieces open more possibilities, and offer a cleaner
>>> commit history with less churn in the main patches.
>>> --
>>
>> Thanks for the review and separating the refactoring into its own
>> commit.
>> Point taken on being more careful when sending updated patch sets.
>> I’ll make sure to keep
>> refactoring and SQL layer changes clearly separated going forward.
>>
>> Attached is v6, rebased on top of HEAD. This version is limited to the
>> SQL function only.
>>
>
> diff --git a/doc/src/sgml/maintenance.sgml
> b/doc/src/sgml/maintenance.sgml
> index e7a9f58c015..6f0e8d7c10a 100644
> --- a/doc/src/sgml/maintenance.sgml
> +++ b/doc/src/sgml/maintenance.sgml
> @@ -813,12 +813,56 @@ HINT:  Execute a database-wide VACUUM in that
> database.
>      <para>
>       As a safety device, an aggressive vacuum scan will
>       occur for any table whose multixact-age is greater than <xref
> -     linkend="guc-autovacuum-multixact-freeze-max-age"/>.  Also, if
> the
> -     storage occupied by multixacts members exceeds about 10GB,
> aggressive vacuum
> +     linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if the
> number
> +     of members created exceeds approximately 2^31 entries, aggressive
> vacuum
>       scans will occur more often for all tables, starting with those
> that
>
> Looking at commit ff20ccae9fdb, it seems that the documentation was
> intentionally written in terms of gigabytes rather than the number:
>
>> The threshold is two billion members, which was interpreted as 2GB
>> in the documentation. Fix to reflect that each member takes up five
>> bytes, which translates to about 10GB. This is not exact, because of
>> page boundaries. While at it, mention the maximum size 20GB.
>
> Anyway, I also think, as Ashutosh suggested, that if we want to fix
> this documentation, it would be better to do so in a separate patch.

Ah, I've found why you choose to add this doc modification in this patch
in the thread, sorry for skipping over the part:
| For earlier versions (18 and before), the storage-size approximation
| remains relevant because users don’t have direct access to member
| count information. Since we don’t plan to backpatch (I assume so) this
| new function, the documentation for older branches should continue to
| rely on the storage-based approximation.

| Now that pg_get_multixact_stats() exposes num_members, the HEAD branch
| docs can describe the thresholds in terms of counts directly.

Personally, I think it might be fine to keep the gigabyte-based
description, and perhaps we could show both the number of members and
gigabytes, since it'd be also helpful to have a sense of the scale.


--
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA Japan Corporation to SRA OSS K.K.



Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

От
Ashutosh Bapat
Дата:
On Fri, Aug 22, 2025 at 7:37 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
>
> | Now that pg_get_multixact_stats() exposes num_members, the HEAD branch
> | docs can describe the thresholds in terms of counts directly.
>
> Personally, I think it might be fine to keep the gigabyte-based
> description, and perhaps we could show both the number of members and
> gigabytes, since it'd be also helpful to have a sense of the scale.
>

Those who have grown their own utilities to monitor the on-disk usage
will not be able to use the count based thresholds and might take some
time for them to starting using pg_get_multixact_stats(). It makes
sense to mention both the count and the corresponding disk usage
threshold. Something like "Also, if the number of multixact members
exceeds approximately 2^31 entries (occupying roughly more than 10GB
in storage) ... ". Users can choose which threshold they want to use.
Adding disk storage threshold in parenthesis indicates that the count
is more accurate and more useful.

Here's detailed review of the patch
+ Returns statistics about current multixact usage:
+ <literal>num_mxids</literal> is the number of multixact IDs assigned,
+ <literal>num_members</literal> is the number of multixact member
entries created,
+ <literal>members_bytes</literal> is the storage occupied by
<literal>num_members</literal>

I thought mentioning bytes, a unit, in column name members_bytes would
not be appropriate in case we start reporting it in a different unit
like kB in future. But we already have
pg_stat_replication_slots::spill_bytes with similar naming. So may be
it's okay. But I would prefer members_size or members_storage or some
such units-free name.

+ in <literal>pg_multixact/members</literal> directory,
+ <literal>oldest_multixact</literal> is the oldest multixact ID still
in use, and
+ <literal>oldest_offset</literal> is the oldest member offset still in use.

I am not sure whether oldest_offset is worth exposing. It is an
implementation detail. Upthread, Michael suggested to expose oldest
offset from GetMultiXactInfo(), but I don't see him explicitly saying
that we should expose it through this function as well. Michael what
do you think?

+ These values can be used to monitor multixact consumption and anticipate
+ autovacuum behavior. See <xref linkend="vacuum-for-multixact-wraparound"/>
+ for further details on multixact wraparound.

I still think that this is not needed. There is no reason to restrict
how users want to use this function. We usually don't do that unless
there is a hazard associated with it.

+ <para>
+ This is a live snapshot of shared counters; the numbers can change
between calls,
+ even within the same transaction.
+ </para></entry>

I have not seen the phrase "live snapshot" being used in the
documentation before. How about "The function reports the statistics
at the time of invoking the function. They may vary between calls even
within the same transaction."?

+ linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if the number
+ of members created exceeds approximately 2^31 entries, aggressive vacuum

a member means the transaction participating in a multixact. What you
intend to say is "if the number of multixacts member entries created
...", right?

+ <para>
+ The <function>pg_get_multixact_stats()</function> function, described in

unnecessary pair of commas.

+ This output shows a system with significant multixact activity:
about ~100 million
+ multixact IDs and ~773 million member entries have been created
since the oldest
+ surviving multixact (ID 39974368). By leveraging this information,
the function helps:
+ <orderedlist>
+ <listitem>
... snip ...
+ Detect potential performance impacts before they become critical.
+ For instance, high multixact usage from frequent row-level locking or
+ foreign key operations can lead to increased I/O and CPU overhead during
+ vacuum operations. Monitoring these stats helps tune autovacuum frequency
+ and transaction patterns.
+ </simpara>
+ </listitem>
+ </orderedlist>

I am unsure whether we should be mentioning use cases in such detail.
Users may find other ways to use those counts. I think the following
paragraph should be placed here.

+ These values can be used to monitor multixact consumption and anticipate
+ autovacuum behavior. See <xref linkend="vacuum-for-multixact-wraparound"/>
+ for further details on multixact wraparound.

 But others may have different opinions.

Maybe you could further write in your example that an aggressive
autovacuum will be triggered in another 10 seconds (or so) if the
number of member entries continues to double every 5 seconds. Or some
practical "usage example" like that.


+ * Returns statistics about current MultiXact usage:
+ * - num_mxids: Number of MultiXact IDs in use
+ * - num_members: Total number of member entries
+ * - oldest_multixact: Oldest MultiXact ID still needed
+ * - oldest_offset: Oldest offset still in use

We don't need to mention each column here, it's evident from the
function body and also from the user facing documentation. Just the
first line is ok.

+ *
+ * Returns a row of NULLs if the MultiXact system is not yet initialized.

tuple or record instead of row.

In the earlier patch you were calling PG_RETURN_NULL(), which I
thought was better. It would get converted into a record of NULLs if
someone is to do SELECT * FROM pg_get_multixact_stats().

I don't think "the MultiXact system is not yet initialized" is the
right description of that condition. GetMultiXactInfo() prologue says
"
Returns false if unable to determine, the oldest offset being
unknown." MultiXactStatData has following comment for oldest offset.
/*
* Oldest multixact offset that is potentially referenced by a multixact
* referenced by a relation. We don't always know this value, so there's
* a flag here to indicate whether or not we currently do.
*/

And also
/* Have we completed multixact startup? */
bool finishedStartup;

I think we need to define this condition more accurately.

And include it in the documentation as well.

+ * Calculate approximate storage space:
+ * - Members are stored in groups of 4
+ * - Each group takes 20 bytes (5 bytes per member)
+ * Note: This ignores small page overhead (12 bytes per 8KB)
+ */
+ membersBytes = (int64) members * 5;

Do we have some constant macros or sizeof(some structure) defined for
5 and 4? That way this computation will be self maintaining and self
documenting.

+ nulls[0] = nulls[1] = nulls[2] = nulls[3] = nulls[4] = false;

memset(nulls, false, sizeof(nulls)); is better and used everywhere.

In fact, instead of initializing it all to true first and then setting
all to false here, we should memset here and set it to true in else
block.

+++ b/src/test/isolation/specs/multixact_stats.spec

I have not an seen an isolation test being used for testing a stats
function. But I find it useful. Let's see what others think.

@@ -0,0 +1,127 @@
+# High-signal invariants for pg_get_multixact_stats()
+# We create exactly one fresh MultiXact on a brand-new table. While
it is pinned
+# by two open transactions, we assert only invariants that background
VACUUM/FREEZE
+# cannot violate:
+# • members increased by ≥ 1 when the second locker arrived,
+# • num_mxids / num_members did not decrease vs earlier snapshots,
+# • oldest_* never decreases.
+# We make NO assertions after releasing locks (freezing/truncation
may shrink deltas).
+# NOTE: Snapshots snap0 and subsequent checks are taken inside an open driver
+# transaction to narrow the window for unrelated truncation between snapshots.

What's a driver transaction?

+#
+# Terminology (global counters):
+# num_mxids, num_members : “in-use” deltas derived from global horizons
+# oldest_multixact, offset : oldest horizons; they move forward, never backward
+#
+# All assertions execute while our multixact is pinned by open txns,
which protects
+# the truncation horizon (VACUUM can’t advance past our pinned multi).

Probably this comment is not needed. But from the sequence of steps
executed, the data is collected when multixact is pinned (what does
that mean?) but the assertions are executed at the end when all the
transactions are committed. Am I correct?

+step snap0 {
+ CREATE TEMP TABLE snap0 AS
+ SELECT num_mxids, num_members, oldest_multixact, oldest_offset
+ FROM pg_get_multixact_stats();
+}

You could use a single table with a primary key column to distinguish
snaps which can be used for joining the rows. Why use a temporary
table? Just setup and tear down the snap table as well?

+
+# Pretty, deterministic key/value output of boolean checks.
+# Keys:
... snip ...
+ (s1.num_mxids >= COALESCE(s0.num_mxids, 0)),
+ (s2.num_mxids >= COALESCE(s1.num_mxids, 0)),
+ (s1.num_members >= COALESCE(s0.num_members, 0)),
+ (s2.num_members >= COALESCE(s1.num_members, 0))
+ ]

This is getting too complex to follow. It produces pretty output but
the query is complex. Instead just let keys as the columns in the
query. Maybe you could print expanded output if that's possible in an
isolation test.

--
Best Wishes,
Ashutosh Bapat



Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

От
Naga Appani
Дата:
Hi Atsushi and Ashutosh,

Thank you for reviewing the patch. Attached is v7, incorporating the
feedback. Please see my responses in-line below.

On Fri, Aug 22, 2025 at 6:45 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Fri, Aug 22, 2025 at 7:37 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
> >
> > | Now that pg_get_multixact_stats() exposes num_members, the HEAD branch
> > | docs can describe the thresholds in terms of counts directly.
> >
> > Personally, I think it might be fine to keep the gigabyte-based
> > description, and perhaps we could show both the number of members and
> > gigabytes, since it'd be also helpful to have a sense of the scale.
> >
>
> Those who have grown their own utilities to monitor the on-disk usage
> will not be able to use the count based thresholds and might take some
> time for them to starting using pg_get_multixact_stats(). It makes
> sense to mention both the count and the corresponding disk usage
> threshold. Something like "Also, if the number of multixact members
> exceeds approximately 2^31 entries (occupying roughly more than 10GB
> in storage) ... ". Users can choose which threshold they want to use.
> Adding disk storage threshold in parenthesis indicates that the count
> is more accurate and more useful.
Updated docs to include both counts and approximate storage.


> I thought mentioning bytes, a unit, in column name members_bytes would
> not be appropriate in case we start reporting it in a different unit
> like kB in future. But we already have
> pg_stat_replication_slots::spill_bytes with similar naming. So may be
> it's okay. But I would prefer members_size or members_storage or some
> such units-free name.
Good point! Adjusted to a units-free name: members_size.

> + in <literal>pg_multixact/members</literal> directory,
> + <literal>oldest_multixact</literal> is the oldest multixact ID still
> in use, and
> + <literal>oldest_offset</literal> is the oldest member offset still in use.
>
> I am not sure whether oldest_offset is worth exposing. It is an
> implementation detail. Upthread, Michael suggested to expose oldest
> offset from GetMultiXactInfo(), but I don't see him explicitly saying
> that we should expose it through this function as well. Michael what
> do you think?
IMHO, exposing oldest_offset gives a full picture of multixact state.
It complements oldest_multixact,
and including it won’t hurt. That said, if consensus is against it,
I’m happy to drop it.

> + These values can be used to monitor multixact consumption and anticipate
> + autovacuum behavior. See <xref linkend="vacuum-for-multixact-wraparound"/>
> + for further details on multixact wraparound.
>
> I still think that this is not needed. There is no reason to restrict
> how users want to use this function. We usually don't do that unless
> there is a hazard associated with it.
Removed.

> + <para>
> + This is a live snapshot of shared counters; the numbers can change
> between calls,
> + even within the same transaction.
> + </para></entry>
>
> I have not seen the phrase "live snapshot" being used in the
> documentation before. How about "The function reports the statistics
> at the time of invoking the function. They may vary between calls even
> within the same transaction."?
Updated wording.

> + linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if the number
> + of members created exceeds approximately 2^31 entries, aggressive vacuum
>
> a member means the transaction participating in a multixact. What you
> intend to say is "if the number of multixacts member entries created
> ...", right?
Correct, updated.

> + <para>
> + The <function>pg_get_multixact_stats()</function> function, described in
>
> unnecessary pair of commas.
Fixed.

> + This output shows a system with significant multixact activity:
> about ~100 million
> + multixact IDs and ~773 million member entries have been created
> since the oldest
> + surviving multixact (ID 39974368). By leveraging this information,
> the function helps:
> + <orderedlist>
> + <listitem>
> ... snip ...
> + Detect potential performance impacts before they become critical.
> + For instance, high multixact usage from frequent row-level locking or
> + foreign key operations can lead to increased I/O and CPU overhead during
> + vacuum operations. Monitoring these stats helps tune autovacuum frequency
> + and transaction patterns.
> + </simpara>
> + </listitem>
> + </orderedlist>
>
> I am unsure whether we should be mentioning use cases in such detail.
> Users may find other ways to use those counts. I think the following
> paragraph should be placed here.
>
> + These values can be used to monitor multixact consumption and anticipate
> + autovacuum behavior. See <xref linkend="vacuum-for-multixact-wraparound"/>
> + for further details on multixact wraparound.
>
>  But others may have different opinions.
>
> Maybe you could further write in your example that an aggressive
> autovacuum will be triggered in another 10 seconds (or so) if the
> number of member entries continues to double every 5 seconds. Or some
> practical "usage example" like that.
Ack. I believe keeping the example with a short list is helpful for
users to navigate
and interpret the stats. If preferred, I can trim it to a brief
paragraph with just the query in
the next rev.

> + * Returns statistics about current MultiXact usage:
> + * - num_mxids: Number of MultiXact IDs in use
> + * - num_members: Total number of member entries
> + * - oldest_multixact: Oldest MultiXact ID still needed
> + * - oldest_offset: Oldest offset still in use
>
> We don't need to mention each column here, it's evident from the
> function body and also from the user facing documentation. Just the
> first line is ok.
Updated - kept only the high-level description.

> + *
> + * Returns a row of NULLs if the MultiXact system is not yet initialized.
>
> tuple or record instead of row.
>
> In the earlier patch you were calling PG_RETURN_NULL(), which I
> thought was better. It would get converted into a record of NULLs if
> someone is to do SELECT * FROM pg_get_multixact_stats().
>
> I don't think "the MultiXact system is not yet initialized" is the
> right description of that condition. GetMultiXactInfo() prologue says
> "
> Returns false if unable to determine, the oldest offset being
> unknown." MultiXactStatData has following comment for oldest offset.
> /*
> * Oldest multixact offset that is potentially referenced by a multixact
> * referenced by a relation. We don't always know this value, so there's
> * a flag here to indicate whether or not we currently do.
> */
Switched to PG_RETURN_NULL() and rephrased both code comment and docs.


> + * Calculate approximate storage space:
> + * - Members are stored in groups of 4
> + * - Each group takes 20 bytes (5 bytes per member)
> + * Note: This ignores small page overhead (12 bytes per 8KB)
> + */
> + membersBytes = (int64) members * 5;
>
> Do we have some constant macros or sizeof(some structure) defined for
> 5 and 4? That way this computation will be self maintaining and self
> documenting.
Those macros are already defined in multixact.c - for example,
MULTIXACT_MEMBERS_PER_MEMBERGROUP
and MULTIXACT_MEMBERGROUP_SIZE encode the 4-per-group and 20-byte
layout. They are local today,
and I’m not sure why they were never exposed. Rather than moving them
into a header and creating wider changes,
v7 retains the explicit 5-bytes/member estimate with an explanatory
comment to stay consistent with existing guidance.
If we feel these macros should be promoted to a header, I think that
would be best handled as a small, separate patch,
and I’d be happy to help with that.

> + nulls[0] = nulls[1] = nulls[2] = nulls[3] = nulls[4] = false;
>
> memset(nulls, false, sizeof(nulls)); is better and used everywhere.
>
> In fact, instead of initializing it all to true first and then setting
> all to false here, we should memset here and set it to true in else
> block.
Updated. v7 uses memset(false) and only sets true where needed.

> +++ b/src/test/isolation/specs/multixact_stats.spec
>
> I have not an seen an isolation test being used for testing a stats
> function. But I find it useful. Let's see what others think.
>
> @@ -0,0 +1,127 @@
> +# High-signal invariants for pg_get_multixact_stats()
> +# We create exactly one fresh MultiXact on a brand-new table. While
> it is pinned
> +# by two open transactions, we assert only invariants that background
> VACUUM/FREEZE
> +# cannot violate:
> +# • members increased by ≥ 1 when the second locker arrived,
> +# • num_mxids / num_members did not decrease vs earlier snapshots,
> +# • oldest_* never decreases.
> +# We make NO assertions after releasing locks (freezing/truncation
> may shrink deltas).
> +# NOTE: Snapshots snap0 and subsequent checks are taken inside an open driver
> +# transaction to narrow the window for unrelated truncation between snapshots.
>
> What's a driver transaction?
A driver transaction is simply the controlling session that stays open
while snapshots are taken.

> +#
> +# Terminology (global counters):
> +# num_mxids, num_members : “in-use” deltas derived from global horizons
> +# oldest_multixact, offset : oldest horizons; they move forward, never backward
> +#
> +# All assertions execute while our multixact is pinned by open txns,
> which protects
> +# the truncation horizon (VACUUM can’t advance past our pinned multi).
>
> Probably this comment is not needed. But from the sequence of steps
> executed, the data is collected when multixact is pinned (what does
> that mean?) but the assertions are executed at the end when all the
> transactions are committed. Am I correct?
You are correct — the assertions are executed at the end, after the commits.
The key point is that all snapshots are taken while the multixact is
pinned by open transactions,
so the invariants hold despite the final check happening later.

> +step snap0 {
> + CREATE TEMP TABLE snap0 AS
> + SELECT num_mxids, num_members, oldest_multixact, oldest_offset
> + FROM pg_get_multixact_stats();
> +}
>
> You could use a single table with a primary key column to distinguish
> snaps which can be used for joining the rows. Why use a temporary
> table? Just setup and tear down the snap table as well?
I kept separate temp tables to keep each snapshot isolated and easy to
read in the spec.
A single table with a PK would work too, but I felt temp tables made
the sequence clearer.

> +
> +# Pretty, deterministic key/value output of boolean checks.
> +# Keys:
> ... snip ...
> + (s1.num_mxids >= COALESCE(s0.num_mxids, 0)),
> + (s2.num_mxids >= COALESCE(s1.num_mxids, 0)),
> + (s1.num_members >= COALESCE(s0.num_members, 0)),
> + (s2.num_members >= COALESCE(s1.num_members, 0))
> + ]
>
> This is getting too complex to follow. It produces pretty output but
> the query is complex. Instead just let keys as the columns in the
> query. Maybe you could print expanded output if that's possible in an
> isolation test.
>
I used the labeled key/value array to mimic \x-style readability while
keeping the output
deterministic for isolation’s text diffs. It clearly names each
invariant and avoids
formatter-dependent width/spacing.

Thanks again for the thoughtful reviews. I really appreciate the
guidance and will be glad to adjust
further if needed.

Best regards,
Naga

Вложения

Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

От
Ashutosh Bapat
Дата:
On Thu, Sep 4, 2025 at 2:41 AM Naga Appani <nagnrik@gmail.com> wrote:
>
> On Fri, Aug 22, 2025 at 6:45 AM Ashutosh Bapat
> <ashutosh.bapat.oss@gmail.com> wrote:
> >
> > On Fri, Aug 22, 2025 at 7:37 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
> > >
> Updated docs to include both counts and approximate storage.
>

This one is remaining.
+ up to approximately 2^32 entries before reaching wraparound.

... 2^32 entries (occupying roughly 20GB in the
<literal>pg_multixact/members</literal> directory) before reaching
wraparound. ...

+ See <xref linkend="vacuum-for-multixact-wraparound"/> for further
details on multixact wraparound.

I don't think we need this reference here. Reference back from that
section is enough.

+ * Returns NULL if the oldest referenced offset is unknown, which
happens during
+ * system startup or when no MultiXact references exist in any relation.

If no MultiXact references exist, and GetMultiXactInfo() returns
false, MultiXactMemberFreezeThreshold() will assume the worst, which I
take as meaning that it will trigger aggressive autovacuum. No
MultiXact references existing is a common case which shouldn't be
assumed as the worst case. The comment I quoted means "the oldest
value of the offset referenced by any multi-xact referenced by a
relation *may not be always known". You seem to have interpreted "may
not be known" as "does not exist" That's not right. I would write this
as "Returns NULL if the oldest referenced offset is unknown which
happens during system startup".

Similarly I would rephrase the following docs as
+ <para>
+ The function returns <literal>NULL</literal> when multixact
statistics are unavailable.
+ For example, during startup before multixact initialization completes or when
+ the oldest member offset cannot be determined.

"The function returns <literal>NULL</literal> when multixact
statistics when the oldest multixact offset corresponding to a
multixact referenced by a relation is not known after starting the
system."

> >
> > @@ -0,0 +1,127 @@
> > +# High-signal invariants for pg_get_multixact_stats()

What does "High-signal" mean here? Is that term defined somewhere?
Using terms that most of the contributors are familiar with improves
readability. If a new term is required, it needs to be defined first.
But I doubt something here requires defining a new term.

> > What's a driver transaction?
> A driver transaction is simply the controlling session that stays open
> while snapshots are taken.

I still don't understand the purpose of this transaction.
pg_get_multixact_stats() isn't transactional so the driver transaction
isn't holding any "snapshot" of the stats. It's also not creating any
multixact and hence does not contribute to testing the output of
pg_get_multixact_stats. Whatever this session is doing, can be done
outside a transaction too. Which step in this session requires an
outer transaction?

Some more comments
+ Returns statistics about current multixact usage:
+ <literal>num_mxids</literal> is the number of multixact IDs assigned,

Is this the number of multixact IDs assigned till now (since whatever
time) or the number of multixact IDs currently in the system?

+ <literal>num_members</literal> is the number of multixact member
entries created,

Similarly this.

+ multixact allocation and usage patterns in real time. For example:

suggestion: ... real time, for example: ... Otherwise the sentence
started by "For example" is not a complete sentence.

+ Returns statistics about current multixact usage:
+ <literal>num_mxids</literal> is the number of multixact IDs assigned,

Is this the number of multixact IDs assigned till now (since whatever
time) or the number of multixact IDs currently in the system?

+ <literal>num_members</literal> is the number of multixact member
entries created,

Similarly this.

+ multixact allocation and usage patterns in real time. For example:

suggestion: ... real time, for example: ... Otherwise the sentence
started by "For example" is not a complete sentence.

+
+ values[0] = Int32GetDatum(multixacts);

This should be UInt32GetDatum() multixacts is uint32.

+ values[1] = Int64GetDatum(members);

Similarly this since MultiXactOffset is uint32.

+ values[4] = Int64GetDatum(oldestOffset);

Similarly this since MultiXactOffset is uint32.

+# Get MultiXact state
+{
+ oid => '9001',
+ descr => 'get current multixact member and multixact ID counts and
oldest values',

suggestion: get current multixact usage statistics.

+ proname => 'pg_get_multixact_stats',
+ prorettype => 'record',
+ proargtypes => '',
+ proallargtypes => '{int4,int8,int8,xid,int8}',
+ proargmodes => '{o,o,o,o,o}',
+ proargnames =>
'{num_mxids,num_members,members_size,oldest_multixact,oldest_offset}',
+ provolatile => 'v',
+ proparallel => 's',
+ prosrc => 'pg_get_multixact_stats'
+},

I like the way you have formatted the new entry, but other entries in
this file are not formatted this way. It would be good to format it
like other entries but if other reviewers prefer this way, we can go
with this too.

--
Best Wishes,
Ashutosh Bapat



Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

От
Naga Appani
Дата:
Hi Ashutosh,

Thank you for continuing to review the patch. Attached is v8,
incorporating the feedback. Please see my responses inline below.

On Fri, Sep 5, 2025 at 6:27 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> This one is remaining.
> + up to approximately 2^32 entries before reaching wraparound.
>
> ... 2^32 entries (occupying roughly 20GB in the
> <literal>pg_multixact/members</literal> directory) before reaching
> wraparound. ...
Done.

> + See <xref linkend="vacuum-for-multixact-wraparound"/> for further
> details on multixact wraparound.
>
> I don't think we need this reference here. Reference back from that
> section is enough.
I kept the cross-reference for now since other multixact function docs
(such as pg_get_multixact_members()) already use this style, and it helps
readers who land directly on the function reference page. Please let me
know if you would prefer that I remove it.

> + * Returns NULL if the oldest referenced offset is unknown, which
> happens during
> + * system startup or when no MultiXact references exist in any relation.
>
> If no MultiXact references exist, and GetMultiXactInfo() returns
> false, MultiXactMemberFreezeThreshold() will assume the worst, which I
> take as meaning that it will trigger aggressive autovacuum. No
> MultiXact references existing is a common case which shouldn't be
> assumed as the worst case. The comment I quoted means "the oldest
> value of the offset referenced by any multi-xact referenced by a
> relation *may not be always known". You seem to have interpreted "may
> not be known" as "does not exist" That's not right. I would write this
> as "Returns NULL if the oldest referenced offset is unknown which
> happens during system startup".
>
> Similarly I would rephrase the following docs as
> + <para>
> + The function returns <literal>NULL</literal> when multixact
> statistics are unavailable.
> + For example, during startup before multixact initialization completes or when
> + the oldest member offset cannot be determined.
>
> "The function returns <literal>NULL</literal> when multixact
> statistics when the oldest multixact offset corresponding to a
> multixact referenced by a relation is not known after starting the
> system."
>
Updated.

> > >
> > > @@ -0,0 +1,127 @@
> > > +# High-signal invariants for pg_get_multixact_stats()
>
> What does "High-signal" mean here? Is that term defined somewhere?
> Using terms that most of the contributors are familiar with improves
> readability. If a new term is required, it needs to be defined first.
> But I doubt something here requires defining a new term.
Dropped that wording and simplified the isolation test.

> > > What's a driver transaction?
> > A driver transaction is simply the controlling session that stays open
> > while snapshots are taken.
>
> I still don't understand the purpose of this transaction.
> pg_get_multixact_stats() isn't transactional so the driver transaction
> isn't holding any "snapshot" of the stats. It's also not creating any
> multixact and hence does not contribute to testing the output of
> pg_get_multixact_stats. Whatever this session is doing, can be done
> outside a transaction too. Which step in this session requires an
> outer transaction?
Removed this mention; the test now only checks monotonicity without extra
transaction scaffolding.

> Some more comments
> + Returns statistics about current multixact usage:
> + <literal>num_mxids</literal> is the number of multixact IDs assigned,
>
> Is this the number of multixact IDs assigned till now (since whatever
> time) or the number of multixact IDs currently in the system?
>
> + <literal>num_members</literal> is the number of multixact member
> entries created,
Updated.

> + Returns statistics about current multixact usage:
> + <literal>num_mxids</literal> is the number of multixact IDs assigned,
>
> Is this the number of multixact IDs assigned till now (since whatever
> time) or the number of multixact IDs currently in the system?
>
> + <literal>num_members</literal> is the number of multixact member
> entries created,
Updated.


> + multixact allocation and usage patterns in real time. For example:
>
> suggestion: ... real time, for example: ... Otherwise the sentence
> started by "For example" is not a complete sentence.
Updated.

> + values[0] = Int32GetDatum(multixacts);
>
> This should be UInt32GetDatum() multixacts is uint32.
>
> + values[1] = Int64GetDatum(members);
>
> Similarly this since MultiXactOffset is uint32.
>
> + values[4] = Int64GetDatum(oldestOffset);
>
> Similarly this since MultiXactOffset is uint32.
Thanks for pointing this out. I had originally followed the existing
types but drifted, fixed now.

> +# Get MultiXact state
> +{
> + oid => '9001',
> + descr => 'get current multixact member and multixact ID counts and
> oldest values',
>
> suggestion: get current multixact usage statistics.
Updated

> + proname => 'pg_get_multixact_stats',
> + prorettype => 'record',
> + proargtypes => '',
> + proallargtypes => '{int4,int8,int8,xid,int8}',
> + proargmodes => '{o,o,o,o,o}',
> + proargnames =>
> '{num_mxids,num_members,members_size,oldest_multixact,oldest_offset}',
> + provolatile => 'v',
> + proparallel => 's',
> + prosrc => 'pg_get_multixact_stats'
> +},
>
> I like the way you have formatted the new entry, but other entries in
> this file are not formatted this way. It would be good to format it
> like other entries but if other reviewers prefer this way, we can go
> with this too.
I reformatted the pg_proc.dat entry to match the surrounding style.

Best regards,
Naga

Вложения

Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

От
Ashutosh Bapat
Дата:
On Fri, Sep 12, 2025 at 9:03 AM Naga Appani <nagnrik@gmail.com> wrote:
>

+ </para>
+ <para>
+ Returns statistics about current multixact usage:
+ <literal>num_mxids</literal> is the total number of multixact IDs
assigned since startup,
+ <literal>num_members</literal> is the total number of multixact
member entries created since startup,

GetMultiXactInfo() returns following
*members = nextOffset - *oldestOffset;
*multixacts = nextMultiXactId - *oldestMultiXactId;
They seem to be the numbers that exist in the system at the time of
the call and not since the startup. Am I missing something?

+ up to approximately 2^32 entries(occupying roughly 20GB in the

space between s and (

+ proallargtypes => '{int4,int8,int8,xid,int8}',

I think the first parameter should also be int8 since uint32 won't fit
into int4.


>
> > + See <xref linkend="vacuum-for-multixact-wraparound"/> for further
> > details on multixact wraparound.
> >
> > I don't think we need this reference here. Reference back from that
> > section is enough.
> I kept the cross-reference for now since other multixact function docs
> (such as pg_get_multixact_members()) already use this style, and it helps
> readers who land directly on the function reference page. Please let me
> know if you would prefer that I remove it.

None of the write up there talks about multixact wraparound so the
reference seems arbitrary to me. I would remove it.

>
> > + * Returns NULL if the oldest referenced offset is unknown, which
> > happens during
> > + * system startup or when no MultiXact references exist in any relation.
> >
> > If no MultiXact references exist, and GetMultiXactInfo() returns
> > false, MultiXactMemberFreezeThreshold() will assume the worst, which I
> > take as meaning that it will trigger aggressive autovacuum. No
> > MultiXact references existing is a common case which shouldn't be
> > assumed as the worst case. The comment I quoted means "the oldest
> > value of the offset referenced by any multi-xact referenced by a
> > relation *may not be always known". You seem to have interpreted "may
> > not be known" as "does not exist" That's not right. I would write this
> > as "Returns NULL if the oldest referenced offset is unknown which
> > happens during system startup".
> >
> > Similarly I would rephrase the following docs as
> > + <para>
> > + The function returns <literal>NULL</literal> when multixact
> > statistics are unavailable.
> > + For example, during startup before multixact initialization completes or when
> > + the oldest member offset cannot be determined.
> >
> > "The function returns <literal>NULL</literal> when multixact
> > statistics when the oldest multixact offset corresponding to a
> > multixact referenced by a relation is not known after starting the
> > system."
> >
> Updated.

Thanks for updating the documentation. But the comment in prologue of
pg_get_multixact_stats is not completely correct as mentioned in my
previous reply. I would just say "Returns NULL if the oldest
referenced offset is unknown". Anybody who wants to know when can that
happen, may search relevant code by looking at GetMultiXactInfo().

I still find the comment at the start of the isolation test a bit
verbose. But I think it's best to leave that to a committer's
judgement. However, the word "locker" is unusual. You want to say the
session that locks a row (or something similar). We may leave exact
words to a committer's judgement.

I still find think that the specific usage scenarios described in the
documentation are not required. And the computation in
pg_get_multixact_stats() should use macros instead of bare numbers.
But we can leave that for a committer to decide. Once you address
above comments, we may mark the CF entry as RFC.

--
Best Wishes,
Ashutosh Bapat



Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

От
Naga Appani
Дата:
Hi Ashutosh,

Thank you for continuing to review the patch. Attached is v9,
incorporating the feedback. Please see my responses inline below.

On Fri, Sep 12, 2025 at 5:34 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

> + Returns statistics about current multixact usage:
> + <literal>num_mxids</literal> is the total number of multixact IDs
> assigned since startup,
> + <literal>num_members</literal> is the total number of multixact
> member entries created since startup,
>
> GetMultiXactInfo() returns following
> *members = nextOffset - *oldestOffset;
> *multixacts = nextMultiXactId - *oldestMultiXactId;
> They seem to be the numbers that exist in the system at the time of
> the call and not since the startup. Am I missing something?
You are right, these counts reflect the numbers currently present in
the system, not cumulative totals since startup. I have reworded the
docs to say “currently present”.

> + up to approximately 2^32 entries(occupying roughly 20GB in the
>
> space between s and (
Fixed.

> + proallargtypes => '{int4,int8,int8,xid,int8}',
>
> I think the first parameter should also be int8 since uint32 won't fit
> into int4.
Updated.

> > > + See <xref linkend="vacuum-for-multixact-wraparound"/> for further
> > > details on multixact wraparound.
> > >
> > > I don't think we need this reference here. Reference back from that
> > > section is enough.
> > I kept the cross-reference for now since other multixact function docs
> > (such as pg_get_multixact_members()) already use this style, and it helps
> > readers who land directly on the function reference page. Please let me
> > know if you would prefer that I remove it.
>
> None of the write up there talks about multixact wraparound so the
> reference seems arbitrary to me. I would remove it.
Removed.


> Thanks for updating the documentation. But the comment in prologue of
> pg_get_multixact_stats is not completely correct as mentioned in my
> previous reply. I would just say "Returns NULL if the oldest
> referenced offset is unknown". Anybody who wants to know when can that
> happen, may search relevant code by looking at GetMultiXactInfo().
Simplified the prologue comment as suggested.

> I still find the comment at the start of the isolation test a bit
> verbose. But I think it's best to leave that to a committer's
> judgement. However, the word "locker" is unusual. You want to say the
> session that locks a row (or something similar). We may leave exact
> words to a committer's judgement.
Reworded to remove "locker" and simplified.

> I still find think that the specific usage scenarios described in the
> documentation are not required. And the computation in
> pg_get_multixact_stats() should use macros instead of bare numbers.
> But we can leave that for a committer to decide. Once you address
> above comments, we may mark the CF entry as RFC.
Sounds good!

With these updates in v9, I believe the patch is now in good shape to
be marked RFC. I’ll go ahead and update the CommitFest entry.

Thank you again for your thorough reviews and thoughtful guidance on
this patch — it has been very helpful.

Best regards,
Naga

Вложения