Обсуждение: [Proposal] Expose internal MultiXact member count function for efficient monitoring

Поиск

Список

Период

Сортировка

[Proposal] Expose internal MultiXact member count function for efficient monitoring

От

Naga Appani

Дата:

10 марта 2025 г., 18:43:54

Hi,

I would like to propose exposing an internal PostgreSQL function called ReadMultiXactCounts() to allow for efficient monitoring of MultiXact member usage. This function provides an accurate, real-time view of MultiXact activity by directly retrieving the actual member count, rather than relying on storage-based calculations.

Current Challenges: The existing approach we are currently using to estimate MultiXact member usage has several drawbacks:

Filesystem scanning overhead: These functions recursively scan the pg_multixact directory, iterating over potentially thousands or millions of files, and retrieving file sizes using stat() calls, which introduces significant I/O overhead.
Potential performance bottleneck: On systems with high transaction throughput generating large numbers of MultiXact members, the filesystem-based approach scales poorly due to the latency of stat() calls, especially on network-based filesystems like RDS/Aurora.
Not a real-time or memory-efficient solution: The current approach does not provide a direct, in-memory view of MultiXact activity.

Proposed SolutionThe internal ReadMultiXactCounts() function, implemented in multixact.c, directly calculates the number of MultiXact members by reading live state from shared memory. This approach avoids the performance issues of the current filesystem-based estimation methods.

By exposing ReadMultiXactCounts() for external use, we can provide PostgreSQL users with an efficient way to monitor MultiXact member usage. This could be particularly useful for integrating with tools like Amazon RDS Performance Insights and Amazon CloudWatch to provide enhanced database insights and proactive managed monitoring for users.

The performance comparison between the current and proposed approaches shows a significant improvement, with the proposed solution taking only a fraction of a millisecond to retrieve the MultiXact member count, compared to tens or hundreds of milliseconds for the current filesystem-based approach.

Following is the comparison of performance between calculating storage of MultiXact members directory and retrieving the count of members.

Implementation	Used size	MultiXact members (approx)	Time taken (ms)	Time factor (vs Baseline)
EC2 community (RDS version of pg_ls_multixactdir)	8642 MB	1.8 billion	96.879	1.00
Linux du command	8642 MB	1.8 billion	96	NA
Proposal (ReadMultiXactCounts)	N/A	1.99 billion	0.167	580 times faster

I believe exposing ReadMultiXactCounts() would be a valuable addition to the PostgreSQL ecosystem, providing users with a more reliable and efficient way to monitor MultiXact usage. Appreciate your feedback or discussion on this proposal.

Please let me know if this approach is acceptable, so I’ll go ahead and submit a patch.

Thank you!

Best regards,

Naga Appani

Postgres Database Engineer

Amazon Web Services

Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

От

Naga Appani

Дата:

11 марта 2025 г., 08:15:28

On Mon, Mar 10, 2025 at 10:43 AM Naga Appani <nagnrik@gmail.com> wrote:

Hi,

I would like to propose exposing an internal PostgreSQL function called ReadMultiXactCounts() to allow for efficient monitoring of MultiXact member usage. This function provides an accurate, real-time view of MultiXact activity by directly retrieving the actual member count, rather than relying on storage-based calculations.

Current Challenges: The existing approach we are currently using to estimate MultiXact member usage has several drawbacks:
Filesystem scanning overhead: These functions recursively scan the pg_multixact directory, iterating over potentially thousands or millions of files, and retrieving file sizes using stat() calls, which introduces significant I/O overhead.
Potential performance bottleneck: On systems with high transaction throughput generating large numbers of MultiXact members, the filesystem-based approach scales poorly due to the latency of stat() calls, especially on network-based filesystems like RDS/Aurora.
Not a real-time or memory-efficient solution: The current approach does not provide a direct, in-memory view of MultiXact activity.
Proposed SolutionThe internal ReadMultiXactCounts() function, implemented in multixact.c, directly calculates the number of MultiXact members by reading live state from shared memory. This approach avoids the performance issues of the current filesystem-based estimation methods.
................
...............

My apologies for re-posting. This is my first time writing to the hackers list, and I accidentally used HTML formatting. Below is the original request in plain text:

**************************************************************************************************************************************************************
I would like to propose exposing an internal PostgreSQL function called ReadMultiXactCounts()[1] to allow for efficient monitoring of MultiXact member usage. This function provides an accurate, real-time view of MultiXact activity by directly retrieving the actual member count, rather than relying on storage-based calculations.

================
Current Challenges
================
The existing approach we are currently using to estimate MultiXact member usage has several drawbacks:
- Filesystem scanning overhead: These functions recursively scan the pg_multixact directory, iterating over potentially thousands or millions of files, and retrieving file sizes using stat() calls, which introduces significant I/O overhead.
- Potential performance bottleneck: On systems with high transaction throughput generating large numbers of MultiXact members, the filesystem-based approach scales poorly due to the latency of stat() calls, especially on network-based filesystems like RDS/Aurora.
- Not a real-time or memory-efficient solution: The current approach does not provide a direct, in-memory view of MultiXact activity.

=================
Proposal
=================
The internal ReadMultiXactCounts() function, implemented in multixact.c, directly calculates the number of MultiXact members by reading live state from shared memory. This approach avoids the performance issues of the current filesystem-based estimation methods.

By exposing ReadMultiXactCounts() for external use, we can provide PostgreSQL users with an efficient way to monitor MultiXact member usage. This could be particularly useful for integrating with tools like Amazon RDS Performance Insights and Amazon CloudWatch to provide enhanced database insights and proactive managed monitoring for users.

=========================
Performance comparison
=========================
The performance comparison between the current and proposed approaches shows a significant improvement, with the proposed solution taking only a fraction of a millisecond to retrieve the MultiXact member count, compared to tens or hundreds of milliseconds for the current filesystem-based approach. And as more members are generated, the gap widens.

Following is the comparison of performance between calculating storage of MultiXact members directory and retrieving the count of members.

Implementation | Used size | MultiXact members
----------------------------------------------------+-------------+------------------
EC2 community (RDS version of pg_ls_multixactdir) | 8642 MB | 1.8 billion
Linux du command | 8642 MB | 1.8 billion
Proposal (ReadMultiXactCounts) | 8642 MB | 1.8 billion

============================================================================================
Sample runs
============================================================================================
Using "du -h"
--------------------
postgres=# \! time du -h /rdsdbdata/db/17.4/data/pg_multixact/members
13G /rdsdbdata/db/17.4/data/pg_multixact/members

real 0m0.285s <============================= time taken
user 0m0.050s <============================= time taken
sys 0m0.140s

Using RDS's pg_ls_multixactdir ():
------------------------------------------------------------
postgres=# SELECT
pg_size_pretty(coalesce(sum(size), 0)) AS members_size
FROM
pg_ls_multixactdir ()
WHERE
name LIKE 'pg_multixact/members%';
members_size
--------------
13 GB
(1 row)

Time: 226.533 ms <============================= time taken

Using proposed function:
----------------------------------------
postgres=# SELECT to_char(pg_get_multixact_members_count(), '999,999,999,999') AS members_count;
members_count
------------------
2,745,823,171
(1 row)

Time: 0.142 ms <============================= time taken

============================================================================================

I believe exposing ReadMultiXactCounts() would be a valuable addition to the PostgreSQL ecosystem, providing users with a more reliable and efficient way to monitor MultiXact usage. Appreciate your feedback or discussion on this proposal.

Please let me know if this approach is acceptable, so I’ll go ahead and submit a patch.

Thank you!

References:

[1] https://github.com/postgres/postgres/blob/master/src/backend/access/transam/multixact.c#L2925-L2948

Thank you!

Best regards,
Naga Appani
Postgres Database Engineer
Amazon Web Services

Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

От

Kirill Reshke

Дата:

11 марта 2025 г., 12:48:01

On Tue, 11 Mar 2025 at 14:37, Naga Appani <nagnrik@gmail.com> wrote:
>
>
>
> On Mon, Mar 10, 2025 at 10:43 AM Naga Appani <nagnrik@gmail.com> wrote:
>>
>> Hi,
>>

Hi

> =================
> Proposal
> =================
> The internal ReadMultiXactCounts() function, implemented in multixact.c, directly calculates the number of MultiXact
membersby reading live state from shared memory. This approach avoids the performance issues of the current
filesystem-basedestimation methods. 

This proposal looks sane. It is indeed helpful to keep an eye out for
multixact usage in systems that are heavily loaded.

> By exposing ReadMultiXactCounts() for external use, we can provide PostgreSQL users with an efficient way to monitor
MultiXactmember usage. This could be particularly useful for integrating with tools like Amazon RDS Performance
Insightsand Amazon CloudWatch to provide enhanced database insights and proactive managed monitoring for users. 
>
> Please let me know if this approach is acceptable, so I’ll go ahead and submit a patch.

Let's give it a try!



--
Best regards,
Kirill Reshke

Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

От

Naga Appani

Дата:

10 июня 2025 г., 00:52:15

On Tue, Mar 11, 2025 at 4:48 AM Kirill Reshke <reshkekirill@gmail.com> wrote:
>
> On Tue, 11 Mar 2025 at 14:37, Naga Appani <nagnrik@gmail.com> wrote:
> >
> >
> >
> > On Mon, Mar 10, 2025 at 10:43 AM Naga Appani <nagnrik@gmail.com> wrote:
> >>
> >> Hi,
> >>
>
> Hi
>
> > =================
> > Proposal
> > =================
> > The internal ReadMultiXactCounts() function, implemented in multixact.c, directly calculates the number of
MultiXactmembers by reading live state from shared memory. This approach avoids the performance issues of the current
filesystem-basedestimation methods. 
>
> This proposal looks sane. It is indeed helpful to keep an eye out for
> multixact usage in systems that are heavily loaded.
>
> > By exposing ReadMultiXactCounts() for external use, we can provide PostgreSQL users with an efficient way to
monitorMultiXact member usage. This could be particularly useful for integrating with tools like Amazon RDS Performance
Insightsand Amazon CloudWatch to provide enhanced database insights and proactive managed monitoring for users. 
> >
> > Please let me know if this approach is acceptable, so I’ll go ahead and submit a patch.
>
> Let's give it a try!

Hi,

As a follow-up, I’m submitting a patch that introduces a SQL-callable
function to retrieve MultiXact usage metrics. Although the motivation
has been discussed earlier in this thread, I’m including a brief recap
below to provide context for the patch itself.

While wraparound due to MultiXacts (MXID) is less frequent than XID
wraparound, it can still lead to aggressive/wraparound vacuum behavior
or downtime in certain workloads — especially those involving foreign
keys, shared row locks, or long-lived transactions. Currently, users
have no SQL-level visibility into MultiXact member consumption, which
makes it hard to proactively respond before issues arise. The only
workaround today involves scanning the pg_multixact/members directory
on disk, current workaround uses stat() calls over potentially
millions of small segment files, adds I/O overhead, and is unsuitable
for periodic monitoring or integration into observability platforms.

Unlike the approach originally proposed or discussed in this thread,
this patch does not expose the internal ReadMultiXactCounts() function
directly. Instead, it wraps it internally (without changing its
visibility) to make the data available via a new SQL function.

This patch adds:

    pg_get_multixact_count()

It returns a composite of:
    - multixacts: number of MultiXact IDs that currently exist
    - members:    number of MultiXact member entries currently exist

Implementation
--------------
- Defined in multixact.c
- Calls ReadMultiXactCounts()
- Returns a composite record (multixacts, members)
- Includes documentation

Use cases
---------
This function enables users to:
- Monitor member usage to anticipate aggressive vacuum and avoid wraparound risk
- Track long-lived workloads that accumulate MultiXacts
- Power lightweight monitoring/diagnostics tools without scanning the filesystem
- Log and analyze MultiXact growth over time

Sample output
-------------
 multixacts |  members
------------+------------
 182371396  | 2826221174
(1 row)

Performance comparison
----------------------
While performance is not the primary motivation for this patch, it
becomes important in monitoring scenarios where frequent polling is
expected. The proposed function executes in sub-millisecond time and
avoids any filesystem I/O, making it well-suited for lightweight,
periodic monitoring.

Implementation                        | Used size | MultiXact members
| Time (ms) | Relative cost
-------------------------------------+-----------+-------------------+-----------+----------------
Community (pg_ls_multixactdir)       | 8642 MB   | 1.8 billion       |
96.879    | 1.00 (baseline)
Linux (du command)                   | 8642 MB   | 1.8 billion       |
96        | 1.00
Proposal (ReadMultiXactCounts-based) | N/A       | 1.99 billion      |
0.167     | ~580x faster

Documentation
-------------
- A new section is added to func.sgml to group multixact-related functions
- A reference to this new function is included in the "Multixacts and
Wraparound" subsection of maintenance.sgml

To keep related functions grouped together, we can consider moving
mxid_age() into the new section as well unless there are objections to
relocating it from the current section.

This patch aims to fill a long-standing observability gap.

Patch attached.

Best regards,
Naga Appani
Postgres Database Engineer
Amazon Web Services

>
>
>
> --
> Best regards,
> Kirill Reshke

Вложения

v1-0001-Add-pg_get_multixact_count-function-and-related-d.patch

Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

От

Ashutosh Bapat

Дата:

25 июля 2025 г., 13:57:37

On Tue, Jun 10, 2025 at 7:50 PM Naga Appani <nagnrik@gmail.com> wrote:
>
> On Tue, Mar 11, 2025 at 4:48 AM Kirill Reshke <reshkekirill@gmail.com> wrote:
> >
> > On Tue, 11 Mar 2025 at 14:37, Naga Appani <nagnrik@gmail.com> wrote:
> > >
> > >
> > >
> > > On Mon, Mar 10, 2025 at 10:43 AM Naga Appani <nagnrik@gmail.com> wrote:
> > >>
> > >> Hi,
> > >>
> >
> > Hi
> >
> > > =================
> > > Proposal
> > > =================
> > > The internal ReadMultiXactCounts() function, implemented in multixact.c, directly calculates the number of
MultiXactmembers by reading live state from shared memory. This approach avoids the performance issues of the current
filesystem-basedestimation methods. 
> >
> > This proposal looks sane. It is indeed helpful to keep an eye out for
> > multixact usage in systems that are heavily loaded.
> >
> > > By exposing ReadMultiXactCounts() for external use, we can provide PostgreSQL users with an efficient way to
monitorMultiXact member usage. This could be particularly useful for integrating with tools like Amazon RDS Performance
Insightsand Amazon CloudWatch to provide enhanced database insights and proactive managed monitoring for users. 
> > >
> > > Please let me know if this approach is acceptable, so I’ll go ahead and submit a patch.
> >
> > Let's give it a try!
>
> Hi,
>
> As a follow-up, I’m submitting a patch that introduces a SQL-callable
> function to retrieve MultiXact usage metrics. Although the motivation
> has been discussed earlier in this thread, I’m including a brief recap
> below to provide context for the patch itself.
>
> While wraparound due to MultiXacts (MXID) is less frequent than XID
> wraparound, it can still lead to aggressive/wraparound vacuum behavior
> or downtime in certain workloads — especially those involving foreign
> keys, shared row locks, or long-lived transactions. Currently, users
> have no SQL-level visibility into MultiXact member consumption, which
> makes it hard to proactively respond before issues arise.

I see mxid_age() will just give mxid consumption but not members
consumption. So just that function is not enough.

>
> Sample output
> -------------
>  multixacts |  members
> ------------+------------
>  182371396  | 2826221174
> (1 row)
>
> Performance comparison
> ----------------------
> While performance is not the primary motivation for this patch, it
> becomes important in monitoring scenarios where frequent polling is
> expected. The proposed function executes in sub-millisecond time and
> avoids any filesystem I/O, making it well-suited for lightweight,
> periodic monitoring.
>
> Implementation                        | Used size | MultiXact members
> | Time (ms) | Relative cost
> -------------------------------------+-----------+-------------------+-----------+----------------
> Community (pg_ls_multixactdir)       | 8642 MB   | 1.8 billion       |
> 96.879    | 1.00 (baseline)
> Linux (du command)                   | 8642 MB   | 1.8 billion       |
> 96        | 1.00
> Proposal (ReadMultiXactCounts-based) | N/A       | 1.99 billion      |
> 0.167     | ~580x faster
>
> Documentation
> -------------
> - A new section is added to func.sgml to group multixact-related functions
> - A reference to this new function is included in the "Multixacts and
> Wraparound" subsection of maintenance.sgml
>
> To keep related functions grouped together, we can consider moving
> mxid_age() into the new section as well unless there are objections to
> relocating it from the current section.

In [1], we decided to document pg_get_multixact_member() in section
"Transaction ID and Snapshot Information Functions". I think the
discussion in the email thread applies to this function as well.


+  <sect2 id="functions-info-multixact-information">
+   <title>MultiXact Information Functions</title>
+

+       <entry role="func_table_entry">
+        <para role="func_signature">
+         <indexterm><primary>pg_get_multixact_count</primary></indexterm>
+         <function>pg_get_multixact_count</function> ()
+         <returnvalue>record</returnvalue>
+        </para>
+        <para>
+         Returns a record with the fields
<structfield>multixacts</structfield> and
<structfield>members</structfield>:
+         <itemizedlist>
+          <listitem>
+           <para><structfield>multixacts</structfield>: Number of
MultiXacts assigned.
+            PostgreSQL initiates aggressive autovacuum when this
value grows beyond the threshold
+            defined by
<varname>autovacuum_multixact_freeze_max_age</varname>, which is based
on
+            the age of <literal>datminmxid</literal>. For more details, see
+            <ulink
url="https://www.postgresql.org/docs/current/routine-vacuuming.html#VACUUM-FOR-MULTIXACT-WRAPAROUND">
+            Routine Vacuuming: Multixact Wraparound</ulink>.</para>
+          </listitem>
+          <listitem>
+           <para><structfield>members</structfield>: Number of
MultiXact member entries created.
+            These are stored in files under the
<filename>pg_multixact/members</filename> subdirectory.
+            Wraparound occurs after approximately 4.29 billion
entries (~20 GiB). PostgreSQL initiates
+            aggressive autovacuum when the number of members created
exceeds approximately 2.145 billion
+            or when storage consumption in
<filename>pg_multixact/members</filename> approaches 10 GiB.</para>
+          </listitem>
+         </itemizedlist>
+        </para>
+       </entry>

The description here doesn't follow the format of the other functions
in this section. We usually explain the inputs and outputs of the
function but not how to use the outputs. In this case, you might want
to just refer to Multixact Wraparound section under Routine Vacuuming
chapter rather than describing the autovacuum behaviour. You can do
that by inserting <xref linkend="vacuum-for-multixact-wraparound"/>
instead of a full URL. These links are appropriately resolved when
creating HTML to version specific links. The URL you have used will
always point to "Current" version.

+    <para>
+     The <function><link
linkend="functions-multixact-information">pg_get_multixact_count</link></function>
+     function provides a way to check how many multixacts and member
entries have been allocated. This can
+     be useful for identifying unusual multixact activity, monitoring
progress toward wraparound, anticipating
+     system-wide aggressive autovacuum as usage approaches critical
thresholds, or verifying whether autovacuum
+     is keeping up with demand.
+    </para>
+

This is the right place to go in details of how the function can be
used; not the function documentation itself. I am yet to make up
whether we need the whole description. I think the first line is
enough and goes well with the rest of the section.

+
+ if (!ReadMultiXactCounts(&multixacts, &members))
+ ereport(ERROR,
+ (errmsg("could not read multixact counts")));

Throwing an error causes the surrounding transaction to abort, so it
should be avoided in a monitoring/reporting function if possible. In
this case for example, we could throw a warning instead or report NULL
values.

If ReadMultiXactCounts() returns false,
MultiXactMemberFreezeThreshold() returns 0, which will cause the
autovacuum to be more aggressive. I think it will be good to highlight
that in the function description since that's one of the objectives of
this function: to know when the autovacuum is going to be more
aggressive.

+
+ values[0] = UInt32GetDatum(multixacts);
+ values[1] = UInt32GetDatum(members);
+
+ tuple = heap_form_tuple(tupdesc, values, nulls);
+ PG_RETURN_DATUM(HeapTupleGetDatum(tuple));
+}

In PG14+, the transaction wraparound is triggered if the size of the
directory exceeds 10GB. This function does not help monitoring that
condition. So a user will need to use du or pg_ls_multixactdir()
anyway, which defeats the purpose of this function being more
efficient than those methods. Am I correct? Can we also report the
size of the directory in this function?

The patch needs tests.

[1] https://www.postgresql.org/message-id/aF8b_fp_9Va58vB9%40nathan

--
Best Wishes,
Ashutosh Bapat

Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

От

Michael Paquier

Дата:

28 июля 2025 г., 07:22:04

On Fri, Jul 25, 2025 at 04:27:37PM +0530, Ashutosh Bapat wrote:
> In [1], we decided to document pg_get_multixact_member() in section
> "Transaction ID and Snapshot Information Functions". I think the
> discussion in the email thread applies to this function as well.

Yep, let's be consistent.

> Throwing an error causes the surrounding transaction to abort, so it
> should be avoided in a monitoring/reporting function if possible. In
> this case for example, we could throw a warning instead or report NULL
> values.

Most likely returning NULL is the best thing we can do, as a safe
fallback.

> The patch needs tests.

Indeed.

May I also suggest a split of the multixact SQL functions into a
separate file, a src/backend/utils/adt/multixactfuncs.c?  The existing
pg_get_multixact_members() relies on GetMultiXactIdMembers(),
available in multixact.h.  The new function pg_get_multixact_count()
relies on ReadMultiXactCounts(), which would mean adding it in
multixact.h.  Even if we finish without an agreement about the SQL
function and the end, publishing ReadMultiXactCounts() would give an
access to the internals to external code.

+PG_FUNCTION_INFO_V1(pg_get_multixact_count);

There should be no need for that, pg_proc.dat handling the
declaration AFAIK.

FWIW, these functions are always kind of hard to use for the end-user
without proper documentation.  You may want to add an example of how
one can use it for monitoring in the docs.
--
Michael

Вложения

signature.asc

Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

От

Ashutosh Bapat

Дата:

28 июля 2025 г., 09:00:08

On Mon, Jul 28, 2025 at 9:52 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> May I also suggest a split of the multixact SQL functions into a
> separate file, a src/backend/utils/adt/multixactfuncs.c?  The existing
> pg_get_multixact_members() relies on GetMultiXactIdMembers(),
> available in multixact.h.  The new function pg_get_multixact_count()
> relies on ReadMultiXactCounts(), which would mean adding it in
> multixact.h.  Even if we finish without an agreement about the SQL
> function and the end, publishing ReadMultiXactCounts() would give an
> access to the internals to external code.
>
> +PG_FUNCTION_INFO_V1(pg_get_multixact_count);
>
> There should be no need for that, pg_proc.dat handling the
> declaration AFAIK.
>
> FWIW, these functions are always kind of hard to use for the end-user
> without proper documentation.  You may want to add an example of how
> one can use it for monitoring in the docs.

+1.

Let's say if the user knows that the counts are so high that a
wraparound is imminent, but vacuuming isn't solving the problem, they
would like to know which transactions are holding it back.
pg_get_multixact_members() can be used to get the members of the
oldest multixact if it's reported and then the user can deal with
those transactions. However, the oldest multixact is not reported
anywhere, AFAIK. It's also part of MultiXactState, so can be extracted
via ReadMultiXactCounts(). We could report it through
pg_get_multixact_counts - after renaming it and ReadMultiXactCounts to
pg_get_multixact_stats() and ReadMultiXactStats() respectively. Or we
could write another function to do so. But it comes handy using query
like below
#select oldestmultixact,
pg_get_multixact_members(oldestmultixact::text::xid) from
pg_get_multixact_count();
 oldestmultixact | pg_get_multixact_members
------------------+--------------------------
                1 | (757,sh)
                1 | (768,sh)
(2 rows)

Here's a quick patch implementing the same. Please feel free to
incorporate and refine it in your patch if you like it.

--
Best Wishes,
Ashutosh Bapat

Вложения

oldest_multixact.patch.txt

Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

От

Naga Appani

Дата:

04 августа 2025 г., 09:16:30

Hi Ashutosh, Michael,

Thanks for the detailed reviews. I have incorporated the feedback;
please find attached v2 and my responses inline below.

On Fri, Jul 25, 2025 at 5:57 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

> In [1], we decided to document pg_get_multixact_member() in section
> "Transaction ID and Snapshot Information Functions". I think the
> discussion in the email thread applies to this function as well.

Done -- the function is now documented under “Transaction ID and
Snapshot Information Functions” for consistency.

> The description here doesn't follow the format of the other functions
> in this section.

Updated the description in func.sgml to match the style of other
functions. Extended usage guidance is now in maintenance.sgml.

> Throwing an error causes the surrounding transaction to abort, so it
> should be avoided in a monitoring/reporting function if possible.

The function now returns NULL instead of throwing an error when counts
can’t be read.

> If ReadMultiXactCounts() returns false, MultiXactMemberFreezeThreshold() returns 0...

Noted -- the docs now mention that the function can be used to
anticipate more aggressive autovacuum behavior in such cases.

> In PG14+, the transaction wraparound is triggered if the size of the
> directory exceeds 10GB. This function does not help monitoring that
> condition. So a user will need to use du or pg_ls_multixactdir()
> anyway, which defeats the purpose of this function being more
> efficient than those methods. Am I correct? Can we also report the
> size of the directory in this function?

Correct, that is the intent of the function. The members count
returned by this function already provides the necessary information
to determine the directory size, since each member entry has a fixed
size. The constants and formulas in [0] and discussed in [1] show that
each group stores four bytes of flags plus four TransactionIds (20
bytes total), yielding 409 groups per 8 KB page and a fixed
members‑to‑bytes ratio. This means ~2 billion members corresponds to
~10 GB (aggressive autovacuum threshold) and ~4 billion members
corresponds to ~20 GB (wraparound).

Since the function already provides the member count, including the
physical size in its output would duplicate information and add no
extra benefit.

> The patch needs tests.

Added an isolation test to cover initial state, MultiXact creation,
counts, and oldest MultiXact reporting.

On Mon, Jul 28, 2025 at 1:00 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

> Let's say if the user knows that the counts are so high that a
> wraparound is imminent, but vacuuming isn't solving the problem...
> Here's a quick patch implementing the same. Please feel free to
> incorporate and refine it in your patch if you like it.

Thank you for sharing the patch. I have incorporated it into this
version with minor adjustments, and it fits well with the overall
design of the function.

On Mon, Jul 28, 2025 at 4:22 AM Michael Paquier <michael@paquier.xyz> wrote:

> Yep, let's be consistent.

Done -- placed in “Transaction ID and Snapshot Information Functions”
for consistency.

> Most likely returning NULL is the best thing we can do, as a safe fallback.

Implemented -- the function now returns NULL if counts can’t be read.

> The patch needs tests.

Isolation tests have been added as described above.

> May I also suggest a split of the multixact SQL functions into a
> separate file, a src/backend/utils/adt/multixactfuncs.c?

I agree that would be cleaner, but I’d prefer to keep the
implementation in multixact.c for now to maintain focus on this patch
and revisit the refactoring later.

> +PG_FUNCTION_INFO_V1(pg_get_multixact_count);

Removed -- now handled entirely by pg_proc.dat.

> ...You may want to add an example of how one can use it for monitoring in the docs.

I’ve added a usage example with sample output in the docs. If you had
a different kind of demo in mind (e.g., creating multixacts manually
and showing the output), please let me know.

References:
[0] https://github.com/postgres/postgres/blob/master/src/backend/access/transam/multixact.c#L130-L156
[1] https://www.postgresql.org/message-id/flat/CACbFw60UOk6fCC02KsyT3OfU9Dnuq5roYxdw2aFisiN_p1L0bg%40mail.gmail.com

Best regards,
Naga Appani

Вложения

v2-0001-Add-pg_get_multixact_stats-SQL-function-for-monit.patch

Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

От

Naga Appani

Дата:

05 августа 2025 г., 00:51:30

On Mon, Aug 4, 2025 at 1:16 AM Naga Appani <nagnrik@gmail.com> wrote:
>
> Hi Ashutosh, Michael,
>
> Thanks for the detailed reviews. I have incorporated the feedback;
> please find attached v2 and my responses inline below.
>
> On Fri, Jul 25, 2025 at 5:57 AM Ashutosh Bapat
> <ashutosh.bapat.oss@gmail.com> wrote:
>
> > In [1], we decided to document pg_get_multixact_member() in section
> > "Transaction ID and Snapshot Information Functions". I think the
> > discussion in the email thread applies to this function as well.
>
> Done -- the function is now documented under “Transaction ID and
> Snapshot Information Functions” for consistency.
>
> > The description here doesn't follow the format of the other functions
> > in this section.
>
> Updated the description in func.sgml to match the style of other
> functions. Extended usage guidance is now in maintenance.sgml.
>
> > Throwing an error causes the surrounding transaction to abort, so it
> > should be avoided in a monitoring/reporting function if possible.
>
> The function now returns NULL instead of throwing an error when counts
> can’t be read.
>
> > If ReadMultiXactCounts() returns false, MultiXactMemberFreezeThreshold() returns 0...
>
> Noted -- the docs now mention that the function can be used to
> anticipate more aggressive autovacuum behavior in such cases.
>
> > In PG14+, the transaction wraparound is triggered if the size of the
> > directory exceeds 10GB. This function does not help monitoring that
> > condition. So a user will need to use du or pg_ls_multixactdir()
> > anyway, which defeats the purpose of this function being more
> > efficient than those methods. Am I correct? Can we also report the
> > size of the directory in this function?
>
> Correct, that is the intent of the function. The members count
> returned by this function already provides the necessary information
> to determine the directory size, since each member entry has a fixed
> size. The constants and formulas in [0] and discussed in [1] show that
> each group stores four bytes of flags plus four TransactionIds (20
> bytes total), yielding 409 groups per 8 KB page and a fixed
> members‑to‑bytes ratio. This means ~2 billion members corresponds to
> ~10 GB (aggressive autovacuum threshold) and ~4 billion members
> corresponds to ~20 GB (wraparound).
>
> Since the function already provides the member count, including the
> physical size in its output would duplicate information and add no
> extra benefit.
>
> > The patch needs tests.
>
> Added an isolation test to cover initial state, MultiXact creation,
> counts, and oldest MultiXact reporting.
>
> On Mon, Jul 28, 2025 at 1:00 AM Ashutosh Bapat
> <ashutosh.bapat.oss@gmail.com> wrote:
>
> > Let's say if the user knows that the counts are so high that a
> > wraparound is imminent, but vacuuming isn't solving the problem...
> > Here's a quick patch implementing the same. Please feel free to
> > incorporate and refine it in your patch if you like it.
>
> Thank you for sharing the patch. I have incorporated it into this
> version with minor adjustments, and it fits well with the overall
> design of the function.
>
> On Mon, Jul 28, 2025 at 4:22 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> > Yep, let's be consistent.
>
> Done -- placed in “Transaction ID and Snapshot Information Functions”
> for consistency.
>
> > Most likely returning NULL is the best thing we can do, as a safe fallback.
>
> Implemented -- the function now returns NULL if counts can’t be read.
>
> > The patch needs tests.
>
> Isolation tests have been added as described above.
>
> > May I also suggest a split of the multixact SQL functions into a
> > separate file, a src/backend/utils/adt/multixactfuncs.c?
>
> I agree that would be cleaner, but I’d prefer to keep the
> implementation in multixact.c for now to maintain focus on this patch
> and revisit the refactoring later.
>
> > +PG_FUNCTION_INFO_V1(pg_get_multixact_count);
>
> Removed -- now handled entirely by pg_proc.dat.
>
> > ...You may want to add an example of how one can use it for monitoring in the docs.
>
> I’ve added a usage example with sample output in the docs. If you had
> a different kind of demo in mind (e.g., creating multixacts manually
> and showing the output), please let me know.
>
> References:
> [0] https://github.com/postgres/postgres/blob/master/src/backend/access/transam/multixact.c#L130-L156
> [1] https://www.postgresql.org/message-id/flat/CACbFw60UOk6fCC02KsyT3OfU9Dnuq5roYxdw2aFisiN_p1L0bg%40mail.gmail.com
>
> Best regards,
> Naga Appani

Following up on my v2 from yesterday — the recent commit [0] changed
the directory layout, which broke the patch (v2). This v3 updates the
code to work with the new structure and also fixes some formatting
issues I noticed while revisiting the changes.

The rest of the patch remains the same as v2, which incorporated
feedback from Ashutosh and Michael (see my previous email for
details).

Please find v3 attached.

References:
[0] https://github.com/postgres/postgres/commit/4e23c9ef65accde7eb3e56aa28d50ae5cf79b64b

Best regards,
Naga Appani

Вложения

v3-0001-Add-pg_get_multixact_stats-SQL-function-for-monit.patch

Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

От

Michael Paquier

Дата:

08 августа 2025 г., 03:35:52

On Mon, Aug 04, 2025 at 04:51:30PM -0500, Naga Appani wrote:
> The rest of the patch remains the same as v2, which incorporated
> feedback from Ashutosh and Michael (see my previous email for
> details).
>
> Please find v3 attached.

I am reading again what you have here, and I really think that we
should move the SQL function parts of multixact.c into their own new
file, exposing ReadMultiXactCounts() in multixact.h, because I also
suspect that this can become really useful for extensions that aim at
doing things similar to your proposal in terms of data monitoring for
autovacuum wraparound.  This means two refactoring patches:
- One to expose the new routine in multixact.h.
- One to move the existing SQL code to its new file.

ReadMultiXactCounts() is also incorrectly named with your proposal to
expose oldestMultiXactId in the information returned to the caller,
where the key point is to make sure that the information retrieved is
consistent across a single LWLock acquisition.  So perhaps this should
be named GetMultiXactInformation() or something similar?

The top of ReadMultiXactCounts() (or whatever its new name) should
also document the information returned across a single call.  It looks
inconsistent to return oldestMultiXactId if the oldestOffsetKnown is
false.  What about oldestOffset itself?  Should it be returned for
consistency with the counts and oldestMultiXactId?
--
Michael

Вложения

signature.asc

Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

От

Ashutosh Bapat

Дата:

08 августа 2025 г., 12:32:53

On Mon, Aug 4, 2025 at 11:46 AM Naga Appani <nagnrik@gmail.com> wrote:
> > In PG14+, the transaction wraparound is triggered if the size of the
> > directory exceeds 10GB. This function does not help monitoring that
> > condition. So a user will need to use du or pg_ls_multixactdir()
> > anyway, which defeats the purpose of this function being more
> > efficient than those methods. Am I correct? Can we also report the
> > size of the directory in this function?
>
> Correct, that is the intent of the function. The members count
> returned by this function already provides the necessary information
> to determine the directory size, since each member entry has a fixed
> size. The constants and formulas in [0] and discussed in [1] show that
> each group stores four bytes of flags plus four TransactionIds (20
> bytes total), yielding 409 groups per 8 KB page and a fixed
> members‑to‑bytes ratio. This means ~2 billion members corresponds to
> ~10 GB (aggressive autovacuum threshold) and ~4 billion members
> corresponds to ~20 GB (wraparound).

Would it be better to do that math in the function and output the
result? Users may not be able to read and understand the PostgreSQL
code or pgsql-hackers threads Or the constants may change across
versions. It will be more convenient for users if they get the output
from the function itself.

On Fri, Aug 8, 2025 at 6:05 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> ReadMultiXactCounts() is also incorrectly named with your proposal to
> expose oldestMultiXactId in the information returned to the caller,
> where the key point is to make sure that the information retrieved is
> consistent across a single LWLock acquisition.  So perhaps this should
> be named GetMultiXactInformation() or something similar?

+1

>
> The top of ReadMultiXactCounts() (or whatever its new name) should
> also document the information returned across a single call.  It looks
> inconsistent to return oldestMultiXactId if the oldestOffsetKnown is
> false.  What about oldestOffset itself?  Should it be returned for
> consistency with the counts and oldestMultiXactId?

+1

Some more comments on the patch
+ <literal>multixacts</literal> is the number of multixact IDs assigned,
+ <literal>members</literal> is the number of multixact member entries created,
+ and <literal>oldest_multixact</literal> is the oldest MultiXact ID
still in use.

Now that the name of the function is changed, we need the names to
indicate that they are counts e.g. num_mxids, num_members.

+ These values can be used to monitor multixact consumption and anticipate
+ autovacuum behavior. See <xref linkend="vacuum-for-multixact-wraparound"/>
+ for further details on multixact wraparound.
+ </para>
+
+ <para>
+ <literal>SELECT * FROM pg_get_multixact_stats();</literal>
+<programlisting>
+ multixacts | members | oldest_multixact
+------------+-------------+------------------
+ 182371396 | 2826221174 | 754321
+</programlisting>

This file doesn't provide usage examples of other functions. This
function doesn't seem to be an exception.

I think we should mention that the statistics may get stale as soon as
it's fetched, even with REPEATABLE READ isolation level.

+ linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if either
+ the storage occupied by multixact members exceeds about 10GB or the number
+ of members created exceeds approximately 2 billion entries, aggressive vacuum

In case each member starts consuming more or less space than it does
today what would be the basis of triggering workaround? Space or
number of members? I think we should mention only that.

scans will occur more often for all tables, starting with those that
- have the oldest multixact-age. Both of these kinds of aggressive
+ have the oldest multixact-age. Both of these kinds of aggressive
scans will occur even if autovacuum is nominally disabled. The members storage
- area can grow up to about 20GB before reaching wraparound.
+ area can grow up to about 20GB or approximately 4 billion entries before
+ reaching wraparound.

Similar to above.

+ </para>
+
+ <para>
+ The <function>pg_get_multixact_stats()</function> function provides a way
+ to monitor multixact allocation and usage patterns in real time. By exposing

This is the right place to elaborate the usage of this function with an example.

+ counts of multixacts, member entries, and the oldest multixact ID, it helps:
+ <orderedlist>
+ <listitem>
+ <simpara>
+ Identify unusual multixact activity from concurrent row-level locks
+ or foreign key operations
+ </simpara>
+ </listitem>
+ <listitem>
+ <simpara>
+ Monitor progress toward wraparound thresholds that trigger aggressive
+ autovacuum (approximately 2 billion members or 10GB storage)
+ </simpara>
+ </listitem>
+ <listitem>
+ <simpara>
+ Verify whether autovacuum is effectively managing multixact cleanup
+ before reaching critical thresholds
+ </simpara>
+ </listitem>
+ </orderedlist>
+ See <xref linkend="functions-info-snapshot"/> for details.

I think the second point here repeats what's already mentioned
earlier. It will be good to elaborate each point with an example
instead of just narration.

+/*
+ * pg_get_multixact_stats
+ *
+ * SQL-callable function to retrieve MultiXact statistics.
+ *
+ * Returns a composite row containing:
+ * - total number of MultiXact IDs created since startup,
+ * - total number of MultiXact members created,

... since startup or the number of existing members?

+ * - the oldest existing MultiXact ID.
+ *
+ * This is primarily useful for monitoring MultiXact usage and ensuring
+ * appropriate wraparound protection.

The last two lines are not required, I think. One of its usage is
monitoring but users may find other usages.

+
+step commit1: COMMIT;
+step commit2: COMMIT;
+step check:
+ SELECT
+ multixacts,
+ members,
+ oldest_multixact
+ FROM pg_get_multixact_stats();
+
+multixacts|members|oldest_multixact
+----------+-------+----------------
+ 1| 3| 1
+(1 row)

Vacuum may clean the multixact between commit2 and check, in which
case the result won't be stable.

--
Best Wishes,
Ashutosh Bapat

Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

От

Naga Appani

Дата:

17 августа 2025 г., 09:27:29

Hi Michael, Ashutosh,

Thanks a lot for the detailed reviews and feedback. Please find
attached v4 of the patchset.

Summary of changes in v4:
- Split into two patches as suggested:
  1. Expose and rename ReadMultiXactCounts() -> GetMultiXactInfo() in
multixact.h with clearer comments.
  2. Add pg_get_multixact_stats() as a SQL-callable function in a new
file (multixactfuncs.c), with docs and tests.
- Function now also returns oldestOffset for consistency.
- Field names updated to num_mxids, num_members, oldest_multixact,
oldest_offset.
- Documentation revised to describe thresholds only in terms of member
counts (disk size wording removed).
- Added a minimal example in maintenance.sgml where multixact
wraparound is already described.
- Isolation tests are rewritten so they no longer depend on exact
counts, but only on monotonic properties guaranteed while a multixact
is pinned.

Replies inline below:

On Thu, Aug 7, 2025 at 7:35 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> I really think that we should move the SQL function parts of multixact.c
> into their own new file, exposing ReadMultiXactCounts() in multixact.h...

Done. The SQL-callable code now lives in
  src/backend/utils/adt/multixactfuncs.c
and the accessor is declared in
  src/include/access/multixact.h.

> ReadMultiXactCounts() is also incorrectly named with your proposal to
> expose oldestMultiXactId in the information returned to the caller...
> So perhaps this should be named GetMultiXactInformation() or something
> similar?

Renamed to GetMultiXactInfo().

> The top of ReadMultiXactCounts() (or whatever its new name) should
> also document the information returned across a single call.

Added detailed comments about consistency under a single LWLock and the
meaning of each field.

> It looks inconsistent to return oldestMultiXactId if the
> oldestOffsetKnown is false. What about oldestOffset itself?

GetMultiXactInfo() now returns oldestOffset as well. If the oldest
offset isn’t currently known, the function returns false and clears
all outputs, so callers don’t see a partially valid struct.

---

On Fri, Aug 8, 2025 at 4:33 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> Would it be better to do that math in the function and output the result?

That’s a cool idea, thanks for pointing it out. For now I have kept the
SQL function focused only on exposing the raw counts (num_mxids,
num_members, oldest IDs). My thought was that keeping the API lean makes
it easier to maintain across versions, while leaving any derived
calculations like approximate storage size to SQL or external tooling.
This way the function remains simple and future-proof, while still
giving users the building blocks to get the view they need.

I’m happy to revisit this if others feel it would be better for the
function to provide an approximate size directly — I wanted to start
with the simplest surface and gather feedback first.

> Now that the name of the function is changed, we need the names to
> indicate that they are counts e.g. num_mxids, num_members.

Adjusted. The SQL function returns: num_mxids, num_members,
oldest_multixact, oldest_offset.

> This file doesn't provide usage examples of other functions. This
> function doesn't seem to be an exception.

Earlier I thought it was fine to add an example since
pg_input_error_info() also has one, so in this version I placed the
example in maintenance.sgml, where we already discuss multixact
wraparound. That seemed like the most natural place for it. I agree with
your point about consistency, though, so I kept the style minimal and
aligned with the surrounding text.

> I think we should mention that the statistics may get stale as soon as
> it's fetched, even with REPEATABLE READ isolation level.

Added a note that values are a live snapshot and can change immediately.

> In case each member starts consuming more or less space than it does
> today what would be the basis of triggering wraparound? Space or
> number of members? I think we should mention only that.

I updated the docs to describe wraparound in terms of member counts only.
The earlier mention of disk size has been dropped, since the thresholds
are defined by counts.

> This is the right place to elaborate the usage of this function with an
> example.

Expanded with a short example, while keeping it consistent with nearby
entries.

> ... since startup or the number of existing members?

Clarified that the values reflect what’s *currently in use* (i.e.,
derived from next/oldest) and that NULLs are returned if the multixact
subsystem has not been initialized yet.

> The last two lines are not required, I think. One of its usage is
> monitoring but users may find other usages.

Dropped those lines.

> Vacuum may clean the multixact between commit2 and check, in which
> case the result won't be stable.

Right, the earlier version of the test assumed stable counts, which
could fail if autovacuum or background cleanup removed entries in
between steps. In v4 the isolation test no longer relies on exact
numbers. Instead it asserts only the monotonic properties that are
guaranteed while a multixact is pinned, and avoids assumptions once
locks are released. That makes the test robust against concurrent vacuum
activity.

---

Thanks again for the thoughtful reviews and guidance. Please let me know
if you see further adjustments needed.

Best regards,
Naga

Вложения

Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

От

Michael Paquier

Дата:

18 августа 2025 г., 09:49:41

On Sun, Aug 17, 2025 at 01:27:29AM -0500, Naga Appani wrote:
> On Thu, Aug 7, 2025 at 7:35 PM Michael Paquier <michael@paquier.xyz> wrote:
>>
>> I really think that we should move the SQL function parts of multixact.c
>> into their own new file, exposing ReadMultiXactCounts() in multixact.h...
>
> Done. The SQL-callable code now lives in
>   src/backend/utils/adt/multixactfuncs.c
> and the accessor is declared in
>   src/include/access/multixact.h.

My point was a bit different: multixactfuncs.c should be created first
because we already have one SQL function in multixact.c that can be
moved inside it, with the declarations it requires added to
multixact.h.  I've extracted what you did, moved the existing
pg_get_multixact_members() inside the new file, and applied the
result.

>> ReadMultiXactCounts() is also incorrectly named with your proposal to
>> expose oldestMultiXactId in the information returned to the caller...
>> So perhaps this should be named GetMultiXactInformation() or something
>> similar?
>
> Renamed to GetMultiXactInfo().

+ * Returns information about current MultiXact state in a single atomic read:

This comment is incorrect.  This is not an atomic read, grabbing a
consistent state of the data across one single lock acquisition.

Except for this comment, this looks pretty much OK.  Ashutosh, any
comments?

I have not looked at the rest.
--
Michael

Вложения

signature.asc

Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

От

Ashutosh Bapat

Дата:

18 августа 2025 г., 14:56:06

On Sun, Aug 17, 2025 at 11:57 AM Naga Appani <nagnrik@gmail.com> wrote:
> On Fri, Aug 8, 2025 at 4:33 AM Ashutosh Bapat
> <ashutosh.bapat.oss@gmail.com> wrote:

>
> > In case each member starts consuming more or less space than it does
> > today what would be the basis of triggering wraparound? Space or
> > number of members? I think we should mention only that.
>
> I updated the docs to describe wraparound in terms of member counts only.
> The earlier mention of disk size has been dropped, since the thresholds
> are defined by counts.

The current document says
"Also, if the storage occupied by multixacts members exceeds about
10GB, aggressive vacuum scans will occur more often for all tables,
starting with those that have the oldest multixact-age." - do you mean
that it's wrong. Instead of checking 10GB threashold, is the code
checking the equivalent member count? If so, I think we need a
separate patch to correct the documentation first. Can you please
point me to the code? Documentation should reflect the code.

>
> That’s a cool idea, thanks for pointing it out. For now I have kept the
> SQL function focused only on exposing the raw counts (num_mxids,
> num_members, oldest IDs). My thought was that keeping the API lean makes
> it easier to maintain across versions, while leaving any derived
> calculations like approximate storage size to SQL or external tooling.
> This way the function remains simple and future-proof, while still
> giving users the building blocks to get the view they need.
>
> I’m happy to revisit this if others feel it would be better for the
> function to provide an approximate size directly — I wanted to start
> with the simplest surface and gather feedback first.

The constant multiplier which converts a count into the disk size is
in the server code. Duplicating it outside the server code, even in
documentation, would require maintenance. GetMultiXactInfo() may not
do the arithmetic but pg_get_multixact_stats() is lean enough to add a
couple computations.

If size is being used as a threshold, reporting count is useless
because user wouldn't know the relation easily. If count is used as a
threshold, reporting count makes sense.

--
Best Wishes,
Ashutosh Bapat

Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

От

Naga Appani

Дата:

19 августа 2025 г., 04:32:39

Hi Michael, Ashutosh,

Thanks a lot for taking the time to review this patch and share your thoughts.

Here’s a short summary of what has changed in v5:
- Added the new pg_get_multixact_stats() function in multixactfuncs.c.
- Fixed the misleading “atomic read” comment in the accessor.
- Clarified documentation: thresholds are described in terms of
counts, since that’s what the code uses.
- Added a members_bytes column in pg_get_multixact_stats() to give
users a rough size estimate (num_members * 5), while making it clear
this is layout-dependent.

Please see my in-line replies below.

---

On Mon, Aug 18, 2025 at 1:49 AM Michael Paquier <michael@paquier.xyz> wrote:
> My point was a bit different: multixactfuncs.c should be created first
> because we already have one SQL function in multixact.c that can be
> moved inside it, with the declarations it requires added to
> multixact.h.  I've extracted what you did, moved the existing
> pg_get_multixact_members() inside the new file, and applied the
> result.
>
Really appreciate your clarification and for making that change. I
misunderstood your earlier point.

> + * Returns information about current MultiXact state in a single atomic read:
>
> This comment is incorrect.  This is not an atomic read, grabbing a
> consistent state of the data across one single lock acquisition.
>
Fixed and adjusted wording.

---

On Mon, Aug 18, 2025 at 6:56 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
> The current document says
> "Also, if the storage occupied by multixacts members exceeds about
> 10GB, aggressive vacuum scans will occur more often for all tables,
> starting with those that have the oldest multixact-age." - do you mean
> that it's wrong. Instead of checking 10GB threashold, is the code
> checking the equivalent member count? If so, I think we need a
> separate patch to correct the documentation first. Can you please
> point me to the code? Documentation should reflect the code.
>
The decision is made in MultiXactMemberFreezeThreshold() [0], and it
is entirely count-based:

if (members <= MULTIXACT_MEMBER_SAFE_THRESHOLD)
    return autovacuum_multixact_freeze_max_age;

fraction = (double) (members - MULTIXACT_MEMBER_SAFE_THRESHOLD) /
    (MULTIXACT_MEMBER_DANGER_THRESHOLD - MULTIXACT_MEMBER_SAFE_THRESHOLD);

MaxMultiXactOffset is defined in multixact.h [1]:

#define MaxMultiXactOffset ((MultiXactOffset) 0xFFFFFFFF)

Thresholds are defined in multixact.c [2]
#define MULTIXACT_MEMBER_SAFE_THRESHOLD    (MaxMultiXactOffset / 2)
#define MULTIXACT_MEMBER_DANGER_THRESHOLD  \
        (MaxMultiXactOffset - MaxMultiXactOffset / 4)

These translate to:
- MaxMultiXactOffset: ~4.29 billion (2^32 - 1)
- MULTIXACT_MEMBER_SAFE_THRESHOLD: ~2.14 billion (2^31 - 1)
- MULTIXACT_MEMBER_DANGER_THRESHOLD: ~3.22 billion (3/4 * 2^32)

So the code path is count-driven.

Regarding docs:
For earlier versions (18 and before), the storage-size approximation
remains relevant because users don’t have direct access to member
count information. Since we don’t plan to backpatch (I assume so) this
new function, the documentation for older branches should continue to
rely on the storage-based approximation.

Now that pg_get_multixact_stats() exposes num_members, the HEAD branch
docs can describe the thresholds in terms of counts directly. For
older branches, the storage approximation still provides users with a
practical way to reason about wraparound risk.


> The constant multiplier which converts a count into the disk size is
> in the server code. Duplicating it outside the server code, even in
> documentation, would require maintenance. GetMultiXactInfo() may not
> do the arithmetic but pg_get_multixact_stats() is lean enough to add a
> couple computations.
>
Thanks for suggesting this — it makes sense, especially for users
upgrading from earlier versions to 19 and higher. I’ve added a
members_bytes column to pg_get_multixact_stats(), computed as
num_members * 5. This respects the existing server-side logic while
also giving those users a familiar reference point, helping them
connect the older size-based guidance with the new count-based view.

---

References:
[0] https://github.com/postgres/postgres/blob/master/src/backend/access/transam/multixact.c#L2916
[1] https://github.com/postgres/postgres/blob/master/src/include/access/multixact.h#L31
[2] https://github.com/postgres/postgres/blob/master/src/backend/access/transam/multixact.c#L216-L218

Patch v5 is attached. Thanks again for the thoughtful reviews — I really
appreciate the guidance and look forward to further feedback.

Best regards,
Naga

Вложения

v5-0001-Add-pg_get_multixact_stats-function-for-monitorin.patch

Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

От

Michael Paquier

Дата:

19 августа 2025 г., 09:32:19

On Mon, Aug 18, 2025 at 08:32:39PM -0500, Naga Appani wrote:
> Thanks a lot for taking the time to review this patch and share your thoughts.
>
> Here’s a short summary of what has changed in v5:
> - Added the new pg_get_multixact_stats() function in multixactfuncs.c.
> - Fixed the misleading “atomic read” comment in the accessor.
> - Clarified documentation: thresholds are described in terms of
> counts, since that’s what the code uses.
> - Added a members_bytes column in pg_get_multixact_stats() to give
> users a rough size estimate (num_members * 5), while making it clear
> this is layout-dependent.
>
> Please see my in-line replies below.

FWIW, I think that you should be a bit more careful before sending
updated patch sets.  You have missed an extra point I have raised
upthread about the refactoring pieces: the switch from
ReadMultiXactCounts() to GetMultiXactInfo() can be done in a patch of
its own.

So I have extracted this part from your latest patch, and applied it
independently of the SQL function business.  Now we are in an
advantageous position on HEAD: even if we do not conclude about the
SQL function to show the mxact numbers and offsets, we have the
function that gives an access to the data you are looking for.  In
short, it is now possible to provide an equivalent of the feature you
want outside of core.  Not saying that the patch cannot be useful, but
such refactoring pieces open more possibilities, and offer a cleaner
commit history with less churn in the main patches.
--
Michael

Вложения

signature.asc

Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

От

Naga Appani

Дата:

20 августа 2025 г., 07:27:41

On Tue, Aug 19, 2025 at 1:32 AM Michael Paquier <michael@paquier.xyz> wrote:
> FWIW, I think that you should be a bit more careful before sending
> updated patch sets.  You have missed an extra point I have raised
> upthread about the refactoring pieces: the switch from
> ReadMultiXactCounts() to GetMultiXactInfo() can be done in a patch of
> its own.
>
> So I have extracted this part from your latest patch, and applied it
> independently of the SQL function business.  Now we are in an
> advantageous position on HEAD: even if we do not conclude about the
> SQL function to show the mxact numbers and offsets, we have the
> function that gives an access to the data you are looking for.  In
> short, it is now possible to provide an equivalent of the feature you
> want outside of core.  Not saying that the patch cannot be useful, but
> such refactoring pieces open more possibilities, and offer a cleaner
> commit history with less churn in the main patches.
> --

Thanks for the review and separating the refactoring into its own commit.
Point taken on being more careful when sending updated patch sets.
I’ll make sure to keep
refactoring and SQL layer changes clearly separated going forward.

Attached is v6, rebased on top of HEAD. This version is limited to the
SQL function only.

Changes since v5:
- Removed the refactoring, as GetMultiXactInfo() is already committed.
- Documentation revised to describe thresholds in terms of raw counts.

Hopefully this makes the proposal easier to evaluate on its own merits.

Вложения

v6-0001-Add-pg_get_multixact_stats-function-for-monitorin.patch

Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

От

torikoshia

Дата:

22 августа 2025 г., 03:28:49

On 2025-08-20 13:27, Naga Appani wrote:

Thanks for working on this!

> On Tue, Aug 19, 2025 at 1:32 AM Michael Paquier <michael@paquier.xyz>
> wrote:
>> FWIW, I think that you should be a bit more careful before sending
>> updated patch sets.  You have missed an extra point I have raised
>> upthread about the refactoring pieces: the switch from
>> ReadMultiXactCounts() to GetMultiXactInfo() can be done in a patch of
>> its own.
>>
>> So I have extracted this part from your latest patch, and applied it
>> independently of the SQL function business.  Now we are in an
>> advantageous position on HEAD: even if we do not conclude about the
>> SQL function to show the mxact numbers and offsets, we have the
>> function that gives an access to the data you are looking for.  In
>> short, it is now possible to provide an equivalent of the feature you
>> want outside of core.  Not saying that the patch cannot be useful, but
>> such refactoring pieces open more possibilities, and offer a cleaner
>> commit history with less churn in the main patches.
>> --
>
> Thanks for the review and separating the refactoring into its own
> commit.
> Point taken on being more careful when sending updated patch sets.
> I’ll make sure to keep
> refactoring and SQL layer changes clearly separated going forward.
>
> Attached is v6, rebased on top of HEAD. This version is limited to the
> SQL function only.
>

diff --git a/doc/src/sgml/maintenance.sgml
b/doc/src/sgml/maintenance.sgml
index e7a9f58c015..6f0e8d7c10a 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -813,12 +813,56 @@ HINT:  Execute a database-wide VACUUM in that
database.
      <para>
       As a safety device, an aggressive vacuum scan will
       occur for any table whose multixact-age is greater than <xref
-     linkend="guc-autovacuum-multixact-freeze-max-age"/>.  Also, if the
-     storage occupied by multixacts members exceeds about 10GB,
aggressive vacuum
+     linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if the
number
+     of members created exceeds approximately 2^31 entries, aggressive
vacuum
       scans will occur more often for all tables, starting with those
that

Looking at commit ff20ccae9fdb, it seems that the documentation was
intentionally written in terms of gigabytes rather than the number:

> The threshold is two billion members, which was interpreted as 2GB
> in the documentation. Fix to reflect that each member takes up five
> bytes, which translates to about 10GB. This is not exact, because of
> page boundaries. While at it, mention the maximum size 20GB.

Anyway, I also think, as Ashutosh suggested, that if we want to fix this
documentation, it would be better to do so in a separate patch.

--
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA Japan Corporation to SRA OSS K.K.

Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

От

torikoshia

Дата:

22 августа 2025 г., 05:07:37

On 2025-08-22 09:28, torikoshia wrote:
> On 2025-08-20 13:27, Naga Appani wrote:
>
> Thanks for working on this!
>
>> On Tue, Aug 19, 2025 at 1:32 AM Michael Paquier <michael@paquier.xyz>
>> wrote:
>>> FWIW, I think that you should be a bit more careful before sending
>>> updated patch sets.  You have missed an extra point I have raised
>>> upthread about the refactoring pieces: the switch from
>>> ReadMultiXactCounts() to GetMultiXactInfo() can be done in a patch of
>>> its own.
>>>
>>> So I have extracted this part from your latest patch, and applied it
>>> independently of the SQL function business.  Now we are in an
>>> advantageous position on HEAD: even if we do not conclude about the
>>> SQL function to show the mxact numbers and offsets, we have the
>>> function that gives an access to the data you are looking for.  In
>>> short, it is now possible to provide an equivalent of the feature you
>>> want outside of core.  Not saying that the patch cannot be useful,
>>> but
>>> such refactoring pieces open more possibilities, and offer a cleaner
>>> commit history with less churn in the main patches.
>>> --
>>
>> Thanks for the review and separating the refactoring into its own
>> commit.
>> Point taken on being more careful when sending updated patch sets.
>> I’ll make sure to keep
>> refactoring and SQL layer changes clearly separated going forward.
>>
>> Attached is v6, rebased on top of HEAD. This version is limited to the
>> SQL function only.
>>
>
> diff --git a/doc/src/sgml/maintenance.sgml
> b/doc/src/sgml/maintenance.sgml
> index e7a9f58c015..6f0e8d7c10a 100644
> --- a/doc/src/sgml/maintenance.sgml
> +++ b/doc/src/sgml/maintenance.sgml
> @@ -813,12 +813,56 @@ HINT:  Execute a database-wide VACUUM in that
> database.
>      <para>
>       As a safety device, an aggressive vacuum scan will
>       occur for any table whose multixact-age is greater than <xref
> -     linkend="guc-autovacuum-multixact-freeze-max-age"/>.  Also, if
> the
> -     storage occupied by multixacts members exceeds about 10GB,
> aggressive vacuum
> +     linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if the
> number
> +     of members created exceeds approximately 2^31 entries, aggressive
> vacuum
>       scans will occur more often for all tables, starting with those
> that
>
> Looking at commit ff20ccae9fdb, it seems that the documentation was
> intentionally written in terms of gigabytes rather than the number:
>
>> The threshold is two billion members, which was interpreted as 2GB
>> in the documentation. Fix to reflect that each member takes up five
>> bytes, which translates to about 10GB. This is not exact, because of
>> page boundaries. While at it, mention the maximum size 20GB.
>
> Anyway, I also think, as Ashutosh suggested, that if we want to fix
> this documentation, it would be better to do so in a separate patch.

Ah, I've found why you choose to add this doc modification in this patch
in the thread, sorry for skipping over the part:
| For earlier versions (18 and before), the storage-size approximation
| remains relevant because users don’t have direct access to member
| count information. Since we don’t plan to backpatch (I assume so) this
| new function, the documentation for older branches should continue to
| rely on the storage-based approximation.

| Now that pg_get_multixact_stats() exposes num_members, the HEAD branch
| docs can describe the thresholds in terms of counts directly.

Personally, I think it might be fine to keep the gigabyte-based
description, and perhaps we could show both the number of members and
gigabytes, since it'd be also helpful to have a sense of the scale.


--
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA Japan Corporation to SRA OSS K.K.

Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

От

Ashutosh Bapat

Дата:

22 августа 2025 г., 14:44:59

On Fri, Aug 22, 2025 at 7:37 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
>
> | Now that pg_get_multixact_stats() exposes num_members, the HEAD branch
> | docs can describe the thresholds in terms of counts directly.
>
> Personally, I think it might be fine to keep the gigabyte-based
> description, and perhaps we could show both the number of members and
> gigabytes, since it'd be also helpful to have a sense of the scale.
>

Those who have grown their own utilities to monitor the on-disk usage
will not be able to use the count based thresholds and might take some
time for them to starting using pg_get_multixact_stats(). It makes
sense to mention both the count and the corresponding disk usage
threshold. Something like "Also, if the number of multixact members
exceeds approximately 2^31 entries (occupying roughly more than 10GB
in storage) ... ". Users can choose which threshold they want to use.
Adding disk storage threshold in parenthesis indicates that the count
is more accurate and more useful.

Here's detailed review of the patch
+ Returns statistics about current multixact usage:
+ <literal>num_mxids</literal> is the number of multixact IDs assigned,
+ <literal>num_members</literal> is the number of multixact member
entries created,
+ <literal>members_bytes</literal> is the storage occupied by
<literal>num_members</literal>

I thought mentioning bytes, a unit, in column name members_bytes would
not be appropriate in case we start reporting it in a different unit
like kB in future. But we already have
pg_stat_replication_slots::spill_bytes with similar naming. So may be
it's okay. But I would prefer members_size or members_storage or some
such units-free name.

+ in <literal>pg_multixact/members</literal> directory,
+ <literal>oldest_multixact</literal> is the oldest multixact ID still
in use, and
+ <literal>oldest_offset</literal> is the oldest member offset still in use.

I am not sure whether oldest_offset is worth exposing. It is an
implementation detail. Upthread, Michael suggested to expose oldest
offset from GetMultiXactInfo(), but I don't see him explicitly saying
that we should expose it through this function as well. Michael what
do you think?

+ These values can be used to monitor multixact consumption and anticipate
+ autovacuum behavior. See <xref linkend="vacuum-for-multixact-wraparound"/>
+ for further details on multixact wraparound.

I still think that this is not needed. There is no reason to restrict
how users want to use this function. We usually don't do that unless
there is a hazard associated with it.

+ <para>
+ This is a live snapshot of shared counters; the numbers can change
between calls,
+ even within the same transaction.
+ </para></entry>

I have not seen the phrase "live snapshot" being used in the
documentation before. How about "The function reports the statistics
at the time of invoking the function. They may vary between calls even
within the same transaction."?

+ linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if the number
+ of members created exceeds approximately 2^31 entries, aggressive vacuum

a member means the transaction participating in a multixact. What you
intend to say is "if the number of multixacts member entries created
...", right?

+ <para>
+ The <function>pg_get_multixact_stats()</function> function, described in

unnecessary pair of commas.

+ This output shows a system with significant multixact activity:
about ~100 million
+ multixact IDs and ~773 million member entries have been created
since the oldest
+ surviving multixact (ID 39974368). By leveraging this information,
the function helps:
+ <orderedlist>
+ <listitem>
... snip ...
+ Detect potential performance impacts before they become critical.
+ For instance, high multixact usage from frequent row-level locking or
+ foreign key operations can lead to increased I/O and CPU overhead during
+ vacuum operations. Monitoring these stats helps tune autovacuum frequency
+ and transaction patterns.
+ </simpara>
+ </listitem>
+ </orderedlist>

I am unsure whether we should be mentioning use cases in such detail.
Users may find other ways to use those counts. I think the following
paragraph should be placed here.

+ These values can be used to monitor multixact consumption and anticipate
+ autovacuum behavior. See <xref linkend="vacuum-for-multixact-wraparound"/>
+ for further details on multixact wraparound.

 But others may have different opinions.

Maybe you could further write in your example that an aggressive
autovacuum will be triggered in another 10 seconds (or so) if the
number of member entries continues to double every 5 seconds. Or some
practical "usage example" like that.

+ * Returns statistics about current MultiXact usage:
+ * - num_mxids: Number of MultiXact IDs in use
+ * - num_members: Total number of member entries
+ * - oldest_multixact: Oldest MultiXact ID still needed
+ * - oldest_offset: Oldest offset still in use

We don't need to mention each column here, it's evident from the
function body and also from the user facing documentation. Just the
first line is ok.

+ *
+ * Returns a row of NULLs if the MultiXact system is not yet initialized.

tuple or record instead of row.

In the earlier patch you were calling PG_RETURN_NULL(), which I
thought was better. It would get converted into a record of NULLs if
someone is to do SELECT * FROM pg_get_multixact_stats().

I don't think "the MultiXact system is not yet initialized" is the
right description of that condition. GetMultiXactInfo() prologue says
"
Returns false if unable to determine, the oldest offset being
unknown." MultiXactStatData has following comment for oldest offset.
/*
* Oldest multixact offset that is potentially referenced by a multixact
* referenced by a relation. We don't always know this value, so there's
* a flag here to indicate whether or not we currently do.
*/

And also
/* Have we completed multixact startup? */
bool finishedStartup;

I think we need to define this condition more accurately.

And include it in the documentation as well.

+ * Calculate approximate storage space:
+ * - Members are stored in groups of 4
+ * - Each group takes 20 bytes (5 bytes per member)
+ * Note: This ignores small page overhead (12 bytes per 8KB)
+ */
+ membersBytes = (int64) members * 5;

Do we have some constant macros or sizeof(some structure) defined for
5 and 4? That way this computation will be self maintaining and self
documenting.

+ nulls[0] = nulls[1] = nulls[2] = nulls[3] = nulls[4] = false;

memset(nulls, false, sizeof(nulls)); is better and used everywhere.

In fact, instead of initializing it all to true first and then setting
all to false here, we should memset here and set it to true in else
block.

+++ b/src/test/isolation/specs/multixact_stats.spec

I have not an seen an isolation test being used for testing a stats
function. But I find it useful. Let's see what others think.

@@ -0,0 +1,127 @@
+# High-signal invariants for pg_get_multixact_stats()
+# We create exactly one fresh MultiXact on a brand-new table. While
it is pinned
+# by two open transactions, we assert only invariants that background
VACUUM/FREEZE
+# cannot violate:
+# • members increased by ≥ 1 when the second locker arrived,
+# • num_mxids / num_members did not decrease vs earlier snapshots,
+# • oldest_* never decreases.
+# We make NO assertions after releasing locks (freezing/truncation
may shrink deltas).
+# NOTE: Snapshots snap0 and subsequent checks are taken inside an open driver
+# transaction to narrow the window for unrelated truncation between snapshots.

What's a driver transaction?

+#
+# Terminology (global counters):
+# num_mxids, num_members : “in-use” deltas derived from global horizons
+# oldest_multixact, offset : oldest horizons; they move forward, never backward
+#
+# All assertions execute while our multixact is pinned by open txns,
which protects
+# the truncation horizon (VACUUM can’t advance past our pinned multi).

Probably this comment is not needed. But from the sequence of steps
executed, the data is collected when multixact is pinned (what does
that mean?) but the assertions are executed at the end when all the
transactions are committed. Am I correct?

+step snap0 {
+ CREATE TEMP TABLE snap0 AS
+ SELECT num_mxids, num_members, oldest_multixact, oldest_offset
+ FROM pg_get_multixact_stats();
+}

You could use a single table with a primary key column to distinguish
snaps which can be used for joining the rows. Why use a temporary
table? Just setup and tear down the snap table as well?

+
+# Pretty, deterministic key/value output of boolean checks.
+# Keys:
... snip ...
+ (s1.num_mxids >= COALESCE(s0.num_mxids, 0)),
+ (s2.num_mxids >= COALESCE(s1.num_mxids, 0)),
+ (s1.num_members >= COALESCE(s0.num_members, 0)),
+ (s2.num_members >= COALESCE(s1.num_members, 0))
+ ]

This is getting too complex to follow. It produces pretty output but
the query is complex. Instead just let keys as the columns in the
query. Maybe you could print expanded output if that's possible in an
isolation test.

--
Best Wishes,
Ashutosh Bapat

Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

От

Naga Appani

Дата:

04 сентября 2025 г., 00:10:56

Hi Atsushi and Ashutosh,

Thank you for reviewing the patch. Attached is v7, incorporating the
feedback. Please see my responses in-line below.

On Fri, Aug 22, 2025 at 6:45 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Fri, Aug 22, 2025 at 7:37 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
> >
> > | Now that pg_get_multixact_stats() exposes num_members, the HEAD branch
> > | docs can describe the thresholds in terms of counts directly.
> >
> > Personally, I think it might be fine to keep the gigabyte-based
> > description, and perhaps we could show both the number of members and
> > gigabytes, since it'd be also helpful to have a sense of the scale.
> >
>
> Those who have grown their own utilities to monitor the on-disk usage
> will not be able to use the count based thresholds and might take some
> time for them to starting using pg_get_multixact_stats(). It makes
> sense to mention both the count and the corresponding disk usage
> threshold. Something like "Also, if the number of multixact members
> exceeds approximately 2^31 entries (occupying roughly more than 10GB
> in storage) ... ". Users can choose which threshold they want to use.
> Adding disk storage threshold in parenthesis indicates that the count
> is more accurate and more useful.
Updated docs to include both counts and approximate storage.


> I thought mentioning bytes, a unit, in column name members_bytes would
> not be appropriate in case we start reporting it in a different unit
> like kB in future. But we already have
> pg_stat_replication_slots::spill_bytes with similar naming. So may be
> it's okay. But I would prefer members_size or members_storage or some
> such units-free name.
Good point! Adjusted to a units-free name: members_size.

> + in <literal>pg_multixact/members</literal> directory,
> + <literal>oldest_multixact</literal> is the oldest multixact ID still
> in use, and
> + <literal>oldest_offset</literal> is the oldest member offset still in use.
>
> I am not sure whether oldest_offset is worth exposing. It is an
> implementation detail. Upthread, Michael suggested to expose oldest
> offset from GetMultiXactInfo(), but I don't see him explicitly saying
> that we should expose it through this function as well. Michael what
> do you think?
IMHO, exposing oldest_offset gives a full picture of multixact state.
It complements oldest_multixact,
and including it won’t hurt. That said, if consensus is against it,
I’m happy to drop it.

> + These values can be used to monitor multixact consumption and anticipate
> + autovacuum behavior. See <xref linkend="vacuum-for-multixact-wraparound"/>
> + for further details on multixact wraparound.
>
> I still think that this is not needed. There is no reason to restrict
> how users want to use this function. We usually don't do that unless
> there is a hazard associated with it.
Removed.

> + <para>
> + This is a live snapshot of shared counters; the numbers can change
> between calls,
> + even within the same transaction.
> + </para></entry>
>
> I have not seen the phrase "live snapshot" being used in the
> documentation before. How about "The function reports the statistics
> at the time of invoking the function. They may vary between calls even
> within the same transaction."?
Updated wording.

> + linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if the number
> + of members created exceeds approximately 2^31 entries, aggressive vacuum
>
> a member means the transaction participating in a multixact. What you
> intend to say is "if the number of multixacts member entries created
> ...", right?
Correct, updated.

> + <para>
> + The <function>pg_get_multixact_stats()</function> function, described in
>
> unnecessary pair of commas.
Fixed.

> + This output shows a system with significant multixact activity:
> about ~100 million
> + multixact IDs and ~773 million member entries have been created
> since the oldest
> + surviving multixact (ID 39974368). By leveraging this information,
> the function helps:
> + <orderedlist>
> + <listitem>
> ... snip ...
> + Detect potential performance impacts before they become critical.
> + For instance, high multixact usage from frequent row-level locking or
> + foreign key operations can lead to increased I/O and CPU overhead during
> + vacuum operations. Monitoring these stats helps tune autovacuum frequency
> + and transaction patterns.
> + </simpara>
> + </listitem>
> + </orderedlist>
>
> I am unsure whether we should be mentioning use cases in such detail.
> Users may find other ways to use those counts. I think the following
> paragraph should be placed here.
>
> + These values can be used to monitor multixact consumption and anticipate
> + autovacuum behavior. See <xref linkend="vacuum-for-multixact-wraparound"/>
> + for further details on multixact wraparound.
>
>  But others may have different opinions.
>
> Maybe you could further write in your example that an aggressive
> autovacuum will be triggered in another 10 seconds (or so) if the
> number of member entries continues to double every 5 seconds. Or some
> practical "usage example" like that.
Ack. I believe keeping the example with a short list is helpful for
users to navigate
and interpret the stats. If preferred, I can trim it to a brief
paragraph with just the query in
the next rev.

> + * Returns statistics about current MultiXact usage:
> + * - num_mxids: Number of MultiXact IDs in use
> + * - num_members: Total number of member entries
> + * - oldest_multixact: Oldest MultiXact ID still needed
> + * - oldest_offset: Oldest offset still in use
>
> We don't need to mention each column here, it's evident from the
> function body and also from the user facing documentation. Just the
> first line is ok.
Updated - kept only the high-level description.

> + *
> + * Returns a row of NULLs if the MultiXact system is not yet initialized.
>
> tuple or record instead of row.
>
> In the earlier patch you were calling PG_RETURN_NULL(), which I
> thought was better. It would get converted into a record of NULLs if
> someone is to do SELECT * FROM pg_get_multixact_stats().
>
> I don't think "the MultiXact system is not yet initialized" is the
> right description of that condition. GetMultiXactInfo() prologue says
> "
> Returns false if unable to determine, the oldest offset being
> unknown." MultiXactStatData has following comment for oldest offset.
> /*
> * Oldest multixact offset that is potentially referenced by a multixact
> * referenced by a relation. We don't always know this value, so there's
> * a flag here to indicate whether or not we currently do.
> */
Switched to PG_RETURN_NULL() and rephrased both code comment and docs.


> + * Calculate approximate storage space:
> + * - Members are stored in groups of 4
> + * - Each group takes 20 bytes (5 bytes per member)
> + * Note: This ignores small page overhead (12 bytes per 8KB)
> + */
> + membersBytes = (int64) members * 5;
>
> Do we have some constant macros or sizeof(some structure) defined for
> 5 and 4? That way this computation will be self maintaining and self
> documenting.
Those macros are already defined in multixact.c - for example,
MULTIXACT_MEMBERS_PER_MEMBERGROUP
and MULTIXACT_MEMBERGROUP_SIZE encode the 4-per-group and 20-byte
layout. They are local today,
and I’m not sure why they were never exposed. Rather than moving them
into a header and creating wider changes,
v7 retains the explicit 5-bytes/member estimate with an explanatory
comment to stay consistent with existing guidance.
If we feel these macros should be promoted to a header, I think that
would be best handled as a small, separate patch,
and I’d be happy to help with that.

> + nulls[0] = nulls[1] = nulls[2] = nulls[3] = nulls[4] = false;
>
> memset(nulls, false, sizeof(nulls)); is better and used everywhere.
>
> In fact, instead of initializing it all to true first and then setting
> all to false here, we should memset here and set it to true in else
> block.
Updated. v7 uses memset(false) and only sets true where needed.

> +++ b/src/test/isolation/specs/multixact_stats.spec
>
> I have not an seen an isolation test being used for testing a stats
> function. But I find it useful. Let's see what others think.
>
> @@ -0,0 +1,127 @@
> +# High-signal invariants for pg_get_multixact_stats()
> +# We create exactly one fresh MultiXact on a brand-new table. While
> it is pinned
> +# by two open transactions, we assert only invariants that background
> VACUUM/FREEZE
> +# cannot violate:
> +# • members increased by ≥ 1 when the second locker arrived,
> +# • num_mxids / num_members did not decrease vs earlier snapshots,
> +# • oldest_* never decreases.
> +# We make NO assertions after releasing locks (freezing/truncation
> may shrink deltas).
> +# NOTE: Snapshots snap0 and subsequent checks are taken inside an open driver
> +# transaction to narrow the window for unrelated truncation between snapshots.
>
> What's a driver transaction?
A driver transaction is simply the controlling session that stays open
while snapshots are taken.

> +#
> +# Terminology (global counters):
> +# num_mxids, num_members : “in-use” deltas derived from global horizons
> +# oldest_multixact, offset : oldest horizons; they move forward, never backward
> +#
> +# All assertions execute while our multixact is pinned by open txns,
> which protects
> +# the truncation horizon (VACUUM can’t advance past our pinned multi).
>
> Probably this comment is not needed. But from the sequence of steps
> executed, the data is collected when multixact is pinned (what does
> that mean?) but the assertions are executed at the end when all the
> transactions are committed. Am I correct?
You are correct — the assertions are executed at the end, after the commits.
The key point is that all snapshots are taken while the multixact is
pinned by open transactions,
so the invariants hold despite the final check happening later.

> +step snap0 {
> + CREATE TEMP TABLE snap0 AS
> + SELECT num_mxids, num_members, oldest_multixact, oldest_offset
> + FROM pg_get_multixact_stats();
> +}
>
> You could use a single table with a primary key column to distinguish
> snaps which can be used for joining the rows. Why use a temporary
> table? Just setup and tear down the snap table as well?
I kept separate temp tables to keep each snapshot isolated and easy to
read in the spec.
A single table with a PK would work too, but I felt temp tables made
the sequence clearer.

> +
> +# Pretty, deterministic key/value output of boolean checks.
> +# Keys:
> ... snip ...
> + (s1.num_mxids >= COALESCE(s0.num_mxids, 0)),
> + (s2.num_mxids >= COALESCE(s1.num_mxids, 0)),
> + (s1.num_members >= COALESCE(s0.num_members, 0)),
> + (s2.num_members >= COALESCE(s1.num_members, 0))
> + ]
>
> This is getting too complex to follow. It produces pretty output but
> the query is complex. Instead just let keys as the columns in the
> query. Maybe you could print expanded output if that's possible in an
> isolation test.
>
I used the labeled key/value array to mimic \x-style readability while
keeping the output
deterministic for isolation’s text diffs. It clearly names each
invariant and avoids
formatter-dependent width/spacing.

Thanks again for the thoughtful reviews. I really appreciate the
guidance and will be glad to adjust
further if needed.

Best regards,
Naga

Вложения

v7-0001-Add-pg_get_multixact_stats-function-for-monitorin.patch

Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

От

Ashutosh Bapat

Дата:

05 сентября 2025 г., 14:27:34

On Thu, Sep 4, 2025 at 2:41 AM Naga Appani <nagnrik@gmail.com> wrote:
>
> On Fri, Aug 22, 2025 at 6:45 AM Ashutosh Bapat
> <ashutosh.bapat.oss@gmail.com> wrote:
> >
> > On Fri, Aug 22, 2025 at 7:37 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
> > >
> Updated docs to include both counts and approximate storage.
>

This one is remaining.
+ up to approximately 2^32 entries before reaching wraparound.

... 2^32 entries (occupying roughly 20GB in the
<literal>pg_multixact/members</literal> directory) before reaching
wraparound. ...

+ See <xref linkend="vacuum-for-multixact-wraparound"/> for further
details on multixact wraparound.

I don't think we need this reference here. Reference back from that
section is enough.

+ * Returns NULL if the oldest referenced offset is unknown, which
happens during
+ * system startup or when no MultiXact references exist in any relation.

If no MultiXact references exist, and GetMultiXactInfo() returns
false, MultiXactMemberFreezeThreshold() will assume the worst, which I
take as meaning that it will trigger aggressive autovacuum. No
MultiXact references existing is a common case which shouldn't be
assumed as the worst case. The comment I quoted means "the oldest
value of the offset referenced by any multi-xact referenced by a
relation *may not be always known". You seem to have interpreted "may
not be known" as "does not exist" That's not right. I would write this
as "Returns NULL if the oldest referenced offset is unknown which
happens during system startup".

Similarly I would rephrase the following docs as
+ <para>
+ The function returns <literal>NULL</literal> when multixact
statistics are unavailable.
+ For example, during startup before multixact initialization completes or when
+ the oldest member offset cannot be determined.

"The function returns <literal>NULL</literal> when multixact
statistics when the oldest multixact offset corresponding to a
multixact referenced by a relation is not known after starting the
system."

> >
> > @@ -0,0 +1,127 @@
> > +# High-signal invariants for pg_get_multixact_stats()

What does "High-signal" mean here? Is that term defined somewhere?
Using terms that most of the contributors are familiar with improves
readability. If a new term is required, it needs to be defined first.
But I doubt something here requires defining a new term.

> > What's a driver transaction?
> A driver transaction is simply the controlling session that stays open
> while snapshots are taken.

I still don't understand the purpose of this transaction.
pg_get_multixact_stats() isn't transactional so the driver transaction
isn't holding any "snapshot" of the stats. It's also not creating any
multixact and hence does not contribute to testing the output of
pg_get_multixact_stats. Whatever this session is doing, can be done
outside a transaction too. Which step in this session requires an
outer transaction?

Some more comments
+ Returns statistics about current multixact usage:
+ <literal>num_mxids</literal> is the number of multixact IDs assigned,

Is this the number of multixact IDs assigned till now (since whatever
time) or the number of multixact IDs currently in the system?

+ <literal>num_members</literal> is the number of multixact member
entries created,

Similarly this.

+ multixact allocation and usage patterns in real time. For example:

suggestion: ... real time, for example: ... Otherwise the sentence
started by "For example" is not a complete sentence.

+ Returns statistics about current multixact usage:
+ <literal>num_mxids</literal> is the number of multixact IDs assigned,

Is this the number of multixact IDs assigned till now (since whatever
time) or the number of multixact IDs currently in the system?

+ <literal>num_members</literal> is the number of multixact member
entries created,

Similarly this.

+ multixact allocation and usage patterns in real time. For example:

suggestion: ... real time, for example: ... Otherwise the sentence
started by "For example" is not a complete sentence.

+
+ values[0] = Int32GetDatum(multixacts);

This should be UInt32GetDatum() multixacts is uint32.

+ values[1] = Int64GetDatum(members);

Similarly this since MultiXactOffset is uint32.

+ values[4] = Int64GetDatum(oldestOffset);

Similarly this since MultiXactOffset is uint32.

+# Get MultiXact state
+{
+ oid => '9001',
+ descr => 'get current multixact member and multixact ID counts and
oldest values',

suggestion: get current multixact usage statistics.

+ proname => 'pg_get_multixact_stats',
+ prorettype => 'record',
+ proargtypes => '',
+ proallargtypes => '{int4,int8,int8,xid,int8}',
+ proargmodes => '{o,o,o,o,o}',
+ proargnames =>
'{num_mxids,num_members,members_size,oldest_multixact,oldest_offset}',
+ provolatile => 'v',
+ proparallel => 's',
+ prosrc => 'pg_get_multixact_stats'
+},

I like the way you have formatted the new entry, but other entries in
this file are not formatted this way. It would be good to format it
like other entries but if other reviewers prefer this way, we can go
with this too.

--
Best Wishes,
Ashutosh Bapat

Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

От

Naga Appani

Дата:

12 сентября 2025 г., 06:32:59

Hi Ashutosh,

Thank you for continuing to review the patch. Attached is v8,
incorporating the feedback. Please see my responses inline below.

On Fri, Sep 5, 2025 at 6:27 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> This one is remaining.
> + up to approximately 2^32 entries before reaching wraparound.
>
> ... 2^32 entries (occupying roughly 20GB in the
> <literal>pg_multixact/members</literal> directory) before reaching
> wraparound. ...
Done.

> + See <xref linkend="vacuum-for-multixact-wraparound"/> for further
> details on multixact wraparound.
>
> I don't think we need this reference here. Reference back from that
> section is enough.
I kept the cross-reference for now since other multixact function docs
(such as pg_get_multixact_members()) already use this style, and it helps
readers who land directly on the function reference page. Please let me
know if you would prefer that I remove it.

> + * Returns NULL if the oldest referenced offset is unknown, which
> happens during
> + * system startup or when no MultiXact references exist in any relation.
>
> If no MultiXact references exist, and GetMultiXactInfo() returns
> false, MultiXactMemberFreezeThreshold() will assume the worst, which I
> take as meaning that it will trigger aggressive autovacuum. No
> MultiXact references existing is a common case which shouldn't be
> assumed as the worst case. The comment I quoted means "the oldest
> value of the offset referenced by any multi-xact referenced by a
> relation *may not be always known". You seem to have interpreted "may
> not be known" as "does not exist" That's not right. I would write this
> as "Returns NULL if the oldest referenced offset is unknown which
> happens during system startup".
>
> Similarly I would rephrase the following docs as
> + <para>
> + The function returns <literal>NULL</literal> when multixact
> statistics are unavailable.
> + For example, during startup before multixact initialization completes or when
> + the oldest member offset cannot be determined.
>
> "The function returns <literal>NULL</literal> when multixact
> statistics when the oldest multixact offset corresponding to a
> multixact referenced by a relation is not known after starting the
> system."
>
Updated.

> > >
> > > @@ -0,0 +1,127 @@
> > > +# High-signal invariants for pg_get_multixact_stats()
>
> What does "High-signal" mean here? Is that term defined somewhere?
> Using terms that most of the contributors are familiar with improves
> readability. If a new term is required, it needs to be defined first.
> But I doubt something here requires defining a new term.
Dropped that wording and simplified the isolation test.

> > > What's a driver transaction?
> > A driver transaction is simply the controlling session that stays open
> > while snapshots are taken.
>
> I still don't understand the purpose of this transaction.
> pg_get_multixact_stats() isn't transactional so the driver transaction
> isn't holding any "snapshot" of the stats. It's also not creating any
> multixact and hence does not contribute to testing the output of
> pg_get_multixact_stats. Whatever this session is doing, can be done
> outside a transaction too. Which step in this session requires an
> outer transaction?
Removed this mention; the test now only checks monotonicity without extra
transaction scaffolding.

> Some more comments
> + Returns statistics about current multixact usage:
> + <literal>num_mxids</literal> is the number of multixact IDs assigned,
>
> Is this the number of multixact IDs assigned till now (since whatever
> time) or the number of multixact IDs currently in the system?
>
> + <literal>num_members</literal> is the number of multixact member
> entries created,
Updated.

> + Returns statistics about current multixact usage:
> + <literal>num_mxids</literal> is the number of multixact IDs assigned,
>
> Is this the number of multixact IDs assigned till now (since whatever
> time) or the number of multixact IDs currently in the system?
>
> + <literal>num_members</literal> is the number of multixact member
> entries created,
Updated.


> + multixact allocation and usage patterns in real time. For example:
>
> suggestion: ... real time, for example: ... Otherwise the sentence
> started by "For example" is not a complete sentence.
Updated.

> + values[0] = Int32GetDatum(multixacts);
>
> This should be UInt32GetDatum() multixacts is uint32.
>
> + values[1] = Int64GetDatum(members);
>
> Similarly this since MultiXactOffset is uint32.
>
> + values[4] = Int64GetDatum(oldestOffset);
>
> Similarly this since MultiXactOffset is uint32.
Thanks for pointing this out. I had originally followed the existing
types but drifted, fixed now.

> +# Get MultiXact state
> +{
> + oid => '9001',
> + descr => 'get current multixact member and multixact ID counts and
> oldest values',
>
> suggestion: get current multixact usage statistics.
Updated

> + proname => 'pg_get_multixact_stats',
> + prorettype => 'record',
> + proargtypes => '',
> + proallargtypes => '{int4,int8,int8,xid,int8}',
> + proargmodes => '{o,o,o,o,o}',
> + proargnames =>
> '{num_mxids,num_members,members_size,oldest_multixact,oldest_offset}',
> + provolatile => 'v',
> + proparallel => 's',
> + prosrc => 'pg_get_multixact_stats'
> +},
>
> I like the way you have formatted the new entry, but other entries in
> this file are not formatted this way. It would be good to format it
> like other entries but if other reviewers prefer this way, we can go
> with this too.
I reformatted the pg_proc.dat entry to match the surrounding style.

Best regards,
Naga

Вложения

v8-0001-Add-pg_get_multixact_stats-function-for-monitorin.patch

Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

От

Ashutosh Bapat

Дата:

12 сентября 2025 г., 13:34:18

On Fri, Sep 12, 2025 at 9:03 AM Naga Appani <nagnrik@gmail.com> wrote:
>

+ </para>
+ <para>
+ Returns statistics about current multixact usage:
+ <literal>num_mxids</literal> is the total number of multixact IDs
assigned since startup,
+ <literal>num_members</literal> is the total number of multixact
member entries created since startup,

GetMultiXactInfo() returns following
*members = nextOffset - *oldestOffset;
*multixacts = nextMultiXactId - *oldestMultiXactId;
They seem to be the numbers that exist in the system at the time of
the call and not since the startup. Am I missing something?

+ up to approximately 2^32 entries(occupying roughly 20GB in the

space between s and (

+ proallargtypes => '{int4,int8,int8,xid,int8}',

I think the first parameter should also be int8 since uint32 won't fit
into int4.

>
> > + See <xref linkend="vacuum-for-multixact-wraparound"/> for further
> > details on multixact wraparound.
> >
> > I don't think we need this reference here. Reference back from that
> > section is enough.
> I kept the cross-reference for now since other multixact function docs
> (such as pg_get_multixact_members()) already use this style, and it helps
> readers who land directly on the function reference page. Please let me
> know if you would prefer that I remove it.

None of the write up there talks about multixact wraparound so the
reference seems arbitrary to me. I would remove it.

>
> > + * Returns NULL if the oldest referenced offset is unknown, which
> > happens during
> > + * system startup or when no MultiXact references exist in any relation.
> >
> > If no MultiXact references exist, and GetMultiXactInfo() returns
> > false, MultiXactMemberFreezeThreshold() will assume the worst, which I
> > take as meaning that it will trigger aggressive autovacuum. No
> > MultiXact references existing is a common case which shouldn't be
> > assumed as the worst case. The comment I quoted means "the oldest
> > value of the offset referenced by any multi-xact referenced by a
> > relation *may not be always known". You seem to have interpreted "may
> > not be known" as "does not exist" That's not right. I would write this
> > as "Returns NULL if the oldest referenced offset is unknown which
> > happens during system startup".
> >
> > Similarly I would rephrase the following docs as
> > + <para>
> > + The function returns <literal>NULL</literal> when multixact
> > statistics are unavailable.
> > + For example, during startup before multixact initialization completes or when
> > + the oldest member offset cannot be determined.
> >
> > "The function returns <literal>NULL</literal> when multixact
> > statistics when the oldest multixact offset corresponding to a
> > multixact referenced by a relation is not known after starting the
> > system."
> >
> Updated.

Thanks for updating the documentation. But the comment in prologue of
pg_get_multixact_stats is not completely correct as mentioned in my
previous reply. I would just say "Returns NULL if the oldest
referenced offset is unknown". Anybody who wants to know when can that
happen, may search relevant code by looking at GetMultiXactInfo().

I still find the comment at the start of the isolation test a bit
verbose. But I think it's best to leave that to a committer's
judgement. However, the word "locker" is unusual. You want to say the
session that locks a row (or something similar). We may leave exact
words to a committer's judgement.

I still find think that the specific usage scenarios described in the
documentation are not required. And the computation in
pg_get_multixact_stats() should use macros instead of bare numbers.
But we can leave that for a committer to decide. Once you address
above comments, we may mark the CF entry as RFC.

--
Best Wishes,
Ashutosh Bapat

Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

От

Naga Appani

Дата:

15 сентября 2025 г., 08:47:41

Hi Ashutosh,

Thank you for continuing to review the patch. Attached is v9,
incorporating the feedback. Please see my responses inline below.

On Fri, Sep 12, 2025 at 5:34 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

> + Returns statistics about current multixact usage:
> + <literal>num_mxids</literal> is the total number of multixact IDs
> assigned since startup,
> + <literal>num_members</literal> is the total number of multixact
> member entries created since startup,
>
> GetMultiXactInfo() returns following
> *members = nextOffset - *oldestOffset;
> *multixacts = nextMultiXactId - *oldestMultiXactId;
> They seem to be the numbers that exist in the system at the time of
> the call and not since the startup. Am I missing something?
You are right, these counts reflect the numbers currently present in
the system, not cumulative totals since startup. I have reworded the
docs to say “currently present”.

> + up to approximately 2^32 entries(occupying roughly 20GB in the
>
> space between s and (
Fixed.

> + proallargtypes => '{int4,int8,int8,xid,int8}',
>
> I think the first parameter should also be int8 since uint32 won't fit
> into int4.
Updated.

> > > + See <xref linkend="vacuum-for-multixact-wraparound"/> for further
> > > details on multixact wraparound.
> > >
> > > I don't think we need this reference here. Reference back from that
> > > section is enough.
> > I kept the cross-reference for now since other multixact function docs
> > (such as pg_get_multixact_members()) already use this style, and it helps
> > readers who land directly on the function reference page. Please let me
> > know if you would prefer that I remove it.
>
> None of the write up there talks about multixact wraparound so the
> reference seems arbitrary to me. I would remove it.
Removed.


> Thanks for updating the documentation. But the comment in prologue of
> pg_get_multixact_stats is not completely correct as mentioned in my
> previous reply. I would just say "Returns NULL if the oldest
> referenced offset is unknown". Anybody who wants to know when can that
> happen, may search relevant code by looking at GetMultiXactInfo().
Simplified the prologue comment as suggested.

> I still find the comment at the start of the isolation test a bit
> verbose. But I think it's best to leave that to a committer's
> judgement. However, the word "locker" is unusual. You want to say the
> session that locks a row (or something similar). We may leave exact
> words to a committer's judgement.
Reworded to remove "locker" and simplified.

> I still find think that the specific usage scenarios described in the
> documentation are not required. And the computation in
> pg_get_multixact_stats() should use macros instead of bare numbers.
> But we can leave that for a committer to decide. Once you address
> above comments, we may mark the CF entry as RFC.
Sounds good!

With these updates in v9, I believe the patch is now in good shape to
be marked RFC. I’ll go ahead and update the CommitFest entry.

Thank you again for your thorough reviews and thoughtful guidance on
this patch — it has been very helpful.

Best regards,
Naga

Вложения

v9-0001-Add-pg_get_multixact_stats-function-for-monitorin.patch

Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

От

torikoshia

Дата:

17 октября 2025 г., 03:28:28

On 2025-09-15 14:47, Naga Appani wrote:

> With these updates in v9, I believe the patch is now in good shape to
> be marked RFC. I’ll go ahead and update the CommitFest entry.

As shown in the commitfest app, v9 patch fails to build:

   multixactfuncs.c:129:28: error: call to undeclared function
'heap_form_tuple'; ISO C99 and later do not support implicit function
declarations [-Wimplicit-function-declaration]
   129 |                 return
HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls));
       |                                          ^
   multixactfuncs.c:129:28: note: did you mean 'brin_form_tuple'?
../../../../src/include/access/brin_tuple.h:96:19: note:
'brin_form_tuple' declared here
    96 | extern BrinTuple *brin_form_tuple(BrinDesc *brdesc, BlockNumber
blkno,
       |                   ^
   multixactfuncs.c:129:28: error: incompatible integer to pointer
conversion passing 'int' to parameter of type 'const HeapTupleData *'
(aka 'const struct HeapTupleData *') [-Wint-conversion]
   129 |                 return
HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls));
       |
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   ../../../../src/include/funcapi.h:230:40: note: passing argument to
parameter 'tuple' here
   230 | HeapTupleGetDatum(const HeapTupleData *tuple)
       |                                        ^
   2 errors generated.

Could you please update the patch to fix this?


--
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA Japan Corporation to SRA OSS K.K.

Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

От

Xuneng Zhou

Дата:

17 октября 2025 г., 05:10:12

Hi,

On Fri, Oct 17, 2025 at 8:28 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
>
> On 2025-09-15 14:47, Naga Appani wrote:
>
> > With these updates in v9, I believe the patch is now in good shape to
> > be marked RFC. I’ll go ahead and update the CommitFest entry.
>
> As shown in the commitfest app, v9 patch fails to build:
>
>    multixactfuncs.c:129:28: error: call to undeclared function
> 'heap_form_tuple'; ISO C99 and later do not support implicit function
> declarations [-Wimplicit-function-declaration]
>    129 |                 return
> HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls));
>        |                                          ^
>    multixactfuncs.c:129:28: note: did you mean 'brin_form_tuple'?
> ../../../../src/include/access/brin_tuple.h:96:19: note:
> 'brin_form_tuple' declared here
>     96 | extern BrinTuple *brin_form_tuple(BrinDesc *brdesc, BlockNumber
> blkno,
>        |                   ^
>    multixactfuncs.c:129:28: error: incompatible integer to pointer
> conversion passing 'int' to parameter of type 'const HeapTupleData *'
> (aka 'const struct HeapTupleData *') [-Wint-conversion]
>    129 |                 return
> HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls));
>        |
> ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>    ../../../../src/include/funcapi.h:230:40: note: passing argument to
> parameter 'tuple' here
>    230 | HeapTupleGetDatum(const HeapTupleData *tuple)
>        |                                        ^
>    2 errors generated.
>
> Could you please update the patch to fix this?

Here’s the updated v10 patch, now including access/htup_details.h in
src/backend/utils/adt/multixactfuncs.c. I’m also interested in this
patch and plan to review it.

Best,
Xuneng

Вложения

v10-0001-Add-pg_get_multixact_stats-function-for-monitori.patch

Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

От

Tomas Vondra

Дата:

18 октября 2025 г., 14:17:57

Thanks for working on this. I'm wondering if this is expected / could
help with monitoring for "space exhaustion" issues, which we currently
can't do easily, as it's not exposed anywhere.

This is in multixact.c at line ~1177, where we do this:

    if (MultiXactState->oldestOffsetKnown &&
        MultiXactOffsetWouldWrap(MultiXactState->offsetStopLimit,
                                 nextOffset, nmembers))
    {
        ereport(ERROR, ...
    }

But I'm not sure the current patch exposes enough information to
calculate how much space remains - calculating that we requires
offsetStopLimit and nextOffset.

The stopLimit could be calculated from oldest_offset, which the patch
returns. It's not quite trivial. It depends on BLCKSZ through
MULTIXACT_MEMBERS_PER_PAGE, and various other internal constants. It's
tempting to hardcode those into monitoring scripts, which then gets
broken in subtle ways with custom builds or if we change something
(which for multixacts we can).

And I don't think the patch exposes nextOffset, right? So AFAICS we
can't actually calculate the remaining space.

Could it either return nextOffset, or maybe actually calculate and
return the remaining space? And perhaps the "total" space, so that it's
possible to calculate what fraction of the space we already consumed.

I'm actually not entirely convinced we should be exposing the raw
internal information this patch aims to expose. Because a lot of that
feels like an internal implementation detail, and it's going to be hard
to interpret ....

Knowing num_mxids / num_members or members_size is nice, but how would
I judge how far the system is from hitting some threshold or hard limit?
Is there some maximum number of mxids/members that we could return? Or
something like that?

Similarly for oldest_multixact / oldest_offset. How useful is that
without knowing the "next" value for each of those?

Or am I missing something obvious?


regards

-- 
Tomas Vondra

Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

От

Naga Appani

Дата:

05 ноября 2025 г., 04:13:09

Thank you for the feedback, Tomas! I agree with the goal you outlined,
providing a
user-friendly “how much space is left” signal would make monitoring far more
actionable.

On Sat, Oct 18, 2025 at 6:18 AM Tomas Vondra <tomas@vondra.me> wrote:
>
> Knowing num_mxids / num_members or members_size is nice, but how would
> I judge how far the system is from hitting some threshold or hard limit?
> Is there some maximum number of mxids/members that we could return? Or
> something like that?

Based on this, I experimented with calculating a num_remaining_members value to
estimate how close the system is to MultiXact member-space exhaustion. I tested
two approaches and validated their behavior through repeated exhaustion cycles.
The results are below.

At the same time, both you and Ashutosh pointed out that oldest_offset exposes
internal implementation details and is not particularly useful on its own, so I
removed oldest_offset in v11.

WHAT I TRIED in regards to space remaining
==========================================

Approach 1: (offsetStopLimit - nextOffset)
------------------------------------------
I exposed offsetStopLimit from GetMultiXactInfo() and computed:

    remainingMembers = offsetStopLimit - nextOffset;

Behavior at exhaustion:

    postgres=# SELECT num_mxids,num_members,remaining_members
               FROM pg_get_multixact_stats();
     num_mxids | num_members | remaining_members
    -----------+-------------+-------------------
     115409471 | 4294914940  |                 1
    (1 row)

After wraparound cleanup:

    postgres=# SELECT num_mxids,num_members,remaining_members
               FROM pg_get_multixact_stats();
     num_mxids | num_members | remaining_members
    -----------+-------------+-------------------
             0 |           0 |                 0
    (1 row)

The value stayed at 0 until roughly ~100k new members were allocated. My reading
is that nextOffset wraps to a small value, while offsetStopLimit remains large
(derived from the oldestOffset at the moment of truncation). Without using the
backend’s wrap-aware comparison logic (MultiXactOffsetWouldWrap()), plain
subtraction crosses the wrap boundary and becomes misleading.

Approach 2: (MaxMultiXactOffset - members)
------------------------------------------
I also tested:

    remainingMembers = MaxMultiXactOffset - members;

Across three exhaustion cycles:

1st attempt:

    postgres=# SELECT num_mxids,num_members,remaining_members
             FROM pg_get_multixact_stats();
     num_mxids | num_members | remaining_members
    -----------+-------------+-------------------
     125098473 | 4294914940  |             52355
    (1 row)

2nd attempt:

    postgres=# SELECT num_mxids,num_members,remaining_members
               FROM pg_get_multixact_stats();
     num_mxids | num_members | remaining_members
    -----------+-------------+-------------------
     116285530 | 4294905729  |             61566
    (1 row)

3rd attempt:

    postgres=# SELECT num_mxids,num_members,remaining_members
               FROM pg_get_multixact_stats();
     num_mxids | num_members | remaining_members
    -----------+-------------+-------------------
     111973488 | 4294862592  |            104703
    (1 row)

The system correctly rejected inserts in each cycle, but the computed
“remaining”
value increased between cycles. This seems to match the dynamic nature of
offsetStopLimit, which appears to be recomputed after truncation:
- based on the new oldestOffset
- aligned back to the start of its segment
- with one safety segment subtracted

Because the stop boundary shifts depending on segment boundaries, the plain
(Max − members) formula reflects alignment effects rather than actual remaining
capacity.

Understanding
============
Based on reading the relevant parts of multixact.c and observing the runtime
behavior, both approaches seem to run into limitations when trying to derive a
“remaining members” value outside the backend. I may be missing details, but the
behavior I observed suggests that a reliable computation might require
duplicating
several internal mechanisms, including:
- wrap-aware offset comparison
- SLRU page and segment alignment rules
- SetOffsetVacuumLimit’s segment recalculation

Without accounting for those, the derived numbers behaved inconsistently across
tests, sometimes staying at 0 until a large jump, and in other cases increasing
between exhaustion cycles. This seems broadly consistent with your concern that
simple arithmetic on these counters does not match how the backend determines
wraparound risk.

To be clear, this interpretation is based only on what I could infer from the
code and testing, and I may not be capturing the entire picture. But from what I
observed, a user-visible “remaining members” metric does not seem
straightforward
without exposing or replicating backend logic.

My thoughts
==========
Given all this, the cleanest approach appears to be not exposing a “remaining
members” counter directly.
PostgreSQL has historically avoided exposing remaining-capacity counters for
wraparound-limited resources such as:
- transaction IDs
- MultiXact IDs
- OIDs

Instead, PostgreSQL exposes current usage and relies on documented
thresholds for
monitoring. Following that established pattern avoids tying a SQL-visible
interface to backend internals that may evolve over time.

Self-monitoring based on documented limits
==========================================
Monitoring then follows the same pattern PostgreSQL already uses for XIDs and
other wraparound-limited values:
- track num_members growth over time
- warn when it exceeds roughly 2^31
- treat values approaching 2^32 as exhaustion-risk territory
- observe the growth rate to estimate when intervention may be needed

This keeps the interface simple, stable, and aligned with existing PostgreSQL
behavior.

Why oldest_offset was removed
=============================
Both you and Ashutosh pointed out that oldest_offset reflects internal SLRU
geometry and is not actionable without reproducing backend logic. Combined with
the behavior seen in the experiments above, it made sense not to expose this
field in the user-visible API. It is removed in v11.

Final shape of the function (v11)
=================================
The function now returns:
- num_mxids
- num_members
- members_size
- oldest_multixact

These fields are stable, directly interpretable, and do not depend on SLRU
internals or wrap-aware arithmetic.

On Thu, Oct 16, 2025 at 9:10 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> Here’s the updated v10 patch, now including access/htup_details.h in
> src/backend/utils/adt/multixactfuncs.c.

Thank you!

On Thu, Oct 16, 2025 at 7:28 PM torikoshia <torikoshia@oss.nttdata.com> wrote:
>
> Could you please update the patch to fix this?

Thank you for raising it and bringing it to attention!

Attached is the v11.

Best regards,
Naga

Вложения

v11-0001-Add-pg_get_multixact_stats-function-for-monitori.patch

Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

От

Ashutosh Bapat

Дата:

10 ноября 2025 г., 09:13:33

On Wed, Nov 5, 2025 at 6:43 AM Naga Appani <nagnrik@gmail.com> wrote:
>
> Understanding
> ============
> Based on reading the relevant parts of multixact.c and observing the runtime
> behavior, both approaches seem to run into limitations when trying to derive a
> “remaining members” value outside the backend. I may be missing details, but the
> behavior I observed suggests that a reliable computation might require
> duplicating
> several internal mechanisms, including:
> - wrap-aware offset comparison
> - SLRU page and segment alignment rules
> - SetOffsetVacuumLimit’s segment recalculation
>
> Without accounting for those, the derived numbers behaved inconsistently across
> tests, sometimes staying at 0 until a large jump, and in other cases increasing
> between exhaustion cycles. This seems broadly consistent with your concern that
> simple arithmetic on these counters does not match how the backend determines
> wraparound risk.
>
> To be clear, this interpretation is based only on what I could infer from the
> code and testing, and I may not be capturing the entire picture. But from what I
> observed, a user-visible “remaining members” metric does not seem
> straightforward
> without exposing or replicating backend logic.

Right now MultiXactOffsetWouldWrap() assesses if the given distance is
higher than the permitted distance between start and boundary. I think
we could instead change it to report the permitted distance based on
start and boundary; use it to report remaining space (after
multiplying it with bytes per member) and also use it to assess
whether the required distance is within that boundary or whether we
need a warning. But ...
On Sat, Oct 18, 2025 at 4:48 PM Tomas Vondra <tomas@vondra.me> wrote:
>
> Thanks for working on this. I'm wondering if this is expected / could
> help with monitoring for "space exhaustion" issues, which we currently
> can't do easily, as it's not exposed anywhere.
>
> This is in multixact.c at line ~1177, where we do this:
>
>     if (MultiXactState->oldestOffsetKnown &&
>         MultiXactOffsetWouldWrap(MultiXactState->offsetStopLimit,
>                                  nextOffset, nmembers))
>     {
>         ereport(ERROR, ...
>     }
>
> But I'm not sure the current patch exposes enough information to
> calculate how much space remains - calculating that we requires
> offsetStopLimit and nextOffset.

The function exposes the number of existing members and the amount of
space they consume (members_size). The documentation mentions space
related thresholds 10GB and 20GB. Isn't comparing members_size to
these thresholds enough to take appropriate action? If so, we could
report the difference between these respective thresholds and
members_size as a metric of space remaining before a given threshold
is triggered.

--
Best Wishes,
Ashutosh Bapat

Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

От

Naga Appani

Дата:

06 декабря 2025 г., 20:52:57

Hi Ashutosh,

Thanks for the review!

I agree - comparing the exposed members_size against the documented
thresholds is sufficient for monitoring purposes.

This aligns with the approach taken in v11: exposing the current usage in
a way consistent with other PostgreSQL counters (e.g., XIDs, OIDs), without
introducing user-visible remaining-capacity calculations whose behavior is
inconsistent and difficult to interpret externally. In the same spirit, I
removed oldest_offset: as we discussed, it is internal and does not
provide an actionable signal to users.

If this addresses the concerns raised so far, I would appreciate
consideration in moving v11 forward for commit.

On Mon, Nov 10, 2025 at 12:13 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Wed, Nov 5, 2025 at 6:43 AM Naga Appani <nagnrik@gmail.com> wrote:
> >
> > Understanding
> > ============
> > Based on reading the relevant parts of multixact.c and observing the runtime
> > behavior, both approaches seem to run into limitations when trying to derive a
> > “remaining members” value outside the backend. I may be missing details, but the
> > behavior I observed suggests that a reliable computation might require
> > duplicating
> > several internal mechanisms, including:
> > - wrap-aware offset comparison
> > - SLRU page and segment alignment rules
> > - SetOffsetVacuumLimit’s segment recalculation
> >
> > Without accounting for those, the derived numbers behaved inconsistently across
> > tests, sometimes staying at 0 until a large jump, and in other cases increasing
> > between exhaustion cycles. This seems broadly consistent with your concern that
> > simple arithmetic on these counters does not match how the backend determines
> > wraparound risk.
> >
> > To be clear, this interpretation is based only on what I could infer from the
> > code and testing, and I may not be capturing the entire picture. But from what I
> > observed, a user-visible “remaining members” metric does not seem
> > straightforward
> > without exposing or replicating backend logic.
>
> Right now MultiXactOffsetWouldWrap() assesses if the given distance is
> higher than the permitted distance between start and boundary. I think
> we could instead change it to report the permitted distance based on
> start and boundary; use it to report remaining space (after
> multiplying it with bytes per member) and also use it to assess
> whether the required distance is within that boundary or whether we
> need a warning. But ...
> On Sat, Oct 18, 2025 at 4:48 PM Tomas Vondra <tomas@vondra.me> wrote:
> >
> > Thanks for working on this. I'm wondering if this is expected / could
> > help with monitoring for "space exhaustion" issues, which we currently
> > can't do easily, as it's not exposed anywhere.
> >
> > This is in multixact.c at line ~1177, where we do this:
> >
> >     if (MultiXactState->oldestOffsetKnown &&
> >         MultiXactOffsetWouldWrap(MultiXactState->offsetStopLimit,
> >                                  nextOffset, nmembers))
> >     {
> >         ereport(ERROR, ...
> >     }
> >
> > But I'm not sure the current patch exposes enough information to
> > calculate how much space remains - calculating that we requires
> > offsetStopLimit and nextOffset.
>
> The function exposes the number of existing members and the amount of
> space they consume (members_size). The documentation mentions space
> related thresholds 10GB and 20GB. Isn't comparing members_size to
> these thresholds enough to take appropriate action? If so, we could
> report the difference between these respective thresholds and
> members_size as a metric of space remaining before a given threshold
> is triggered.

Best regards,
Naga

Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

От

Ashutosh Bapat

Дата:

08 декабря 2025 г., 07:40:16

On Sat, Dec 6, 2025 at 11:23 PM Naga Appani <nagnrik@gmail.com> wrote:
>
> Hi Ashutosh,
>
> Thanks for the review!
>
> I agree - comparing the exposed members_size against the documented
> thresholds is sufficient for monitoring purposes.
>
> This aligns with the approach taken in v11: exposing the current usage in
> a way consistent with other PostgreSQL counters (e.g., XIDs, OIDs), without
> introducing user-visible remaining-capacity calculations whose behavior is
> inconsistent and difficult to interpret externally. In the same spirit, I
> removed oldest_offset: as we discussed, it is internal and does not
> provide an actionable signal to users.
>
> If this addresses the concerns raised so far, I would appreciate
> consideration in moving v11 forward for commit.

The patch at [1] changes the function used to fetch mxid related
information. With that we will get rid of awkwardness around
non-availability of the statistics. It's better to wait for those
changes to get committed before moving this forward.

--
Best Wishes,
Ashutosh Bapat

Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

От

Naga Appani

Дата:

13 декабря 2025 г., 22:34:47

Thank you, Ashutosh!

On Sun, Dec 7, 2025 at 10:40 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
> The patch at [1] changes the function used to fetch mxid related
> information. With that we will get rid of awkwardness around
> non-availability of the statistics. It's better to wait for those
> changes to get committed before moving this forward.

Following the upstream change from Heikki's patch [0], I've updated
the patch (v12) to align with the new behavior.

Code changes:
- GetMultiXactInfo() now returns void, so the conditional checks and NULL
  handling have been removed.
- MultiXactOffset is now 64-bit; updated the code to use Int64GetDatum()
  for member counts.
- Switched to using MULTIXACT_MEMBERGROUP_SIZE and
  MULTIXACT_MEMBERS_PER_MEMBERGROUP from multixact_internal.h instead of
  hardcoded calculations.

Documentation changes:
- Removed the NULL-return discussion from func-info.sgml, as the
  statistics are now always available.
- Updated maintenance.sgml to clarify that exceeding the historical
  2^32 member limit no longer causes wraparound, but instead triggers
  more aggressive vacuum activity for disk space management.

I validated the behavior before and after cleanup.
The function correctly reports current usage (beyond the old limits) and
resets once multixacts are removed:

postgres=# SELECT num_mxids, num_members, pg_size_pretty(members_size)
AS members_size, oldest_multixact FROM pg_get_multixact_stats();
-[ RECORD 1 ]------------------
num_mxids        | 267969541
num_members      | 9469693355
members_size     | 44 GB
oldest_multixact | 2

postgres=# SELECT pg_terminate_backend(27222);
pg_terminate_backend
----------------------
t

postgres=# SELECT num_mxids, num_members, pg_size_pretty(members_size)
AS members_size, oldest_multixact FROM pg_get_multixact_stats();
-[ RECORD 1 ]------------------
num_mxids        | 0
num_members      | 0
members_size     | 0 bytes
oldest_multixact | 267969543

The updated patch is attached.

Ref:
[0] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=bd8d9c9bdfa0c2168bb37edca6fa88168cacbbaa

Regards,
Naga

Вложения

v12-0001-Add-pg_get_multixact_stats-function-for-monitori.patch

Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

От

Michael Paquier

Дата:

17 декабря 2025 г., 07:19:11

On Sat, Dec 13, 2025 at 01:34:47PM -0600, Naga Appani wrote:
> Documentation changes:
> - Removed the NULL-return discussion from func-info.sgml, as the
>   statistics are now always available.
> - Updated maintenance.sgml to clarify that exceeding the historical
>   2^32 member limit no longer causes wraparound, but instead triggers
>   more aggressive vacuum activity for disk space management.
>
> I validated the behavior before and after cleanup.
> The function correctly reports current usage (beyond the old limits) and
> resets once multixacts are removed:

+    /*
+     * Calculate storage space for members. Members are stored in groups,
+     * with each group containing MULTIXACT_MEMBERS_PER_MEMBERGROUP members
+     * and taking MULTIXACT_MEMBERGROUP_SIZE bytes.
+     */
+    membersBytes = (int64) (members / MULTIXACT_MEMBERS_PER_MEMBERGROUP) *
+                   MULTIXACT_MEMBERGROUP_SIZE;

This is the key point of the patch internal logic.  And there is one
thing that I am wondering here.  The amount of space taken by a number
of members depends on the other compiled constants from
multixact_internal.h.  Hence, rather than calculate the amount of
space taken by a set of members in some code hidden in the SQL
function, could it be better to put that directly as a macro or an
inline function in multixact_internal.h?
--
Michael

Вложения

signature.asc

Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

От

Ashutosh Bapat

Дата:

17 декабря 2025 г., 08:16:51

On Wed, Dec 17, 2025 at 9:49 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Sat, Dec 13, 2025 at 01:34:47PM -0600, Naga Appani wrote:
> > Documentation changes:
> > - Removed the NULL-return discussion from func-info.sgml, as the
> >   statistics are now always available.
> > - Updated maintenance.sgml to clarify that exceeding the historical
> >   2^32 member limit no longer causes wraparound, but instead triggers
> >   more aggressive vacuum activity for disk space management.
> >
> > I validated the behavior before and after cleanup.
> > The function correctly reports current usage (beyond the old limits) and
> > resets once multixacts are removed:
>
> +       /*
> +        * Calculate storage space for members. Members are stored in groups,
> +        * with each group containing MULTIXACT_MEMBERS_PER_MEMBERGROUP members
> +        * and taking MULTIXACT_MEMBERGROUP_SIZE bytes.
> +        */
> +       membersBytes = (int64) (members / MULTIXACT_MEMBERS_PER_MEMBERGROUP) *
> +                                  MULTIXACT_MEMBERGROUP_SIZE;
>
> This is the key point of the patch internal logic.  And there is one
> thing that I am wondering here.  The amount of space taken by a number
> of members depends on the other compiled constants from
> multixact_internal.h.  Hence, rather than calculate the amount of
> space taken by a set of members in some code hidden in the SQL
> function, could it be better to put that directly as a macro or an
> inline function in multixact_internal.h?

+1.

--
Best Wishes,
Ashutosh Bapat

Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

От

Naga Appani

Дата:

25 декабря 2025 г., 03:09:14

Thanks for the suggestion and review, Michael and Ashutosh!

On Tue, Dec 16, 2025 at 11:17 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Wed, Dec 17, 2025 at 9:49 AM Michael Paquier <michael@paquier.xyz> wrote:
> >
> > On Sat, Dec 13, 2025 at 01:34:47PM -0600, Naga Appani wrote:
> > >
> > > I validated the behavior before and after cleanup.
> > > The function correctly reports current usage (beyond the old limits) and
> > > resets once multixacts are removed:
> >
> > +       /*
> > +        * Calculate storage space for members. Members are stored in groups,
> > +        * with each group containing MULTIXACT_MEMBERS_PER_MEMBERGROUP members
> > +        * and taking MULTIXACT_MEMBERGROUP_SIZE bytes.
> > +        */
> > +       membersBytes = (int64) (members / MULTIXACT_MEMBERS_PER_MEMBERGROUP) *
> > +                                  MULTIXACT_MEMBERGROUP_SIZE;
> >
> > This is the key point of the patch internal logic.  And there is one
> > thing that I am wondering here.  The amount of space taken by a number
> > of members depends on the other compiled constants from
> > multixact_internal.h.  Hence, rather than calculate the amount of
> > space taken by a set of members in some code hidden in the SQL
> > function, could it be better to put that directly as a macro or an
> > inline function in multixact_internal.h?
>
> +1.
>
> --
> Best Wishes,
> Ashutosh Bapat

I’ve updated the patch as suggested.

The member storage size calculation has been refactored into a static
inline function, MultiXactMemberStorageSize(), in
src/include/access/multixact_internal.h.

Please find v13 attached.

Regards,
Naga

Вложения

v13-0001-Add-pg_get_multixact_stats-function-for-monitori.patch

Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

От

Michael Paquier

Дата:

25 декабря 2025 г., 03:45:33

On Wed, Dec 24, 2025 at 06:09:14PM -0600, Naga Appani wrote:
> I’ve updated the patch as suggested.
>
> The member storage size calculation has been refactored into a static
> inline function, MultiXactMemberStorageSize(), in
> src/include/access/multixact_internal.h.
>
> Please find v13 attached.

Seems basically sensible here for the structure, including the hints
and recommendations for the GUCs.

+     of multixact member entries created exceeds approximately 2^31 entries
[...]
+     This output shows a system with significant multixact activity: about ~312 million
+     multixact IDs and ~2.8 billion member entries consuming 13 GB of storage space.

The documentation could be improved more.  The power '^' and tilde
symbols are not used for references.  If any, I'd encourage using
wordings like "2 billion" entries for all these paragraphs across the
board.  For the tilde part, you would mean "at least" or "at most"
rather than the boundaries implied with the tilde (aka we should not
expect the reader the mental effort to translate and  understand what
these symbols mean, especially for non-native English speaker).

+        Detect potential performance impacts before they become critical.
+        For instance, high multixact usage from frequent row-level locking or
+        foreign key operations can lead to increased I/O and CPU overhead during
+        vacuum operations. Monitoring these stats helps tune autovacuum frequency
+        and transaction patterns.

Saying that, this paragraph does not seem that useful to me,
especially the last sentence which is evasive and can apply to
anything related to monitoring.

The second hint is more useful, but perhaps we should mention which
GUC(s) should be touched to make num_members go lower?  As a whole,
the orderedlist does not seem strongly necessary to me: the third
item is evasive, the first and second items describe problematic
patterns and what could cause them.  As a whole, for the docs
part, the new additions in the existing paragraph of maintenance.sgml
are OK for me.  The first part of the new paragraph added also
provides some direct information about how useful this new function is
to evaluate the amount of disk space used.  I'd like to think that we
should just complete it the two facts about num_mxids and num_members
you are listing, with two sentences appended at the end of the new
paragraph rather than a list of items.

If we don't completely agree about the "hint" part, we could split the
patch in two for now: let's add the function first, then discuss more
about what kind of tweaks and patterns we want to document as a set of
follow-up changes.  It does not change the fact that the function is
useful for disk-space monitoring purposes.  The patterns and hints are
a second different matter.
--
Michael

Вложения

signature.asc

Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

От

Michael Paquier

Дата:

25 декабря 2025 г., 04:30:37

On Thu, Dec 25, 2025 at 09:45:33AM +0900, Michael Paquier wrote:
> Seems basically sensible here for the structure, including the hints
> and recommendations for the GUCs.

+/* Calculate storage space in bytes for a given number of members */
+static inline int64
+MultiXactMemberStorageSize(MultiXactOffset members)
+{
+    return (int64) (members / MULTIXACT_MEMBERS_PER_MEMBERGROUP) *
+           MULTIXACT_MEMBERGROUP_SIZE;
+}

By the way, this bit also feels a bit confusing, and this comes down
to the fact that "members" is not an offset, isn't it?  This relates
to MultiXactMemberFreezeThreshold(), that considers the number of
members as an offset, but it is a number of members, a difference
between two offsets.

I am wondering if it would not be cleaner and less confusing to do
things slightly differently (sorry I did not pay much attention to
that previously):
- Change GetMultiXactInfo() to return two offsets, nextOffset and
oldestOffset.
- Use uint64 for members and recalculate the difference in
MultiXactMemberFreezeThreshold() and the function code.  Heikki has
just switched multixact offsets to be 64 bits, yippee.
- Redefine MultiXactMemberStorageSize() so as it does not take a
number of members in input, but as the amount of space taken between
two offsets.  At least that would be more consistent with all the
other inline functions of multixact.h that rely on MultiXactOffset
inputs.  Using a int64 is still OK I guess, there may be a case to
detect "negative" numbers and give a change to the users of the new
inline function to notice that they did a computation wrong, rather
than hiding a signedness problem.
--
Michael

Вложения

signature.asc

Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

От

Michael Paquier

Дата:

29 декабря 2025 г., 06:51:23

On Thu, Dec 25, 2025 at 10:30:37AM +0900, Michael Paquier wrote:
> I am wondering if it would not be cleaner and less confusing to do
> things slightly differently (sorry I did not pay much attention to
> that previously):
> - Change GetMultiXactInfo() to return two offsets, nextOffset and
> oldestOffset.
> - Use uint64 for members and recalculate the difference in
> MultiXactMemberFreezeThreshold() and the function code.  Heikki has
> just switched multixact offsets to be 64 bits, yippee.
> - Redefine MultiXactMemberStorageSize() so as it does not take a
> number of members in input, but as the amount of space taken between
> two offsets.  At least that would be more consistent with all the
> other inline functions of multixact.h that rely on MultiXactOffset
> inputs.  Using a int64 is still OK I guess, there may be a case to
> detect "negative" numbers and give a change to the users of the new
> inline function to notice that they did a computation wrong, rather
> than hiding a signedness problem.

So, here is what I have in mind, split into independent pieces:
- Remove the existing type confusion with GetMultiXactInfo(), due to
how things have always been done in MultiXactMemberFreezeThreshold().
- Add macro MultiXactOffsetStorageSize(), to calculate the amount of
space used between two offsets.
- The main patch, with adjustments in comments, the test (no
non-ASCII characters in that, please).  One thing that was really
surprising is that you did not consider ROLE_PG_READ_ALL_STATS.  We
expect all the stats information to be hidden if a role is not granted
access to them, and this function should be no exception especially as
it relates to disk space usage like database or tablespace size
functions.

Anyway, attached are all these updated pieces.  The doc edits are what
I have mentioned upthread, close to what you have suggested to me
offline.

Comments?
--
Michael

Вложения

Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

От

Naga Appani

Дата:

30 декабря 2025 г., 05:57:11

On Sun, Dec 28, 2025 at 9:51 PM Michael Paquier <michael@paquier.xyz> wrote:
> So, here is what I have in mind, split into independent pieces:
> - Remove the existing type confusion with GetMultiXactInfo(), due to
> how things have always been done in MultiXactMemberFreezeThreshold().
> - Add macro MultiXactOffsetStorageSize(), to calculate the amount of
> space used between two offsets.
> - The main patch, with adjustments in comments, the test (no
> non-ASCII characters in that, please).  One thing that was really
> surprising is that you did not consider ROLE_PG_READ_ALL_STATS.  We
> expect all the stats information to be hidden if a role is not granted
> access to them, and this function should be no exception especially as
> it relates to disk space usage like database or tablespace size
> functions.
>
> Anyway, attached are all these updated pieces.  The doc edits are what
> I have mentioned upthread, close to what you have suggested to me
> offline.
>
> Comments?
> --
> Michael

Thank you for patches, Michael! I've tested and everything works well:
- All patches apply cleanly
- Isolation test (multixact-stats) passes
- Function correctly reports stats under heavy load

Tested with significant multixact activity:
++++++++++++++++++++++++++++++++++++++++++++++
postgres=# \x
Expanded display is on.
postgres=# SELECT
    to_char(num_mxids::bigint, 'FM999,999,999,999') AS num_mxids,
    to_char(num_members::bigint, 'FM999,999,999,999') AS num_members,
    to_char(members_size::bigint, 'FM999,999,999,999') AS members_size_bytes,
    pg_size_pretty(members_size) AS members_size_pretty,
    to_char(oldest_multixact::text::bigint, 'FM999,999,999,999') AS
oldest_multixact
FROM pg_get_multixact_stats();
-[ RECORD 1 ]-------+------------------
num_mxids           | 235,095,556
num_members         | 14,435,701,862
members_size_bytes  | 72,178,509,300
members_size_pretty | 67 GB
oldest_multixact    | 2
++++++++++++++++++++++++++++++++++++++++++++++

After cleanup, the function properly resets:
++++++++++++++++++++++++++++++++++++++++++++++
-[ RECORD 1 ]-------+-------------
num_mxids           | 0
num_members         | 0
members_size_bytes  | 0
members_size_pretty | 0 bytes
oldest_multixact    | 235,095,558
++++++++++++++++++++++++++++++++++++++++++++++

The oldest_multixact correctly advances to reflect the cleanup.

Thanks for adding the pg_read_all_stats privilege check!

I think this is ready for RFC.

Best regards,
Naga

Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

От

Michael Paquier

Дата:

30 декабря 2025 г., 10:44:03

On Mon, Dec 29, 2025 at 08:57:11PM -0600, Naga Appani wrote:
> The oldest_multixact correctly advances to reflect the cleanup.
>
> Thanks for adding the pg_read_all_stats privilege check!
>
> I think this is ready for RFC.

Thanks for looking.  I have done an extra round of brush-up, then
applied the set.  The buildfarm looks OK with it.
--
Michael

Вложения

signature.asc

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: [Proposal] Expose internal MultiXact member count function for efficient monitoring

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения