Обсуждение: enhance wraparound warnings

Поиск
Список
Период
Сортировка

enhance wraparound warnings

От
Nathan Bossart
Дата:
varsup.c has the following comment:

    /*
     * We'll start complaining loudly when we get within 40M transactions of
     * data loss.  This is kind of arbitrary, but if you let your gas gauge
     * get down to 2% of full, would you be looking for the next gas station?
     * We need to be fairly liberal about this number because there are lots
     * of scenarios where most transactions are done by automatic clients that
     * won't pay attention to warnings.  (No, we're not gonna make this
     * configurable.  If you know enough to configure it, you know enough to
     * not get in this kind of trouble in the first place.)
     */

I don't know about you, but I start getting antsy around a quarter tank.
In any case, I'm told that even 40M transactions aren't enough time to
react these days.  Attached are a few patches to enhance the wraparound
warnings.

* 0001 adds a "percent remaining" detail message to the existing WARNING.
The idea is that "1.86% of transaction IDs" is both easier to understand
and better indicates urgency than "39985967 transactions".

* 0002 bumps the warning limit from 40M to 100M to give folks some more
time to react.

* 0003 adds an early warning system for when fewer than 500M transactions
remain.  This system sends a LOG only to the server log every 1M
transactions.  The hope is that this gets someone's attention sooner
without flooding the application and server log.

Thoughts?

-- 
nathan

Вложения

Re: enhance wraparound warnings

От
Nathan Bossart
Дата:

Re: enhance wraparound warnings

От
Chao Li
Дата:
Hi Nathan,

I just reviewed the patch. My comments are mainly in 0001, and a few nits on 0003. For 0002, the code change is quite
straightforward,I am not sure the value bumping to has been discussed. 

> On Dec 12, 2025, at 04:28, Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> rebased
>
> --
> nathan
>
<v2-0001-Add-percentage-of-transaction-IDs-that-are-availa.patch><v2-0002-Bump-transaction-ID-limit-to-warn-at-100M.patch><v2-0003-Perodically-emit-server-logs-when-fewer-than-500M.patch>

1 - 0001
```
+                                   (double) (multiWrapLimit - result) / PG_INT32_MAX * 100),
```

I don’t feel good with using PG_INT32_MAX as denominator, though the value is correct.

Looking at the code of how xidWrapLimit is calculated:
```
    /*
     * The place where we actually get into deep trouble is halfway around
     * from the oldest potentially-existing XID.  (This calculation is
     * probably off by one or two counts, because the special XIDs reduce the
     * size of the loop a little bit.  But we throw in plenty of slop below,
     * so it doesn't matter.)
     */
    xidWrapLimit = oldest_datfrozenxid + (MaxTransactionId >> 1);
    if (xidWrapLimit < FirstNormalTransactionId)
        xidWrapLimit += FirstNormalTransactionId;
```

Where "(MaxTransactionId >> 1)” has the same value as PG_INT32_MAX. But if one day xid is changed to 64 bits, that code
doesn’tneed to updated, while these patched code will need to be updated. 

So, can we define a const in transom.h like:
```
#define MaxTransactionId ((TransactionId) 0xFFFFFFFF)
#define WrapAroundWindow (MaxTransactionId>>1)
```

And use WrapAroundWindow in all places.

2 - 0001
```
+                         errdetail("Approximately %.2f%% of MultiXactIds are available for use.",
```

“%.2f%%” shows only 2 digits after dot. xidWrapLimit is roughly 2B, when remaining goes down to 107374, it will shows
“0.00%”.IMO, when remaining is a large number, percentage makes more sense, while an exact number is clearer when the
numberis relatively small. So, can we show both percentage and exact number? Or shows the exact number when percentage
is0.00%? 

3 - 0001
```
 <programlisting>
 WARNING:  database "mydb" must be vacuumed within 39985967 transactions
+DETAIL:  Approximately 1.86% of transactions IDs are available for use.
```

Typo: " transactions IDs” => " transaction IDs"

4 - 0003
```
Subject: [PATCH v2 3/3] Perodically emit server logs when fewer than 500M
```

Typo: Perodically => Periodically

5 - 0003
```
+    xidLogLimit = xidWrapLimit - 500000000;
```

Instead of hardcode 500M, do we want to consider autovacuum_freeze_max_age? If a deployment sets
autovacuum_freeze_max_age> 500M, then vacuum would be triggered first, then this log can get kinda non-intuitive. But
ifa vacuum cannot freeze anything tuple, then this log will still make sense. I am not sure. Maybe not a real problem. 

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/







Re: enhance wraparound warnings

От
Nathan Bossart
Дата:
On Fri, Dec 12, 2025 at 10:59:53AM +0800, Chao Li wrote:
> I just reviewed the patch. My comments are mainly in 0001, and a few nits
> on 0003. For 0002, the code change is quite straightforward, I am not
> sure the value bumping to has been discussed.

Thanks!

> Where "(MaxTransactionId >> 1)” has the same value as PG_INT32_MAX. But
> if one day xid is changed to 64 bits, that code doesn’t need to updated,
> while these patched code will need to be updated.
> 
> So, can we define a const in transom.h like:
> ```
> #define MaxTransactionId ((TransactionId) 0xFFFFFFFF)
> #define WrapAroundWindow (MaxTransactionId>>1)
> ```
> 
> And use WrapAroundWindow in all places.

I think I'd rather just open-code the (MaxTransactionId / 2) here.  I'm not
too concerned about 64-bit transaction IDs (there's a lot more than this to
change for that), but it does seem like a good idea to be consistent with
nearby code.

> ```
> +                         errdetail("Approximately %.2f%% of MultiXactIds are available for use.",
> ```
> 
> “%.2f%%” shows only 2 digits after dot. xidWrapLimit is roughly 2B, when
> remaining goes down to 107374, it will shows “0.00%”. IMO, when remaining
> is a large number, percentage makes more sense, while an exact number is
> clearer when the number is relatively small. So, can we show both
> percentage and exact number? Or shows the exact number when percentage is
> 0.00%?

The errmsg part should already show the exact number of IDs remaining.

> ```
> +    xidLogLimit = xidWrapLimit - 500000000;
> ```
> 
> Instead of hardcode 500M, do we want to consider
> autovacuum_freeze_max_age? If a deployment sets autovacuum_freeze_max_age
> > 500M, then vacuum would be triggered first, then this log can get kinda
> non-intuitive. But if a vacuum cannot freeze anything tuple, then this
> log will still make sense. I am not sure. Maybe not a real problem.

IMHO we should still emit warnings about imminent wraparound even if
autovacuum_freeze_max_age is set to totally-inadvisable values.  I think
the behavior you are describing only happens if users set it to north of
1.6B.

-- 
nathan

Вложения

Re: enhance wraparound warnings

От
Shinya Kato
Дата:
On Sat, Nov 15, 2025 at 2:05 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
> I don't know about you, but I start getting antsy around a quarter tank.
> In any case, I'm told that even 40M transactions aren't enough time to
> react these days.  Attached are a few patches to enhance the wraparound
> warnings.

Thank you for the patch!

> * 0001 adds a "percent remaining" detail message to the existing WARNING.
> The idea is that "1.86% of transaction IDs" is both easier to understand
> and better indicates urgency than "39985967 transactions".

I like this idea and this is helpful information for DBA. 0001 looks good to me.

> * 0002 bumps the warning limit from 40M to 100M to give folks some more
> time to react.

I don't have a strong opinion on whether 100M is the right value, but
I noticed a documentation issue in 0002.

<programlisting>
WARNING:  database "mydb" must be vacuumed within 39985967 transactions
DETAIL:  Approximately 1.86% of transaction IDs are available for use.
HINT:  To avoid XID assignment failures, execute a database-wide
VACUUM in that database.
</programlisting>

In maintenance.sgml, above "39985967" and "1.86%" should be updated.

> * 0003 adds an early warning system for when fewer than 500M transactions
> remain.  This system sends a LOG only to the server log every 1M
> transactions.  The hope is that this gets someone's attention sooner
> without flooding the application and server log.

I'm not sure 0003 is worth the added complexity. It adds a new field
to TransamVariablesData and a modulo check in GetNewTransactionId(),
which is a hot path. DBAs who need early warning can already monitor
age(datfrozenxid) with more flexible thresholds.


--
Best regards,
Shinya Kato
NTT OSS Center



Re: enhance wraparound warnings

От
Nathan Bossart
Дата:
On Wed, Feb 18, 2026 at 04:16:16PM +0900, Shinya Kato wrote:
> On Sat, Nov 15, 2025 at 2:05 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
>> I don't know about you, but I start getting antsy around a quarter tank.
>> In any case, I'm told that even 40M transactions aren't enough time to
>> react these days.  Attached are a few patches to enhance the wraparound
>> warnings.
> 
> Thank you for the patch!

Thanks for reviewing.

> I don't have a strong opinion on whether 100M is the right value, but
> I noticed a documentation issue in 0002.
> 
> <programlisting>
> WARNING:  database "mydb" must be vacuumed within 39985967 transactions
> DETAIL:  Approximately 1.86% of transaction IDs are available for use.
> HINT:  To avoid XID assignment failures, execute a database-wide
> VACUUM in that database.
> </programlisting>
> 
> In maintenance.sgml, above "39985967" and "1.86%" should be updated.

Fixed.

> I'm not sure 0003 is worth the added complexity. It adds a new field
> to TransamVariablesData and a modulo check in GetNewTransactionId(),
> which is a hot path. DBAs who need early warning can already monitor
> age(datfrozenxid) with more flexible thresholds.

Yeah, looking at this one again, I'm less sure it's worth pursuing.  I've
removed it.

-- 
nathan

Вложения

Re: enhance wraparound warnings

От
Nathan Bossart
Дата:
Barring additional feedback or objections, I'm planning to commit this in
the next week or two.

-- 
nathan



Re: enhance wraparound warnings

От
wenhui qiu
Дата:
Hi Nathan Bossart 
> Barring additional feedback or objections, I'm planning to commit this in
> the next week or two.
Thank you for working on this. The path LGTM,But I have a small request,There are many reasons why the table's age can’t be frozen. Now have a path that can report the reason to users(https://commitfest.postgresql.org/patch/6188/). Would you be interested in reviewing it? I think we should tell users the root cause of why the age can’t be reduced, so they can clearly understand where the issue is.I think we should not only tell users that the XID is close to wraparound, but also report why this causes the table‘s age to be unable to freeze.


Thanks

On Sat, Mar 7, 2026 at 6:15 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
Barring additional feedback or objections, I'm planning to commit this in
the next week or two.

--
nathan