Обсуждение: [PATCH] Speed up of vac_update_datfrozenxid.
Hi hackers,
While testing behavior of 'vacuum' functionality under high load
we've noticed that the exclusive lock with 'LOCKTAG_DATABASE_FROZEN_ID'
tag could be held for a prolonged time, thus causing contention with other sessions.
This lock is obtained by the 'vac_update_datfrozenxid' function at the end of vacuum
process to update value of 'datfrozenxid' in the 'pg_database' relation.
This function currently performs two operations under the
mentioned lock: first scanning the 'pg_class' to find minimal
'relfrozenxid' and 'relminmxid' and then update corresponding values
in the 'pg_database' relation. Update of the 'pg_database' table
includes locking the required tuple and then deciding whether row
need to be updated or not. While looking at the
'vac_update_datfrozenxid' content, we've noticed following comment:
/* * Fetch a copy of the tuple to scribble on. We could check the syscache * tuple first. If that concluded !dirty, we'd avoid waiting on * concurrent heap_update() and would avoid exclusive-locking the buffer. * For now, don't optimize that. */
The optimization described in this comment (i.e. scanning and
locking tuple only if we really intend to update it) seems to be a
valid one, so I've created a patch to test it (attached to this
message). It should be applicable on top of current 'master' branch.
The code now tries to obtain data from the SysCache first and checks
whether it's necessary to call 'systable_inplace_update_begin()' to
actually update xid values.
To test the effect of the patch, a synthetic test was written using bpftrace.
Bpftrace script measures the execution time of 'vac_update_datfrozenxid'.
The script is attached to the email.
The test scenario:
- Create an empty database cluster
- Set autovacuum_naptime = '1s' (so that 'vac_update_datfrozenxid' is called more often)
- Run the bpftrace script (it outputs the function execution time in ns to stdout)
It is not necessary to apply the load. The autovacuum worker will call
'vac_update_datfrozenxid' regardless.
As a result, execution time of 'vac_update_datfrozenxid()' with the patch is 30% faster
No patch Patch Speed up
Mean: 73058 ns 51444 ns 30%
Median: 71846 ns 48650 ns 33%
It is worth noting that due to the fact that before updating the 'datfrozenxid', a full 'pg_class' scan is performed, the performance improvement becomes less significant as the number of relations increases (more time will be spent on sequential scan of 'pg_class'). The previous measurement was performed on an empty cluster with no user relations. To test the patch in a more realistic case, I generated 1000 tables in the database and repeated the measurement. No patch Patch Speed up
Mean: 97113 ns 73839 ns 24% Median: 98328 ns 63541 ns 29% As we can see, the speed gain has decreased, but it is still significant.
What do you think about the patch?
Best regards,
Rustam Khamidullin
Вложения
On 15/07/2025 14:02, Rustam Khamidullin wrote: > Hi hackers, > > While testing behavior of 'vacuum' functionality under high load > we've noticed that the exclusive lock with 'LOCKTAG_DATABASE_FROZEN_ID' > tag could be held for a prolonged time, thus causing contention with > other sessions. > This lock is obtained by the 'vac_update_datfrozenxid' function at the > end of vacuum > process to update value of 'datfrozenxid' in the 'pg_database' relation. > This function currently performs two operations under the > mentioned lock: first scanning the 'pg_class' to find minimal > 'relfrozenxid' and 'relminmxid' and then update corresponding values > in the 'pg_database' relation. Update of the 'pg_database' table > includes locking the required tuple and then deciding whether row > need to be updated or not. While looking at the > 'vac_update_datfrozenxid' content, we've noticed following comment: > > /* > * Fetch a copy of the tuple to scribble on. We could check the syscache > * tuple first. If that concluded !dirty, we'd avoid waiting on > * concurrent heap_update() and would avoid exclusive-locking the buffer. > * For now, don't optimize that. > */ > > The optimization described in this comment (i.e. scanning and > locking tuple only if we really intend to update it) seems to be a > valid one, so I've created a patch to test it (attached to this > message). It should be applicable on top of current 'master' branch. > The code now tries to obtain data from the SysCache first and checks > whether it's necessary to call 'systable_inplace_update_begin()' to > actually update xid values. > > To test the effect of the patch, a synthetic test was written using > bpftrace. > Bpftrace script measures the execution time of 'vac_update_datfrozenxid'. > The script is attached to the email. > > The test scenario: > > 1. Create an empty database cluster > 2. Set autovacuum_naptime = '1s' (so that 'vac_update_datfrozenxid' is > called more often) > 3. Run the bpftrace script (it outputs the function execution time in > ns to stdout) > > It is not necessary to apply the load. The autovacuum worker will call > 'vac_update_datfrozenxid' regardless. > As a result, execution time of 'vac_update_datfrozenxid()' with the > patch is 30% faster > No patch Patch Speed up > Mean: 73058 ns 51444 ns 30% > Median: 71846 ns 48650 ns 33% > > It is worth noting that due to the fact thatbefore updating the > 'datfrozenxid',a full 'pg_class' scan is performed, the performance > improvementbecomes less significant as the number of > relationsincreases(more time will be spent on sequentialscan of > 'pg_class').The previous measurement was performedon an empty > clusterwith nouser relations.To test the patch in a more realistic case, > I generated 1000 tables in the database and repeated the measurement. No > patch Patch Speed up > Mean:97113 ns 73839 ns 24% Median:98328 ns 63541 ns 29% As we can see, > the speed gain has decreased, but it is still significant. How significant is that speedup in the grand scheme of things? What fraction is vac_update_datfrozenxid of the overall work that autovacuum does? Did this patch help with the lock contention that you mentioned at the top? - Heikki