Обсуждение: pgsql: Add VACUUM instrumentation for scanned pages, relfrozenxid.

Поиск
Список
Период
Сортировка

pgsql: Add VACUUM instrumentation for scanned pages, relfrozenxid.

От
Peter Geoghegan
Дата:
Add VACUUM instrumentation for scanned pages, relfrozenxid.

Report on scanned pages within VACUUM VERBOSE and autovacuum logging.
These are pages that were physically examined during the VACUUM
operation.  Note that this can include a small number of pages that were
marked all-visible in the visibility map by some earlier VACUUM
operation.  VACUUM won't skip all-visible pages that aren't part of a
range of all-visible pages that's at least 32 blocks in length (partly
to avoid missing out on opportunities to advance relfrozenxid during
non-aggressive VACUUMs).

Commit 44fa8488 simplified the definition of scanned pages.  It became
the complement of the pages (of those pages from rel_pages) that were
skipped using the visibility map.  And so scanned pages precisely
indicates how effective the visibility map was at saving work.  (Before
now we displayed the number of pages skipped via the visibility map when
happened to be frozen pages, but not when they were merely all-visible,
which was less useful to users.)

Rename the user-visible OldestXmin output field to "removal cutoff", and
show some supplementary information: how far behind the cutoff is
(number of XIDs behind) by the time the VACUUM operation finished.  This
will help users to figure out what's _not_ working in extreme cases
where VACUUM is fundamentally unable to remove dead tuples or freeze
older tuples (e.g., due to a leaked replication slot).  Also report when
relfrozenxid is advanced by VACUUM in output that immediately follows
"removal cutoff".  This structure is intended to highlight the
relationship between the new relfrozenxid value for the table, and the
VACUUM operation's removal cutoff.

Finally, add instrumentation of "missed dead tuples", and the number of
pages that had at least one such tuple.  These are fully DEAD (not just
RECENTLY_DEAD) tuples with storage that could not be pruned due to
failure to acquire a cleanup lock on a heap page.  This is a replacement
for the "skipped due to pin" instrumentation removed by commit 44fa8488.
It shows more details than before for pages where failing to get a
cleanup lock actually resulted in VACUUM missing out on useful work, but
usually shows nothing at all instead (the mere fact that we couldn't get
a cleanup lock is usually of no consequence whatsoever now).

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAH2-Wznp=c=Opj8Z7RMR3G=ec3_JfGYMN_YvmCEjoPCHzWbx0g@mail.gmail.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/872770fd6ccf12596b9e26234df9a37cae083af2

Modified Files
--------------
src/backend/access/heap/vacuumlazy.c | 97 +++++++++++++++++++++++++-----------
src/backend/commands/analyze.c       |  3 ++
src/backend/commands/vacuum.c        |  9 ++++
src/include/commands/vacuum.h        |  2 +
4 files changed, 82 insertions(+), 29 deletions(-)


Re: pgsql: Add VACUUM instrumentation for scanned pages, relfrozenxid.

От
Peter Eisentraut
Дата:
On 12.02.22 01:49, Peter Geoghegan wrote:
> Add VACUUM instrumentation for scanned pages, relfrozenxid.

This patch introduced the log messages

     "new relfrozenxid: %u, which is %d XIDs ahead of previous value\n"

     "new relminmxid: %u, which is %d MXIDs ahead of previous value\n"

According to the surrounding math in each case, this means that the new 
%u is larger by %d than the previous value.

The use of "ahead of" had confused me here, since I needed to think 
three times about which direction is meant.  I suppose it's technically 
correct, but perhaps there is a clearer term?




Re: pgsql: Add VACUUM instrumentation for scanned pages, relfrozenxid.

От
Peter Geoghegan
Дата:
On Wed, Sep 14, 2022 at 1:20 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
> According to the surrounding math in each case, this means that the new
> %u is larger by %d than the previous value.
>
> The use of "ahead of" had confused me here, since I needed to think
> three times about which direction is meant.  I suppose it's technically
> correct, but perhaps there is a clearer term?

I don't know. There are several things to consider here. Or at least
several things that I've considered, that seem important. These are:

* We kinda need to "hide" the original relfrozenxid from the user
(just show the new one) by just showing a delta. After all, we might
needlessly confuse users if we ended up showing an original
relfrozenxid that has been advanced to a more recent value that is
ostensibly (though not actually) far older than the original.

Most users won't quite understand that XIDs use modulo-2^32
arithmetic. It isn't actually important that they understand the
details, so I really would rather not go there.

* The delta is very useful for comparing successive VACUUM operations
for the same table (usually a very large table) over time.

The dimension of time is very important -- how things change for the
table (this is true for everything, not just relfrozenxid stuff). The
XIDs themselves aren't terribly meaningful over time (they get
recycled), but the XID-age-wise deltas really can be compared to each
other easily and sensibly. The variance of the "advanced by this many
XIDs" delta over time might be very interesting.

I expect that this will become much more important in Postgres 16.
It's much easier for VACUUM to find a way to keep up over time if it
is given breathing room to freeze when it's easiest/cheapest. We'll
still have to insist on keeping relfrozenxid recent at some point, but
doing so prematurely seems very counterproductive to me.

* The delta thing also highlights the relationship between "removable
cutoff" and its own XID age at the end of the VACUUM, which is itself
a delta.

The maximum final relfrozenxid that can ever be set by VACUUM is also
"removable cutoff". ISTM that there is value in hinting at that
relationship through how we present the information. It's something
that advanced users have a good chance of picking up on if they see
lots of examples of.

-- 
Peter Geoghegan