Обсуждение: Re: Eagerly scan all-visible pages to amortize aggressive vacuum

Поиск
Список
Период
Сортировка

Re: Eagerly scan all-visible pages to amortize aggressive vacuum

От
Andres Freund
Дата:
Hi,

On 2024-11-01 19:35:22 -0400, Melanie Plageman wrote:
> It implements a new "semi-aggressive" vacuum. Semi-aggressive vacuums
> eagerly scan some number of all-visible but not all-frozen pages in
> hopes of freezing them. All-visible pages that are eagerly scanned and
> set all-frozen in the visibility map are considered successful eager
> scans and those not frozen are considered failed eager scans.

I think it's worth mentioning that this would be useful to eventually be able
to mark pages as all-visible during on-access-pruning, to allow for
index-only-scans (and also faster seqscans). Today that'd have the danger of
increasing the "anti-wraparound vacuum debt" unconscionably, but with this
patch that'd be addressed.


> Because we want to amortize our eager scanning across a few vacuums,
> we cap the maximum number of successful eager scans to a percentage of
> the total number of all-visible but not all-frozen pages in the table
> (currently 20%).

One thing worth mentioning around here seems that we currently can't
partially-aggressively freeze tuples that are "too young" and how that
interacts with everything else.


> To demonstrate the results, I ran an append-only workload run for over
> 3 hours on master and with my patch applied. The patched version of
> Postgres amortized the work of freezing the all-visible but not
> all-frozen pages nicely. The first aggressive vacuum with the patch
> was 44 seconds and on master it was 1201 seconds.

Oops.


> patch
>   LOG:  automatic aggressive vacuum of table "history": index scans: 0
>   vacuum duration: 44 seconds (msecs: 44661).
>     pages: 0 removed, 27425085 remain, 1104095 scanned (4.03% of
> total), 709889 eagerly scanned
>     frozen: 316544 pages from table (1.15% of total) had 17409920
> tuples frozen. 316544 pages set all-frozen in the VM
>     I/O timings: read: 1160.599 ms, write: 2461.205 ms. approx time
> spent in vacuum delay: 16230 ms.
>     buffer usage: 1105630 hits, 1111898 reads, 646229 newly dirtied,
> 1750426 dirtied.
>     WAL usage: 1027099 records, 316566 full page images, 276209780 bytes.
>
> master
>   LOG:  automatic aggressive vacuum of table "history": index scans: 0
>   vacuum duration: 1201 seconds (msecs: 1201487).
>     pages: 0 removed, 27515348 remain, 15800948 scanned (57.43% of
> total), 15098257 eagerly scanned
>     frozen: 15096384 pages from table (54.87% of total) had 830247896
> tuples frozen. 15096384 pages set all-frozen in the VM
>     I/O timings: read: 246537.348 ms, write: 73000.498 ms. approx time
> spent in vacuum delay: 349166 ms.
>     buffer usage: 15798343 hits, 15813524 reads, 15097063 newly
> dirtied, 31274333 dirtied.
>     WAL usage: 30733564 records, 15097073 full page images, 11789257631 bytes.
>
> This is because, with the patch, the freezing work is being off-loaded
> to earlier vacuums.
>
> In the attached chart.png, you can see the vm_page_freezes climbing
> steadily with the patch, whereas on master, there are sudden spikes
> aligned with the aggressive vacuums. You can also see that the number
> of pages that are all-visible but not all-frozen grows steadily on
> master until the aggressive vacuum. This is vacuum's "backlog" of
> freezing work.

What's the reason for all-visible-but-not-all-frozen to increase to a higher
value initially than where it later settles?


> In this benchmark, the TPS is rate-limited, but using pgbench
> per-statement reports, I calculated that the P99 latency for this
> benchmark is 16 ms on master and 1 ms with the patch. Vacuuming pages
> sooner decreases vacuum reads and doing the freezing work spread over
> more vacuums improves P99 transaction latency.

Nice!


> Below is the comparative WAL volume, checkpointer and background
> writer writes, reads and writes done by all other backend types, time
> spent vacuuming in milliseconds, and p99 latency. Notice that overall
> vacuum IO time is substantially lower with the patch.
>
>    version     wal  cptr_bgwriter_w   other_rw  vac_io_time  p99_lat
>     patch   770 GB          5903264  235073744   513722         1
>     master  767 GB          5908523  216887764  1003654        16

Hm. It's not clear to me why other_rw is higher with the patch? After all,
given the workload, there's no chance of unnecessarily freezing tuples? Is
that just because at the end of the benchmark there's leftover work?



> From 69b517f34caf39ad814691d6412c68d54e852990 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplageman@gmail.com>
> Date: Mon, 28 Oct 2024 10:53:37 -0400
> Subject: [PATCH v1 1/9] Rename LVRelState->frozen_pages
>
> Rename frozen_pages to tuple_freeze_pages in LVRelState, the struct used
> for tracking state during vacuuming of a heap relation. frozen_pages
> sounds like it includes every all-frozen page. That is a misnomer. It
> does not include pages with already frozen tuples. It also includes
> pages that are not actually all-frozen.

Hm. Is tuple_freeze_pages that much clearer, it could also just indicate pages
that already were frozen. How about "newly_frozen_pages"?


> Subject: [PATCH v1 2/9] Make visibilitymap_set() return previous state of

LGTM.



> From c317a272713bb833f7f2761a5be1924f8e1bdb4d Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplageman@gmail.com>
> Date: Thu, 31 Oct 2024 18:19:18 -0400
> Subject: [PATCH v1 3/9] Count pages set all-visible and all-frozen in VM
>  during vacuum
>
> Vacuum already counts and logs pages with newly frozen tuples. Count and
> log pages set all-frozen in the VM too. This includes pages that are
> empty before or after vacuuming.
>
> While we are at it, count and log the number of pages vacuum set
> all-visible. Pages that are all-visible but not all-frozen are debt for
> future aggressive vacuums. The newly all-visible and all-frozen counts
> give us visiblity into the rate at which this debt is being accrued and
> paid down.

Hm. Why is it interesting to log VM changes separately from the state changes
of underlying pages? Wouldn't it e.g. be more intersting to log the number of
empty pages that vacuum [re-]discovered? I've a bit hard time seeing how a
user could take advantage of this.


> From 67781cc2511bb7d62ccc9461f1787272820abcc4 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplageman@gmail.com>
> Date: Mon, 28 Oct 2024 11:07:50 -0400
> Subject: [PATCH v1 4/9] Replace uses of blkno local variable in
>  lazy_scan_heap()

Largely LGTM, but I'm not sure that it's worth having as a separate commit.


> @@ -1114,7 +1117,6 @@ heap_vac_scan_next_block(LVRelState *vacrel, BlockNumber *blkno,
>              ReleaseBuffer(vacrel->next_unskippable_vmbuffer);
>              vacrel->next_unskippable_vmbuffer = InvalidBuffer;
>          }
> -        *blkno = vacrel->rel_pages;
>          return false;
>      }

I'd seat *blkno to InvalidBlockNumber or such, that'd make misuse more
apparent than having it set to some random older block.


> From 67b5565ad57d3b196695f85811dde2044ba79f3e Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplageman@gmail.com>
> Date: Mon, 28 Oct 2024 11:14:24 -0400
> Subject: [PATCH v1 5/9] Move vacuum VM buffer release
>
> The VM buffer for the next unskippable block can be released after the
> main loop in lazy_scan_heap(). Doing so de-clutters
> heap_vac_scan_next_block() and opens up more refactoring options.

That's vague...


> From 8485dc400b3d4e9f895170af4f5fb1bb959b8495 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplageman@gmail.com>
> Date: Mon, 28 Oct 2024 11:36:58 -0400
> Subject: [PATCH v1 6/9] Remove superfluous next_block local variable in vacuum
>  code
>
> Reduce the number of block related variables in lazy_scan_heap() and its
> helpers by removing the next_block local variable from
> heap_vac_scan_next_block().

I don't mind this change, but I also don't get how it's related to anything
else here or why it's really better than the status quo.


> diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
> index 4b1eadea1f2..52c9d49f2b1 100644
> --- a/src/backend/access/heap/vacuumlazy.c
> +++ b/src/backend/access/heap/vacuumlazy.c
> @@ -1112,19 +1112,17 @@ static bool
>  heap_vac_scan_next_block(LVRelState *vacrel, BlockNumber *blkno,
>                           bool *all_visible_according_to_vm)
>  {
> -    BlockNumber next_block;
> -
>      /* relies on InvalidBlockNumber + 1 overflowing to 0 on first call */
> -    next_block = vacrel->current_block + 1;
> +    vacrel->current_block++;

I realize this isn't introduced in this commit, but darn, that's ugly.


> From 78ad9e022b95e024ff5bfa96af78e9e44730c970 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplageman@gmail.com>
> Date: Mon, 28 Oct 2024 11:42:10 -0400
> Subject: [PATCH v1 7/9] Make heap_vac_scan_next_block() return BlockNumber


> @@ -857,7 +857,8 @@ lazy_scan_heap(LVRelState *vacrel)
>      vacrel->next_unskippable_allvis = false;
>      vacrel->next_unskippable_vmbuffer = InvalidBuffer;
>
> -    while (heap_vac_scan_next_block(vacrel, &blkno, &all_visible_according_to_vm))
> +    while (BlockNumberIsValid(blkno = heap_vac_scan_next_block(vacrel,
> +                                                               &all_visible_according_to_vm)))

Personally I'd write this as

while (true)
{
    BlockNumber blkno;

    blkno = heap_vac_scan_next_block(vacrel, ...);

    if (!BlockNumberIsValid(blkno))
       break;

Mostly because it's good to use more minimal scopes when possible,
particularly when previously the scope intentionally was larger. But also
partially because I don't love variable assignments inside a macro call,
inside a while().



> From 818d1c3b068c6705611256cfc3eb1f10bdc0b684 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplageman@gmail.com>
> Date: Fri, 1 Nov 2024 18:25:05 -0400
> Subject: [PATCH v1 8/9] WIP: Add more general summary to vacuumlazy.c
>
> Currently the summary at the top of vacuumlazy.c provides some specific
> details related to the new dead TID storage in 17. I plan to add a
> summary and maybe some sub-sections to contextualize it.

I like this idea. It's hard to understand vacuumlazy.c without already
understanding vacuumlazy.c, which isn't a good situation.


> ---
>  src/backend/access/heap/vacuumlazy.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
>
> diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
> index 7ce69953ba0..15a04c6b10b 100644
> --- a/src/backend/access/heap/vacuumlazy.c
> +++ b/src/backend/access/heap/vacuumlazy.c
> @@ -3,6 +3,17 @@
>   * vacuumlazy.c
>   *      Concurrent ("lazy") vacuuming.
>   *
> + * Heap relations are vacuumed in three main phases. In the first phase,
> + * vacuum scans relation pages, pruning and freezing tuples and saving dead
> + * tuples' TIDs in a TID store. If that TID store fills up or vacuum finishes
> + * scanning the relation, it progresses to the second phase: index vacuuming.
> + * After index vacuuming is complete, vacuum scans the blocks of the relation
> + * indicated by the TIDs in the TID store and reaps the dead tuples, freeing
> + * that space for future tuples. Finally, vacuum may truncate the relation if
> + * it has emptied pages at the end. XXX: this summary needs work.

Yea, at least we ought to mention that the phasing can be different when there
are no indexes and that the later phases can heuristically be omitted when
there aren't enough dead items.



> From f21f0bab1dbe675be4b4dddcb2eea486d8a69d36 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplageman@gmail.com>
> Date: Mon, 28 Oct 2024 12:15:08 -0400
> Subject: [PATCH v1 9/9] Eagerly scan all-visible pages to amortize aggressive
>  vacuum
>
> Introduce semi-aggressive vacuums, which scan some of the all-visible
> but not all-frozen pages in the relation to amortize the cost of an
> aggressive vacuum.

I wonder if "aggressive" is really the right terminology going
forward... Somehow it doesn't seem particularly descriptive anymore if, in
many workloads, almost all vacuums are going to be aggressive-ish.


> Because the goal is to freeze these all-visible pages, all-visible pages
> that are eagerly scanned and set all-frozen in the visibility map are
> considered successful eager scans and those not frozen are considered
> failed eager scans.
>
> If too many eager scans fail in a row, eager scanning is temporarily
> suspended until a later portion of the relation. Because the goal is to
> amortize aggressive vacuums, we cap the number of successes as well.
> Once we reach the maximum number of blocks successfully eager scanned
> and frozen, the semi-aggressive vacuum is downgraded to an unaggressive
> vacuum.
> ---
>  src/backend/access/heap/vacuumlazy.c | 327 +++++++++++++++++++++++----
>  src/backend/commands/vacuum.c        |  20 +-
>  src/include/commands/vacuum.h        |  27 ++-
>  src/tools/pgindent/typedefs.list     |   1 +
>  4 files changed, 326 insertions(+), 49 deletions(-)
>
> diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
> index 15a04c6b10b..adabb5ff5f1 100644
> --- a/src/backend/access/heap/vacuumlazy.c
> +++ b/src/backend/access/heap/vacuumlazy.c
> @@ -12,6 +12,40 @@
>   * that space for future tuples. Finally, vacuum may truncate the relation if
>   * it has emptied pages at the end. XXX: this summary needs work.
>   *
> + * Relation Scanning:
> + *
> + * Vacuum scans the heap relation, starting at the beginning and progressing
> + * to the end, skipping pages as permitted by their visibility status, vacuum
> + * options, and the aggressiveness level of the vacuum.
> + *
> + * When page skipping is enabled, unaggressive vacuums may skip scanning pages
> + * that are marked all-visible in the visibility map. We may choose not to
> + * skip pages if the range of skippable pages is below SKIP_PAGES_THRESHOLD.
> + *
> + * Semi-aggressive vacuums will scan skippable pages in an effort to freeze
> + * them and decrease the backlog of all-visible but not all-frozen pages that
> + * have to be processed to advance relfrozenxid and avoid transaction ID
> + * wraparound.
> + *
> + * We count it as a success when we are able to set an eagerly scanned page
> + * all-frozen in the VM and a failure when we are not able to set the page
> + * all-frozen.
> + *
> + * Because we want to amortize the overhead of freezing pages over multiple
> + * vacuums, we cap the number of successful eager scans to
> + * EAGER_SCAN_SUCCESS_RATE of the number of all-visible but not all-frozen
> + * pages at the beginning of the vacuum.
> + *
> + * On the assumption that different regions of the table are likely to contain
> + * similarly aged data, we use a localized failure cap instead of a global cap
> + * for the whole relation. The failure count is reset on each region of the
> + * table, comprised of RELSEG_SIZE blocks (or 1/4 of the table size for a
> + * small table). In each region, we tolerate MAX_SUCCESSIVE_EAGER_SCAN_FAILS
> + * before suspending eager scanning until the end of the region.

I'm a bit surprised to see such large regions. Why not something finer, in the
range of a few megabytes?  The FSM steers new tuples quite aggressively to the
start of the table, which means that in many workloads there will be old and
new data interspersed at the start of the table. Using RELSEG_SIZE sized
regions for semi-aggressive vacuuming will mean that we'll often not do any
semi-aggressive processing beyond the start of the relation, as we'll reach
the failure rate quickly.

I also find it layering-wise a bit weird to use RELSEG_SIZE, that's really imo
is just an md.c concept.


> +/*
> + * Semi-aggressive vacuums eagerly scan some all-visible but not all-frozen
> + * pages. Since our goal is to freeze these pages, an eager scan that fails to
> + * set the page all-frozen in the VM is considered to have "failed".
> + *
> + * On the assumption that different regions of the table tend to have
> + * similarly aged data, once we fail to freeze MAX_SUCCESSIVE_EAGER_SCAN_FAILS
> + * blocks, we suspend eager scanning until vacuum has progressed to another
> + * region of the table with potentially older data.
> + */
> +#define MAX_SUCCESSIVE_EAGER_SCAN_FAILS 1024

Can this really be a constant, given that the semi-aggressive regions are
shrunk for small tables?


> +/*
> + * An eager scan of a page that is set all-frozen in the VM is considered
> + * "successful". To spread out eager scanning across multiple semi-aggressive
> + * vacuums, we limit the number of successful eager scans (as well as the
> + * number of failures). The maximum number of successful eager scans is
> + * calculated as a ratio of the all-visible but not all-frozen pages at the
> + * beginning of the vacuum.
> + */
> +#define EAGER_SCAN_SUCCESS_RATE 0.2
>  typedef struct LVRelState

There imo should be newlines between the define and LVRelState.

>  {
>      /* Target heap relation and its indexes */
> @@ -153,8 +208,22 @@ typedef struct LVRelState
>      BufferAccessStrategy bstrategy;
>      ParallelVacuumState *pvs;
>
> -    /* Aggressive VACUUM? (must set relfrozenxid >= FreezeLimit) */
> -    bool        aggressive;
> +    /*
> +     * Whether or not this is an aggressive, semi-aggressive, or unaggressive
> +     * VACUUM. A fully aggressive vacuum must set relfrozenxid >= FreezeLimit
> +     * and therefore must scan every unfrozen tuple. A semi-aggressive vacuum
> +     * will scan a certain number of all-visible pages until it is downgraded
> +     * to an unaggressive vacuum.
> +     */
> +    VacAggressive aggressive;

Hm. A few comments:

- why is VacAggressive defined in vacuum.h? Isn't this fairly tightly coupled
  to heapam?
- Kinda feels like the type should be named VacAggressivness or such?


> +    /*
> +     * A semi-aggressive vacuum that has failed to freeze too many eagerly
> +     * scanned blocks in a row suspends eager scanning. unaggressive_to is the
> +     * block number of the first block eligible for resumed eager scanning.
> +     */
> +    BlockNumber unaggressive_to;

What's it set to otherwise? What is it set to in aggressive vacuums?


> @@ -227,6 +296,26 @@ typedef struct LVRelState
>      BlockNumber next_unskippable_block; /* next unskippable block */
>      bool        next_unskippable_allvis;    /* its visibility status */
>      Buffer        next_unskippable_vmbuffer;    /* buffer containing its VM bit */
> +
> +    /*
> +     * Count of skippable blocks eagerly scanned as part of a semi-aggressive
> +     * vacuum (for logging only).
> +     */
> +    BlockNumber eager_scanned;

I'd add _pages or such.


> +    /*
> +     * The number of eagerly scanned blocks a semi-aggressive vacuum failed to
> +     * freeze (due to age) in the current eager scan region. It is reset each
> +     * time we hit MAX_SUCCESSIVE_EAGER_SCAN_FAILS.
> +     */
> +    BlockNumber eager_scanned_failed_frozen;
> +
> +    /*
> +     * The remaining number of blocks a semi-aggressive vacuum will consider
> +     * eager scanning. This is initialized to EAGER_SCAN_SUCCESS_RATE of the
> +     * total number of all-visible but not all-frozen pages.
> +     */
> +    BlockNumber remaining_eager_scan_successes;

I think it might look better if you just bundled these into a struct like

      struct
      {
        BlockNumber scanned;
        BlockNumber failed_frozen;
        BlockNumber remaining_successes;
      } eager_pages;


> @@ -471,24 +569,49 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
>           * Force aggressive mode, and disable skipping blocks using the
>           * visibility map (even those set all-frozen)
>           */
> -        vacrel->aggressive = true;
> +        vacrel->aggressive = VAC_AGGRESSIVE;
>          skipwithvm = false;
>      }
>
>      vacrel->skipwithvm = skipwithvm;
> +    vacrel->eager_scanned = 0;
> +    vacrel->eager_scanned_failed_frozen = 0;
> +
> +    /*
> +     * Even if we successfully freeze them, we want to cap the number of
> +     * eagerly scanned blocks so that we spread out the overhead across
> +     * multiple vacuums. remaining_eager_scan_successes is only used by
> +     * semi-aggressive vacuums.
> +     */

Somehow the "them" feels like a dangling pointer that's initialized too late ;)


> +    visibilitymap_count(rel, &orig_rel_allvisible, &orig_rel_allfrozen);
> +    vacrel->remaining_eager_scan_successes =
> +        (BlockNumber) (EAGER_SCAN_SUCCESS_RATE * (orig_rel_allvisible - orig_rel_allfrozen));
>
>      if (verbose)
>      {
> -        if (vacrel->aggressive)
> -            ereport(INFO,
> -                    (errmsg("aggressively vacuuming \"%s.%s.%s\"",
> -                            vacrel->dbname, vacrel->relnamespace,
> -                            vacrel->relname)));
> -        else
> -            ereport(INFO,
> -                    (errmsg("vacuuming \"%s.%s.%s\"",
> -                            vacrel->dbname, vacrel->relnamespace,
> -                            vacrel->relname)));
> +        switch (vacrel->aggressive)
> +        {
> +            case VAC_UNAGGRESSIVE:
> +                ereport(INFO,
> +                        (errmsg("vacuuming \"%s.%s.%s\"",
> +                                vacrel->dbname, vacrel->relnamespace,
> +                                vacrel->relname)));
> +                break;
> +
> +            case VAC_AGGRESSIVE:
> +                ereport(INFO,
> +                        (errmsg("aggressively vacuuming \"%s.%s.%s\"",
> +                                vacrel->dbname, vacrel->relnamespace,
> +                                vacrel->relname)));
> +                break;
> +
> +            case VAC_SEMIAGGRESSIVE:
> +                ereport(INFO,
> +                        (errmsg("semiaggressively vacuuming \"%s.%s.%s\"",
> +                                vacrel->dbname, vacrel->relnamespace,
> +                                vacrel->relname)));
> +                break;
> +        }

Wonder if we should have a function that returns the aggressiveness of a
vacuum as an already translated string. There are other places where we emit
the aggressiveness as part of a message, and it's pretty silly to duplicate
most of the message.


> @@ -545,11 +668,13 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
>       * Non-aggressive VACUUMs may advance them by any amount, or not at all.
>       */
>      Assert(vacrel->NewRelfrozenXid == vacrel->cutoffs.OldestXmin ||
> -           TransactionIdPrecedesOrEquals(vacrel->aggressive ? vacrel->cutoffs.FreezeLimit :
> +           TransactionIdPrecedesOrEquals(vacrel->aggressive == VAC_AGGRESSIVE ?
> +                                         vacrel->cutoffs.FreezeLimit :
>                                           vacrel->cutoffs.relfrozenxid,
>                                           vacrel->NewRelfrozenXid));
>      Assert(vacrel->NewRelminMxid == vacrel->cutoffs.OldestMxact ||
> -           MultiXactIdPrecedesOrEquals(vacrel->aggressive ? vacrel->cutoffs.MultiXactCutoff :
> +           MultiXactIdPrecedesOrEquals(vacrel->aggressive == VAC_AGGRESSIVE ?
> +                                       vacrel->cutoffs.MultiXactCutoff :
>                                         vacrel->cutoffs.relminmxid,
>                                         vacrel->NewRelminMxid));
>      if (vacrel->skippedallvis)

These are starting to feel somewhat complicated. Wonder if it'd be easier to
read if they were written as normal ifs.


> +/*
> + * Helper to decrement a block number to 0 without wrapping around.
> + */
> +static void
> +decrement_blkno(BlockNumber *block)
> +{
> +    if ((*block) > 0)
> +        (*block)--;
> +}

Huh.


> @@ -956,11 +1094,23 @@ lazy_scan_heap(LVRelState *vacrel)
>          if (!got_cleanup_lock)
>              LockBuffer(buf, BUFFER_LOCK_SHARE);
>
> +        page_freezes = vacrel->vm_page_freezes;
> +
>          /* Check for new or empty pages before lazy_scan_[no]prune call */
>          if (lazy_scan_new_or_empty(vacrel, buf, blkno, page, !got_cleanup_lock,
>                                     vmbuffer))
>          {
>              /* Processed as new/empty page (lock and pin released) */
> +
> +            /* count an eagerly scanned page as a failure or a success */
> +            if (was_eager_scanned)
> +            {
> +                if (vacrel->vm_page_freezes > page_freezes)
> +                    decrement_blkno(&vacrel->remaining_eager_scan_successes);
> +                else
> +                    vacrel->eager_scanned_failed_frozen++;
> +            }
> +
>              continue;
>          }

Maybe I'm confused, but ISTM that remaining_eager_scan_successes shouldn't
actually be a BlockNumber, given it doesn't actually indicates a specific
block...

I don't understand why we would sometimes want to treat empty pages as a
failure? They can't fail to be frozen, no?

I'm not sure it makes sense to count them as progress towards the success
limit either - afaict we just rediscovered free space is the table. That's imo
separate from semi-aggressive freezing.


Storing page_freezes as a copy of vacrel->vm_page_freezes and then checking if
that increased feels like a somewhat ugly way of tracking if freezing
happend. There's no more direct way?

Why is decrement_blkno() needed? How can we ever get into negative territory?
Shouldn't eager scanning have been disabled when
remaining_eager_scan_successes reaches zero and thus prevent
remaining_eager_scan_successes from ever going below zero?



> @@ -1144,7 +1310,65 @@ heap_vac_scan_next_block(LVRelState *vacrel,
>           */
>          bool        skipsallvis;
>
> -        find_next_unskippable_block(vacrel, &skipsallvis);
> +        /*
> +         * Figure out if we should disable eager scan going forward or
> +         * downgrade to an unaggressive vacuum altogether.
> +         */
> +        if (vacrel->aggressive == VAC_SEMIAGGRESSIVE)
> +        {
> +            /*
> +             * If we hit our success limit, there is no need to eagerly scan
> +             * any additional pages. Downgrade the vacuum to unaggressive.
> +             */
> +            if (vacrel->remaining_eager_scan_successes == 0)
> +                vacrel->aggressive = VAC_UNAGGRESSIVE;
> +
> +            /*
> +             * If we hit the max number of failed eager scans for this region
> +             * of the table, figure out where the next eager scan region
> +             * should start. Eager scanning is effectively disabled until we
> +             * scan a block in that new region.
> +             */
> +            else if (vacrel->eager_scanned_failed_frozen >=
> +                     MAX_SUCCESSIVE_EAGER_SCAN_FAILS)
> +            {
> +                BlockNumber region_size,
> +                            offset;
> +

Why are we doing this logic here, rather than after incrementing
eager_scanned_failed_frozen? Seems that'd limit the amount of times we need to
run through this logic substantially?


> +                /*
> +                 * On the assumption that different regions of the table are
> +                 * likely to have similarly aged data, we will retry eager
> +                 * scanning again later. For a small table, we'll retry eager
> +                 * scanning every quarter of the table. For a larger table,
> +                 * we'll consider eager scanning again after processing
> +                 * another region's worth of data.
> +                 *
> +                 * We consider the region to start from the first failure, so
> +                 * calculate the block to restart eager scanning from there.
> +                 */
> +                region_size = Min(RELSEG_SIZE, (vacrel->rel_pages / 4));
> +
> +                offset = vacrel->eager_scanned_failed_frozen % region_size;
> +
> +                Assert(vacrel->eager_scanned > 0);
> +
> +                vacrel->unaggressive_to = vacrel->current_block + (region_size - offset);
> +                vacrel->eager_scanned_failed_frozen = 0;
> +            }
> +        }

I'd put the logic for this in a separate function, feels complicated and
independent enough for it to be worth not having it in heap_vac_scan_next_block().


> @@ -1199,6 +1423,11 @@ heap_vac_scan_next_block(LVRelState *vacrel,
>   * The next unskippable block and its visibility information is updated in
>   * vacrel.
>   *
> + * consider_eager_scan indicates whether or not we should consider scanning
> + * all-visible but not all-frozen blocks. was_eager_scanned is set to true if
> + * we decided to eager scan a block. In this case, next_unskippable_block is
> + * set to that block number.
> + *
>   * Note: our opinion of which blocks can be skipped can go stale immediately.
>   * It's okay if caller "misses" a page whose all-visible or all-frozen marking
>   * was concurrently cleared, though.  All that matters is that caller scan all
> @@ -1208,7 +1437,11 @@ heap_vac_scan_next_block(LVRelState *vacrel,
>   * to skip such a range is actually made, making everything safe.)
>   */
>  static void
> -find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
> +find_next_unskippable_block(
> +                            LVRelState *vacrel,
> +                            bool consider_eager_scan,
> +                            bool *was_eager_scanned,
> +                            bool *skipsallvis)

I don't think we ever have function definitions without a parameter on the
line with the function name.

Hm - isn't consider_eager_scan potentially outdated after
find_next_unskippable_block() iterated over a bunch of blocks? If
consider_eager_scan is false, this could very well go into the next
"semi-aggressive region", where consider_eager_scan should have been
re-enabled, no?


Greetings,

Andres Freund



Re: Eagerly scan all-visible pages to amortize aggressive vacuum

От
Robert Treat
Дата:
On Fri, Dec 13, 2024 at 5:53 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
> On Thu, Nov 7, 2024 at 10:42 AM Andres Freund <andres@anarazel.de> wrote:
> > Hi,
>
> Thanks for the review!
> Attached v2 should address your feedback and also fixes a few bugs with v1.
>
> I've still yet to run very long-running benchmarks. I did start running more
> varied benchmark scenarios -- but all still under two hours. So far, the
> behavior is as expected.
>
> > On 2024-11-01 19:35:22 -0400, Melanie Plageman wrote:
> > > Because we want to amortize our eager scanning across a few vacuums,
> > > we cap the maximum number of successful eager scans to a percentage of
> > > the total number of all-visible but not all-frozen pages in the table
> > > (currently 20%).
> >
> > One thing worth mentioning around here seems that we currently can't
> > partially-aggressively freeze tuples that are "too young" and how that
> > interacts with everything else.
>
> I'm not sure I know what you mean. Are you talking about how we don't freeze
> tuples that are visible to everyone but younger than the freeze limit?
>

FWIW that was my interpretation of his statement, though I had a
clarifying question around this topic myself, which is, from a user
perspective when would we expect to see these eager vacuums? ISTM we
would be doing 'normal vacuums' prior to vacuum_freeze_min_age, and
'aggressive vacuums' after (autovacuum_freeze_max_age -
vacuum_freeze_min_age), so the expectation is that 'eager vacuums'
would fall into the ~50 million transaction window between those two
points (assuming defaults, which admittedly I don't use). Does that
sound right?

> > > In the attached chart.png, you can see the vm_page_freezes climbing
> > > steadily with the patch, whereas on master, there are sudden spikes
> > > aligned with the aggressive vacuums. You can also see that the number
> > > of pages that are all-visible but not all-frozen grows steadily on
> > > master until the aggressive vacuum. This is vacuum's "backlog" of
> > > freezing work.
> >
> > What's the reason for all-visible-but-not-all-frozen to increase to a higher
> > value initially than where it later settles?
>
> My guess is that it has to do with shorter, more frequent vacuums at the
> beginning of the benchmark when the relation is smaller (and we haven't
> exceeded shared buffers or memory yet). They are setting pages all-visible, but
> we haven't used up enough xids yet to qualify for an eager vacuum.
>
> The peak of AVnAF pages aligns with the start of the first eager vacuum. We
> don't do any eager scanning until we are sure there is some data requiring
> freeze (see this criteria):
>
>     if (TransactionIdIsNormal(vacrel->cutoffs.relfrozenxid) &&
>         TransactionIdPrecedesOrEquals(vacrel->cutoffs.relfrozenxid,
>                                       vacrel->cutoffs.FreezeLimit))
>
> Once we have used up enough xids to qualify for the first eager vacuum, the
> number of AVnAF pages starts to go down.
>
> It would follow from this theory that we would see a build-up like this after
> each relfrozenxid advancement (so after the next aggressive vacuum).
>
> But I think we don't see this because the vacuums are longer by the time
> aggressive vacuums have started, so we end up using up enough XIDs between
> vacuums to qualify for eager vacuums on vacuums after the aggressive vacuum.
>
> That is just my theory though.
>

I like your theory but it's a little too counterintuitive for me :-)

I would expect we'd see a change in the vacuum time & rate after the
first aggressive scan, which incidentally your graph does show for
master, but it looks a bit too smooth on your original patchset. I
guess there could be a sweet spot where the rate of changes fit
perfectly with regards to the line between lazy / eager vacuums, but
hard to imagine you were that lucky.

> > > Below is the comparative WAL volume, checkpointer and background
> > > writer writes, reads and writes done by all other backend types, time
> > > spent vacuuming in milliseconds, and p99 latency. Notice that overall
> > > vacuum IO time is substantially lower with the patch.
> > >
> > >    version     wal  cptr_bgwriter_w   other_rw  vac_io_time  p99_lat
> > >     patch   770 GB          5903264  235073744   513722         1
> > >     master  767 GB          5908523  216887764  1003654        16
> >
> > Hm. It's not clear to me why other_rw is higher with the patch? After all,
> > given the workload, there's no chance of unnecessarily freezing tuples? Is
> > that just because at the end of the benchmark there's leftover work?
>
> So other_rw is mostly client backend and autovacuum reads and writes. It is
> higher with the patch because there are actually more vacuum reads and writes
> with the patch than on master. However the autovacuum worker read and write
> time is much lower. Those blocks are more often in shared buffers, I would
> guess.
>
<snip>
> From e36b4fac345be44954410c4f0e61467dc0f49a72 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplageman@gmail.com>
> Date: Thu, 12 Dec 2024 16:44:37 -0500
> Subject: [PATCH v2 10/10] Eagerly scan all-visible pages to amortize
> aggressive vacuum
>
> @@ -27,11 +27,37 @@
>  * to the end, skipping pages as permitted by their visibility status, vacuum
>  * options, and the eagerness level of the vacuum.
>  *
> + * There are three vacuum eagerness levels: normal vacuum, eager vacuum, and
> + * aggressive vacuum.
> + *
>  * When page skipping is enabled, non-aggressive vacuums may skip scanning
> - * pages that are marked all-visible in the visibility map. We may choose not
> + * pages that are marked all-visible in the visibility map. It may choose not
>  * to skip pages if the range of skippable pages is below
>  * SKIP_PAGES_THRESHOLD.
>  *

I find the above confusing since page skipping is the regular activity
but referred to in the negative, and because you use the term
"non-aggressive vacuums" which in prior releases only mapped to
"normal" vacuums, but now would map to both "normal" and "eager"
vacuums, and it isn't clear that is desired (in my head anyway). Does
the following still convey what you meant (and hopefully work better
with the paragraphs that follow)?

When page skipping is not disabled, a normal vacuum may skip scanning
pages that are marked all-visible in the visibility map if the range
of skippable pages is below SKIP_PAGES_THRESHOLD.

> + * Eager vacuums will scan skippable pages in an effort to freeze them and
> + * decrease the backlog of all-visible but not all-frozen pages that have to
> + * be processed to advance relfrozenxid and avoid transaction ID wraparound.
> + *

> @@ -170,6 +197,51 @@ typedef enum
>       VACUUM_ERRCB_PHASE_TRUNCATE,
> } VacErrPhase;
>
> +/*
> + * Eager vacuums scan some all-visible but not all-frozen pages. Since our
> + * goal is to freeze these pages, an eager scan that fails to set the page
> + * all-frozen in the VM is considered to have "failed".
> + *
> + * On the assumption that different regions of the table tend to have
> + * similarly aged data, once we fail to freeze EAGER_SCAN_MAX_FAILS_PER_REGION
> + * blocks in a region of size EAGER_SCAN_REGION_SIZE, we suspend eager
> + * scanning until vacuum has progressed to another region of the table with
> + * potentially older data.
> + */
> +#define EAGER_SCAN_REGION_SIZE 4096
> +#define EAGER_SCAN_MAX_FAILS_PER_REGION 128
> +
> +/*
> + * An eager scan of a page that is set all-frozen in the VM is considered
> + * "successful". To spread out eager scanning across multiple eager vacuums,
> + * we limit the number of successful eager page scans. The maximum number of
> + * successful eager page scans is calculated as a ratio of the all-visible but
> + * not all-frozen pages at the beginning of the vacuum.
> + */
> +#define EAGER_SCAN_SUCCESS_RATE 0.2
> +
> +/*
> + * The eagerness level of a vacuum determines how many all-visible but
> + * not all-frozen pages it eagerly scans.
> + *
> + * A normal vacuum (eagerness VAC_NORMAL) scans no all-visible pages (with the
> + * exception of those scanned due to SKIP_PAGES_THRESHOLD).
> + *
> + * An eager vacuum (eagerness VAC_EAGER) scans a number of pages up to a limit
> + * based on whether or not it is succeeding or failing. An eager vacuum is
> + * downgraded to a normal vacuum when it hits its success quota. An aggressive
> + * vacuum cannot be downgraded. No eagerness level is ever upgraded.
> + *

At the risk of being overly nit-picky... eager vacuums scan their
subset of all-visible pages "up to a limit" based solely on the
success ratio. In the case of (excessive) failures, there is no limit
to the number of pages scanned, only a pause in the pages scanned
until the next region.

> + * An aggressive vacuum (eagerness EAGER_FULL) must scan all all-visible but
> + * not all-frozen pages.
> + */

I think the above should be VAC_AGGRESSIVE vs EAGER_FULL, no?

> +typedef enum VacEagerness
> +{
> +     VAC_NORMAL,
> +     VAC_EAGER,
> +     VAC_AGGRESSIVE,
> +} VacEagerness;
> +

> @@ -516,25 +772,20 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
>                * Force aggressive mode, and disable skipping blocks using the
>                * visibility map (even those set all-frozen)
>                */
> -             vacrel->aggressive = true;
> +             aggressive = true;
>               skipwithvm = false;
>       }
>
>       vacrel->skipwithvm = skipwithvm;
>
> +     heap_vacuum_set_up_eagerness(vacrel, aggressive);
> +
>       if (verbose)
> -     {
> -             if (vacrel->aggressive)
> -                     ereport(INFO,
> -                                     (errmsg("aggressively vacuuming \"%s.%s.%s\"",
> -                                                     vacrel->dbname, vacrel->relnamespace,
> -                                                     vacrel->relname)));
> -             else
> -                     ereport(INFO,
> -                                     (errmsg("vacuuming \"%s.%s.%s\"",
> -                                                     vacrel->dbname, vacrel->relnamespace,
> -                                                     vacrel->relname)));
> -     }
> +             ereport(INFO,
> +                             (errmsg("%s of \"%s.%s.%s\"",
> +                                             vac_eagerness_description(vacrel->eagerness),
> +                                             vacrel->dbname, vacrel->relnamespace,
> +                                             vacrel->relname)));
>
>       /*
>        * Allocate dead_items memory using dead_items_alloc.  This handles

One thing I am wondering about is that since we actually modify
vacrel->eagerness during the "success downgrade" cycle, a single
vacuum run could potentially produce messages with both eager vacuum
and normal vacuum language. I don't think that'd be a problem in the
above spot, but wondering if it might be elsewhere (maybe in
pg_stat_activity?).


Robert Treat
https://xzilla.net



Re: Eagerly scan all-visible pages to amortize aggressive vacuum

От
Nazir Bilal Yavuz
Дата:
Hi,

Thank you for working on this!

On Sat, 14 Dec 2024 at 01:53, Melanie Plageman
<melanieplageman@gmail.com> wrote:
>
> On Thu, Nov 7, 2024 at 10:42 AM Andres Freund <andres@anarazel.de> wrote:
> >
> > Hi,
>
> Thanks for the review!
> Attached v2 should address your feedback and also fixes a few bugs with v1.

Here are couple of code comments:

=== [PATCH v2 07/10] ===

It took me a while to understand that heap_vac_scan_next_block() loops
until rel_pages. What do you think about adding
Assert(vacrel->current_block == rel_pages) before the
pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED,
rel_pages) and having a comment on main loop should process blocks
until rel_pages?

=== [PATCH v2 09/10] ===

+ * If there are no indexes or index scanning is disabled, phase II may be
+ * skipped. If phase I identified very few dead index entries, vacuum may skip
+ * phases II and III. Index vacuuming deletes the dead index entries from the
+ * TID store.

phase III is not mentioned in the previous comments. It could be
better to first explain what phase III is before mentioning it.

+ * After index vacuuming is complete, vacuum scans the blocks of the relation
+ * indicated by the TIDs in the TID store and reaps the dead tuples, freeing
+ * that space for future tuples. Finally, vacuum may truncate the relation if
+ * it has emptied pages at the end.
+ *
+ * After finishing all phases of work, vacuum updates relation statistics in
+ * pg_class and the cumulative statistics subsystem.

There is no clear definition of phase III here as well. I can not
understand what phase III is and which parts the vacuum may skip.

=== [PATCH v2 10/10] ===

+        /*
+         * The number of eagerly scanned blocks an eager vacuum failed to
+         * freeze (due to age) in the current eager scan region. Eager vacuums
+         * reset it to EAGER_SCAN_MAX_FAILS_PER_REGION each time they enter a
+         * new region of the relation. Aggressive vacuums also decrement this
+         * coutner but it is initialized to # blocks in the relation.
+         */

s/coutner/counter

 /* non-export function prototypes */
+
 static void lazy_scan_heap(LVRelState *vacrel);

Extra blank line.

+    if (TransactionIdIsNormal(vacrel->cutoffs.relfrozenxid) &&
+        TransactionIdPrecedesOrEquals(vacrel->cutoffs.relfrozenxid,
+                                      vacrel->cutoffs.FreezeLimit))
+        oldest_unfrozen_requires_freeze = true;
+
+    if (MultiXactIdIsValid(vacrel->cutoffs.relminmxid) &&
+        MultiXactIdPrecedesOrEquals(vacrel->cutoffs.relminmxid,
+                                    vacrel->cutoffs.MultiXactCutoff))
+        oldest_unfrozen_requires_freeze = true;

You may want to short-circuit the second if condition with
!oldest_unfrozen_requires_freeze but it may increase the complexity,
up to you.

+    vacrel->eager_pages.remaining_fails = EAGER_SCAN_MAX_FAILS_PER_REGION *
+        (1 - vacrel->next_eager_scan_region_start / EAGER_SCAN_REGION_SIZE);

This will always return EAGER_SCAN_MAX_FAILS_PER_REGION or 0 because
of the integer dividing.

+    if (aggressive)
+    {
+        vacrel->eagerness = VAC_AGGRESSIVE;
+
+        /*
+         * An aggressive vacuum must scan every all-visible page to safely
+         * advance the relfrozenxid and/or relminmxid. As such, there is no
+         * cap to the number of allowed successes or failures.
+         */
+        vacrel->eager_pages.remaining_fails = vacrel->rel_pages + 1;
+        vacrel->eager_pages.remaining_successes = vacrel->rel_pages + 1;
+        return;
+    }
...
...
+        if (was_eager_scanned)
+        {
+            if (vm_page_frozen)
+            {
+                Assert(vacrel->eager_pages.remaining_successes > 0);
+                vacrel->eager_pages.remaining_successes--;
+
+                if (vacrel->eager_pages.remaining_successes == 0)
+                {
+                    Assert(vacrel->eagerness == VAC_EAGER);

My initial thought was that since *was_eager_scanned* is true,
Assert(vacrel->eagerness == VAC_EAGER) should be under the top if
condition (I assumed that was_eager_scanned is only true for eager
vacuums, not for aggressive vacuums too) but I see your point here.
Since you set vacrel->eager_pages.remaining_successes to
vacrel->rel_pages + 1, vacrel->eager_pages.remaining_successes can not
reach 0 although all pages are processed as successful. I think
comment is needed in both places to explain why
vacrel->eager_pages.[remaining_fails | remaining_successes] is set to
vacrel->rel_pages + 1 and why vacrel->eagerness should be VAC_EAGER
when was_eager_scanned is true and
vacrel->eager_pages.remaining_successes is 0.

--
Regards,
Nazir Bilal Yavuz
Microsoft



Re: Eagerly scan all-visible pages to amortize aggressive vacuum

От
Melanie Plageman
Дата:
Thanks for the review!

On Tue, Dec 17, 2024 at 10:57 AM Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:
>
> Here are couple of code comments:
>
> === [PATCH v2 07/10] ===
>
> It took me a while to understand that heap_vac_scan_next_block() loops
> until rel_pages. What do you think about adding
> Assert(vacrel->current_block == rel_pages) before the
> pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED,
> rel_pages) and having a comment on main loop should process blocks
> until rel_pages?

I added these to the v3 I sent in [1]

> === [PATCH v2 09/10] ===
>
> + * If there are no indexes or index scanning is disabled, phase II may be
> + * skipped. If phase I identified very few dead index entries, vacuum may skip
> + * phases II and III. Index vacuuming deletes the dead index entries from the
> + * TID store.
>
> phase III is not mentioned in the previous comments. It could be
> better to first explain what phase III is before mentioning it.
>
> + * After index vacuuming is complete, vacuum scans the blocks of the relation
> + * indicated by the TIDs in the TID store and reaps the dead tuples, freeing
> + * that space for future tuples. Finally, vacuum may truncate the relation if
> + * it has emptied pages at the end.
> + *
> + * After finishing all phases of work, vacuum updates relation statistics in
> + * pg_class and the cumulative statistics subsystem.
>
> There is no clear definition of phase III here as well. I can not
> understand what phase III is and which parts the vacuum may skip.

I've attempted to address this in v3.

> === [PATCH v2 10/10] ===
>
> +        /*
> +         * The number of eagerly scanned blocks an eager vacuum failed to
> +         * freeze (due to age) in the current eager scan region. Eager vacuums
> +         * reset it to EAGER_SCAN_MAX_FAILS_PER_REGION each time they enter a
> +         * new region of the relation. Aggressive vacuums also decrement this
> +         * coutner but it is initialized to # blocks in the relation.
> +         */
>
> s/coutner/counter

Fixed

>  /* non-export function prototypes */
> +
>  static void lazy_scan_heap(LVRelState *vacrel);
>
> Extra blank line.

Fixed

> +    if (TransactionIdIsNormal(vacrel->cutoffs.relfrozenxid) &&
> +        TransactionIdPrecedesOrEquals(vacrel->cutoffs.relfrozenxid,
> +                                      vacrel->cutoffs.FreezeLimit))
> +        oldest_unfrozen_requires_freeze = true;
> +
> +    if (MultiXactIdIsValid(vacrel->cutoffs.relminmxid) &&
> +        MultiXactIdPrecedesOrEquals(vacrel->cutoffs.relminmxid,
> +                                    vacrel->cutoffs.MultiXactCutoff))
> +        oldest_unfrozen_requires_freeze = true;
>
> You may want to short-circuit the second if condition with
> !oldest_unfrozen_requires_freeze but it may increase the complexity,
> up to you.

Good point. Done.

> +    vacrel->eager_pages.remaining_fails = EAGER_SCAN_MAX_FAILS_PER_REGION *
> +        (1 - vacrel->next_eager_scan_region_start / EAGER_SCAN_REGION_SIZE);
>
> This will always return EAGER_SCAN_MAX_FAILS_PER_REGION or 0 because
> of the integer dividing.

Ah, wow, thanks so much for catching that!

> +    if (aggressive)
> +    {
> +        vacrel->eagerness = VAC_AGGRESSIVE;
> +
> +        /*
> +         * An aggressive vacuum must scan every all-visible page to safely
> +         * advance the relfrozenxid and/or relminmxid. As such, there is no
> +         * cap to the number of allowed successes or failures.
> +         */
> +        vacrel->eager_pages.remaining_fails = vacrel->rel_pages + 1;
> +        vacrel->eager_pages.remaining_successes = vacrel->rel_pages + 1;
> +        return;
> +    }
> ...
> ...
> +        if (was_eager_scanned)
> +        {
> +            if (vm_page_frozen)
> +            {
> +                Assert(vacrel->eager_pages.remaining_successes > 0);
> +                vacrel->eager_pages.remaining_successes--;
> +
> +                if (vacrel->eager_pages.remaining_successes == 0)
> +                {
> +                    Assert(vacrel->eagerness == VAC_EAGER);
>
> My initial thought was that since *was_eager_scanned* is true,
> Assert(vacrel->eagerness == VAC_EAGER) should be under the top if
> condition (I assumed that was_eager_scanned is only true for eager
> vacuums, not for aggressive vacuums too) but I see your point here.
> Since you set vacrel->eager_pages.remaining_successes to
> vacrel->rel_pages + 1, vacrel->eager_pages.remaining_successes can not
> reach 0 although all pages are processed as successful. I think
> comment is needed in both places to explain why
> vacrel->eager_pages.[remaining_fails | remaining_successes] is set to
> vacrel->rel_pages + 1 and why vacrel->eagerness should be VAC_EAGER
> when was_eager_scanned is true and
> vacrel->eager_pages.remaining_successes is 0.

Makes sense. I've attempted to clarify as you suggest in v3.

- Melanie

[1] https://www.postgresql.org/message-id/CAAKRu_Zni_idCUyKTBteRM-G5X1qiB9mf75rZGtHpt%2Bnk1z4Gg%40mail.gmail.com



Re: Eagerly scan all-visible pages to amortize aggressive vacuum

От
Robert Haas
Дата:
On Tue, Dec 17, 2024 at 5:54 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
> Makes sense. I've attempted to clarify as you suggest in v3.

I would just commit 0001. There's nothing to be gained by waiting around.

I don't care about 0002 much. It doesn't seem particularly better or
worse. I would suggest that if you want to do this, maybe don't split
up this:

    vacrel->blkno = InvalidBlockNumber;
    if (BufferIsValid(vmbuffer))
        ReleaseBuffer(vmbuffer);

You could put the moved code before or after after that instead of the
middle of it. Unless there's some reason it makes more sense in the
middle.

I don't care about 0003 much, either. It looks correct to me, but
whether it's better is subjective.

I dislike 0004 as presented. Reducing the scope of blkno is a false
economy; the function doesn't do much of anything interesting after
the main loop. And why do we want to report rel_pages to the progress
reporting machinery instead of blkno? I'd rather that the code report
where it actually ended up (blkno) rather than reporting where it
thinks it must have ended up (rel_pages).

I agree that the assert that 0005 replaces is confusing, but replacing
8 lines of code with 37 is not an improvement in my book.

I like 0006. The phase I-II-III terminology doesn't appear anywhere in
the code (at least, not to my knowledge) but we speak of it that way
so frequently in email and in conversation that mentioning it
someplace that a future developer might find seems advisable. I think
it would be worth mentioning in the second paragraph of the comment
that we may resume phase I after completing phase III if we entered
phase II due to lack of memory. I'm not sure what the right way to
phrase this is -- in a sense, this possibility means that these aren't
really phases at all, however much we may speak of them that way. But
as I say, we talk about them that way all the time, so it's good that
this is finally adding comments to match.

Regarding 0007:

- Why do we measure failure as an integer total but success as a
percentage? Maybe the thought here is that failures are counted per
region but successes are counted globally across the relation, but
then I wonder if that's what we want, and if so, whether we need some
comments to explain why we want that rather than doing both per
region.

- I do not like the nested struct definition for eager_pages. Either
define the struct separately, give it a name, and then refer to that
name here, or just define the elements individually, and maybe give
them a common prefix or something. I don't think what you have here is
a style we normally use. As you can see, pgindent is not particularly
fond of it. It seems particularly weird given that you didn't even put
all the stuff related to eagerness inside of it.

- It's a little weird that you mostly treat eager vacuums as an
intermediate position between aggressive and normal, but then we
decide whether we're eager in a different place than we decide whether
we're aggressive.

- On a related note, won't most vacuums be VAC_EAGER rather than
VAC_NORMAL, thus making VAC_NORMAL a misnomer? I wonder if it's better
to merge eager and normal together, and just treat the cases where we
judge eager scanning not worthwhile as a mild special case of an
otherwise-normal vacuum. It's important that a user can tell what
happened from the log message, but it doesn't seem absolutely
necessary for the start-of-vacuum message to make that clear. It could
just be that %u all-visible scanned => 0 all-visible scanned means we
didn't end up being at all eager.

- Right after the comment "Produce distinct output for the corner case
all the same, just in case," you've changed it so that there's no
longer an if-statement. While you haven't really falsified the comment
in any deep sense, some rewording is probably in order. Also, the
first sentence of this comment overtly contradicts itself without
really explaining anything useful. That's a preexisting problem for
which you're not responsible, but maybe it makes sense to fix it while
you're adjusting this comment.

- Perhaps it's unfair of me, but I think I would have hoped for an
acknowledgement in this commit message, considering that I believe I
was the one who suggested breaking the relation into logical regions,
trying to freeze a percentage of the all-visible-but-not-all-frozen
pages, and capping both successes and failures. Starting the region at
a random offset wasn't my idea, and the specific thresholds you've
chosen were not the ones I suggested, and limiting successes globally
rather than per-region was not what I think I had in mind, and I don't
mean to take away from everything you've done to move this forward,
but unless I am misunderstanding the situation, this particular patch
(0007) is basically an implementation of an algorithm that, as far as
I know, I was the first to propose.

- Which of course also means that I tend to like the idea, but also
that I'm biased. Still, what is the reasonable alternative to this
patch? I have a hard time believing that it's "do nothing". As far as
I know, pretty much everyone agrees that the large burst of work that
tends to occur when an aggressive vacuum kicks off is extremely
problematic, particularly but not only the first time it kicks off on
a table or group of tables that may have accumulated many
all-visible-but-not-all-frozen pages. This patch might conceivably err
in moving that work too aggressively to earlier vacuums, thus making
those vacuums too expensive or wasting work if the pages end up being
modified again; or it might conceivably err in moving work
insufficiently aggressively to earlier vacuums, leaving too much
remaining work when the aggressive vacuum finally happens. In fact, I
would be surprised if it doesn't have those problems in some
situations. But it could have moderately severe cases of those
problems and still be quite a bit better than what we have now
overall.

- So,I think we should avoid fine-tuning this and try to understand if
there's anything wrong with the big picture. Can we imagine a user who
is systematically unhappy with this change? Like, not a user who got
hosed once because of some bad luck, but someone who is constantly and
predictably getting hosed? They'd need to be accumulating lots of
all-visible-not-all-frozen pages in relatively large tables on a
regular basis, but then I guess they'd need to either drop the tables
before the aggressive vacuum happened, or they'd need to render the
pages not-all-visible again before the aggressive vacuum would have
happened. I'm not entirely sure how possible that is. My best guess is
that it's possible if the timing of your autovacuum runs is
particularly poor -- you just need to load some data, vacuum it early
enough that the XIDs are still young so it doesn't get frozen, then
have the eager vacuum hit it, and then update it. That doesn't seem
impossible, but I'm not sure if it's possible to make it happen often
enough and severely enough to really cause a problem. And I'm not sure
we're going to find that out before this is committed, so that brings
me to:

- If this patch is going to go into the source tree in PostgreSQL 18,
it should do that soon. This is a bad patch to be committing in late
March. I propose that it should be shipped by end of January or wait a
year. Even if it goes into the tree by the end of January, there is a
chance we'll find problems that we can't fix quickly and have to
revert the whole thing. But that becomes much more likely if it is
committed close to feature freeze. There is a certain kind of patch -
and I think it includes this patch - where you just can't ever really
know what the problems people are going to hit IRL are until it's
committed and people try things and hit problems, perhaps not even as
part of an intent to test this patch, but just testing in general. If
we don't find those problems with enough time remaining to react to
them in a thoughtful way, that's where full reverts start to happen.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Eagerly scan all-visible pages to amortize aggressive vacuum

От
Robert Treat
Дата:
On Tue, Dec 17, 2024 at 5:51 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
>
> Thanks for taking a look!
>
> I've rebased and attached an updated v3 which also addresses review feedback.
>
> On Sun, Dec 15, 2024 at 1:05 AM Robert Treat <rob@xzilla.net> wrote:
> > On Fri, Dec 13, 2024 at 5:53 PM Melanie Plageman
> > <melanieplageman@gmail.com> wrote:
<snip>
>
> So, all vacuums are eager after the first eager vacuum (in this
> workload).

I have been thinking about this statement and a sense of awkwardness
that I felt about how this implementation came together which I think
Robert captured well with his statement "- It's a little weird that
you mostly treat eager vacuums as an intermediate position between
aggressive and normal, but then we decide whether we're eager in a
different place than we decide whether
we're aggressive."

It feels to me like eager vacuums are not so much a distinct thing,
but that, like how all vacuum do index cleanup unless told not to, all
vacuums are optimistic that they will convert avp to afp, only we bail
out quickly if the table is below vacuum_freeze_min_age and we throw
the limits out if it is above autovacuum_freeze_max_age, similar to
how we manage vacuum cost limit.

> > > + * The eagerness level of a vacuum determines how many all-visible but
> > > + * not all-frozen pages it eagerly scans.
> > > + *
> > > + * A normal vacuum (eagerness VAC_NORMAL) scans no all-visible pages (with the
> > > + * exception of those scanned due to SKIP_PAGES_THRESHOLD).
> > > + *
> > > + * An eager vacuum (eagerness VAC_EAGER) scans a number of pages up to a limit
> > > + * based on whether or not it is succeeding or failing. An eager vacuum is
> > > + * downgraded to a normal vacuum when it hits its success quota. An aggressive
> > > + * vacuum cannot be downgraded. No eagerness level is ever upgraded.
> > > + *
> >
> > At the risk of being overly nit-picky... eager vacuums scan their
> > subset of all-visible pages "up to a limit" based solely on the
> > success ratio. In the case of (excessive) failures, there is no limit
> > to the number of pages scanned, only a pause in the pages scanned
> > until the next region.
>
> Yes, this is a good point. I've changed it to specifically indicate
> the limit is based on successes. However, I feel like I should either
> not mention the success limit here or mention the regional limiting
> based on failures -- otherwise it is confusing. Can you think of a
> wording that would be good for this comment?
>

I don't feel like I came up with anything concise :-)

I think the gist is something like "a given vacuum will scan a number
of pages up to either a limit based on successes, after which it is
downgraded to a normal vacuum, or a subset of pages based on failures
across different regions within the heap." but even that feels like we
are relying on people already understanding what is going on rather
than defining/explaining it.

> > > + * An aggressive vacuum (eagerness EAGER_FULL) must scan all all-visible but
> > > + * not all-frozen pages.
> > > + */
> >
> > I think the above should be VAC_AGGRESSIVE vs EAGER_FULL, no?
>
> Yep, thanks! Fixed.
>
> > >       vacrel->skipwithvm = skipwithvm;
> > >
> > > +     heap_vacuum_set_up_eagerness(vacrel, aggressive);
> > > +
> > >       if (verbose)
> > > -     {
> > > -             if (vacrel->aggressive)
> > > -                     ereport(INFO,
> > > -                                     (errmsg("aggressively vacuuming \"%s.%s.%s\"",
> > > -                                                     vacrel->dbname, vacrel->relnamespace,
> > > -                                                     vacrel->relname)));
> > > -             else
> > > -                     ereport(INFO,
> > > -                                     (errmsg("vacuuming \"%s.%s.%s\"",
> > > -                                                     vacrel->dbname, vacrel->relnamespace,
> > > -                                                     vacrel->relname)));
> > > -     }
> > > +             ereport(INFO,
> > > +                             (errmsg("%s of \"%s.%s.%s\"",
> > > +                                             vac_eagerness_description(vacrel->eagerness),
> > > +                                             vacrel->dbname, vacrel->relnamespace,
> > > +                                             vacrel->relname)));
> > >
> > >       /*
> > >        * Allocate dead_items memory using dead_items_alloc.  This handles
> >
> > One thing I am wondering about is that since we actually modify
> > vacrel->eagerness during the "success downgrade" cycle, a single
> > vacuum run could potentially produce messages with both eager vacuum
> > and normal vacuum language. I don't think that'd be a problem in the
> > above spot, but wondering if it might be elsewhere (maybe in
> > pg_stat_activity?).
>
> Great point. It's actually already an issue in the vacuum log output.
> In the new patch, I save the eagerness level at the beginning and use
> it in the logging output. Any future users of the eagerness level will
> have to figure out if they care about the original eagerness level or
> the current one.
>

In a general sense, you want to know if your vacuums are converting
avp to afp since it can explain i/o usage. In this version of the
patch, we know it is turned on based on VacEagerness, but it'd be nice
to know if it got turned off; ie. basically if we end up in
if (vacrel->eager_pages.remaining_successes == 0) maybe drop a note in the logs.

In a world where all vacuums are eager vacuums, I think you still want
the latter messaging, but I think the former would end up being noted
based on the outcome of criteria in vacuum_get_cutoffs() (primarily if
we are over the freeze limit).

Robert Treat
https://xzilla.net



Re: Eagerly scan all-visible pages to amortize aggressive vacuum

От
Melanie Plageman
Дата:
On Sat, Dec 21, 2024 at 10:28 AM Robert Treat <rob@xzilla.net> wrote:
>
> On Tue, Dec 17, 2024 at 5:51 PM Melanie Plageman
> <melanieplageman@gmail.com> wrote:
> >
> It feels to me like eager vacuums are not so much a distinct thing,
> but that, like how all vacuum do index cleanup unless told not to, all
> vacuums are optimistic that they will convert avp to afp, only we bail
> out quickly if the table is below vacuum_freeze_min_age and we throw
> the limits out if it is above autovacuum_freeze_max_age, similar to
> how we manage vacuum cost limit.

I like this framing/way of thinking about it. v4 in [1] eliminates the
concept of eager vacuums. I still use the aggressive vacuum and
non-aggressive vacuum framing. But, if we add a reloption/GUC to allow
configuring failures per region (proposed in [1]), that means more
using-facing docs and I think this way of framing it as a spectrum of
all-visible page scanning based on relfrozenxid's relationship to each
of these GUCs might be a clear way to explain eager scanning to users.

> In a general sense, you want to know if your vacuums are converting
> avp to afp since it can explain i/o usage. In this version of the
> patch, we know it is turned on based on VacEagerness, but it'd be nice
> to know if it got turned off; ie. basically if we end up in
> if (vacrel->eager_pages.remaining_successes == 0) maybe drop a note in the logs.

I've added a logging message like this.

> In a world where all vacuums are eager vacuums, I think you still want
> the latter messaging, but I think the former would end up being noted
> based on the outcome of criteria in vacuum_get_cutoffs() (primarily if
> we are over the freeze limit).

Hmm. Now that I've eliminated the concept of eager vacuums, users are
not informed whether or not eager scanning is enabled at the beginning
of vacuum. Only when eager scanning is disabled during vacuum or at
the end when they see the number of pages eagerly scanned would they
know whether or not this eager scanning happened.

- Melanie

[1] https://www.postgresql.org/message-id/CAAKRu_byJgf3wB8ukv6caXROReS1SRZsVPb7RWP%2B8qPtdDGykw%40mail.gmail.com



Re: Eagerly scan all-visible pages to amortize aggressive vacuum

От
Andres Freund
Дата:
Hi,

On 2025-01-07 15:46:26 -0500, Melanie Plageman wrote:
> For table storage options, those related to vacuum but not autovacuum
> are in the main StdRdOptions struct. Of those, some are overridden by
> VACUUM command parameters which are parsed out into the VacuumParams
> struct. Though the members of VacuumParams are initialized in
> ExecVacuum(), the storage parameter overrides are determined in
> vacuum_rel() and the final value goes in the VacuumParams struct which
> is passed all the way through to heap_vacuum_rel().
> 
> Because VacuumParams is what ultimately gets passed down to the
> table-AM specific vacuum implementation, autovacuum also initializes
> its own instance of VacuumParams in the autovac_table struct in
> table_recheck_autovac() (even though no VACUUM command parameters can
> affect autovacuum). These are overridden in vacuum_rel() as well.
> 
> Ultimately vacuum_eager_scan_max_fails is a bit different from the
> existing members of VacuumParams and StdRdOptions. It is a GUC and a
> table storage option but not a SQL command parameter -- and both the
> GUC and the table storage parameter affect both vacuum and autovacuum.
> And it doesn't need to be initialized in different ways for autovacuum
> and vacuum. In the end, I decided to follow the existing conventions
> as closely as I could.

I think that's fine. The abstractions in this area aren't exactly perfect, and
I don't think this makes it worse in any meaningful way. It's not really
different from having other heap-specific params like freeze_min_age in
VacuumParams.

Greetings,

Andres Freund



Re: Eagerly scan all-visible pages to amortize aggressive vacuum

От
Alena Rybakina
Дата:

Hi!

On 14.01.2025 00:46, Melanie Plageman wrote:
On Thu, Jan 9, 2025 at 1:24 PM Andres Freund <andres@anarazel.de> wrote:
On 2025-01-07 15:46:26 -0500, Melanie Plageman wrote:
For table storage options, those related to vacuum but not autovacuum
are in the main StdRdOptions struct. Of those, some are overridden by
VACUUM command parameters which are parsed out into the VacuumParams
struct. Though the members of VacuumParams are initialized in
ExecVacuum(), the storage parameter overrides are determined in
vacuum_rel() and the final value goes in the VacuumParams struct which
is passed all the way through to heap_vacuum_rel().

Because VacuumParams is what ultimately gets passed down to the
table-AM specific vacuum implementation, autovacuum also initializes
its own instance of VacuumParams in the autovac_table struct in
table_recheck_autovac() (even though no VACUUM command parameters can
affect autovacuum). These are overridden in vacuum_rel() as well.

Ultimately vacuum_eager_scan_max_fails is a bit different from the
existing members of VacuumParams and StdRdOptions. It is a GUC and a
table storage option but not a SQL command parameter -- and both the
GUC and the table storage parameter affect both vacuum and autovacuum.
And it doesn't need to be initialized in different ways for autovacuum
and vacuum. In the end, I decided to follow the existing conventions
as closely as I could.
I think that's fine. The abstractions in this area aren't exactly perfect, and
I don't think this makes it worse in any meaningful way. It's not really
different from having other heap-specific params like freeze_min_age in
VacuumParams.
Got it. I've left it as is, then.

Attached v6 is rebased over recent changes in the vacuum-related docs.
I've also updated the "Routine Vacuuming" section of the docs to
mention eager scanning.

I'm planning to commit 0001 (which updates the code comment at the top
of vacuumlazy.c to explain heap vacuuming) --barring any objections.


Thank you for working on this patch, without this explanation it is difficult to understand what is happening, to put it mildly. 

While working on the vacuum statistics, I'll add you a link to the thread just in case [0], I discovered some more important points that I think can be mentioned. 

The first of them is related to the fact that vacuum will not clean tuples referenced in indexes, since it was previously unable to take a cleanup lock on the index. You can look at the increment of missed_dead_tuples and vacrel->missed_dead_pages in the lazy_scan_noprune function. That is, these are absolutely dead tuples for vacuum that it simply could not clean. 

Secondly, I think it is worth mentioning the moment when vacuum urgently starts cleaning the heap relationship when there is a threat of a wraparound round. At this point, it skips the index processing phase and heap relationship truncation. 

Thirdly, FreeSpaceMap is updated every time after the complete completion of index and table cleaning (after the lazy_vacuum function) and after table heap pruning stage (the lazy_scan_prune function). Maybe you should add it.

I think it is possible to add additional information about parallel vacuum - firstly, workers are generated for each index, which perform their cleaning. Some indexes are defined by vacuum as unsafe for processing by a parallel worker and can be processed only by a postmaster (or leader). These are indexes that do not support parallel bulk-deletion, parallel cleanup (see parallel_vacuum_index_is_parallel_safe function). 

I noticed an interesting point, but I don’t know if it is necessary to write about it, but for me it was not obvious and informative that the buffer and wal statistics are thrown by the indexes that were processed by workers and are thrown separately in (pvs->buffer_usage, pvs->wal_usage).

[0] https://commitfest.postgresql.org/51/5012/

-- 
Regards,
Alena Rybakina
Postgres Professional

Re: Eagerly scan all-visible pages to amortize aggressive vacuum

От
Marcos Pegoraro
Дата:
On 14.01.2025 22:51, Melanie Plageman wrote:

There is a typo on committed vacuumlazy.c file
 * If the TID store fills up in phase I, vacuum suspends phase I, proceeds to
 * phases II and II, cleaning up the dead tuples referenced in the current TID
 * store. This empties the TID store resumes phase I.

Probably you meant phases II and III

regards
Marcos

Re: Eagerly scan all-visible pages to amortize aggressive vacuum

От
Melanie Plageman
Дата:
On Wed, Jan 15, 2025 at 2:36 PM Marcos Pegoraro <marcos@f10.com.br> wrote:
>>
>> On 14.01.2025 22:51, Melanie Plageman wrote:
>
>
> There is a typo on committed vacuumlazy.c file
>  * If the TID store fills up in phase I, vacuum suspends phase I, proceeds to
>  * phases II and II, cleaning up the dead tuples referenced in the current TID
>  * store. This empties the TID store resumes phase I.
>
> Probably you meant phases II and III

Oops. Thanks for catching this. I'll fix that and the following
sentence in the main patch proposed in this thread to avoid extra
noise.

- Melanie



Re: Eagerly scan all-visible pages to amortize aggressive vacuum

От
Melanie Plageman
Дата:
On Wed, Jan 15, 2025 at 12:08 PM Alena Rybakina
<a.rybakina@postgrespro.ru> wrote:
>
> This is interesting, but I think it might belong as commentary in
> vacuumparallel.c instead.
>
> I added some description about it, I hope it is fine. I attached vacuum_description.diff

Alena, thanks again for your review. I pushed the part that we worked
on so far. I didn't end up adding in your proposed addition to
vacuumparallel.c. Honestly, I am not as familiar with the parallel
vacuuming code, so I wasn't sure what information rises to the level
of importance of being in the comment at the top of the file.

- Melanie



Re: Eagerly scan all-visible pages to amortize aggressive vacuum

От
Alena Rybakina
Дата:
On 15.01.2025 23:09, Melanie Plageman wrote:
On Wed, Jan 15, 2025 at 12:08 PM Alena Rybakina
<a.rybakina@postgrespro.ru> wrote:
This is interesting, but I think it might belong as commentary in
vacuumparallel.c instead.

I added some description about it, I hope it is fine. I attached vacuum_description.diff
Alena, thanks again for your review. I pushed the part that we worked
on so far. I didn't end up adding in your proposed addition to
vacuumparallel.c. Honestly, I am not as familiar with the parallel
vacuuming code, so I wasn't sure what information rises to the level
of importance of being in the comment at the top of the file.

Yes, I see. ok)
-- 
Regards,
Alena Rybakina
Postgres Professional

Re: Eagerly scan all-visible pages to amortize aggressive vacuum

От
Robert Treat
Дата:
On Wed, Jan 15, 2025 at 3:56 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
> On Wed, Jan 15, 2025 at 3:03 PM Melanie Plageman
> <melanieplageman@gmail.com> wrote:
> >
> > On Wed, Jan 15, 2025 at 2:36 PM Marcos Pegoraro <marcos@f10.com.br> wrote:
> > >>
> > > There is a typo on committed vacuumlazy.c file
> > >  * If the TID store fills up in phase I, vacuum suspends phase I, proceeds to
> > >  * phases II and II, cleaning up the dead tuples referenced in the current TID
> > >  * store. This empties the TID store resumes phase I.
> > >
> > > Probably you meant phases II and III
> >
> > Oops. Thanks for catching this. I'll fix that and the following
> > sentence in the main patch proposed in this thread to avoid extra
> > noise.
>
> Attached v8 is rebased and has various comment cleanups.
>

Hey Melanie, took a walk through this version, some minor thoughts below.

--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -9128,9 +9128,10 @@ COPY postgres_log FROM
'/full/path/to/logfile.csv' WITH csv;
         <command>VACUUM</command> may scan and fail to set all-frozen in the
         visibility map before disabling eager scanning until the next region
         (currently 4096 blocks) of the relation. A value of 0 disables eager
-        scanning altogether. The default is 128. This parameter can be set in
-        postgresql.conf or on the server command line but is overridden for
-        individual tables by changing the
+        scanning altogether. The default is 128. This parameter can only be set
+        in the <filename>postgresql.conf</filename> file or on the server
+        command line; but the setting can be overridden for individual tables
+        by changing the
         <link linkend="reloption-vacuum-eager-scan-max-fails">corresponding
table storage parameter</link>.
+        For more information see <xref linkend="vacuum-for-wraparound"/>.
        </para>
       </listitem>

The above aligns with the boiler plate text we usually use for these
types of settings, though typically we just link to the table
parameter section, but I left the direct link you included, since it
seems like a nice addition (I'm contemplating doing that for all such
params, but that should be done separately)

As a complete aside, I think we should re-order these sections to have
freezing first, then cost delay, then autovac stuff, since I feel like
most people learn about vacuum first, then build on that with
autovacuum, not to mention the general priority of managing
wraparound. Granted, that's not directly germane to eager scanning.


--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -638,9 +638,9 @@ SELECT datname, age(datfrozenxid) FROM pg_database;
    </tip>

    <para>
-    <command>VACUUM</command> mostly scans pages that have been modified since
-    the last vacuum. Some all-visible but not all-frozen pages are eagerly
-    scanned to try and freeze them. But the
+    <command>VACUUM</command> typically scans pages that have been
modified since
+    the last vacuum. While some all-visible but not all-frozen pages
are eagerly
+    scanned to try and freeze them, the
     <structfield>relfrozenxid</structfield> can only be advanced when every
     page of the table that might contain unfrozen XIDs is scanned.  This

above is an attempt to make this wording less awkward.


wrt this portion of src/backend/access/heap/vacuumlazy.c
+ * pages at the beginning of the vacuum. Once the success cap has been hit,
+ * eager scanning is permanently disabled.
+ *
Maybe this is obvious enough to the reader, but should we change this
to something like "eager scanning is disabled for the remainder of
this vacuum" or similar? I guess I'm trying to make clear that it
isn't disabled "permanently" or until an aggressive vacuum run
completes or a vacuum freeze or some other scenario; we can/will eager
scan again essentially immediately if we just run vacuum again. (It
seems obvious in the actual code, just less so in the context of the
introductory wording)


And one last bit of overthinking... in src/backend/access/heap/vacuumlazy.c
+   if (vacrel->rel_pages < VACUUM_EAGER_SCAN_REGION_SIZE)
+       return;
It's easy to agree that anything less than the region size doesn't
make sense to eager scan, but I wondered about the "region_size +1"
scenario; essentially cases where we are not very much larger in total
than a single region, where it also feels like there isn't much gain
from eager scanning. Perhaps we should wait until 2x region size, in
which case we'd at least start in a scenario where the bucketing is
more equal?


Robert Treat
https://xzilla.net



Re: Eagerly scan all-visible pages to amortize aggressive vacuum

От
Robert Haas
Дата:
On Wed, Jan 22, 2025 at 5:48 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
> Circling back to benchmarking, I've been running the most adversarial
> benchmarks I could devise and can share a bit of what I've found.
>
> I created a "hot tail" benchmark where 16 clients insert some data and
> then update some data  older than what they just inserted but still
> towards the end of the relation. The adversarial part is that I bulk
> delete all the data older than X hours where X hours is always after
> the data is eligible to be frozen but before it would be aggressively
> vacuumed.
>
> That means that there are a bunch of pages that will never be frozen
> on master but are frozen with the patch -- wasting vacuum resources. I
> tuned vacuum_freeze_min_age and vacuum_freeze_table_age and picked the
> DELETE window to specifically have this behavior.
>
> With my patch, I do see a 15-20% increase in the total time spent
> vacuuming over the course of the multi-hour benchmark. (I only see a
> 1% increase in the total WAL volume, though.)

How much time is that in absolute terms? If the benchmark runs for 3
hours and during that time we have 1 autovacuum worker active for 30
minutes out of those 3 hours, that is different than if we have 5
autovacuum workers active nearly all the time. Or, maybe a clearer way
to put it, what percentage of the total work was the VACUUM work? If
the total work was $100 and the VACUUM work accounted for $80 of that,
then a 15-20% increase is pretty significant; if the total work was
$100 and the VACUUM work accounted for $5 of that, then a 15-20%
increase matters a lot less.

But tentatively I'm inclined to say this is fine. Some of the work
that VACUUM is doing is probably work that otherwise would have needed
to happen in the foreground. For instance, the speedup in DELETEs that
you observed might be not only because the pages are cached but
perhaps also because DELETE doesn't need to do any non-trivial
visibility checks. Also, users don't have to anti-optimize their
configuration settings for their workload as you did when constructing
the adversarial case.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Eagerly scan all-visible pages to amortize aggressive vacuum

От
Robert Haas
Дата:
On Wed, Jan 22, 2025 at 5:48 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
> Thanks! Attached v9 incorporates all your suggested changes.

I'm not exactly sure what to do about it, but I feel like the
documentation of vacuum_eager_scan_max_fails is going to be
incomprehensible to someone who doesn't already have a deep knowledge
of how this patch works.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Eagerly scan all-visible pages to amortize aggressive vacuum

От
Robert Haas
Дата:
On Thu, Jan 23, 2025 at 1:31 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Wed, Jan 22, 2025 at 5:48 PM Melanie Plageman
> <melanieplageman@gmail.com> wrote:
> > Thanks! Attached v9 incorporates all your suggested changes.
>
> I'm not exactly sure what to do about it, but I feel like the
> documentation of vacuum_eager_scan_max_fails is going to be
> incomprehensible to someone who doesn't already have a deep knowledge
> of how this patch works.

[ hit "Send" WAY too early ]

There's a lot of magic numbers here: 4096 blocks per region, 128 fails
per region, some other number of successes per region. I don't know
why this particular one is configurable and none of the others are.
I'm not saying it's the wrong decision, but it's not clear to me what
the reasoning is.

Also, the units of this parameter are pretty terrible. Like, it could
be measured in some unit that involves bytes or kilobytes or
megabytes, or, maybe better, it could be measured as a percentage.
Instead of either of those things, it's an integer that only makes
sense in reference to another (hard-coded, magical) integer. If you
made this a percentage, then you wouldn't necessarily even have to
tell people that the magic internal number is 4096. Or at least that
knowledge would be a lot less critical.

+    <command>VACUUM</command> typically scans pages that have been
+    modified since the last vacuum. While some all-visible but not
+    all-frozen pages are eagerly scanned to try and freeze them, the
+    <structfield>relfrozenxid</structfield> can only be advanced when
+    every page of the table that might contain unfrozen XIDs is scanned.

The phrase "While some all-visible but not all-frozen pages are
eagerly scanned to try and freeze them" seems odd here, because what
follows is true either way. This is like saying "When it's Tuesday, I
get hungry if I don't eat any food."

+ /*
+ * A normal vacuum that has failed to freeze too many eagerly scanned
+ * blocks in a row suspends eager scanning. next_eager_scan_region_start
+ * is the block number of the first block eligible for resumed eager
+ * scanning.
+ *
+ * When eager scanning is permanently disabled, either initially
+ * (including for aggressive vacuum) or due to hitting the success limit,
+ * this is set to InvalidBlockNumber.
+ */

This and the other comments for the new LVRelState members are quite
long, but I really like them. Arguably we ought to have more of this
kind of thing.

+ /*
+ * We only want to enable eager scanning if we are likely to be able to
+ * freeze some of the pages in the relation. We can freeze tuples older
+ * than the visibility horizon calculated at the beginning of vacuum, but
+ * we are only guaranteed to freeze them if at least one tuple on the page
+ * precedes the freeze limit or multixact cutoff (calculated from
+ * vacuum_[multixact_]freeze_min_age). So, if the oldest unfrozen xid
+ * (relfrozenxid/relminmxid) does not precede the freeze cutoff, we aren't
+ * likely to freeze many tuples.
+ */

I think this could be clearer. And I wonder if "We can freeze tuples
older" is actually a typo for "We can freeze tuples newer", because
otherwise I can't make sense of the rest of the sentence. Right now it
seems to boil down to something like: We can freeze tuples older...but
we are only guaranteed to freeze them if they are older.

Maybe what we want to say here is something like: We only want to
enable eager scanning if we are likely to be able to freeze some of
the pages in the relation. If FreezeLimit has not advanced past
relfrozenxid, or if MultiXactCutoff has not advanced passed
relminmxid, then there's a good chance we won't be able to freeze
anything more than we already have. Success would not be impossible,
because there may be pages we didn't try to freeze previously, and
it's also possible that some XID greater than FreezeLimit has ended,
allowing for freezing. But as a heuristic, we wait until the
FreezeLimit advances, so that we won't repeatedly attempt eager
freezing while the same long-running transaction blocks progress every
time.

+ ereport(INFO,
+ (errmsg("Vacuum successfully froze %u eager scanned blocks of
\"%s.%s.%s\". Now disabling eager scanning.",

I predict that if Tom sees this, you will get complaints about both
the wording of the message, which pretty clearly does not conform to
style guidelines for a primary error message, and also about the use
of the INFO level. Allow me to suggest DEBUG1 or DEBUG2 and "disabling
eager scanning after freezing %u eagerly scanned blocks".

+ /*
+ * Aggressive vacuums cannot skip all-visible pages that are not also
+ * all-frozen. Normal vacuums with eager scanning enabled only skip
+ * such pages if they have hit the failure limit for the current eager
+ * scan region.
+ */
+ if (vacrel->aggressive ||
+ vacrel->eager_scan_remaining_fails > 0)
+ {
+ if (!vacrel->aggressive)
+ next_unskippable_eager_scanned = true;
+ break;
  }

I think this would be clearer as two separate if statements. if
(aggressive) break; if (vacrel->eager_scan_remaining_fails) {
next_unskippable_eager_scanned = true; break; }

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Eagerly scan all-visible pages to amortize aggressive vacuum

От
Melanie Plageman
Дата:
On Thu, Jan 23, 2025 at 12:16 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Wed, Jan 22, 2025 at 5:48 PM Melanie Plageman
> <melanieplageman@gmail.com> wrote:
> > Circling back to benchmarking, I've been running the most adversarial
> > benchmarks I could devise and can share a bit of what I've found.
> >
> > I created a "hot tail" benchmark where 16 clients insert some data and
> > then update some data  older than what they just inserted but still
> > towards the end of the relation. The adversarial part is that I bulk
> > delete all the data older than X hours where X hours is always after
> > the data is eligible to be frozen but before it would be aggressively
> > vacuumed.
> >
> > That means that there are a bunch of pages that will never be frozen
> > on master but are frozen with the patch -- wasting vacuum resources. I
> > tuned vacuum_freeze_min_age and vacuum_freeze_table_age and picked the
> > DELETE window to specifically have this behavior.
> >
> > With my patch, I do see a 15-20% increase in the total time spent
> > vacuuming over the course of the multi-hour benchmark. (I only see a
> > 1% increase in the total WAL volume, though.)
>
> How much time is that in absolute terms? If the benchmark runs for 3
> hours and during that time we have 1 autovacuum worker active for 30
> minutes out of those 3 hours, that is different than if we have 5
> autovacuum workers active nearly all the time. Or, maybe a clearer way
> to put it, what percentage of the total work was the VACUUM work? If
> the total work was $100 and the VACUUM work accounted for $80 of that,
> then a 15-20% increase is pretty significant; if the total work was
> $100 and the VACUUM work accounted for $5 of that, then a 15-20%
> increase matters a lot less.

So, in this case, there is only one table in question, so 1 autovacuum
worker (and up to 2 maintenance parallel workers for index vacuuming).
The duration I provided is just the absolute duration from start of
vacuum to finish -- not considering the amount of time each parallel
worker may have been working (also it includes time spent delaying).
The benchmark ran for 2.8 hours. I configured vacuum to run
frequently. In this case, master spent 47% of the total time vacuuming
and the patch spent 56%.

There was a fair amount of run-to-run variance which is down to vacuum
and checkpoint timing. Andres suggested off-list that I rerun the
benchmarks with checksums and FPIs disabled to remove some of this
variation. That, of course, won't give accurate total time numbers but
it should make the proportional increase more stable. I'll share some
results once I've done this.

> But tentatively I'm inclined to say this is fine.

Inherent in frontloading work is wasting it if it turns out the work
is unneeded. Unneeded work is from one of two sources 1) we failed to
freeze the page or 2) we succeed in freezing the page but then the
page is unfrozen before the next aggressive vacuum. Avoiding 1 would
require knowledge about the distribution of page ages throughout the
relation that we decided was too expensive to get and store. Avoiding
2 would require prescience about the future of the workload. We found
building and storing a model to make predictions like that too
complicated, error-prone, and expensive.

I think your algorithm is the best we can do if we won't do either 1
or 2. If this general kind of algorithm is the best we can do, then
the only levers we have are changing the caps for success and failure
or changing when we try to eager scan. I suspect the latter won't make
much of a difference (i.e. some ways will work better for some
workloads and worse for others). For the former, we have choices about
what we make configurable (e.g. success, failure, region size). As for
choosing the defaults, I did some experimentation, and it went pretty
much as expected -- raising the failure cap or success cap results in
more unneeded work and lowering it results in less. There are some
workloads where there is a perfect point which results in the least
unneeded work for the most benefit -- but that point is totally
different for different workloads and configurations of Postgres.

> Some of the work
> that VACUUM is doing is probably work that otherwise would have needed
> to happen in the foreground. For instance, the speedup in DELETEs that
> you observed might be not only because the pages are cached but
> perhaps also because DELETE doesn't need to do any non-trivial
> visibility checks. Also, users don't have to anti-optimize their
> configuration settings for their workload as you did when constructing
> the adversarial case.

Yes, like I hope users won't tune their vacuum_freeze_min_age very low
if they are going to delete all data older than an hour. The
benchmarks were mostly to look for unexpected interactions -- of which
the DELETE performance was one. But other things, like a massive
increase in WAL volume did not happen.

- Melanie



Re: Eagerly scan all-visible pages to amortize aggressive vacuum

От
Robert Haas
Дата:
On Fri, Jan 24, 2025 at 9:15 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:
> So, in this case, there is only one table in question, so 1 autovacuum
> worker (and up to 2 maintenance parallel workers for index vacuuming).
> The duration I provided is just the absolute duration from start of
> vacuum to finish -- not considering the amount of time each parallel
> worker may have been working (also it includes time spent delaying).
> The benchmark ran for 2.8 hours. I configured vacuum to run
> frequently. In this case, master spent 47% of the total time vacuuming
> and the patch spent 56%.

Definitely not insignificant, but I think it's OK for a worst case.
Autovacuum is a background process, so it's not like a 20% regression
on query performance.

> Inherent in frontloading work is wasting it if it turns out the work
> is unneeded. Unneeded work is from one of two sources 1) we failed to
> freeze the page or 2) we succeed in freezing the page but then the
> page is unfrozen before the next aggressive vacuum. Avoiding 1 would
> require knowledge about the distribution of page ages throughout the
> relation that we decided was too expensive to get and store. Avoiding
> 2 would require prescience about the future of the workload. We found
> building and storing a model to make predictions like that too
> complicated, error-prone, and expensive.

Well, the algorithm has guards against doing too much of (1). I think
that's really important. One of the really bad things about the AV
algorithm in general is that it will happily keep retrying VACUUM on
tables where there's no chance of removing any more tuples because no
relevant transactions have ended since the last time we vacuumed. But
this patch stops trying to do the thing that it does if we see that it
isn't working out -- and the thresholds are pretty tight. We could
make them even tighter, but it's already the case, I think, that after
a pretty modest amount of not freezing things, we stop trying to
freeze things.

As far as (2) goes, I'm open to the idea that this can be further
improved in the future, but I believe it will be really hard to do
better than looking at the time since last modification. The model you
proposed needed a fairly large amount of new statistics, and it still
couldn't handle something as simple as "half of the pages are modified
after X amount of time, and the other and are modified after X+Y
amount of time". I think what we'd really want to be able to make good
predictions is to look at a Fourier transform of the
inter-modification times -- but that would require even more detailed
data than you were gathering, and that was already pushing the limits
of what was realistic.

In short, I feel like this algorithm is more vulnerable to (2) than
(1), but I agree that we can't do much better right now. It will be
interesting to see what users think of the results, assuming this does
go forward. My intuition is that large amounts of VACUUM work that
happen when an aggressive VACUUM is triggered are so much more painful
for users than an ordinary non-aggressive VACUUM being a bit more
expensive that this should be a win overall even if the total effort
expended is a bit greater than now -- but my intuition is sometimes
wrong. I don't think we're going to find out without something being
committed, though.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Eagerly scan all-visible pages to amortize aggressive vacuum

От
Robert Haas
Дата:
On Fri, Jan 24, 2025 at 3:02 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
> This thought exercise made me realize something is wrong with my
> current patch, though. If you set the failure tolerance
> (vacuum_eager_scan_max_fails) to 0 right now, it disables eager
> scanning altogether. That might be unexpected. You would probably
> expect setting that to 0 to still allow eager scanning if it is only
> succeeding. That made me think that perhaps I should have a -1 value
> that disables eager scanning altogether and 0 just sets the failure
> tolerance to 0.

I would not expect that. I would expect setting a value of N to mean
"try until we have N failures". "Try until you have 0 failures" is
logically equivalent to "don't try".

> I think you're right. I would go with a percentage. I don't see many
> other GUCs that are percents. What would you call it? Perhaps
> vacuum_eager_scan_fail_threshold? The % of the blocks in the table
> vacuum may scan and fail to freeze.
>
> There is an issue I see with making it a percentage. The current
> default vacuum_eager_scan_max_fails is 128 out of 4096. That means you
> are allowed to scan about 3% of the blocks in the table even if you
> fail to freeze every one. I don't think there are very many reasonable
> values above 256, personally. So, it might be weird to add a
> percentage GUC value with only very low acceptable values. Part of
> this is that we are not making the success cap configurable, so that
> means that you might have lots of extra I/O if you are both failing
> and succeeding. Someone configuring this GUC might think this is
> controlling the amount of extra I/O they are incurring.

Why would it be unreasonable to set this value to 25% or even 100%? I
grant that it doesn't sound like the most prudent possible value, but
if I'm willing to fritter away my VACUUM resources to have the best
possible chance of eagerly freezing stuff, isn't that up to me? I
think setting this value to 100% would be WAY less damaging than
work_mem='1TB' or autovacuum_naptime='7d', both of which are allowed.
In fact, setting this to 100% could theoretically have no negative
consequences at all, if it so happens that no freeze failures occur.
Couldn't it even be a win, if freeze failures are common but
minimizing the impact of aggressive vacuum is of overwhelming
importance?

Looking at ConfigureNamesReal[], some existing words we use to talk
about real-valued GUCs include: cost, fraction, factor, bias,
multiplier, target, rate. Of those, I like "fraction" and "rate" best
here, because they imply a range of 0-1 rather than anything broader.
vacuum_max_eager_freeze_failure_rate? Kinda long...

> The while was meant to contrast the sentence before it about typically
> only scanning pages that have been modified since the previous vacuum.
> But, I see how that didn't work out for me. How about this:
>
> """
> <command>VACUUM</command> typically scans pages that have been
> modified since the last vacuum. However, some all-visible but not
> all-frozen pages are eagerly scanned to try and freeze them. Note that
> the <structfield>relfrozenxid</structfield> can only be advanced when
> every page of the table that might contain unfrozen XIDs is scanned.
> """

"to try and freeze them" seems a little awkward to me. The rest seems fine.

"in the hope that we can freeze them"?

That still doesn't seem great to me. Maybe it's worth expending more
text here, not sure.

> I rewrote the comment using some of your input and trying to clarify
> my initial point about opportunistic vs guaranteed freezing.
>
> """
> If FreezeLimit has not advanced past the relfrozenxid or
> MultiXactCutoff has not advanced past relminmxid, we are unlikely to
> be able to freeze anything new.
> Tuples with XIDs older than OldestXmin or MXIDs older than OldestMxact
> are technically freezable, but we won't freeze them unless some other
> criteria for opportunistic freezing is met.

It's not super clear to me what "some other criteria" means here, but
maybe it's fine.

> It is also possible than a transaction newer than the FreezeLimit has

than->that

> ended, rendering additional tuples freezable.
> As a heuristic, however, it makes sense to wait until the FreezeLimit
> or MultiXactCutoff has advanced before eager scanning.
> """
>
> > + ereport(INFO,
> > + (errmsg("Vacuum successfully froze %u eager scanned blocks of
> > \"%s.%s.%s\". Now disabling eager scanning.",
> >
> > I predict that if Tom sees this, you will get complaints about both
> > the wording of the message, which pretty clearly does not conform to
> > style guidelines for a primary error message, and also about the use
> > of the INFO level. Allow me to suggest DEBUG1 or DEBUG2 and "disabling
> > eager scanning after freezing %u eagerly scanned blocks".
>
> I've used your wording. Just for future reference, are the style
> guidelines I was violating the capitalization and punctuation? Are
> these documented somewhere? Also, what is a primary error message?
> INFO level? Or ones that use ereport and are translated? I looked at
> other messages and saw that they don't capitalize the first word or
> use punctuation, so I assume that those were problems.

Yes. Primary error messages, i.e. errmsg(), are not capitalized and
punctuated, unlike errdetail() and errhint() messages, which are.

See https://www.postgresql.org/docs/current/error-style-guide.html

INFO level is used for VERY few things. I can't tell you off the top
of my head when it's appropriate, but I think the answer is "almost
never".

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Eagerly scan all-visible pages to amortize aggressive vacuum

От
Robert Treat
Дата:
On Fri, Jan 24, 2025 at 3:43 PM Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Jan 24, 2025 at 3:02 PM Melanie Plageman
> <melanieplageman@gmail.com> wrote:
> >
> > > + ereport(INFO,
> > > + (errmsg("Vacuum successfully froze %u eager scanned blocks of
> > > \"%s.%s.%s\". Now disabling eager scanning.",
> > >
> > > I predict that if Tom sees this, you will get complaints about both
> > > the wording of the message, which pretty clearly does not conform to
> > > style guidelines for a primary error message, and also about the use
> > > of the INFO level. Allow me to suggest DEBUG1 or DEBUG2 and "disabling
> > > eager scanning after freezing %u eagerly scanned blocks".
> >
> > I've used your wording. Just for future reference, are the style
> > guidelines I was violating the capitalization and punctuation? Are
> > these documented somewhere? Also, what is a primary error message?
> > INFO level? Or ones that use ereport and are translated? I looked at
> > other messages and saw that they don't capitalize the first word or
> > use punctuation, so I assume that those were problems.
>
> Yes. Primary error messages, i.e. errmsg(), are not capitalized and
> punctuated, unlike errdetail() and errhint() messages, which are.
>
> See https://www.postgresql.org/docs/current/error-style-guide.html
>
> INFO level is used for VERY few things. I can't tell you off the top
> of my head when it's appropriate, but I think the answer is "almost
> never".
>

Maybe, but one of the areas that INFO is used for is providing
additional details in VACUUM VERBOSE output, and this seems like it
would be pretty useful information to have if you are trying to
discern changes in i/o rate during a vacuum, or trying to tune the
failure rate setting, or several other related fields (including
automated capture by tools like pganalyze), so I believe INFO is the
right choice for this.


Robert Treat
https://xzilla.net



Re: Eagerly scan all-visible pages to amortize aggressive vacuum

От
Robert Treat
Дата:
On Fri, Jan 24, 2025 at 3:43 PM Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Jan 24, 2025 at 3:02 PM Melanie Plageman
> <melanieplageman@gmail.com> wrote:
> > This thought exercise made me realize something is wrong with my
> > current patch, though. If you set the failure tolerance
> > (vacuum_eager_scan_max_fails) to 0 right now, it disables eager
> > scanning altogether. That might be unexpected. You would probably
> > expect setting that to 0 to still allow eager scanning if it is only
> > succeeding. That made me think that perhaps I should have a -1 value
> > that disables eager scanning altogether and 0 just sets the failure
> > tolerance to 0.
>
> I would not expect that. I would expect setting a value of N to mean
> "try until we have N failures". "Try until you have 0 failures" is
> logically equivalent to "don't try".
>

Hmm... isn't "don't try" really more logically equivalent to "stop
after zero failures", with "try until zero failures" more of the
inverse... more like "never stop never stopping"?

> > I think you're right. I would go with a percentage. I don't see many
> > other GUCs that are percents. What would you call it? Perhaps
> > vacuum_eager_scan_fail_threshold? The % of the blocks in the table
> > vacuum may scan and fail to freeze.
> >
> > There is an issue I see with making it a percentage. The current
> > default vacuum_eager_scan_max_fails is 128 out of 4096. That means you
> > are allowed to scan about 3% of the blocks in the table even if you
> > fail to freeze every one. I don't think there are very many reasonable
> > values above 256, personally. So, it might be weird to add a
> > percentage GUC value with only very low acceptable values. Part of
> > this is that we are not making the success cap configurable, so that
> > means that you might have lots of extra I/O if you are both failing
> > and succeeding. Someone configuring this GUC might think this is
> > controlling the amount of extra I/O they are incurring.
>
> Why would it be unreasonable to set this value to 25% or even 100%? I
> grant that it doesn't sound like the most prudent possible value, but
> if I'm willing to fritter away my VACUUM resources to have the best
> possible chance of eagerly freezing stuff, isn't that up to me? I
> think setting this value to 100% would be WAY less damaging than
> work_mem='1TB' or autovacuum_naptime='7d', both of which are allowed.
> In fact, setting this to 100% could theoretically have no negative
> consequences at all, if it so happens that no freeze failures occur.
> Couldn't it even be a win, if freeze failures are common but
> minimizing the impact of aggressive vacuum is of overwhelming
> importance?
>

Yeah, I don't see much reason to be concerned about this being a foot
gun; I don't think there is a way to configure it to be more
disruptive than the folks I have seen running vacuum freeze in cron
jobs :-)

And I do think there is potential upside; we've mostly talked about
this in cases that use settings close to the defaults, but I think
there is a significant number of people who use numbers quite
different from the defaults (specifically, increasing
autovac_max_freeze_age / vacuum_freeze_table_age considerably higher)
where a higher failure rate is probably worth it, especially as vacuum
time increases. (It feels very analogous to checkpoint smoothing when
put this way)

> > The while was meant to contrast the sentence before it about typically
> > only scanning pages that have been modified since the previous vacuum.
> > But, I see how that didn't work out for me. How about this:
> >
> > """
> > <command>VACUUM</command> typically scans pages that have been
> > modified since the last vacuum. However, some all-visible but not
> > all-frozen pages are eagerly scanned to try and freeze them. Note that
> > the <structfield>relfrozenxid</structfield> can only be advanced when
> > every page of the table that might contain unfrozen XIDs is scanned.
> > """
>
> "to try and freeze them" seems a little awkward to me. The rest seems fine.
>
> "in the hope that we can freeze them"?
>
> That still doesn't seem great to me. Maybe it's worth expending more
> text here, not sure.
>

Yeah, this feels like a net negative. As I see it, we're trying to
connect three not-obviously related ideas for users who are trying to
understand how this system works especially with regards to
relfrozenxid advanced (based on the section of the docs we are in),
loosely
1. you've probably heard vacuum mainly scans modified pages for cleanup
2. you might notice it also scans non-modified pages, because
sometimes it wants to freeze stuff
3. you might think, if we are freezing stuff, why doesn't relfrozenxid advance?

So that middle bit is trying to act as glue that pulls this all
together. I thought the previous version was closer, with Haas's
feedback I might go with something more like this:

    <para>
-    <command>VACUUM</command> normally only scans pages that have been modified
-    since the last vacuum, but
<structfield>relfrozenxid</structfield> can only be
-    advanced when every page of the table
-    that might contain unfrozen XIDs is scanned.  This happens when
+    While <command>VACUUM</command> typically scans pages that have been
+    modified since the last vacuum, it may also eagerly scan some
all-visible but not
+    all-frozen pages in an attempt to freeze them, but the
+    <structfield>relfrozenxid</structfield> will only be advanced when
+    every page of the table that might contain unfrozen XIDs is scanned.
+    This happens when

> > I rewrote the comment using some of your input and trying to clarify
> > my initial point about opportunistic vs guaranteed freezing.
> >
> > """
> > If FreezeLimit has not advanced past the relfrozenxid or
> > MultiXactCutoff has not advanced past relminmxid, we are unlikely to
> > be able to freeze anything new.
> > Tuples with XIDs older than OldestXmin or MXIDs older than OldestMxact
> > are technically freezable, but we won't freeze them unless some other
> > criteria for opportunistic freezing is met.
>
> It's not super clear to me what "some other criteria" means here, but
> maybe it's fine.
>
> > It is also possible that a transaction newer than the FreezeLimit has
> > ended, rendering additional tuples freezable.
> > As a heuristic, however, it makes sense to wait until the FreezeLimit
> > or MultiXactCutoff has advanced before eager scanning.
> > """
> >

Yeah, I also think the "some other criteria" part seems poor, although
tbh this all feels like a net negative to the previous wording or
Haas's suggested change, like you're describing what the code below it
does rather than what the intentions of the code are. For example, I
like the "freeze horizon" language vs explicit FreezeLimit since the
code could change even if the goal doesn't. But in lue of going back
to that wording (modulo your 1 line explanation in your email), if I
were to attempt to split the difference:

We only want to
enable eager scanning if we are likely to be able to freeze some of
the pages in the relation, which is unlikely if FreezeLimit has not
advanced past
relfrozenxid or if MultiXactCutoff has not advanced passed
relminmxid. Granted, there may be pages we didn't try to freeze
before, or some previously blocking XID greater than FreezeLimit may
have now ended (allowing for freezing), but as a heuristic we wait
until the
FreezeLimit advances to increase our chances of successful freezing.


Robert Treat
https://xzilla.net



Re: Eagerly scan all-visible pages to amortize aggressive vacuum

От
Robert Haas
Дата:
On Sun, Jan 26, 2025 at 3:11 PM Robert Treat <rob@xzilla.net> wrote:
> Hmm... isn't "don't try" really more logically equivalent to "stop
> after zero failures", with "try until zero failures" more of the
> inverse... more like "never stop never stopping"?

No. Or at least, I don't think that's how English works.

> So that middle bit is trying to act as glue that pulls this all
> together. I thought the previous version was closer, with Haas's
> feedback I might go with something more like this:
>
>     <para>
> -    <command>VACUUM</command> normally only scans pages that have been modified
> -    since the last vacuum, but
> <structfield>relfrozenxid</structfield> can only be
> -    advanced when every page of the table
> -    that might contain unfrozen XIDs is scanned.  This happens when
> +    While <command>VACUUM</command> typically scans pages that have been
> +    modified since the last vacuum, it may also eagerly scan some
> all-visible but not
> +    all-frozen pages in an attempt to freeze them, but the
> +    <structfield>relfrozenxid</structfield> will only be advanced when
> +    every page of the table that might contain unfrozen XIDs is scanned.
> +    This happens when

I like that.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Eagerly scan all-visible pages to amortize aggressive vacuum

От
Masahiko Sawada
Дата:
On Mon, Jan 27, 2025 at 9:45 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:
>
> On Fri, Jan 24, 2025 at 3:43 PM Robert Haas <robertmhaas@gmail.com> wrote:
> >
> > On Fri, Jan 24, 2025 at 3:02 PM Melanie Plageman
> > <melanieplageman@gmail.com> wrote:
> > > I think you're right. I would go with a percentage. I don't see many
> > > other GUCs that are percents. What would you call it? Perhaps
> > > vacuum_eager_scan_fail_threshold? The % of the blocks in the table
> > > vacuum may scan and fail to freeze.
> > >
> > > There is an issue I see with making it a percentage. The current
> > > default vacuum_eager_scan_max_fails is 128 out of 4096. That means you
> > > are allowed to scan about 3% of the blocks in the table even if you
> > > fail to freeze every one. I don't think there are very many reasonable
> > > values above 256, personally. So, it might be weird to add a
> > > percentage GUC value with only very low acceptable values. Part of
> > > this is that we are not making the success cap configurable, so that
> > > means that you might have lots of extra I/O if you are both failing
> > > and succeeding. Someone configuring this GUC might think this is
> > > controlling the amount of extra I/O they are incurring.
> >
> > Why would it be unreasonable to set this value to 25% or even 100%? I
> > grant that it doesn't sound like the most prudent possible value, but
> > if I'm willing to fritter away my VACUUM resources to have the best
> > possible chance of eagerly freezing stuff, isn't that up to me? I
> > think setting this value to 100% would be WAY less damaging than
> > work_mem='1TB' or autovacuum_naptime='7d', both of which are allowed.
> > In fact, setting this to 100% could theoretically have no negative
> > consequences at all, if it so happens that no freeze failures occur.
> > Couldn't it even be a win, if freeze failures are common but
> > minimizing the impact of aggressive vacuum is of overwhelming
> > importance?
> >
> > Looking at ConfigureNamesReal[], some existing words we use to talk
> > about real-valued GUCs include: cost, fraction, factor, bias,
> > multiplier, target, rate. Of those, I like "fraction" and "rate" best
> > here, because they imply a range of 0-1 rather than anything broader.
> > vacuum_max_eager_freeze_failure_rate? Kinda long...
>
> attached v11 uses a fraction with this name. It follows the
> conventions and I find it descriptive.
>
> Changing the configuration to a fraction does mean that quite a few of
> the comments are different in this version. I think the description of
> the guc in
> config.sgml could use some review in particular. I tried to make it
> clear that the percentage is not the maximum number of blocks that
> could be eager scanned, because you may also eagerly scan blocks and
> succeed.
>
> Other note: I noticed AutoVacOpts that are floating point numbers (like
> vacuum_cost_delay) are float8s but their associated GUCs (like
> autovacuum_vacuum_cost_delay) are doubles. These are going to be equivalent,
> but I wanted to note the inconsistency in case I was missing something.
>
> > > """
> > > <command>VACUUM</command> typically scans pages that have been
> > > modified since the last vacuum. However, some all-visible but not
> > > all-frozen pages are eagerly scanned to try and freeze them. Note that
> > > the <structfield>relfrozenxid</structfield> can only be advanced when
> > > every page of the table that might contain unfrozen XIDs is scanned.
> > > """
> >
> > "to try and freeze them" seems a little awkward to me. The rest seems fine.
> >
> > "in the hope that we can freeze them"?
> >
> > That still doesn't seem great to me. Maybe it's worth expending more
> > text here, not sure.
>
> Given this and Treat's input from a different mail:
>
> > Yeah, this feels like a net negative. As I see it, we're trying to
> > connect three not-obviously related ideas for users who are trying to
> > understand how this system works especially with regards to
> > relfrozenxid advanced (based on the section of the docs we are in),
> > loosely
> > 1. you've probably heard vacuum mainly scans modified pages for cleanup
> > 2. you might notice it also scans non-modified pages, because
> > sometimes it wants to freeze stuff
> > 3. you might think, if we are freezing stuff, why doesn't relfrozenxid advance?
> >
> > So that middle bit is trying to act as glue that pulls this all
> > together. I thought the previous version was closer, with Haas's
> > feedback I might go with something more like this:
>
> I've mostly used his suggested wording in attached v11.
>
> > > I rewrote the comment using some of your input and trying to clarify
> > > my initial point about opportunistic vs guaranteed freezing.
> > >
> > > """
> > > If FreezeLimit has not advanced past the relfrozenxid or
> > > MultiXactCutoff has not advanced past relminmxid, we are unlikely to
> > > be able to freeze anything new.
> > > Tuples with XIDs older than OldestXmin or MXIDs older than OldestMxact
> > > are technically freezable, but we won't freeze them unless some other
> > > criteria for opportunistic freezing is met.
> >
> > It's not super clear to me what "some other criteria" means here, but
> > maybe it's fine.
>
> Given this thought and Treat's feedback below
>
> > Yeah, I also think the "some other criteria" part seems poor, although
> > tbh this all feels like a net negative to the previous wording or
> > Haas's suggested change, like you're describing what the code below it
> > does rather than what the intentions of the code are. For example, I
> > like the "freeze horizon" language vs explicit FreezeLimit since the
> > code could change even if the goal doesn't. But in lue of going back
> > to that wording (modulo your 1 line explanation in your email), if I
> > were to attempt to split the difference:
> >
> > We only want to
> > enable eager scanning if we are likely to be able to freeze some of
> > the pages in the relation, which is unlikely if FreezeLimit has not
> > advanced past
> > relfrozenxid or if MultiXactCutoff has not advanced passed
> > relminmxid. Granted, there may be pages we didn't try to freeze
> > before, or some previously blocking XID greater than FreezeLimit may
> > have now ended (allowing for freezing), but as a heuristic we wait
> > until the
> > FreezeLimit advances to increase our chances of successful freezing.
>
> I've updated the proposed text to mostly be in line with what he suggested.
>
> > > ended, rendering additional tuples freezable.
> > > As a heuristic, however, it makes sense to wait until the FreezeLimit
> > > or MultiXactCutoff has advanced before eager scanning.
> > > """
> > >
> > > > + ereport(INFO,
> > > > + (errmsg("Vacuum successfully froze %u eager scanned blocks of
> > > > \"%s.%s.%s\". Now disabling eager scanning.",
> > > >
> > > > I predict that if Tom sees this, you will get complaints about both
> > > > the wording of the message, which pretty clearly does not conform to
> > > > style guidelines for a primary error message, and also about the use
> > > > of the INFO level. Allow me to suggest DEBUG1 or DEBUG2 and "disabling
> > > > eager scanning after freezing %u eagerly scanned blocks".
> > >
> > > I've used your wording. Just for future reference, are the style
> > > guidelines I was violating the capitalization and punctuation? Are
> > > these documented somewhere? Also, what is a primary error message?
> > > INFO level? Or ones that use ereport and are translated? I looked at
> > > other messages and saw that they don't capitalize the first word or
> > > use punctuation, so I assume that those were problems.
> >
> > Yes. Primary error messages, i.e. errmsg(), are not capitalized and
> > punctuated, unlike errdetail() and errhint() messages, which are.
> >
> > See https://www.postgresql.org/docs/current/error-style-guide.html
> >
> > INFO level is used for VERY few things. I can't tell you off the top
> > of my head when it's appropriate, but I think the answer is "almost
> > never".
>
> I have changed it to DEBUG2. I started it as INFO because the other vacuum
> messages about whether or not the vacuum is an aggressive vacuum were at INFO
> level.I don't know what a user might prefer. Treat said this downthread:
>
> > Maybe, but one of the areas that INFO is used for is providing
> > additional details in VACUUM VERBOSE output, and this seems like it
> > would be pretty useful information to have if you are trying to
> > discern changes in i/o rate during a vacuum, or trying to tune the
> > failure rate setting, or several other related fields (including
> > automated capture by tools like pganalyze), so I believe INFO is the
> > right choice for this.
>
> I'm happy to go either way. I don't want users mad about verbosity, but if they
> think it's helpful, then that seems good.
>

Thank you for updating the patch. I was reviewing the v10 patch and
had some comments. I believe these comments are still valid for v11,
but please ignore them if outdated.

+       if (TransactionIdIsNormal(vacrel->cutoffs.relfrozenxid) &&
+               TransactionIdPrecedesOrEquals(vacrel->cutoffs.relfrozenxid,
+
   vacrel->cutoffs.FreezeLimit))
+               oldest_unfrozen_before_cutoff = true;
+
+       if (!oldest_unfrozen_before_cutoff &&
+               MultiXactIdIsValid(vacrel->cutoffs.relminmxid) &&
+               MultiXactIdPrecedesOrEquals(vacrel->cutoffs.relminmxid,
+
 vacrel->cutoffs.MultiXactCutoff))
+               oldest_unfrozen_before_cutoff = true;

Given that our freeze check strictly checks if an xid is older than
the cutoff (exclusive bound), I think we should check if the
relfrozenxid and relminmxid strictly precede the cutoff values.

---
        if (*was_eager_scanned)
            vacrel->eager_scanned_pages++;

How about incrementing this counter near the place where incrementing
scanned_pages (i.e., at the beginning of the loop of
heap_vac_scan_next_block())? It would make it easy to understand the
difference between eager_scanned_pages and scanned_pages.

---
+ * No vm_page_frozen output parameter (like what is passed to
+ * lazy_scan_prune()) is passed here because empty pages are always frozen and
+ * thus could never be eagerly scanned.

The last part "thus could never be eagerly scanned" confused me a bit;
IIUC we count all pages that are scanned because of this new eager
scan feature as "eagerly scanned pages". We increment
eager_scanned_pages counter even if the page is either new or empty.
This fact seems to contradict the sentence "empty pages could never be
eagerly scanned".

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Eagerly scan all-visible pages to amortize aggressive vacuum

От
Robert Treat
Дата:
On Mon, Jan 27, 2025 at 12:45 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
>
> On Fri, Jan 24, 2025 at 3:43 PM Robert Haas <robertmhaas@gmail.com> wrote:
> >
> > On Fri, Jan 24, 2025 at 3:02 PM Melanie Plageman
> > <melanieplageman@gmail.com> wrote:
> attached v11 uses a fraction with this name. It follows the
> conventions and I find it descriptive.
>
> Changing the configuration to a fraction does mean that quite a few of
> the comments are different in this version. I think the description of
> the guc in
> config.sgml could use some review in particular. I tried to make it
> clear that the percentage is not the maximum number of blocks that
> could be eager scanned, because you may also eagerly scan blocks and
> succeed.
>

Specifies the maximum fraction of pages that <command>VACUUM</command>
may scan and <emphasis>fail</emphasis> to set all-frozen in the
visibility map before disabling eager scanning. A value of 0 disables
eager scanning altogether. The default is 0.03 (3%).

Note that when eager scanning is enabled, successful page freeze
attempts will not count against this limit, although they are
internally capped at 20% of the all-visible but not all-frozen pages
within the relation, in an effort to amortize overhead for future
aggressive vacuums.

This parameter can only be set in the
<filename>postgresql.conf</filename> file or on the server command
line; but the setting can be overridden for individual tables by
changing the <link
linkend="reloption-vacuum-max-eager-freeze-failure-rate">corresponding
table storage parameter</link>. For more information, see <xref
linkend="vacuum-for-wraparound"/>.


> Other note: I noticed AutoVacOpts that are floating point numbers (like
> vacuum_cost_delay) are float8s but their associated GUCs (like
> autovacuum_vacuum_cost_delay) are doubles. These are going to be equivalent,
> but I wanted to note the inconsistency in case I was missing something.
>

::thinking emoji::

I think what you have is right, though certainly better to have a more
qualified opinion.
On a not entirely related item, I find it interesting that you set
your max value to 1.0, but vacuum_scale_factor (and similar) set to
100.00.
I think you have it right though, what does it mean to set a table's
scale factor to 10,000%?

> > >
> > > > + ereport(INFO,
> > > > + (errmsg("Vacuum successfully froze %u eager scanned blocks of
> > > > \"%s.%s.%s\". Now disabling eager scanning.",
> > > >
> > > > I predict that if Tom sees this, you will get complaints about both
> > > > the wording of the message, which pretty clearly does not conform to
> > > > style guidelines for a primary error message, and also about the use
> > > > of the INFO level. Allow me to suggest DEBUG1 or DEBUG2 and "disabling
> > > > eager scanning after freezing %u eagerly scanned blocks".
> > >
> > > I've used your wording. Just for future reference, are the style
> > > guidelines I was violating the capitalization and punctuation? Are
> > > these documented somewhere? Also, what is a primary error message?
> > > INFO level? Or ones that use ereport and are translated? I looked at
> > > other messages and saw that they don't capitalize the first word or
> > > use punctuation, so I assume that those were problems.
> >
> > Yes. Primary error messages, i.e. errmsg(), are not capitalized and
> > punctuated, unlike errdetail() and errhint() messages, which are.
> >
> > See https://www.postgresql.org/docs/current/error-style-guide.html
> >
> > INFO level is used for VERY few things. I can't tell you off the top
> > of my head when it's appropriate, but I think the answer is "almost
> > never".
>
> I have changed it to DEBUG2. I started it as INFO because the other vacuum
> messages about whether or not the vacuum is an aggressive vacuum were at INFO
> level.I don't know what a user might prefer. Treat said this downthread:
>
> > Maybe, but one of the areas that INFO is used for is providing
> > additional details in VACUUM VERBOSE output, and this seems like it
> > would be pretty useful information to have if you are trying to
> > discern changes in i/o rate during a vacuum, or trying to tune the
> > failure rate setting, or several other related fields (including
> > automated capture by tools like pganalyze), so I believe INFO is the
> > right choice for this.
>
> I'm happy to go either way. I don't want users mad about verbosity, but if they
> think it's helpful, then that seems good.
>

If there is a configurable, people will want to tune it, and DEBUG
level messages aren't a usable solution.

Robert Treat
https://xzilla.net



Re: Eagerly scan all-visible pages to amortize aggressive vacuum

От
Masahiko Sawada
Дата:
On Mon, Jan 27, 2025 at 1:22 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
>
> On Mon, Jan 27, 2025 at 12:52 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > Thank you for updating the patch. I was reviewing the v10 patch and
> > had some comments. I believe these comments are still valid for v11,
> > but please ignore them if outdated.
>
> Thanks so much for the review!
>
> > +       if (TransactionIdIsNormal(vacrel->cutoffs.relfrozenxid) &&
> > +               TransactionIdPrecedesOrEquals(vacrel->cutoffs.relfrozenxid,
> > +
> >    vacrel->cutoffs.FreezeLimit))
> > +               oldest_unfrozen_before_cutoff = true;
> > +
> > +       if (!oldest_unfrozen_before_cutoff &&
> > +               MultiXactIdIsValid(vacrel->cutoffs.relminmxid) &&
> > +               MultiXactIdPrecedesOrEquals(vacrel->cutoffs.relminmxid,
> > +
> >  vacrel->cutoffs.MultiXactCutoff))
> > +               oldest_unfrozen_before_cutoff = true;
> >
> > Given that our freeze check strictly checks if an xid is older than
> > the cutoff (exclusive bound), I think we should check if the
> > relfrozenxid and relminmxid strictly precede the cutoff values.
>
> Makes sense. I've changed that.
>
> > ---
> >         if (*was_eager_scanned)
> >             vacrel->eager_scanned_pages++;
> >
> > How about incrementing this counter near the place where incrementing
> > scanned_pages (i.e., at the beginning of the loop of
> > heap_vac_scan_next_block())? It would make it easy to understand the
> > difference between eager_scanned_pages and scanned_pages.
>
> Right. That makes sense. I've changed that in the attached v12.
>
> > ---
> > + * No vm_page_frozen output parameter (like what is passed to
> > + * lazy_scan_prune()) is passed here because empty pages are always frozen and
> > + * thus could never be eagerly scanned.
> >
> > The last part "thus could never be eagerly scanned" confused me a bit;
> > IIUC we count all pages that are scanned because of this new eager
> > scan feature as "eagerly scanned pages". We increment
> > eager_scanned_pages counter even if the page is either new or empty.
> > This fact seems to contradict the sentence "empty pages could never be
> > eagerly scanned".
>
> Ah, so what I mean by this is that the first time an empty page is
> vacuumed, it is set all-visible and all-frozen in the VM. We only
> eagerly scan pages that are _only_ all-visible (not also all-frozen).
> So, no empty pages will ever be eligible for eager scanning. I've
> updated the comment, because I agree it was confusing. See what you
> think now.

Thank you for updating the patch! These updates look good to me.

BTW I realized that we need to update tab-complete.c too to support
the tab-completion for vacuum_max_eager_freeze_failure_rate.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Eagerly scan all-visible pages to amortize aggressive vacuum

От
Melanie Plageman
Дата:
On Tue, Jan 28, 2025 at 4:38 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
>
> Thank you for updating the patch! These updates look good to me.

Thanks for taking another look!

> BTW I realized that we need to update tab-complete.c too to support
> the tab-completion for vacuum_max_eager_freeze_failure_rate.

I'm pretty sure tab-complete.c is auto-generated now. Tab completion
for vacuum_max_eager_freeze_failure_rate worked for me without doing
anything extra.

- Melanie



Re: Eagerly scan all-visible pages to amortize aggressive vacuum

От
Melanie Plageman
Дата:
On Fri, Jan 24, 2025 at 11:20 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Fri, Jan 24, 2025 at 9:15 AM Melanie Plageman
> <melanieplageman@gmail.com> wrote:
> > So, in this case, there is only one table in question, so 1 autovacuum
> > worker (and up to 2 maintenance parallel workers for index vacuuming).
> > The duration I provided is just the absolute duration from start of
> > vacuum to finish -- not considering the amount of time each parallel
> > worker may have been working (also it includes time spent delaying).
> > The benchmark ran for 2.8 hours. I configured vacuum to run
> > frequently. In this case, master spent 47% of the total time vacuuming
> > and the patch spent 56%.
>
> Definitely not insignificant, but I think it's OK for a worst case.
> Autovacuum is a background process, so it's not like a 20% regression
> on query performance.

So, I've done a few runs with FPIs turned off to reduce run variance
caused by vacuum and checkpoint timing.
Of course this means that the amount of IO done by vacuum is very
different from a benchmark run with realistic settings.

I reran two of my simulations:

1)
- hot tail
    32 clients inserting 20 rows then updating 1 row
    duration: 3 hours

There is a small increase in total time spent vacuuming (< 10%). But
it is spread out. The first aggressive vacuum of the table is 20
seconds with the patch and 9 minutes on master. And this is not an
append-only workload -- the tail of the table (up to 200,000 rows old)
is being updated (and potentially unfrozen). So, this feels like a
win.

The insert/update P99 latency is lower (better) with the patch around
the time of the aggressive vacuum.

2)
- hot tail with delete (worst-case)
    32 clients inserting 20 rows then updating 1 row and 1
rate-limited client deleting all data before it can be aggressively
vacuumed
    durations: 3 hours

There is a 10-15% increase in total time spent vacuuming with the
patch (30-40% of total benchmark runtime is spent vacuuming).

I ran the benchmark for 4 hours as well, and for that duration I
started to see a larger increase in vacuum IO time with the patch.
However, the 4 hour run had only one aggressive vacuum (around the 2.5
hour mark), so the numbers are hard to compare because the patch is
meant to do some of the work of the next aggressive vacuum in advance.

The insert/update P99 latency is the same or lower (better) with the patch.

Next I plan to run the hottail delete benchmark with default settings
(including FPIs) with master and with the patch for about 24 hours
each. I'm hoping the long duration will smooth out some of the run
variance even with FPIs.

- Melanie



Re: Eagerly scan all-visible pages to amortize aggressive vacuum

От
Masahiko Sawada
Дата:
On Wed, Jan 29, 2025 at 6:08 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:
>
> On Tue, Jan 28, 2025 at 4:38 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> >
> > Thank you for updating the patch! These updates look good to me.
>
> Thanks for taking another look!
>
> > BTW I realized that we need to update tab-complete.c too to support
> > the tab-completion for vacuum_max_eager_freeze_failure_rate.
>
> I'm pretty sure tab-complete.c is auto-generated now. Tab completion
> for vacuum_max_eager_freeze_failure_rate worked for me without doing
> anything extra.

I missed to mention; it's about the tab completion for
vacuum_max_easter_freeze_failure storage parameter. We still need to
update the tab-complete.in.c file (I mentioned the wrong file
previously) for that.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Eagerly scan all-visible pages to amortize aggressive vacuum

От
Andres Freund
Дата:
Hi,

On 2025-01-29 14:12:52 -0500, Melanie Plageman wrote:
> From 71f32189aad510b73d221fb0478ffd916e5e5dde Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplageman@gmail.com>
> Date: Mon, 27 Jan 2025 12:23:00 -0500
> Subject: [PATCH v14] Eagerly scan all-visible pages to amortize aggressive
>  vacuum
> 
> Amortize the cost of an aggressive vacuum by eagerly scanning some
> all-visible but not all-frozen pages during normal vacuums.

I think it'd be good to explain the problem this is trying to address a bit
more (i.e. the goal is to avoid the situation which a lot of work is deferred
until an aggressive vacuum, which is then very expensive).


> Because the goal is to freeze these all-visible pages, all-visible pages
> that are eagerly scanned and set all-frozen in the visibility map are
> counted as successful eager freezes and those not frozen are considered
> failed eager freezes.

I don't really understand this sentence. "Because the goal is to freeze these
all-visible pages" doesn't really relate with the rest of the sentence.



> +     <varlistentry id="guc-vacuum-max-eager-freeze-failure-rate" xreflabel="vacuum_max_eager_freeze_failure_rate">
> +      <term><varname>vacuum_max_eager_freeze_failure_rate</varname> (<type>floating point</type>)
> +      <indexterm>
> +       <primary><varname>vacuum_max_eager_freeze_failure_rate</varname> configuration parameter</primary>
> +      </indexterm>
> +      </term>
> +      <listitem>
> +       <para>
> +        Specifies the maximum fraction of pages that
> +        <command>VACUUM</command> may scan and <emphasis>fail</emphasis> to set
> +        all-frozen in the visibility map before disabling eager scanning. A
> +        value of <literal>0</literal> disables eager scanning altogether. The
> +        default is <literal>0.03</literal> (3%).
> +       </para>

Fraction of what?


> +       <para>
> +        Note that when eager scanning is enabled, successful page freezes
> +        do not count against this limit, although they are internally
> +        capped at 20% of the all-visible but not all-frozen pages in the
> +        relation. Capping successful page freezes helps amortize the
> +        overhead across multiple normal vacuums.
> +       </para>

What does it mean that they are not counted, but are capped?


> +   <para>
> +    If a table is building up a backlog of all-visible but not all-frozen
> +    pages, a normal vacuum may choose to scan skippable pages in an effort to
> +    freeze them. Doing so decreases the number of pages the next aggressive
> +    vacuum must scan. These are referred to as <firstterm>eagerly
> +    scanned</firstterm> pages. Eager scanning can be tuned to attempt
> +    to freeze more all-visible pages by increasing
> +    <xref linkend="guc-vacuum-max-eager-freeze-failure-rate"/>. Even if eager
> +    scanning has kept the number of all-visible but not all-frozen pages to a
> +    minimum, most tables still require periodic aggressive vacuuming.
> +   </para>

Maybe mention that the aggressive vacuuming will often be cheaper than without
eager freezing, even if necessary?


> + * Normal vacuums count all-visible pages eagerly scanned as a success when
> + * they are able to set them all-frozen in the VM and as a failure when they
> + * are not able to set them all-frozen.

Maybe some more punctuation would make this more readable? Or a slight
rephrasing?


> + * Because we want to amortize the overhead of freezing pages over multiple
> + * vacuums, normal vacuums cap the number of successful eager freezes to
> + * MAX_EAGER_FREEZE_SUCCESS_RATE of the number of all-visible but not
> + * all-frozen pages at the beginning of the vacuum. Once the success cap has
> + * been hit, eager scanning is disabled for the remainder of the vacuum of the
> + * relation.

It also caps the maximum "downside" of freezing eagerly, right? Seems worth
mentioning.



> +    /*
> +     * Now calculate the eager scan start block. Start at a random spot
> +     * somewhere within the first eager scan region. This avoids eager
> +     * scanning and failing to freeze the exact same blocks each vacuum of the
> +     * relation.
> +     */

If I understand correctly, we're not really choosing a spot inside the first
eager scan region, we determine the bounds of the first region?



> @@ -930,16 +1188,21 @@ lazy_scan_heap(LVRelState *vacrel)
>      vacrel->current_block = InvalidBlockNumber;
>      vacrel->next_unskippable_block = InvalidBlockNumber;
>      vacrel->next_unskippable_allvis = false;
> +    vacrel->next_unskippable_eager_scanned = false;
>      vacrel->next_unskippable_vmbuffer = InvalidBuffer;
>  
> -    while (heap_vac_scan_next_block(vacrel, &blkno, &all_visible_according_to_vm))
> +    while (heap_vac_scan_next_block(vacrel, &blkno, &all_visible_according_to_vm,
> +                                    &was_eager_scanned))

Pedantic^3: Is past tense really appropriate? We *will* be scanning that page
in the body of the loop, right?


> @@ -1064,7 +1327,45 @@ lazy_scan_heap(LVRelState *vacrel)
>          if (got_cleanup_lock)
>              lazy_scan_prune(vacrel, buf, blkno, page,
>                              vmbuffer, all_visible_according_to_vm,
> -                            &has_lpdead_items);
> +                            &has_lpdead_items, &vm_page_frozen);
> +
> +        /*
> +         * Count an eagerly scanned page as a failure or a success.
> +         */
> +        if (was_eager_scanned)

Hm - how should pages be counted that we couldn't get a lock on?  I think
right now they'll be counted as a failure, but that doesn't seem quite right.


> diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
> index 0ab921a169b..32a1b8c46a1 100644
> --- a/src/backend/postmaster/autovacuum.c
> +++ b/src/backend/postmaster/autovacuum.c
> @@ -2826,6 +2826,12 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map,
>          tab->at_params.is_wraparound = wraparound;
>          tab->at_params.log_min_duration = log_min_duration;
>          tab->at_params.toast_parent = InvalidOid;
> +
> +        /*
> +         * Later we check reloptions for vacuum_max_eager_freeze_failure_rate
> +         * override
> +         */
> +        tab->at_params.max_eager_freeze_failure_rate = vacuum_max_eager_freeze_failure_rate;
>          tab->at_storage_param_vac_cost_limit = avopts ?
>              avopts->vacuum_cost_limit : 0;
>          tab->at_storage_param_vac_cost_delay = avopts ?

I'd mention where that is, so that a reader of that comment doesn't have to
search around...


I think this is pretty close!

Greetings,

Andres Freund



Re: Eagerly scan all-visible pages to amortize aggressive vacuum

От
Robert Haas
Дата:
On Mon, Feb 3, 2025 at 9:09 PM Andres Freund <andres@anarazel.de> wrote:
> > +     <varlistentry id="guc-vacuum-max-eager-freeze-failure-rate" xreflabel="vacuum_max_eager_freeze_failure_rate">
> > +      <term><varname>vacuum_max_eager_freeze_failure_rate</varname> (<type>floating point</type>)
> > +      <indexterm>
> > +       <primary><varname>vacuum_max_eager_freeze_failure_rate</varname> configuration parameter</primary>
> > +      </indexterm>
> > +      </term>
> > +      <listitem>
> > +       <para>
> > +        Specifies the maximum fraction of pages that
> > +        <command>VACUUM</command> may scan and <emphasis>fail</emphasis> to set
> > +        all-frozen in the visibility map before disabling eager scanning. A
> > +        value of <literal>0</literal> disables eager scanning altogether. The
> > +        default is <literal>0.03</literal> (3%).
> > +       </para>
>
> Fraction of what?

"pages", according to the text.

I'm not really sure what you think is wrong with this. There may be an
even better way to explain this but this doesn't seem bad to me.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Eagerly scan all-visible pages to amortize aggressive vacuum

От
Robert Treat
Дата:
On Tue, Feb 4, 2025 at 12:44 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
> On Mon, Feb 3, 2025 at 9:09 PM Andres Freund <andres@anarazel.de> wrote:
> >
> > On 2025-01-29 14:12:52 -0500, Melanie Plageman wrote:
> > > From 71f32189aad510b73d221fb0478ffd916e5e5dde Mon Sep 17 00:00:00 2001
> > > From: Melanie Plageman <melanieplageman@gmail.com>
>
> > > @@ -1064,7 +1327,45 @@ lazy_scan_heap(LVRelState *vacrel)
> > >               if (got_cleanup_lock)
> > >                       lazy_scan_prune(vacrel, buf, blkno, page,
> > >                                                       vmbuffer, all_visible_according_to_vm,
> > > -                                                     &has_lpdead_items);
> > > +                                                     &has_lpdead_items, &vm_page_frozen);
> > > +
> > > +             /*
> > > +              * Count an eagerly scanned page as a failure or a success.
> > > +              */
> > > +             if (was_eager_scanned)
> >
> > Hm - how should pages be counted that we couldn't get a lock on?  I think
> > right now they'll be counted as a failure, but that doesn't seem quite right.
>
> Yea, I thought that counting them as failures made sense because we
> did fail to freeze them. However, now that you mention it, we didn't
> fail to freeze them because of age, so maybe we don't want to count
> them as failures. I don't expect us to have a bunch of contended
> all-visible pages, so I think the question is about what makes it more
> clear in the code. What do you think? Should I reset was_eager_scanned
> to false if we don't get the cleanup lock?
>

I feel like if we are making the trade-off in resources to attempt
eager scanning, and we weren't making progress for whatever reason
(and in the lock failure cases, wouldn't some of those be things that
would prevent us from freezing?) then it would generally be ok to bias
towards bailing sooner rather than later.


Robert Treat
https://xzilla.net



Re: Eagerly scan all-visible pages to amortize aggressive vacuum

От
Robert Haas
Дата:
On Tue, Feb 4, 2025 at 2:57 PM Robert Treat <rob@xzilla.net> wrote:
> > Yea, I thought that counting them as failures made sense because we
> > did fail to freeze them. However, now that you mention it, we didn't
> > fail to freeze them because of age, so maybe we don't want to count
> > them as failures. I don't expect us to have a bunch of contended
> > all-visible pages, so I think the question is about what makes it more
> > clear in the code. What do you think? Should I reset was_eager_scanned
> > to false if we don't get the cleanup lock?
>
> I feel like if we are making the trade-off in resources to attempt
> eager scanning, and we weren't making progress for whatever reason
> (and in the lock failure cases, wouldn't some of those be things that
> would prevent us from freezing?) then it would generally be ok to bias
> towards bailing sooner rather than later.

Failures to acquire cleanup locks are, hopefully, rare, so it may not
matter that much. Having said that, if we skip a page because we can't
acquire a cleanup lock on it, I think that means that it was already
present in shared_buffers, which means that we didn't have to do an
I/O to get it. Since I think the point of the failure cap is mostly to
limit wasted I/O, I would lean toward NOT counting such cases as
failures.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Eagerly scan all-visible pages to amortize aggressive vacuum

От
Melanie Plageman
Дата:
On Tue, Feb 4, 2025 at 3:55 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Tue, Feb 4, 2025 at 2:57 PM Robert Treat <rob@xzilla.net> wrote:
> > > Yea, I thought that counting them as failures made sense because we
> > > did fail to freeze them. However, now that you mention it, we didn't
> > > fail to freeze them because of age, so maybe we don't want to count
> > > them as failures. I don't expect us to have a bunch of contended
> > > all-visible pages, so I think the question is about what makes it more
> > > clear in the code. What do you think? Should I reset was_eager_scanned
> > > to false if we don't get the cleanup lock?
> >
> > I feel like if we are making the trade-off in resources to attempt
> > eager scanning, and we weren't making progress for whatever reason
> > (and in the lock failure cases, wouldn't some of those be things that
> > would prevent us from freezing?) then it would generally be ok to bias
> > towards bailing sooner rather than later.
>
> Failures to acquire cleanup locks are, hopefully, rare, so it may not
> matter that much. Having said that, if we skip a page because we can't
> acquire a cleanup lock on it, I think that means that it was already
> present in shared_buffers, which means that we didn't have to do an
> I/O to get it. Since I think the point of the failure cap is mostly to
> limit wasted I/O, I would lean toward NOT counting such cases as
> failures.

I think I misspoke when I said we are unlikely to have contended
all-visible pages. I suppose it is trivial to concoct a scenario where
there are many pinned all-visible pages.

Initially I agreed with you that we shouldn't count eagerly scanned
pages we failed to freeze because we didn't get the cleanup lock as
failures. If the page is pinned in shared buffers, it is necessarily
not going to cost us a read -- which is the main overhead of failed
eager freezes.

However, if we don't count an eagerly scanned page as a failure when
we don't get the cleanup lock because we won't have incurred a read,
then why would we count any eagerly scanned page in shared buffers as
a failure? In the case that we actually tried freezing the page and
failed because it was too new, that is giving us information about
whether or not we should keep trying to freeze. So, it is not just
about doing the read but also about what the failure indicates about
the data.

Interestingly, we call heap_tuple_should_freeze() in
lazy_scan_noprune(), so we actually could tell whether or not there
are some tuples on the page old enough to trigger freezing if we had
gotten the cleanup lock. One option would be to add a new output
parameter to lazy_scan_noprune() that indicates whether or not there
were tuples with xids older than the FreezeLimit, and only if there
were not, count it as a failed eager scan.

- Melanie



Re: Eagerly scan all-visible pages to amortize aggressive vacuum

От
Robert Haas
Дата:
On Tue, Feb 4, 2025 at 5:34 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
> I think I misspoke when I said we are unlikely to have contended
> all-visible pages. I suppose it is trivial to concoct a scenario where
> there are many pinned all-visible pages.

It's hard to keep heap pages pinned for a really long time, so there
aren't likely to be a lot of them at once. Maybe async I/O will change
that somewhat, but the word in this sentence that I would emphasize is
"concoct". I don't think it's normal for there to be lots of pinned
pages just hanging around.

(It's a bit easier to have index pages that stay pinned for a long
time because an index scan can, or at least could last I checked, hold
onto a pin while the cursor was open even if we're not currently
executing the query. But for a heap page that doesn't happen, AFAIK.)

> However, if we don't count an eagerly scanned page as a failure when
> we don't get the cleanup lock because we won't have incurred a read,
> then why would we count any eagerly scanned page in shared buffers as
> a failure? In the case that we actually tried freezing the page and
> failed because it was too new, that is giving us information about
> whether or not we should keep trying to freeze. So, it is not just
> about doing the read but also about what the failure indicates about
> the data.

That's a good point, but failure to get a tuple lock isn't a judgement
either way on whether the XIDs in the page are old or new.

> Interestingly, we call heap_tuple_should_freeze() in
> lazy_scan_noprune(), so we actually could tell whether or not there
> are some tuples on the page old enough to trigger freezing if we had
> gotten the cleanup lock. One option would be to add a new output
> parameter to lazy_scan_noprune() that indicates whether or not there
> were tuples with xids older than the FreezeLimit, and only if there
> were not, count it as a failed eager scan.

Yeah, that could be done, and it doesn't sound like a bad idea.
However, I'm also unsure whether we need to add additional logic for
this case. My intuition is that it just won't happen very often. I'm
not quite confident that I'm correct about that, but if I had to
guess, my guess would be "rare scenario".

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Eagerly scan all-visible pages to amortize aggressive vacuum

От
Robert Haas
Дата:
On Wed, Feb 5, 2025 at 12:23 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
> I started getting worried thinking about this. If you have a cursor
> for select * from a table and fetch forward far enough, couldn't
> vacuum fail to get cleanup locks on a whole range of blocks?

I don't think so. A given scan holds at most 1 heap pin at a time, so
I don't see how a cursor doing a FETCH FORWARD would make anything
happen that wouldn't otherwise. I do think it's theoretically possible
that a running query and a running VACUUM could be going at exactly
the same speed so that the VACUUM always tries to pin every page at
the exact same moment the SELECT has it pinned, but in practice I
think it is very unlikely that the timing will work out like that.
Like win-the-lottery kind of unlikely.

> Yea, I guess that means the freeze limit caps wasted read I/O -- but
> you could end up doing no read I/O and still hitting the freeze fail
> limit because it is trying to detect when data is too new to be worth
> eager scanning. But that's probably the behavior we want.

The too-new case is very scary, I think. It's already a problem that
when there are long-running transactions in play, autovacuum commands
a VACUUM which finds nothing it can usefully clean up. It would be
very valuable for someone to figure out a way to prevent that, but the
relevance to this patch is that we need to try hard to avoid making
that problem substantially worse than it is already. Even if the whole
database is memory-resident, the too-new case allows for wasting a
bunch of effort that we would rather not waste; the cap here tries to
make sure that isn't any worse than it needs to be.

> > > Interestingly, we call heap_tuple_should_freeze() in
> > > lazy_scan_noprune(), so we actually could tell whether or not there
> > > are some tuples on the page old enough to trigger freezing if we had
> > > gotten the cleanup lock. One option would be to add a new output
> > > parameter to lazy_scan_noprune() that indicates whether or not there
> > > were tuples with xids older than the FreezeLimit, and only if there
> > > were not, count it as a failed eager scan.
> >
> > Yeah, that could be done, and it doesn't sound like a bad idea.
> > However, I'm also unsure whether we need to add additional logic for
> > this case. My intuition is that it just won't happen very often. I'm
> > not quite confident that I'm correct about that, but if I had to
> > guess, my guess would be "rare scenario".
>
> I've not done this in the attached v16. I have added a comment about it.
> I think not doing it is a judgment call and not a bug, right?

As of this writing, I do not believe that this is an essential part of
this patch, but it is always possible that with the benefit of
hindsight it will seem otherwise. That's just life, though. If I knew
in advance which of my decisions would later seem like mistakes, I
would make very few mistakes.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Eagerly scan all-visible pages to amortize aggressive vacuum

От
Andres Freund
Дата:
Hi,

On 2025-02-04 12:44:22 -0500, Melanie Plageman wrote:
> On Mon, Feb 3, 2025 at 9:09 PM Andres Freund <andres@anarazel.de> wrote:
> > > +     /*
> > > +      * Now calculate the eager scan start block. Start at a random spot
> > > +      * somewhere within the first eager scan region. This avoids eager
> > > +      * scanning and failing to freeze the exact same blocks each vacuum of the
> > > +      * relation.
> > > +      */
> >
> > If I understand correctly, we're not really choosing a spot inside the first
> > eager scan region, we determine the bounds of the first region?
> 
> I'm not sure I understand how those are different, but I updated the
> comment a bit. Maybe you can elaborate what you mean?

Let's assume that we use regions of 512 pages. Without randomness we'd do:

[0 .. 512) [512 .. 1024) ...


IMO, if we were to choose a spot inside the first region, we'd do:

[random(0, 512) .. 512) [512 .. 1024)


If we choose the bounds of the first region we'd do:

[0, random(0, 512)) [$prior bound .. $prior_bound + 512)

or something like that.

Greetings,

Andres Freund


PS: planning to respond to cleanup lock stuff downthread, after a bit of
exercise or tomorrow morning



Re: Eagerly scan all-visible pages to amortize aggressive vacuum

От
Andres Freund
Дата:
Hi,

On 2025-02-10 14:30:15 -0500, Robert Haas wrote:
> On Wed, Feb 5, 2025 at 12:23 PM Melanie Plageman
> <melanieplageman@gmail.com> wrote:
> > I started getting worried thinking about this. If you have a cursor
> > for select * from a table and fetch forward far enough, couldn't
> > vacuum fail to get cleanup locks on a whole range of blocks?
> 
> I don't think so. A given scan holds at most 1 heap pin at a time, so
> I don't see how a cursor doing a FETCH FORWARD would make anything
> happen that wouldn't otherwise. I do think it's theoretically possible
> that a running query and a running VACUUM could be going at exactly
> the same speed so that the VACUUM always tries to pin every page at
> the exact same moment the SELECT has it pinned, but in practice I
> think it is very unlikely that the timing will work out like that.
> Like win-the-lottery kind of unlikely.

I think the most common way to get a range of pages pinned is to have a bunch
of backends doing index nested loop joins to a fairly narrow value range in a
table. You can sometimes see (via pg_buffercache) that there are some pages
that are rather heavily pinned, but just about no other page in the table is.


> > Yea, I guess that means the freeze limit caps wasted read I/O -- but
> > you could end up doing no read I/O and still hitting the freeze fail
> > limit because it is trying to detect when data is too new to be worth
> > eager scanning. But that's probably the behavior we want.
> 
> The too-new case is very scary, I think. It's already a problem that
> when there are long-running transactions in play, autovacuum commands
> a VACUUM which finds nothing it can usefully clean up. It would be
> very valuable for someone to figure out a way to prevent that, but the
> relevance to this patch is that we need to try hard to avoid making
> that problem substantially worse than it is already. Even if the whole
> database is memory-resident, the too-new case allows for wasting a
> bunch of effort that we would rather not waste; the cap here tries to
> make sure that isn't any worse than it needs to be.

IME the most common way for this issue is anti-wraparound vacuums that can't
actually clean up anything, due to some old transaction. We'll just fire those
off over and over again, without taking into account that we still can't make
progress.

That scenario doesn't apply the same way to this patch, because by the time
you get to an anti-wrap vacuum, this patch doesn't do anything.

Of course it's possible that the dead tuple trigger causes repeated vacuums
that each can't clean up anything, but that's much less common IME.


> > > > Interestingly, we call heap_tuple_should_freeze() in
> > > > lazy_scan_noprune(), so we actually could tell whether or not there
> > > > are some tuples on the page old enough to trigger freezing if we had
> > > > gotten the cleanup lock. One option would be to add a new output
> > > > parameter to lazy_scan_noprune() that indicates whether or not there
> > > > were tuples with xids older than the FreezeLimit, and only if there
> > > > were not, count it as a failed eager scan.
> > >
> > > Yeah, that could be done, and it doesn't sound like a bad idea.
> > > However, I'm also unsure whether we need to add additional logic for
> > > this case. My intuition is that it just won't happen very often. I'm
> > > not quite confident that I'm correct about that, but if I had to
> > > guess, my guess would be "rare scenario".
> >
> > I've not done this in the attached v16. I have added a comment about it.
> > I think not doing it is a judgment call and not a bug, right?
> 
> As of this writing, I do not believe that this is an essential part of
> this patch, but it is always possible that with the benefit of
> hindsight it will seem otherwise. That's just life, though. If I knew
> in advance which of my decisions would later seem like mistakes, I
> would make very few mistakes.

I agree that we can disregard this for now.

Greetings,

Andres Freund



Re: Eagerly scan all-visible pages to amortize aggressive vacuum

От
Andres Freund
Дата:
On 2025-02-05 12:23:29 -0500, Melanie Plageman wrote:
> Attached v16 implements the logic to not count pages we failed to
> freeze because of cleanup lock contention as eager freeze failures.

That looks good to me.



Re: Eagerly scan all-visible pages to amortize aggressive vacuum

От
Melanie Plageman
Дата:
On Tue, Feb 11, 2025 at 11:31 AM Andres Freund <andres@anarazel.de> wrote:
>
> On 2025-02-05 12:23:29 -0500, Melanie Plageman wrote:
> > Attached v16 implements the logic to not count pages we failed to
> > freeze because of cleanup lock contention as eager freeze failures.
>
> That looks good to me.

Cool. Committed and marked as such in the cf app.

- Melanie



Re: Eagerly scan all-visible pages to amortize aggressive vacuum

От
Pavel Luzanov
Дата:
Hello,

I have a question about the documentation.

 From description of the vacuum_max_eager_freeze_failure_rate [1]

"Specifies the maximum number of pages (as a fraction of total pages in 
the relation) that VACUUM may scan and fail to set all-frozen in the 
visibility map before disabling eager scanning."

It seems that after reaching the vacuum_max_eager_freeze_failure_rate, 
the eager scanning of this table will be stopped. But in the source code 
[2]:

  * cap. The failure count is reset for each region of the table -- 
comprised
  * of EAGER_SCAN_REGION_SIZE blocks. In each region, we tolerate
  * vacuum_max_eager_freeze_failure_rate of EAGER_SCAN_REGION_SIZE failures
  * before suspending eager scanning until the end of the region.

 From this description, vacuum_max_eager_freeze_failure_rate limit 
applies to a region of EAGER_SCAN_REGION_SIZE pages, but not to the 
whole table.

Which one is correct? May be I'm missing something?
Do we need any clarifications in the documentation?

1. 
https://www.postgresql.org/docs/devel/runtime-config-vacuum.html#GUC-VACUUM-MAX-EAGER-FREEZE-FAILURE-RATE
2. 

https://github.com/postgres/postgres/blob/3357471cf9f5e470dfed0c7919bcf31c7efaf2b9/src/backend/access/heap/vacuumlazy.c#L82-L85

-- 
Pavel Luzanov
Postgres Professional: https://postgrespro.com




Re: Eagerly scan all-visible pages to amortize aggressive vacuum

От
Melanie Plageman
Дата:
On Thu, Jul 31, 2025 at 7:05 AM Pavel Luzanov <p.luzanov@postgrespro.ru> wrote:
>
> I have a question about the documentation.
>
>  From description of the vacuum_max_eager_freeze_failure_rate [1]
>
> "Specifies the maximum number of pages (as a fraction of total pages in
> the relation) that VACUUM may scan and fail to set all-frozen in the
> visibility map before disabling eager scanning."
>
> It seems that after reaching the vacuum_max_eager_freeze_failure_rate,
> the eager scanning of this table will be stopped. But in the source code
> [2]:
>
>   * cap. The failure count is reset for each region of the table --
> comprised
>   * of EAGER_SCAN_REGION_SIZE blocks. In each region, we tolerate
>   * vacuum_max_eager_freeze_failure_rate of EAGER_SCAN_REGION_SIZE failures
>   * before suspending eager scanning until the end of the region.
>
>  From this description, vacuum_max_eager_freeze_failure_rate limit
> applies to a region of EAGER_SCAN_REGION_SIZE pages, but not to the
> whole table.

Right, I looked at the fact that it is temporarily disabled per region
and then re-enabled as an implementation detail that would be more
confusing to document. If you eager scan and fail to freeze 3% of each
region, you'll have eager scanned and failed to freeze 3% of the
blocks in the table -- so the total cap is the same. You don't see the
eager scan counts until the end of vacuuming, so from the user's
perspective it's the same.

Documenting it means exposing more about the algorithm than seems
useful or actionable to the user.

Is there something you would do differently from a user perspective if
you knew that it was being temporarily disabled and reenabled for
specific regions of the table?

- Melanie



Re: Eagerly scan all-visible pages to amortize aggressive vacuum

От
Pavel Luzanov
Дата:
On 31.07.2025 16:31, Melanie Plageman wrote:
If you eager scan and fail to freeze 3% of each
region, you'll have eager scanned and failed to freeze 3% of the
blocks in the table -- so the total cap is the same.
Oh my God! What a stupid mistake I made. I multiplied the number of regions by 3% :-)
Documenting it means exposing more about the algorithm than seems
useful or actionable to the user.

Of course, you are right.
Thank you very much for clarifications.
-- 
Pavel Luzanov
Postgres Professional: https://postgrespro.com