Обсуждение: Re: Trigger more frequent autovacuums of heavy insert tables
I really appreciate all the work to make vacuum better. Anything that helps our problem of autovacuum not scaling well for large tables is a win.
+ /*
+ * Every block marked all-frozen in the VM must also be marked
+ * all-visible.
+ */
+ if (new_rel_allfrozen > new_rel_allvisible)
+ new_rel_allfrozen = new_rel_allvisible;
+
relallfrozen = classForm->relallfrozen;
relallvisible = classForm->relallvisible;
if (relallvisible > relpages)
relallvisible = relpages;
if (relallfrozen > relallvisible)
relallfrozen = relallvisible;
pcnt_unfrozen = 1 - ((float4) relallfrozen / relpages);
I'm not overly familiar with this part of the code base, but here are some questions/ideas:
+ * Every block marked all-frozen in the VM must also be marked
+ * all-visible.
+ */
+ if (new_rel_allfrozen > new_rel_allvisible)
+ new_rel_allfrozen = new_rel_allvisible;
+
Maybe tweak either the comment, or the code, as I read that comment as meaning:
if (new_rel_allfrozen > new_rel_allvisible)
new_ral_allvisible = new_rel_allfrozen;
+ /*
+ * If we are modifying relallvisible manually, it is not clear
+ * what relallfrozen value would make sense. Therefore, set it to
+ * -1, or unknown. It will be updated the next time these fields
+ * are updated.
+ */
+ replaces[ncols] = Anum_pg_class_relallfrozen;
+ values[ncols] = Int32GetDatum(-1);
+ * If we are modifying relallvisible manually, it is not clear
+ * what relallfrozen value would make sense. Therefore, set it to
+ * -1, or unknown. It will be updated the next time these fields
+ * are updated.
+ */
+ replaces[ncols] = Anum_pg_class_relallfrozen;
+ values[ncols] = Int32GetDatum(-1);
Do we need some extra checks later on when we are actually using this to prevent negative numbers in the calculations? It's only going to make pcnt_unfrozen something like 1.0001 but still might want to skip that.
In autovacuum.c, seems we could simplify some of the logic there to this?:
if (relpages > 0 && reltuples > 0) {
relallfrozen = classForm->relallfrozen;
relallvisible = classForm->relallvisible;
if (relallvisible > relpages)
relallvisible = relpages;
if (relallfrozen > relallvisible)
relallfrozen = relallvisible;
pcnt_unfrozen = 1 - ((float4) relallfrozen / relpages);
}
vacinsthresh = (float4) vac_ins_base_thresh + vac_ins_scale_factor * reltuples * pcnt_unfrozen;
vacinsthresh = (float4) vac_ins_base_thresh + vac_ins_scale_factor * reltuples * pcnt_unfrozen;
Again, I'm not clear under what circumstances will relallvisible > relpages?
Cheers,
Greg
Hi Melanie Plageman
Thank you for working on this ,Actually, there were two patches aimed at optimizing vacuum-triggered processes, and one of them reached a consensus and has been committed:https://commitfest.postgresql.org/52/5046/ , https://commitfest.postgresql.org/51/5395/, Maybe referring to the already committed patch and setting a maximum value for vacuum_max_ins_threshold would be more acceptable.
Thanks
On Thu, Feb 6, 2025 at 6:08 AM Melanie Plageman <melanieplageman@gmail.com> wrote:
On Thu, Jan 16, 2025 at 5:50 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
>
> On Thu, Jan 16, 2025 at 4:43 PM Melanie Plageman
> <melanieplageman@gmail.com> wrote:
> >
> > On Fri, Oct 25, 2024 at 11:14 AM Melanie Plageman
> > <melanieplageman@gmail.com> wrote:
> > >
> > > I've done something similar to this in attached v2.
> >
> > This needed a rebase. See attached v4.
>
> Whoops -- docs didn't build. Attached v5.
Outside of the positive performance impact of vacuuming pages before
they go cold (detailed in my first email [1]), there is also a
substantial positive effect with this patch for large tables with
substantial cold regions: fewer anti-wraparound vacuums and more
frequent normal/aggressive vacuums
With the default vacuum settings, you often see an append-only table
devolve to _only_ anti-wraparound vacuums after the first aggressive
vacuum. I ran an insert-only workload for an hour (with 32 clients and
synchronous commit off to maximize the amount of data inserted) with
the default vacuum settings. On master, after the first aggressive
vacuum, we do only anti-wraparound vacuums (and only two of these are
triggered). With the patch, after the first aggressive vacuum, 10 more
vacuums are triggered -- none of which are anti-wraparound vacuums.
I attached a chart comparing the autovacuums triggered on master vs
with the patch.
Besides the performance benefit of spreading the freezing work over
more normal vacuums (thereby disrupting foreground workloads less),
anti-wraparound vacuums are not auto canceled by DDL -- making them
more of a nuisance to users.
[1] https://www.postgresql.org/message-id/CAAKRu_aj-P7YyBz_cPNwztz6ohP%2BvWis%3Diz3YcomkB3NpYA--w%40mail.gmail.com
Hi
> We could add autovacuum_vacuum_insert_max_threshold, but with an> insert-only workload, we can expect that the cold data is being
> frozen. By calculating the threshold based on unfrozen data, we are
> effectively capping the threshold for inserted data without adding
> another guc. If any of that data is being unfrozen via updates or
> deletes, then the autovacuum_vacuum_max_threshold would apply.
> Perhaps I'm missing a case where calculating the insert threshold on
> unfrozen data would not act as a cap, in which case I could get on
> board with a guc.
Actually ,I like your solution. Even I think this formula could use that pcnt_unfrozen parameter
vacthresh = (float4) vac_base_thresh + vac_scale_factor * reltuples * pcnt_unfrozen;
Thanks
On Thu, Feb 6, 2025 at 11:42 PM Melanie Plageman <melanieplageman@gmail.com> wrote:
Attached v6 is rebased over 306dc520b9dfd60
On Wed, Feb 5, 2025 at 8:54 PM wenhui qiu <qiuwenhuifx@gmail.com> wrote:
>
> Hi Melanie Plageman
> Thank you for working on this ,Actually, there were two patches aimed at optimizing vacuum-triggered processes, and one of them reached a consensus and has been committed:https://commitfest.postgresql.org/52/5046/ , https://commitfest.postgresql.org/51/5395/, Maybe referring to the already committed patch and setting a maximum value for vacuum_max_ins_threshold would be more acceptable.
We could add autovacuum_vacuum_insert_max_threshold, but with an
insert-only workload, we can expect that the cold data is being
frozen. By calculating the threshold based on unfrozen data, we are
effectively capping the threshold for inserted data without adding
another guc. If any of that data is being unfrozen via updates or
deletes, then the autovacuum_vacuum_max_threshold would apply.
Perhaps I'm missing a case where calculating the insert threshold on
unfrozen data would not act as a cap, in which case I could get on
board with a guc.
- Melanie