Re: Should vacuum process config file reload more often

Поиск
Список
Период
Сортировка
От Melanie Plageman
Тема Re: Should vacuum process config file reload more often
Дата
Msg-id CAAKRu_YmQg_i=0DBfPzy1MMc2GjfuJFo=0rFv4fCzRQStLhjvw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Should vacuum process config file reload more often  (Daniel Gustafsson <daniel@yesql.se>)
Ответы Re: Should vacuum process config file reload more often  (Melanie Plageman <melanieplageman@gmail.com>)
Список pgsql-hackers
v17 attached does not yet fix the logging problem or variable naming
problem.

I have not changed where AutoVacuumUpdateCostLimit() is called either.

This is effectively just a round of cleanup. I hope I have managed to
address all other code review feedback so far, though some may have
slipped through the cracks.

On Wed, Apr 5, 2023 at 2:56 PM Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Apr 5, 2023 at 11:29 AM Melanie Plageman <melanieplageman@gmail.com> wrote:
> + /*
> + * Balance and update limit values for autovacuum workers. We must
> + * always do this in case the autovacuum launcher or another
> + * autovacuum worker has recalculated the number of workers across
> + * which we must balance the limit. This is done by the launcher when
> + * launching a new worker and by workers before vacuuming each table.
> + */
>
> I don't quite understand what's going on here. A big reason that I'm
> worried about this whole issue in the first place is that sometimes
> there's a vacuum going on a giant table and you can't get it to go
> fast. You want it to absorb new settings, and to do so quickly. I
> realize that this is about the number of workers, not the actual cost
> limit, so that makes what I'm about to say less important. But ... is
> this often enough? Like, the time before we move onto the next table
> could be super long. The time before a new worker is launched should
> be ~autovacuum_naptime/autovacuum_max_workers or ~20s with default
> settings, so that's not horrible, but I'm kind of struggling to
> understand the rationale for this particular choice. Maybe it's fine.

I've at least updated this comment to be more correct/less misleading.

>
> +               if (autovacuum_vac_cost_limit > 0)
> +                       VacuumCostLimit = autovacuum_vac_cost_limit;
> +               else
> +                       VacuumCostLimit = vacuum_cost_limit;
> +
> +               /* Only balance limit if no cost-related storage
> parameters specified */
> +               if (pg_atomic_unlocked_test_flag(&MyWorkerInfo->wi_dobalance))
> +                       return;
> +               Assert(VacuumCostLimit > 0);
> +
> +               nworkers_for_balance = pg_atomic_read_u32(
> +
> &AutoVacuumShmem->av_nworkersForBalance);
> +
> +               /* There is at least 1 autovac worker (this worker). */
> +               if (nworkers_for_balance <= 0)
> +                       elog(ERROR, "nworkers_for_balance must be > 0");
> +
> +               VacuumCostLimit = Max(VacuumCostLimit /
> nworkers_for_balance, 1);
>
> I think it would be better stylistically to use a temporary variable
> here and only assign the final value to VacuumCostLimit.

I tried that and thought it adding confusing clutter. If it is a code
cleanliness issue, I am willing to change it, though.

On Wed, Apr 5, 2023 at 3:04 PM Daniel Gustafsson <daniel@yesql.se> wrote:
>
> > On 5 Apr 2023, at 20:55, Robert Haas <robertmhaas@gmail.com> wrote:
>
> > Again, I don't think this is something we should try to
> > address right now under time pressure, but in the future, I think we
> > should consider ripping this behavior out.
>
> I would not be opposed to that, but I wholeheartedly agree that it's not the
> job of this patch (or any patch at this point in the cycle).
>
> > +               if (autovacuum_vac_cost_limit > 0)
> > +                       VacuumCostLimit = autovacuum_vac_cost_limit;
> > +               else
> > +                       VacuumCostLimit = vacuum_cost_limit;
> > +
> > +               /* Only balance limit if no cost-related storage
> > parameters specified */
> > +               if (pg_atomic_unlocked_test_flag(&MyWorkerInfo->wi_dobalance))
> > +                       return;
> > +               Assert(VacuumCostLimit > 0);
> > +
> > +               nworkers_for_balance = pg_atomic_read_u32(
> > +
> > &AutoVacuumShmem->av_nworkersForBalance);
> > +
> > +               /* There is at least 1 autovac worker (this worker). */
> > +               if (nworkers_for_balance <= 0)
> > +                       elog(ERROR, "nworkers_for_balance must be > 0");
> > +
> > +               VacuumCostLimit = Max(VacuumCostLimit /
> > nworkers_for_balance, 1);
> >
> > I think it would be better stylistically to use a temporary variable
> > here and only assign the final value to VacuumCostLimit.
>
> I can agree with that.  Another supertiny nitpick on the above is to not end a
> single-line comment with a period.

I have fixed this.

On Thu, Apr 6, 2023 at 2:40 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Thu, Apr 6, 2023 at 12:29 AM Melanie Plageman
> <melanieplageman@gmail.com> wrote:
> >
> > Thanks all for the reviews.
> >
> > v16 attached. I put it together rather quickly, so there might be a few
> > spurious whitespaces or similar. There is one rather annoying pgindent
> > outlier that I have to figure out what to do about as well.
> >
> > The remaining functional TODOs that I know of are:
> >
> > - Resolve what to do about names of GUC and vacuum variables for cost
> >   limit and cost delay (since it may affect extensions)
> >
> > - Figure out what to do about the logging message which accesses dboid
> >   and tableoid (lock/no lock, where to put it, etc)
> >
> > - I see several places in docs which reference the balancing algorithm
> >   for autovac workers. I did not read them in great detail, but we may
> >   want to review them to see if any require updates.
> >
> > - Consider whether or not the initial two commits should just be
> >   squashed with the third commit
> >
> > - Anything else reviewers are still unhappy with
> >
> >
> > On Wed, Apr 5, 2023 at 1:56 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Wed, Apr 5, 2023 at 5:05 AM Melanie Plageman
> > > <melanieplageman@gmail.com> wrote:
> > > >
> > > > On Tue, Apr 4, 2023 at 4:27 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > > > ---
> > > > > -                if (worker->wi_proc != NULL)
> > > > > -                        elog(DEBUG2, "autovac_balance_cost(pid=%d
> > > > > db=%u, rel=%u, dobalance=%s cost_limit=%d, cost_limit_base=%d,
> > > > > cost_delay=%g)",
> > > > > -                                 worker->wi_proc->pid,
> > > > > worker->wi_dboid, worker->wi_tableoid,
> > > > > -                                 worker->wi_dobalance ? "yes" : "no",
> > > > > -                                 worker->wi_cost_limit,
> > > > > worker->wi_cost_limit_base,
> > > > > -                                 worker->wi_cost_delay);
> > > > >
> > > > > I think it's better to keep this kind of log in some form for
> > > > > debugging. For example, we can show these values of autovacuum workers
> > > > > in VacuumUpdateCosts().
> > > >
> > > > I added a message to do_autovacuum() after calling VacuumUpdateCosts()
> > > > in the loop vacuuming each table. That means it will happen once per
> > > > table. It's not ideal that I had to move the call to VacuumUpdateCosts()
> > > > behind the shared lock in that loop so that we could access the pid and
> > > > such in the logging message after updating the cost and delay, but it is
> > > > probably okay. Though noone is going to be changing those at this
> > > > point, it still seemed better to access them under the lock.
> > > >
> > > > This does mean we won't log anything when we do change the values of
> > > > VacuumCostDelay and VacuumCostLimit while vacuuming a table. Is it worth
> > > > adding some code to do that in VacuumUpdateCosts() (only when the value
> > > > has changed not on every call to VacuumUpdateCosts())? Or perhaps we
> > > > could add it in the config reload branch that is already in
> > > > vacuum_delay_point()?
> > >
> > > Previously, we used to show the pid in the log since a worker/launcher
> > > set other workers' delay costs. But now that the worker sets its delay
> > > costs, we don't need to show the pid in the log. Also, I think it's
> > > useful for debugging and investigating the system if we log it when
> > > changing the values. The log I imagined to add was like:
> > >
> > > @@ -1801,6 +1801,13 @@ VacuumUpdateCosts(void)
> > >             VacuumCostDelay = vacuum_cost_delay;
> > >
> > >         AutoVacuumUpdateLimit();
> > > +
> > > +       elog(DEBUG2, "autovacuum update costs (db=%u, rel=%u,
> > > dobalance=%s, cost_limit=%d, cost_delay=%g active=%s failsafe=%s)",
> > > +            MyWorkerInfo->wi_dboid, MyWorkerInfo->wi_tableoid,
> > > +            pg_atomic_unlocked_test_flag(&MyWorkerInfo->wi_dobalance)
> > > ? "no" : "yes",
> > > +            VacuumCostLimit, VacuumCostDelay,
> > > +            VacuumCostDelay > 0 ? "yes" : "no",
> > > +            VacuumFailsafeActive ? "yes" : "no");
> > >     }
> > >     else
> > >     {
> >
> > Makes sense. I've updated the log message to roughly what you suggested.
> > I also realized I think it does make sense to call it in
> > VacuumUpdateCosts() -- only for autovacuum workers of course. I've done
> > this. I haven't taken the lock though and can't decide if I must since
> > they access dboid and tableoid -- those are not going to change at this
> > point, but I still don't know if I can access them lock-free...
> > Perhaps there is a way to condition it on the log level?
> >
> > If I have to take a lock, then I don't know if we should put these in
> > VacuumUpdateCosts()...
>
> I think we don't need to acquire a lock there as both values are
> updated only by workers reporting this message.

I dunno. I just don't feel that comfortable saying, oh it's okay to
access these without a lock probably. I propose we do one of the
following:

- Take a shared lock inside VacuumUpdateCosts() (it is not called on every
    call to vacuum_delay_point()) before reading from these variables.

    Pros:
        - We will log whenever there is a change to these parameters
    Cons:
        - This adds overhead in the common case when log level is < DEBUG2.
            Is there a way to check the log level before taking the lock?
        - Acquiring the lock inside the function is inconsistent with the
            pattern that some of the other autovacuum functions requiring
            locks use (they assume you have a lock if needed inside of the
            function). But, we could assert that the lock is not already held.
        - If we later decide we don't like this choice and want to move the
            logging elsewhere, it will necessarily log less frequently which
            seems like a harder change to make than logging more frequently.

- Move this logging into the loop through relations in do_autovacuum()
    and the config reload code and take the shared lock before doing the
    logging.

    Pros:
        - Seems safe and not expensive
        - Covers most of the times we would want the logging
    Cons:
        - duplicates logging in two places

> Some minor comments on 0003:
>
> +/*
> + * autovac_recalculate_workers_for_balance
> + *             Recalculate the number of workers to consider, given
> cost-related
> + *             storage parameters and the current number of active workers.
> + *
> + * Caller must hold the AutovacuumLock in at least shared mode to access
> + * worker->wi_proc.
> + */
>
> Does it make sense to add Assert(LWLockHeldByMe(AutovacuumLock)) at
> the beginning of this function?

I've added this. It is called infrequently enough to be okay, I think.


>                  /* rebalance in case the default cost parameters changed */
> -                LWLockAcquire(AutovacuumLock, LW_EXCLUSIVE);
> -                autovac_balance_cost();
> +                LWLockAcquire(AutovacuumLock, LW_SHARED);
> +                autovac_recalculate_workers_for_balance();
>                  LWLockRelease(AutovacuumLock);
>
> Do we really need to have the autovacuum launcher recalculate
> av_nworkersForBalance after reloading the config file? Since the cost
> parameters are not used inautovac_recalculate_workers_for_balance()
> the comment also needs to be updated.

Yep, almost certainly don't need this. I've removed this call to
autovac_recalculate_workers_for_balance().

> +                /*
> +                 * Balance and update limit values for autovacuum
> workers. We must
> +                 * always do this in case the autovacuum launcher or another
> +                 * autovacuum worker has recalculated the number of
> workers across
> +                 * which we must balance the limit. This is done by
> the launcher when
> +                 * launching a new worker and by workers before
> vacuuming each table.
> +                 */
> +                AutoVacuumUpdateCostLimit();
>
> I think the last sentence is not correct. IIUC recalculation of
> av_nworkersForBalance is done by the launcher after a worker finished
> and by workers before vacuuming each table.

Yes, you are right. However, I think the comment was generally
misleading and I have reworded it.

> It's not a problem of this patch, but IIUC since we don't reset
> wi_dobalance after vacuuming each table we use the last value of
> wi_dobalance for performing autovacuum items. At end of the loop for
> tables in do_autovacuum() we have the following code that explains why
> we don't reset wi_dobalance:
>
>         /*
>          * Remove my info from shared memory.  We could, but intentionally
>          * don't, unset wi_dobalance on the assumption that we are more likely
>          * than not to vacuum a table with no cost-related storage parameters
>          * next, so we don't want to give up our share of I/O for a very short
>          * interval and thereby thrash the global balance.
>          */
>         LWLockAcquire(AutovacuumScheduleLock, LW_EXCLUSIVE);
>         MyWorkerInfo->wi_tableoid = InvalidOid;
>         MyWorkerInfo->wi_sharedrel = false;
>         LWLockRelease(AutovacuumScheduleLock);
>
> Assuming we agree with that, probably we need to reset it to true
> after vacuuming all tables?

Ah, great point. I have done this.

On Thu, Apr 6, 2023 at 8:29 AM Daniel Gustafsson <daniel@yesql.se> wrote:
>
> > On 6 Apr 2023, at 08:39, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> > Also I agree with
> > where to put the log but I think the log message should start with
> > lower cases:
> >
> > +                elog(DEBUG2,
> > +                         "Autovacuum VacuumUpdateCosts(db=%u, rel=%u,
>
> In principle I agree, but in this case Autovacuum is a name and should IMO in
> userfacing messages start with capital A.

I've left this unchanged while I agonize over what to do with the
placement of the log message in general. But I am happy to keep it
uppercase.

> > +/*
> > + * autovac_recalculate_workers_for_balance
> > + *             Recalculate the number of workers to consider, given
> > cost-related
> > + *             storage parameters and the current number of active workers.
> > + *
> > + * Caller must hold the AutovacuumLock in at least shared mode to access
> > + * worker->wi_proc.
> > + */
> >
> > Does it make sense to add Assert(LWLockHeldByMe(AutovacuumLock)) at
> > the beginning of this function?
>
> That's probably not a bad idea.

Done.

> > ---
> >                 /* rebalance in case the default cost parameters changed */
> > -                LWLockAcquire(AutovacuumLock, LW_EXCLUSIVE);
> > -                autovac_balance_cost();
> > +                LWLockAcquire(AutovacuumLock, LW_SHARED);
> > +                autovac_recalculate_workers_for_balance();
> >                 LWLockRelease(AutovacuumLock);
> >
> > Do we really need to have the autovacuum launcher recalculate
> > av_nworkersForBalance after reloading the config file? Since the cost
> > parameters are not used inautovac_recalculate_workers_for_balance()
> > the comment also needs to be updated.
>
> If I understand this comment right; there was a discussion upthread that simply
> doing it in both launcher and worker simplifies the code with little overhead.
> A comment can reflect that choice though.

Yes, but now that this function no longer deals with the cost limit and
delay values itself, we can remove it.

- Melanie

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Daniel Gustafsson
Дата:
Сообщение: Re: Should vacuum process config file reload more often
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Rethinking the implementation of ts_headline()