Re: New GUC autovacuum_max_threshold ?

Поиск

Список

Период

Сортировка

От	Frédéric Yhuel
Тема	Re: New GUC autovacuum_max_threshold ?
Дата	26 апреля 2024 г. 08:10:20
Msg-id	ba3a146c-1bdb-45f4-a7fa-02713783a811@dalibo.com обсуждение исходный текст
Ответ на	Re: New GUC autovacuum_max_threshold ? (Robert Haas <robertmhaas@gmail.com>)
Список	pgsql-hackers

Дерево обсуждения


Le 25/04/2024 à 22:21, Robert Haas a écrit :
> The analyze case, I feel, is really murky.
> autovacuum_analyze_scale_factor stands for the proposition that as the
> table becomes larger, analyze doesn't need to be done as often. If
> what you're concerned about is the frequency estimates, that's true:
> an injection of a million new rows can shift frequencies dramatically
> in a small table, but the effect is blunted in a large one. But a lot
> of the cases I've seen have involved the histogram boundaries. If
> you're inserting data into a table in increasing order, every new
> million rows shifts the boundary of the last histogram bucket by the
> same amount. You either need those rows included in the histogram to
> get good query plans, or you don't. If you do, the frequency with
> which you need to analyze does not change as the table grows. If you
> don't, then it probably does. But the answer doesn't really depend on
> how big the table is already, but on your workload. So it's unclear to
> me that the proposed parameter is the right idea here at all. It's
> also unclear to me that the existing system is the right idea. 🙂

This is very interesting. And what about ndistinct? I believe it could 
be problematic, too, in some (admittedly rare or pathological) cases.

For example, suppose that the actual number of distinct values grows 
from 1000 to 200000 after a batch of insertions, for a particular 
column. OK, in such a case, the default analyze sampling isn't large 
enough to compute a ndistinct close enough to reality anyway. But 
without any analyze at all, it can lead to very bad planning - think of 
a Nested Loop with a parallel seq scan for the outer table instead of a 
simple efficient index scan, because the index scan of the inner table 
is overestimated (each index scan cost and number or rows returned).

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: New GUC autovacuum_max_threshold ?