Re: New GUC autovacuum_max_threshold ?
От | Frédéric Yhuel |
---|---|
Тема | Re: New GUC autovacuum_max_threshold ? |
Дата | |
Msg-id | ba3a146c-1bdb-45f4-a7fa-02713783a811@dalibo.com обсуждение исходный текст |
Ответ на | Re: New GUC autovacuum_max_threshold ? (Robert Haas <robertmhaas@gmail.com>) |
Список | pgsql-hackers |
Le 25/04/2024 à 22:21, Robert Haas a écrit : > The analyze case, I feel, is really murky. > autovacuum_analyze_scale_factor stands for the proposition that as the > table becomes larger, analyze doesn't need to be done as often. If > what you're concerned about is the frequency estimates, that's true: > an injection of a million new rows can shift frequencies dramatically > in a small table, but the effect is blunted in a large one. But a lot > of the cases I've seen have involved the histogram boundaries. If > you're inserting data into a table in increasing order, every new > million rows shifts the boundary of the last histogram bucket by the > same amount. You either need those rows included in the histogram to > get good query plans, or you don't. If you do, the frequency with > which you need to analyze does not change as the table grows. If you > don't, then it probably does. But the answer doesn't really depend on > how big the table is already, but on your workload. So it's unclear to > me that the proposed parameter is the right idea here at all. It's > also unclear to me that the existing system is the right idea. 🙂 This is very interesting. And what about ndistinct? I believe it could be problematic, too, in some (admittedly rare or pathological) cases. For example, suppose that the actual number of distinct values grows from 1000 to 200000 after a batch of insertions, for a particular column. OK, in such a case, the default analyze sampling isn't large enough to compute a ndistinct close enough to reality anyway. But without any analyze at all, it can lead to very bad planning - think of a Nested Loop with a parallel seq scan for the outer table instead of a simple efficient index scan, because the index scan of the inner table is overestimated (each index scan cost and number or rows returned).
В списке pgsql-hackers по дате отправления: