Re: We probably need autovacuum_max_wraparound_workers
От | Tom Lane |
---|---|
Тема | Re: We probably need autovacuum_max_wraparound_workers |
Дата | |
Msg-id | 6551.1340927335@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: We probably need autovacuum_max_wraparound_workers (Josh Berkus <josh@agliodbs.com>) |
Ответы |
Re: We probably need autovacuum_max_wraparound_workers
|
Список | pgsql-hackers |
Josh Berkus <josh@agliodbs.com> writes: > So there are two parts to this problem, each of which needs a different > solution: > 1. Databases can inadvertently get to the state where many tables need > wraparound vacuuming at exactly the same time, especially if they have > many "cold" data partition tables. I'm not especially sold on your theory that there's some behavior that forces such convergence, but it's certainly plausible that there was, say, a schema alteration applied to all of those partitions at about the same time. In any case, as Robert has been saying, it seems like it would be smart to try to get autovacuum to spread out the anti-wraparound work a bit better when it's faced with a lot of tables with similar relfrozenxid values. > 2. When we do hit wraparound thresholds for multiple tables, autovacuum > has no hesitation about doing autovacuum_max_workers worth of wraparound > vacuum simultaneously, even when that exceeds the I/O capactity of the > system. I continue to maintain that this problem is unrelated to wraparound as such, and that thinking it is is a great way to design a bad solution. There are any number of reasons why autovacuum might need to run max_workers at once. What we need to look at is making sure that they don't run the system into the ground when that happens. Since your users weren't complaining about performance with one or two autovac workers running (were they?), we can assume that the cost-delay settings were such as to not create a problem in that scenario. So it seems to me that it's down to autovac_balance_cost(). Either there's a plain-vanilla bug in there, or seek costs are breaking the assumption that it's okay to give N workers each 1/Nth of the single-worker I/O capacity. As far as bugs are concerned, I wonder if the premise of the calculation * The idea here is that we ration out I/O equally. The amount of I/O * that a worker can consume is determined bycost_limit/cost_delay, so we * try to equalize those ratios rather than the raw limit settings. might be wrong in itself? The ratio idea seems plausible but ... regards, tom lane
В списке pgsql-hackers по дате отправления: