Re: autovacuum default parameters
От | Gregory Stark |
---|---|
Тема | Re: autovacuum default parameters |
Дата | |
Msg-id | 877ioqqlev.fsf@oxford.xeocode.com обсуждение исходный текст |
Ответ на | autovacuum default parameters (was Re: 8.2 is 30% better in pgbench than 8.3) (Alvaro Herrera <alvherre@commandprompt.com>) |
Ответы |
Re: autovacuum default parameters
|
Список | pgsql-hackers |
"Alvaro Herrera" <alvherre@commandprompt.com> writes: > We didn't, but while I agree with the idea, I think 5% is too low. I > don't want autovacuum to get excessively aggressive. Is 10% not enough? Well let me flip it around. Would you think a default fillfactor of 10% would be helpful or overkill? I think it would nearly always be overkill and waste heap space and therefore cache hit rate and i/o bandwidth. I get my 5% intuition from the TPCC stock table which has about 20 tuples per page. That means a fillfactor or vacuum at 5% both translate into trying to maintain a margin of one tuple's worth of space per page. Enough for an update to happen without migrating to a new page. That's actually a fairly wide table though. A narrower table could easily have 50-100 tuple per page which would require only 1-2% of dead space overhead. <idle speculation> Perhaps the two parameters should be tied together and we should make the autovacuum parameter: max(1%, min(10%, fillfactor(table))) and make the default fill factor 5%. Hm. We have the width of the table in the stats don't we? We could actually calculate the "1 tuple's worth of space" percentage automatically on a per-table basis. Or for that matter instead of calculating it as a percentage of the whole table, just compare the number of updates/deletes with the number of pages in the table. </speculation> > How about the analyze scale factor, should we keep the current 10%? I > have less of a problem with reducing it further since analyze is cheaper > than vacuum. My "try to maintain one tuple's worth of space" model doesn't answer this question at all. It depends entirely on whether the ddl is changing the data distribution. Perhaps this should be 1/max(stats_target) for the table. So the default would be 10% but if you raise the stats_target for a column to 100 it would go down to 1% or so. The idea being that if you have ten buckets then updating 1/10th of the rows stands an even chance of doubling or halving the size of your bucket. Except there's no math behind that intuition at all and I rather doubt it makes much sense. Actually I feel like there should be a factor of 2 or more in there as well. If you modify 1/10th of the rows and you have 10 buckets then we should be analyzing *before* the distribution has a chance to be modified beyond recognition. Perhaps I shouldn't have closed the <speculation> tag so early :) The problem if we try to calculate reasonable defaults like this is it makes it unclear how to expose any knob for the user to adjust it if they need to. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
В списке pgsql-hackers по дате отправления: