Re: benchmarking the query planner
От | Tom Lane |
---|---|
Тема | Re: benchmarking the query planner |
Дата | |
Msg-id | 15086.1229046610@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: benchmarking the query planner ("Nathan Boley" <npboley@gmail.com>) |
Ответы |
Re: benchmarking the query planner
|
Список | pgsql-hackers |
"Nathan Boley" <npboley@gmail.com> writes: > Isn't a selectivity estimate of x = v as ( the number of values in v's > histogram bucket ) / ( number of distinct values in v's histogram > bucket ) pretty rational? Thats currently what we do for non-mcv > values, except that we look at ndistinct over the whole table instead > of individual histogram buckets. But the histogram buckets are (meant to be) equal-population, so it should come out the same either way. The removal of MCVs from the population will knock any possible variance in ndistinct down to the point where I seriously doubt that this could offer a win. An even bigger problem is that this requires estimation of ndistinct among fractions of the population, which will be proportionally less accurate than the overall estimate. Accurate ndistinct estimation is *hard*. > now, if there are 100 histogram buckets then any values that occupy > more than 1% of the table will be mcv's regardless - why force a value > to be an mcv if it only occupies 0.1% of the table? Didn't you just contradict yourself? The cutoff would be 1% not 0.1%. In any case there's already a heuristic to cut off the MCV list at some shorter length (ie, more than 1% in this example) if it seems not worthwhile to keep the last entries. See lines 2132ff (in CVS HEAD) in analyze.c. regards, tom lane
В списке pgsql-hackers по дате отправления: