Re: Understanding histograms
От | Tom Lane |
---|---|
Тема | Re: Understanding histograms |
Дата | |
Msg-id | 18628.1209566591@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: Understanding histograms ("Len Shapiro" <lenshap@gmail.com>) |
Ответы |
Re: Understanding histograms
|
Список | pgsql-performance |
"Len Shapiro" <lenshap@gmail.com> writes: > I asked about n_distinct, whose documentation reads in part "The > negated form is used when ANALYZE believes that the number of distinct > values is likely to increase as the table grows". and I asked about > why ANALYZE believes that the number of distinct values is likely to > increase. I'm unclear why you quoted to me the documentation on > stadistinct. n_distinct is just a view of stadistinct. I assumed you'd poked around in the code enough to know that ... >>> The "rows=2" estimate makes sense when const = 1 or 5, but it makes no >>> sense to me for other values of const not in the MVC list. >> >> I'm not sure what estimate you'd expect instead? > Instead I would expect an estimate of "rows=0" for values of const > that are not in the MCV list and not in the histogram. Surely that's not very sane? The MCV list plus histogram generally don't include every value in the table. IIRC the estimate for values not present in the MCV list is (1 - sum(MCV frequencies)) divided by (n_distinct - number of MCV entries), which amounts to assuming that all values not present in the MCV list occur equally often. The weak spot of course is that the n_distinct estimate may be pretty inaccurate. > Where in the source is the code that manipulates the histogram? commands/analyze.c builds it, and most of the estimation with it happens in utils/adt/selfuncs.c. regards, tom lane
В списке pgsql-performance по дате отправления: