Re: On Distributions In 7.2.1
От | Tom Lane |
---|---|
Тема | Re: On Distributions In 7.2.1 |
Дата | |
Msg-id | 7233.1020348710@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: On Distributions In 7.2.1 (Mark kirkwood <markir@slingshot.co.nz>) |
Ответы |
Tracking down Database growth
Re: On Distributions In 7.2.1 |
Список | pgsql-general |
Mark kirkwood <markir@slingshot.co.nz> writes: > However Tom's observation is still valid (in spite of my math) - all the > frequencies are overestimated, rather than the expected "some bigger, > some smaller" sort of thing. No, that makes sense. The values that get into the most-common-values list are only going to be ones that are significantly more common (in the sample) than the estimated average frequency. So if the thing makes a good estimate of the average frequency, you'll only see upside outliers in the MCV list. The relevant logic is in analyze.c: /* * Decide how many values are worth storing as most-common values. * If we are able to generate a complete MCV list (all the values * in the sample will fit, and we think these are all the ones in * the table), then do so. Otherwise, store only those values * that are significantly more common than the (estimated) * average. We set the threshold rather arbitrarily at 25% more * than average, with at least 2 instances in the sample. Also, * we won't suppress values that have a frequency of at least 1/K * where K is the intended number of histogram bins; such values * might otherwise cause us to emit duplicate histogram bin * boundaries. */ regards, tom lane
В списке pgsql-general по дате отправления: