Re: On Distributions In 7.2.1
От | Mark kirkwood |
---|---|
Тема | Re: On Distributions In 7.2.1 |
Дата | |
Msg-id | 1020487765.1259.23.camel@spikey.slithery.org обсуждение исходный текст |
Ответ на | Re: On Distributions In 7.2.1 (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: On Distributions In 7.2.1
|
Список | pgsql-general |
>On Fri, 2002-05-03 at 02:11, Tom Lane wrote: > Mark kirkwood <markir@slingshot.co.nz> writes: > > However Tom's observation is still valid (in spite of my math) - all the > > frequencies are overestimated, rather than the expected "some bigger, > > some smaller" sort of thing. > > No, that makes sense. The values that get into the most-common-values > list are only going to be ones that are significantly more common (in > the sample) than the estimated average frequency. So if the thing makes > a good estimate of the average frequency, you'll only see upside > outliers in the MCV list. The relevant logic is in analyze.c: > >(snippage) oh I see... I am thinking that a larger sample may reduce the likelihood of sample values *not* being included in my MCV list. A quick ALTER .... SET STATISTICS 500 + ANALYZE results in frequencies of around the 0.0004 area for each MCV - closer to the actual of 0.00033. So the frequency estimation algorithm is behaving well in this case, and appears to be coverging to the correct results (or within a sensible neighbourhood of them...). So for this type of data (uniformly distributed keys) one can ANALYZE with confidence... I notice that for a resonably large number of keys + big table conbination using 100 quantiles gives much more accurate frequencies - I wonder if its worth a mention in the docs to the effect : "ANALYZE with the default (10 or so) is ok for most cases, but for big tables consider using ALTER ... SET STATISTICS 100 for commonly JOINed or WHEREed columns" (Next ...the logarithmic guy...) regards Mark
В списке pgsql-general по дате отправления: