Re: On Distributions In 7.2.1
От | Mark kirkwood |
---|---|
Тема | Re: On Distributions In 7.2.1 |
Дата | |
Msg-id | 1020332199.3822.16.camel@spikey.slithery.org обсуждение исходный текст |
Ответ на | Re: On Distributions In 7.2.1 (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: On Distributions In 7.2.1
|
Список | pgsql-general |
err - some of my puzzlement is explained by this (silly) error : there are 3000 distinct values, each with 1000 occurrences (*not* 3000) so the frequency I am "chasing" is 0.00033... ( not 0.001) so for 10 (quantiles) we see ~ 0.0016 (too big) for 30 seeing ~ 0.001 (too big but closer) for 100 seeing ~ 0.00066 (still too big but closer) so actually its "creeping" closer (rather than oscillating), which seems a much healthier situation. However Tom's observation is still valid (in spite of my math) - all the frequencies are overestimated, rather than the expected "some bigger, some smaller" sort of thing. I will do some more ANALYZE runs and see what happens... (then the log distribution could be fun) regards Mark >On Thu, 2002-05-02 at 17:00, Tom Lane wrote: > Mark kirkwood <markir@slingshot.co.nz> writes: > > There is slightly odd behaviour with the frequencies decreasing with > > increasing number of quantiles (same as 7.2 .. same code here ?). > > That does seem curious. With the inevitable sampling error, you'd > expect that some values would be sampled at a bit more than their > true frequency, and others at a bit less. The oversampled ones would > be the ones to get into the MCV list. But what you've got here is > that even the most-commonly-sampled value showed up at a bit less > than its true frequency. Is this repeatable if you do ANALYZE over > and over? Maybe it was just a statistical fluke. > > > I am wondering if this is caused by my example not having any "real" most > > common values (they are all as common as each other). > > I am going to fiddle with my data generation script, skew the > > distribution and see what effect that has. > > Someone else reported some results that made it look like a logarithmic > frequency distribution was a difficult case for the stats gatherer: > http://archives.postgresql.org/pgsql-general/2002-03/msg01300.php > So please be sure to try that. > > regards, tom lane >
В списке pgsql-general по дате отправления: