Re: On Distributions In 7.2.1

Поиск

Список

Период

Сортировка

От	Mark kirkwood
Тема	Re: On Distributions In 7.2.1
Дата	2 мая 2002 г. 05:41:14
Msg-id	1020332199.3822.16.camel@spikey.slithery.org обсуждение исходный текст
Ответ на	Re: On Distributions In 7.2.1 (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы	Re: On Distributions In 7.2.1
Список	pgsql-general

Дерево обсуждения

err - some of my puzzlement is explained by this (silly) error :

there are 3000 distinct values, each with 1000 occurrences (*not* 3000)
so the frequency I am "chasing" is 0.00033... ( not 0.001)

so for 10 (quantiles) we see ~ 0.0016 (too big)
for 30 seeing ~ 0.001 (too big but closer)
for 100 seeing ~ 0.00066 (still too big but closer)

so actually its "creeping" closer (rather than oscillating), which seems
a much healthier situation.

However Tom's observation is still valid (in spite of my math) - all the
frequencies are overestimated, rather than the expected "some bigger,
some smaller" sort of thing.

I will do some more ANALYZE runs and see what happens...

(then the log distribution could be fun)

regards

Mark


>On Thu, 2002-05-02 at 17:00, Tom Lane wrote:
> Mark kirkwood <markir@slingshot.co.nz> writes:
> > There is slightly odd behaviour with the frequencies decreasing with
> > increasing number of quantiles (same as 7.2 .. same code here ?).
>
> That does seem curious.  With the inevitable sampling error, you'd
> expect that some values would be sampled at a bit more than their
> true frequency, and others at a bit less.  The oversampled ones would
> be the ones to get into the MCV list.  But what you've got here is
> that even the most-commonly-sampled value showed up at a bit less
> than its true frequency.  Is this repeatable if you do ANALYZE over
> and over?  Maybe it was just a statistical fluke.
>
> > I am wondering if this is caused by my example not having any "real" most
> > common values (they are all as common as each other).
> > I am going to fiddle with my data generation script, skew the
> > distribution and see what effect that has.
>
> Someone else reported some results that made it look like a logarithmic
> frequency distribution was a difficult case for the stats gatherer:
>     http://archives.postgresql.org/pgsql-general/2002-03/msg01300.php
> So please be sure to try that.
>
>             regards, tom lane
>

В списке pgsql-general по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: On Distributions In 7.2.1