Re: On Distributions In 7.2.1

Поиск

Список

Период

Сортировка

От	Tom Lane
Тема	Re: On Distributions In 7.2.1
Дата	2 мая 2002 г. 10:37:37
Msg-id	7233.1020348710@sss.pgh.pa.us обсуждение исходный текст
Ответ на	Re: On Distributions In 7.2.1 (Mark kirkwood <markir@slingshot.co.nz>)
Ответы	Tracking down Database growth Re: On Distributions In 7.2.1
Список	pgsql-general

Дерево обсуждения

Mark kirkwood <markir@slingshot.co.nz> writes:
> However Tom's observation is still valid (in spite of my math) - all the
> frequencies are overestimated, rather than the expected "some bigger,
> some smaller" sort of thing.

No, that makes sense.  The values that get into the most-common-values
list are only going to be ones that are significantly more common (in
the sample) than the estimated average frequency.  So if the thing makes
a good estimate of the average frequency, you'll only see upside
outliers in the MCV list.  The relevant logic is in analyze.c:

        /*
         * Decide how many values are worth storing as most-common values.
         * If we are able to generate a complete MCV list (all the values
         * in the sample will fit, and we think these are all the ones in
         * the table), then do so.    Otherwise, store only those values
         * that are significantly more common than the (estimated)
         * average. We set the threshold rather arbitrarily at 25% more
         * than average, with at least 2 instances in the sample.  Also,
         * we won't suppress values that have a frequency of at least 1/K
         * where K is the intended number of histogram bins; such values
         * might otherwise cause us to emit duplicate histogram bin
         * boundaries.
         */

            regards, tom lane

В списке pgsql-general по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: On Distributions In 7.2.1