Re: On Distributions In 7.2.1

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: On Distributions In 7.2.1
Дата
Msg-id 7233.1020348710@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: On Distributions In 7.2.1  (Mark kirkwood <markir@slingshot.co.nz>)
Ответы Tracking down Database growth
Re: On Distributions In 7.2.1
Список pgsql-general
Mark kirkwood <markir@slingshot.co.nz> writes:
> However Tom's observation is still valid (in spite of my math) - all the
> frequencies are overestimated, rather than the expected "some bigger,
> some smaller" sort of thing.

No, that makes sense.  The values that get into the most-common-values
list are only going to be ones that are significantly more common (in
the sample) than the estimated average frequency.  So if the thing makes
a good estimate of the average frequency, you'll only see upside
outliers in the MCV list.  The relevant logic is in analyze.c:

        /*
         * Decide how many values are worth storing as most-common values.
         * If we are able to generate a complete MCV list (all the values
         * in the sample will fit, and we think these are all the ones in
         * the table), then do so.    Otherwise, store only those values
         * that are significantly more common than the (estimated)
         * average. We set the threshold rather arbitrarily at 25% more
         * than average, with at least 2 instances in the sample.  Also,
         * we won't suppress values that have a frequency of at least 1/K
         * where K is the intended number of histogram bins; such values
         * might otherwise cause us to emit duplicate histogram bin
         * boundaries.
         */

            regards, tom lane

В списке pgsql-general по дате отправления:

Предыдущее
От: "Christopher Kings-Lynne"
Дата:
Сообщение: PureFTPd
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Using views and MS access via odbc