Re: Understanding histograms

Поиск

Список

Период

Сортировка

От	Tom Lane
Тема	Re: Understanding histograms
Дата	30 апреля 2008 г. 11:43:30
Msg-id	18628.1209566591@sss.pgh.pa.us обсуждение исходный текст
Ответ на	Re: Understanding histograms ("Len Shapiro" <lenshap@gmail.com>)
Ответы	Re: Understanding histograms
Список	pgsql-performance

Дерево обсуждения

"Len Shapiro" <lenshap@gmail.com> writes:
> I asked about n_distinct, whose documentation reads in part "The
> negated form is used when ANALYZE believes that the number of distinct
> values is likely to increase as the table grows".  and I asked about
> why ANALYZE believes that the number of distinct values is likely to
> increase.  I'm unclear why you quoted to me the documentation on
> stadistinct.

n_distinct is just a view of stadistinct.  I assumed you'd poked around
in the code enough to know that ...

>>> The "rows=2" estimate makes sense when const = 1 or 5, but it makes no
>>> sense to me for other values of const not in the MVC list.
>>
>> I'm not sure what estimate you'd expect instead?

> Instead I would expect an estimate of "rows=0" for values of const
> that are not in the MCV list and not in the histogram.

Surely that's not very sane?  The MCV list plus histogram generally
don't include every value in the table.  IIRC the estimate for values
not present in the MCV list is (1 - sum(MCV frequencies)) divided by
(n_distinct - number of MCV entries), which amounts to assuming that
all values not present in the MCV list occur equally often.  The weak
spot of course is that the n_distinct estimate may be pretty inaccurate.

> Where in the source is the code that manipulates the histogram?

commands/analyze.c builds it, and most of the estimation with it
happens in utils/adt/selfuncs.c.

            regards, tom lane

В списке pgsql-performance по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Understanding histograms