Re: multivariate statistics v14

Поиск
Список
Период
Сортировка
От Tatsuo Ishii
Тема Re: multivariate statistics v14
Дата
Msg-id 20160316.112907.1269707811749756579.t-ishii@sraoss.co.jp
обсуждение исходный текст
Ответ на Re: multivariate statistics v14  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Список pgsql-hackers
> Instead of simply multiplying the ndistinct estimate with selecticity,
> we instead use the formula for the expected number of distinct values
> observed in 'k' rows when there are 'd' distinct values in the bin
> 
>     d * (1 - ((d - 1) / d)^k)
> 
> This is 'with replacements' which seems appropriate for the use, and it
> mostly assumes uniform distribution of the distinct values. So if the
> distribution is not uniform (e.g. there are very frequent groups) this
> may be less accurate than the current algorithm in some cases, giving
> over-estimates. But that's probably better than OOM.
> ---
>  src/backend/utils/adt/selfuncs.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
> index f8d39aa..6eceedf 100644
> --- a/src/backend/utils/adt/selfuncs.c
> +++ b/src/backend/utils/adt/selfuncs.c
> @@ -3466,7 +3466,7 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
>              /*
>               * Multiply by restriction selectivity.
>               */
> -            reldistinct *= rel->rows / rel->tuples;
> +            reldistinct = reldistinct * (1 - powl((reldistinct - 1) / reldistinct,rel->rows));

Why do you change "*=" style? I see no reason to change this.
        reldistinct *= 1 - powl((reldistinct - 1) / reldistinct, rel->rows);

Looks better to me because it's shorter and cleaner.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Eisentraut
Дата:
Сообщение: Re: Relaxing SSL key permission checks
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: syslog configurable line splitting behavior