Re: [HACKERS] AGG_HASHED cost estimate

Поиск

Список

Период

Сортировка

От	Ashutosh Bapat
Тема	Re: [HACKERS] AGG_HASHED cost estimate
Дата	20 апреля 2017 г. 13:46:18
Msg-id	CAFjFpReY-RbPsRyH7s8ApG6=Cv1KrGvK4j_FMFXJg4dbGu1wXQ@mail.gmail.com обсуждение исходный текст
Ответ на	[HACKERS] AGG_HASHED cost estimate (Jeff Janes <jeff.janes@gmail.com>)
Ответы	Re: [HACKERS] AGG_HASHED cost estimate Re: [HACKERS] AGG_HASHED cost estimate
Список	pgsql-hackers

Дерево обсуждения

On Thu, Apr 20, 2017 at 7:47 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
>
> In cost_size.c, there is this comment block:
>
> +        * Note: in this cost model, AGG_SORTED and AGG_HASHED have exactly
> the
> +        * same total CPU cost, but AGG_SORTED has lower startup cost.  If
> the
> +        * input path is already sorted appropriately, AGG_SORTED should be
> +        * preferred (since it has no risk of memory overflow).  This will
> happen
> +        * as long as the computed total costs are indeed exactly equal ---
> but
> +        * if there's roundoff error we might do the wrong thing.  So be
> sure
> +        * that the computations below form the same intermediate values in
> the
> +        * same order.
>
> But, why should they have the same cost in the first place?  A sorted
> aggregation just has to do an equality comparison on each adjacent pair,
> which is costed as (cpu_operator_cost * numGroupCols) * input_tuples.   A
> hash aggregation has to do a hashcode computation for each input, which
> apparently is also costed at (cpu_operator_cost * numGroupCols) *
> input_tuples.

I suspect we don't cost this.

> But it also has to do the equality comparison between the
> input tuple and any aggregation it is about to be aggregated into, to make
> sure the hashcode is not a false collision.  That should be another
> (cpu_operator_cost * numGroupCols) * (input_tuples - numGroups), shouldn't
> it?

but cost this without numGroups.
   /*    * The transCost.per_tuple component of aggcosts should be charged once    * per input tuple, corresponding to
thecosts of evaluating the aggregate    * transfns and their input expressions (with any startup cost of course    *
chargedbut once).  The finalCost component is charged once per output    * tuple, corresponding to the costs of
evaluatingthe finalfns.    *    * If we are grouping, we charge an additional cpu_operator_cost per    * grouping
columnper input tuple for grouping comparisons.    *

The reason may be that hashing isn't as costly as a comparison. I
don't how true is that.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [HACKERS] AGG_HASHED cost estimate