Re: [HACKERS] AGG_HASHED cost estimate
От | Ashutosh Bapat |
---|---|
Тема | Re: [HACKERS] AGG_HASHED cost estimate |
Дата | |
Msg-id | CAFjFpReY-RbPsRyH7s8ApG6=Cv1KrGvK4j_FMFXJg4dbGu1wXQ@mail.gmail.com обсуждение исходный текст |
Ответ на | [HACKERS] AGG_HASHED cost estimate (Jeff Janes <jeff.janes@gmail.com>) |
Ответы |
Re: [HACKERS] AGG_HASHED cost estimate
Re: [HACKERS] AGG_HASHED cost estimate |
Список | pgsql-hackers |
On Thu, Apr 20, 2017 at 7:47 AM, Jeff Janes <jeff.janes@gmail.com> wrote: > > In cost_size.c, there is this comment block: > > + * Note: in this cost model, AGG_SORTED and AGG_HASHED have exactly > the > + * same total CPU cost, but AGG_SORTED has lower startup cost. If > the > + * input path is already sorted appropriately, AGG_SORTED should be > + * preferred (since it has no risk of memory overflow). This will > happen > + * as long as the computed total costs are indeed exactly equal --- > but > + * if there's roundoff error we might do the wrong thing. So be > sure > + * that the computations below form the same intermediate values in > the > + * same order. > > But, why should they have the same cost in the first place? A sorted > aggregation just has to do an equality comparison on each adjacent pair, > which is costed as (cpu_operator_cost * numGroupCols) * input_tuples. A > hash aggregation has to do a hashcode computation for each input, which > apparently is also costed at (cpu_operator_cost * numGroupCols) * > input_tuples. I suspect we don't cost this. > But it also has to do the equality comparison between the > input tuple and any aggregation it is about to be aggregated into, to make > sure the hashcode is not a false collision. That should be another > (cpu_operator_cost * numGroupCols) * (input_tuples - numGroups), shouldn't > it? but cost this without numGroups. /* * The transCost.per_tuple component of aggcosts should be charged once * per input tuple, corresponding to thecosts of evaluating the aggregate * transfns and their input expressions (with any startup cost of course * chargedbut once). The finalCost component is charged once per output * tuple, corresponding to the costs of evaluatingthe finalfns. * * If we are grouping, we charge an additional cpu_operator_cost per * grouping columnper input tuple for grouping comparisons. * The reason may be that hashing isn't as costly as a comparison. I don't how true is that. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
В списке pgsql-hackers по дате отправления: