Re: Parallel Aggregate

Поиск

Список

Период

Сортировка

От	Amit Kapila
Тема	Re: Parallel Aggregate
Дата	18 марта 2016 г. 07:25:06
Msg-id	CAA4eK1K5DTkqsNQsmeLBKQMpL6OcM10R1KbgzUy9ERffCAfEvQ@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Parallel Aggregate (David Rowley <david.rowley@2ndquadrant.com>)
Ответы	Re: Parallel Aggregate
Список	pgsql-hackers

Дерево обсуждения

On Thu, Mar 17, 2016 at 6:41 PM, David Rowley <david.rowley@2ndquadrant.com> wrote:
>
> On 18 March 2016 at 01:22, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > On Thu, Mar 17, 2016 at 10:35 AM, David Rowley
> > <david.rowley@2ndquadrant.com> wrote:
>
> Updated patch is attached. Thanks for the re-review.
>

Few more comments:

+ if (parse->groupClause)

+ path = (Path *) create_sort_path(root,

+ grouped_rel,

+ path,

+ root->group_pathkeys,

+ -1.0);

For final path, why do you want to sort just for group by case?

+ path = (Path *) create_gather_path(root, partial_grouped_rel, path,

+ NULL, &total_groups);

+ if (parse->groupClause)

+ path = (Path *) create_sort_path(root,

+ grouped_rel,

+ path,

+ root->group_pathkeys,

+ -1.0);

+ if (parse->hasAggs)

+ add_path(grouped_rel, (Path *)

+ create_agg_path(root,

+ grouped_rel,

+ path,

+ target,

+ parse->groupClause ? AGG_SORTED : AGG_PLAIN,

+ parse->groupClause,

+ (List *) parse->havingQual,

+ &agg_costs,

+ partial_grouped_rel->rows,

+ true,

+ true));

+ else

+ add_path(grouped_rel, (Path *)

+ create_group_path(root,

+ grouped_rel,

+ path,

+ target,

+ parse->groupClause,

+ (List *) parse->havingQual,

+ total_groups));

In above part of patch, it seems you are using number of groups differenetly; for create_group_path() and create_gather_path(), you have used total_groups whereas for create_agg_path() partial_grouped_rel->rows is used, is there a reason for the same?

+ if (grouped_rel->partial_pathlist)

+ {

+ Path *path = (Path *)

linitial(grouped_rel->partial_pathlist);

+ double total_groups;

+ total_groups =

path->rows * path->parallel_degree;

+ path = (Path *) create_gather_path(root, partial_grouped_rel,

path,

+ NULL, &total_groups);

A. Won't passing partial_grouped_rel lead to incomplete information required by create_gather_path() w.r.t the case of parameterized path info?

B. You have mentioned that passing grouped_rel will make gather path contain the information of final path target, but what is the problem with that? I mean to ask why Gather node is required to contain partial path target information instead of final path target.

C. Can we consider passing pathtarget to create_gather_path() as that seems to save us from inventing new UpperRelationKind? If you are worried about adding the new parameter (pathtarget) to create_gather_path(), then I think we are already passing it in many other path generation functions, so why not for gather path generation as well?

4A.

Overall function create_grouping_paths() looks better than previous, but I think still it is difficult to read. I think it can be improved by generating partial aggregate paths separately as we do for nestloop join refer function consider_parallel_nestloop

4B.

Rather than directly using create_gather_path(), can't we use generate_gather_paths as for all places where we generate gather node, generate_gather_paths() is used.

+make_partialgroup_input_target(PlannerInfo *root, PathTarget *final_target)

{

+ foreach(lc, final_target->exprs)

+ {

+ Expr *expr = (Expr *) lfirst(lc);

i++;

+ if (parse->groupClause)

+ {

+ Index sgref = final_target-

>sortgrouprefs[i];

+ if (sgref && get_sortgroupref_clause_noerr(sgref, parse->groupClause)

!= NULL)

+ {

+ /*

* It's a grouping column, so add it to the input target as-is.

+ */

add_column_to_pathtarget(input_target, expr, sgref);

+ continue;

}

+ }

+ /*

+ * Non-grouping column, so just remember the expression for later

* call to pull_var_clause.

+ */

+ non_group_cols = lappend(non_group_cols, expr);

}

Do we want to achieve something different in the above foreach loop than the similar loop in make_group_input_target(), if not then why are they not exactly same?

+ /* XXX this causes some redundant cost calculation ... */

+ input_target = set_pathtarget_cost_width(root,

input_target);

+ return input_target;

Can't we use return set_pathtarget_cost_width() directly rather than fetching it in input_target and then returning input_target?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Parallel Aggregate