Re: [HACKERS] WIP: Aggregation push-down

Поиск

Список

Период

Сортировка

От	Antonin Houska
Тема	Re: [HACKERS] WIP: Aggregation push-down
Дата	12 июля 2019 г. 08:41:51
Msg-id	13971.1562920911@spoje.net обсуждение исходный текст
Ответ на	Re: [HACKERS] WIP: Aggregation push-down (Richard Guo <riguo@pivotal.io>)
Ответы	Re: [HACKERS] WIP: Aggregation push-down
Список	pgsql-hackers

Дерево обсуждения

Richard Guo <riguo@pivotal.io> wrote:

> I didn't fully follow the whole thread and mainly looked into the
> latest
> patch set. So what are the considerations for abandoning the
> aggmultifn
> concept?

Originally the function was there to support join where both relations are
grouped. My doubts about usefulness of such join started at [1]. (See the
thread referenced from [2].)

> In my opinion, aggmultifn would enable us to do a lot more
> types of transformation. For example, consider the query below:
> 
> select sum(foo.c) from foo join bar on foo.b = bar.b group by foo.a,
> bar.a;
> 
> With the latest patch, the plan looks like:
> 
> Finalize HashAggregate    <------ sum(psum)
>    Group Key: foo.a, bar.a
>    ->  Hash Join
>          Hash Cond: (bar.b = foo.b)
>          ->  Seq Scan on bar
>          ->  Hash
>                ->  Partial HashAggregate    <------ sum(foo.c) as
> psum
>                      Group Key: foo.a, foo.b
>                      ->  Seq Scan on foo
> 
> 
> If we have aggmultifn, we can perform the query this way:
> 
> Finalize HashAggregate    <------ sum(foo.c)*cnt
>    Group Key: foo.a, bar.a
>    ->  Hash Join
>          Hash Cond: (foo.b = bar.b)
>          ->  Seq Scan on foo
>          ->  Hash
>                ->  Partial HashAggregate    <------ count(*) as cnt
>                      Group Key: bar.a, bar.b
>                      ->  Seq Scan on bar

Perhaps, but it that would require changes to nodeAgg.c, which I want to avoid
for now.

> And this way:
> 
> Finalize HashAggregate    <------ sum(psum)*cnt
>    Group Key: foo.a, bar.a
>    ->  Hash Join
>          Hash Cond: (foo.b = bar.b)
>                ->  Partial HashAggregate    <------ sum(foo.c) as
> psum
>                      Group Key: foo.a, foo.b
>                      ->  Seq Scan on foo
>          ->  Hash
>                ->  Partial HashAggregate    <------ count(*) as cnt
>                      Group Key: bar.a, bar.b
>                      ->  Seq Scan on bar

This looks like my idea presented in [1], for which there seems to be no
common use case.

> My another question is in function add_grouped_path(), when creating
> sorted aggregation path on top of subpath. If the subpath is not
> sorted,
> then the sorted aggregation path would not be generated. Why not in
> this
> case we create a sort path on top of subpath first and then create
> group
> aggregation path on top of the sort path?

I see no reason not to do it. (If you want to try, just go ahead.) However the
current patch version brings only the basic functionality and I'm not going to
add new functionality (such as parallel aggregation, partitioned tables or
postgres_fdw) until the open design questions are resolved. Otherwise there's
a risk that the additional work will be wasted due to major rework of the core
functionality.

> Core dump when running one query in agg_pushdown.sql

Thanks for the report! Fixed in the new version.

> This is really a cool feature. Thank you for working on this.

I appreciate to hear that :-) Let's see which postgres release will adopt it.

[1] https://www.postgresql.org/message-id/cc823e89-3fbc-f94e-b9d4-9c713b044b5d%402ndquadrant.com

[2] https://www.postgresql.org/message-id/flat/9666.1491295317%40localhost

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com

Вложения

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [HACKERS] WIP: Aggregation push-down

Вложения