Re: [HACKERS] WIP: Aggregation push-down
От | Antonin Houska |
---|---|
Тема | Re: [HACKERS] WIP: Aggregation push-down |
Дата | |
Msg-id | 13971.1562920911@spoje.net обсуждение исходный текст |
Ответ на | Re: [HACKERS] WIP: Aggregation push-down (Richard Guo <riguo@pivotal.io>) |
Ответы |
Re: [HACKERS] WIP: Aggregation push-down
|
Список | pgsql-hackers |
Richard Guo <riguo@pivotal.io> wrote: > I didn't fully follow the whole thread and mainly looked into the > latest > patch set. So what are the considerations for abandoning the > aggmultifn > concept? Originally the function was there to support join where both relations are grouped. My doubts about usefulness of such join started at [1]. (See the thread referenced from [2].) > In my opinion, aggmultifn would enable us to do a lot more > types of transformation. For example, consider the query below: > > select sum(foo.c) from foo join bar on foo.b = bar.b group by foo.a, > bar.a; > > With the latest patch, the plan looks like: > > Finalize HashAggregate <------ sum(psum) > Group Key: foo.a, bar.a > -> Hash Join > Hash Cond: (bar.b = foo.b) > -> Seq Scan on bar > -> Hash > -> Partial HashAggregate <------ sum(foo.c) as > psum > Group Key: foo.a, foo.b > -> Seq Scan on foo > > > If we have aggmultifn, we can perform the query this way: > > Finalize HashAggregate <------ sum(foo.c)*cnt > Group Key: foo.a, bar.a > -> Hash Join > Hash Cond: (foo.b = bar.b) > -> Seq Scan on foo > -> Hash > -> Partial HashAggregate <------ count(*) as cnt > Group Key: bar.a, bar.b > -> Seq Scan on bar Perhaps, but it that would require changes to nodeAgg.c, which I want to avoid for now. > And this way: > > Finalize HashAggregate <------ sum(psum)*cnt > Group Key: foo.a, bar.a > -> Hash Join > Hash Cond: (foo.b = bar.b) > -> Partial HashAggregate <------ sum(foo.c) as > psum > Group Key: foo.a, foo.b > -> Seq Scan on foo > -> Hash > -> Partial HashAggregate <------ count(*) as cnt > Group Key: bar.a, bar.b > -> Seq Scan on bar This looks like my idea presented in [1], for which there seems to be no common use case. > My another question is in function add_grouped_path(), when creating > sorted aggregation path on top of subpath. If the subpath is not > sorted, > then the sorted aggregation path would not be generated. Why not in > this > case we create a sort path on top of subpath first and then create > group > aggregation path on top of the sort path? I see no reason not to do it. (If you want to try, just go ahead.) However the current patch version brings only the basic functionality and I'm not going to add new functionality (such as parallel aggregation, partitioned tables or postgres_fdw) until the open design questions are resolved. Otherwise there's a risk that the additional work will be wasted due to major rework of the core functionality. > Core dump when running one query in agg_pushdown.sql Thanks for the report! Fixed in the new version. > This is really a cool feature. Thank you for working on this. I appreciate to hear that :-) Let's see which postgres release will adopt it. [1] https://www.postgresql.org/message-id/cc823e89-3fbc-f94e-b9d4-9c713b044b5d%402ndquadrant.com [2] https://www.postgresql.org/message-id/flat/9666.1491295317%40localhost -- Antonin Houska Web: https://www.cybertec-postgresql.com
Вложения
В списке pgsql-hackers по дате отправления: