Re: POC: GROUP BY optimization
От | Andrei Lepikhov |
---|---|
Тема | Re: POC: GROUP BY optimization |
Дата | |
Msg-id | 50c1c1dc-057c-421b-a1f4-62622c7afe61@postgrespro.ru обсуждение исходный текст |
Ответ на | Re: POC: GROUP BY optimization (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-hackers |
On 4/12/24 06:44, Tom Lane wrote: > * It seems like root->processed_groupClause no longer has much to do > with the grouping behavior, which is scary given how much code still > believes that it does. I suspect there are bugs lurking there, and 1. Analysing the code, processed_groupClause is used in: - adjust_foreign_grouping_path_cost - to decide, do we need to add cost of sort to the foreign grouping. - just for replacing relids during SJE process. - create_groupingsets_plan(), preprocess_grouping_sets, remove_useless_groupby_columns - we don't apply this feature in the case of grouping sets. - get_number_of_groups - estimation shouldn't depend on the order of clauses. - create_grouping_paths - to analyse hashability of grouping clause - make_group_input_target, make_partial_grouping_target - doesn't depend on the order of clauses planner.c: add_paths_to_grouping_rel, create_partial_grouping_paths - in the case of hash grouping. So, the only case we can impact current behavior lies in the postgres_fdw. But here we don't really know which plan will be chosen during planning on foreign node and stay the same behavior is legal for me. > am not comforted by the fact that the patch changed exactly nothing > in the pathnodes.h documentation of that field. This comment looks > pretty silly now too: > > /* Preprocess regular GROUP BY clause, if any */ > root->processed_groupClause = list_copy(parse->groupClause); > > What "preprocessing" is going on there? This comment was adequate > before, when the code line invoked preprocess_groupclause which had > a bunch of commentary; but now it has to stand on its own and it's > not doing a great job of that. Thanks for picking this place. I fixed it. More interesting here is that we move decisions on the order of group-by clauses to the stage, where we already have underlying subtree and incoming path keys. In principle, we could return the preprocessing routine and arrange GROUP-BY with the ORDER-BY clause as it was before. But looking into the code, I found only one reason to do this: adjust_group_pathkeys_for_groupagg. Curious about how much profit we get here, I think we can discover it later with no hurry. A good outcome here will be a test case that can show the importance of arranging GROUP-BY and ORDER-BY at an early stage. -- regards, Andrei Lepikhov Postgres Professional
Вложения
В списке pgsql-hackers по дате отправления: