Re: BUG #17479: "plan should not reference subplan's variable" when calling `grouping` on result of subquery
От | Tom Lane |
---|---|
Тема | Re: BUG #17479: "plan should not reference subplan's variable" when calling `grouping` on result of subquery |
Дата | |
Msg-id | 218738.1652310685@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: BUG #17479: "plan should not reference subplan's variable" when calling `grouping` on result of subquery (Richard Guo <guofenglinux@gmail.com>) |
Ответы |
Re: BUG #17479: "plan should not reference subplan's variable" when calling `grouping` on result of subquery
|
Список | pgsql-bugs |
Richard Guo <guofenglinux@gmail.com> writes: >> Actually the two SubLink expressions are totally the same. But we did >> not check that and proceeded to expand them to two SubPlans. Right. It'd be good to improve on that situation somehow, but I doubt that that line of thought will lead to anything backpatchable. > When we generate PathTarget for initial input to grouping nodes in > make_group_input_target(), for non-grouping columns we would pull out > all the Vars they mention with the help of pull_var_clause(), and add > them to the input target. But this ignores the Vars included inside > GroupingFunc node, even with flag PVC_RECURSE_AGGREGATES. > If we change that and include the Vars inside GroupingFunc node, we > would be able to fix up the GROUPINGFUNC target entry correctly by > walking into the GroupingFunc->args and matching the Vars there. At first I didn't like this solution at all, but on reflection I think it's okay. The existing comment there is wrong because it neglects the fact that whatever's inside the GROUPING construct has to be amenable to later processing by setrefs.c and possibly EXPLAIN. So we cannot have Vars there that aren't output by the lower plan nodes, which is the optimization the comment is incorrectly thinking can work. This code has accidentally worked so far because whatever's inside GROUPING must duplicate a GROUP BY clause elsewhere, which'll result in the same Vars getting propagated up the tree as far as the plan node responsible for grouping. But once you phrase it that way, the fragility become obvious: what if the GROUPING construct is placed above whatever node does the grouping? I think it might be possible to break it today, even without using SubPlans; and if you can't, it's only because the planner is leaving money on the table by not aggressively pruning the output tlists for upper plan nodes. So I now agree that the thing to do is make this logic work the same for GroupingFunc as it does for Aggref (and incidentally make the comment about that less wishy-washy). The apparent savings of not pulling the contained references is illusory, because we'll end up with exactly the same plan tree --- or else a broken plan tree. We may have more to do here, though, because with this patch I get explain verbose SELECT grouping(res.cnt) FROM Card CROSS JOIN LATERAL (SELECT (SELECT Card.id) AS cnt ) AS res GROUP BY res.cnt; HashAggregate (cost=67.38..71.88 rows=200 width=8) Output: GROUPING((SubPlan 1)), ((SubPlan 2)) Group Key: (SubPlan 2) -> Seq Scan on public.card (cost=0.00..61.00 rows=2550 width=8) Output: (SubPlan 2), card.id SubPlan 2 -> Result (cost=0.00..0.01 rows=1 width=4) Output: card.id What became of SubPlan 1? Maybe this is fine but it looks a little shaky. We should at least run down why that's happening and make sure we're not leaving dangling pointers anywhere. regards, tom lane
В списке pgsql-bugs по дате отправления: