Re: GROUP BY ROLLUP queries on views trigger full table scans (index usage not optimized)
| От | Richard Guo |
|---|---|
| Тема | Re: GROUP BY ROLLUP queries on views trigger full table scans (index usage not optimized) |
| Дата | |
| Msg-id | CAMbWs49fQuPPMTmFL3NrshGBYoTQL26tWtg-3pA2jkmW=C5GQg@mail.gmail.com обсуждение исходный текст |
| Ответ на | Re:GROUP BY ROLLUP queries on views trigger full table scans (index usage not optimized) ("Haowu Ge" <gehaowu@bitmoe.com>) |
| Ответы |
Re: GROUP BY ROLLUP queries on views trigger full table scans (index usage not optimized)
|
| Список | pgsql-bugs |
On Fri, Dec 5, 2025 at 11:17 AM Haowu Ge <gehaowu@bitmoe.com> wrote: > I tried applying the patch myself, compiled and installed it, > but it doesn’t seem to resolve the issue of GROUP BY ROLLUP leading to a full table scan. I see why that failed. In your example scenario, the operand ends up as RelabelType(PlaceHolderVar(Var)). The v1 patch attempted to strip PlaceHolderVar nodes first and then RelabelType nodes. So the outer RelabelType blocked the logic from seeing and stripping the inner PlaceHolderVar, causing the match to fail. You may think we can reverse the stripping order to fix this, but we cannot, because these nodes can be interleaved in arbitrary ways. For instance: CREATE TABLE t (a int, b int); CREATE INDEX ON t (a); SELECT * FROM (SELECT a::oid AS x FROM t) WHERE x::int = 1 GROUP BY ROLLUP(x); The operand is RT(PHV(RT(Var))). To handle such interleaving, we may need a loop that repeatedly peels off both node types from the top of the stack until the underlying operand is revealed. However, even a robust top-level loop is insufficient because PHVs can be buried deeply within an expression tree. This is particularly problematic for expression indexes. Consider an index on (a+1) with the following query: CREATE INDEX ON t ((a+1)); SELECT * FROM (SELECT a AS x FROM t) WHERE x + 1 = 2 GROUP BY ROLLUP(x); In this case, the operand becomes OpExpr(+, PHV(Var), Const(1)). The top-level stripping cannot reach this nested PHV node, which causes the match to fail at the end, because match_index_to_operand() relies on equal() to verify the match. Therefore, we may need to recurse into the operand expression tree to perform a deep strip of the PHVs, leveraging expression_tree_mutator. A significant concern with this deep-strip approach is the performance cost. match_index_to_operand() lies on the planner's hot path, and unconditionally invoking expression_tree_mutator for every check would introduce unacceptable overhead: that involves memory allocation and tree copying. To mitigate this, we can avoid the stripping logic when there are no PHVs to remove. We can check root->glob->lastPHId == 0 to determine if there's no PHVs at all. Unfortunately, the "root" parameter is not available in match_index_to_operand(), and modifying the function signature to pass it down would break ABI compatibility, which should not happen in a patch for back-patching. (We can do that on master though.) As a viable alternative, maybe we can use a lightweight, read-only expression walker on the operand tree first to detect the presence of any PHVs. We only trigger the expensive deep mutation if this preliminary check confirms that it is actually necessary. I'm wondering whether we need to deep-strip RelabelType nodes as well. Is it possible to have a case where the operand OpExpr(+, RT(a), 1) fails to match an index expression OpExpr(+, a, 1)? Even if so, maybe we do not need to deep-strip RelabelType nodes. There is a distinct difference between these two node types: PHVs are artifacts injected purely by the planner and users have no control over them, while RelabelType nodes usually reflect explicit types or casts derived from the user's query. If the user defines the index as (a::text || b), they are expected to write WHERE (a::text || b) = .... Any thoughts? Hi Tom, I wonder if you can provide some insights? - Richard
В списке pgsql-bugs по дате отправления: