Pushing down join clauses into subqueries
От | Heikki Linnakangas |
---|---|
Тема | Pushing down join clauses into subqueries |
Дата | |
Msg-id | 7105c8f3-e16f-ee54-9ce9-c4ac2384791e@iki.fi обсуждение исходный текст |
Список | pgsql-hackers |
It would be nice, if we could push down join quals into subqueries. For example: create table small_table (i int); create table big_table (i int, j int); insert into small_table values (1), (2); -- two rows insert into big_table select g/10, g from generate_series(1, 100000) g; -- million rows create index on big_table(i); select * from small_table, (select i, count(*) from big_table group by i) as sq where small_table.i = sq.i; Currently, we will fully materialize the subquery, and then filter the rows find the rows that match the outer query: QUERY PLAN ----------------------------------------------------------------------------------- Hash Join (cost=1949.50..2017.08 rows=2550 width=16) Hash Cond: (small_table.i = big_table.i) -> Seq Scan on small_table (cost=0.00..35.50 rows=2550 width=4) -> Hash (cost=1947.00..1947.00 rows=200 width=12) -> HashAggregate (cost=1943.00..1945.00 rows=200 width=12) Group Key: big_table.i -> Seq Scan on big_table (cost=0.00..1443.00 rows=100000 width=4) (7 rows) For this query, a nested loop join, with an index scan on big_table, would be a better plan. Doing that would require creating parameterized paths, planning the subquery multiple times, with and without the pushed-down join quals. However, there's one special case: What if the subquery is LATERAL? In that case, the subquery is already parameterized. We could push down join quals, which refer the same "other" relations that are already laterally referenced, without making it any more lateral. Attached is a patch to do that (the first patch). I'm not too familiar with this code; does it look sane? One thing that I wasn't clear on, is the intended behavior of ReplaceVarsFromTargetList(). In the patch, when a join qual is pushed down, the Vars in the join clause that refer to the subquery's output, are replaced with the subquery's target list entries. That's the same thing we do when pushing down regular, non-join quals, too. But when pushing down a join qual, the qual will also include Vars for other relations on the same level, not just Vars for the subquery. When the qual is pushed down to the subquery, the Vars referring to other relations need to have their varlevelsup incremented. In the patch, I solved this by first calling IncrementVarSublevelsUp() on the qual, to bump up varlevelsup for all Vars, and then ReplaceVarsFromTargetList(), with sublevels_up = 1. But that way, when ReplaceVarsFromTargetList() replaces Vars with the target list entries, it also adjusts varlevelsup in any Vars in the target list entry. I changed it to not do that anymore. All the existing callers call ReplaceVarsFromTargetList() with sublevels_up == 0, so they shouldn't be affected, but I wonder what the original intention here was? Attached is also a second, much more work-in-progress patch, that expands the push-down support to non-LATERAL subqueries. Yes, that means that the subquery is planned multiple times. I think that needs some further heuristics, or perhaps we could pass the list of "potential" join clauses to the subquery when it's planned for the first time, and have subquery_planner() tell which of them might be useful, e.g. because there are indexes to back them. - Heikki
Вложения
В списке pgsql-hackers по дате отправления: