Обсуждение: Should consider materializing the cheapest inner path in consider_parallel_nestloop()
Hi all,
I recently run benchmark[1] on master, but I found performance problem as below:
explain analyze select
subq_0.c0 as c0,
subq_0.c1 as c1,
subq_0.c2 as c2
from
(select
ref_0.l_shipmode as c0,
sample_0.l_orderkey as c1,
sample_0.l_quantity as c2,
ref_0.l_orderkey as c3,
sample_0.l_shipmode as c5,
ref_0.l_shipinstruct as c6
from
public.lineitem as ref_0
left join public.lineitem as sample_0
on ((select p_partkey from public.part order by p_partkey limit 1)
is not NULL)
where sample_0.l_orderkey is NULL) as subq_0
where subq_0.c5 is NULL
limit 1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=78.00..45267050.75 rows=1 width=27) (actual time=299695.097..299695.099 rows=0 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=78.00..78.00 rows=1 width=8) (actual time=0.651..0.652 rows=1 loops=1)
-> Sort (cost=78.00..83.00 rows=2000 width=8) (actual time=0.650..0.651 rows=1 loops=1)
Sort Key: part.p_partkey
Sort Method: top-N heapsort Memory: 25kB
-> Seq Scan on part (cost=0.00..68.00 rows=2000 width=8) (actual time=0.013..0.428 rows=2000 loops=1)
-> Nested Loop Left Join (cost=0.00..45266972.75 rows=1 width=27) (actual time=299695.096..299695.096 rows=0 loops=1)
Join Filter: ($0 IS NOT NULL)
Filter: ((sample_0.l_orderkey IS NULL) AND (sample_0.l_shipmode IS NULL))
Rows Removed by Filter: 3621030625
-> Seq Scan on lineitem ref_0 (cost=0.00..1969.75 rows=60175 width=11) (actual time=0.026..6.225 rows=60175 loops=1)
-> Materialize (cost=0.00..2270.62 rows=60175 width=27) (actual time=0.000..2.554 rows=60175 loops=60175)
-> Seq Scan on lineitem sample_0 (cost=0.00..1969.75 rows=60175 width=27) (actual time=0.004..8.169 rows=60175 loops=1)
Planning Time: 0.172 ms
Execution Time: 299695.501 ms
(16 rows)
After I set enable_material to off, the same query run faster, as below:
set enable_material = off;
explain analyze select
subq_0.c0 as c0,
subq_0.c1 as c1,
subq_0.c2 as c2
from
(select
ref_0.l_shipmode as c0,
sample_0.l_orderkey as c1,
sample_0.l_quantity as c2,
ref_0.l_orderkey as c3,
sample_0.l_shipmode as c5,
ref_0.l_shipinstruct as c6
from
public.lineitem as ref_0
left join public.lineitem as sample_0
on ((select p_partkey from public.part order by p_partkey limit 1)
is not NULL)
where sample_0.l_orderkey is NULL) as subq_0
where subq_0.c5 is NULL
limit 1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1078.00..91026185.57 rows=1 width=27) (actual time=192669.605..192670.425 rows=0 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=78.00..78.00 rows=1 width=8) (actual time=0.662..0.663 rows=1 loops=1)
-> Sort (cost=78.00..83.00 rows=2000 width=8) (actual time=0.661..0.662 rows=1 loops=1)
Sort Key: part.p_partkey
Sort Method: top-N heapsort Memory: 25kB
-> Seq Scan on part (cost=0.00..68.00 rows=2000 width=8) (actual time=0.017..0.430 rows=2000 loops=1)
-> Gather (cost=1000.00..91026107.57 rows=1 width=27) (actual time=192669.604..192670.422 rows=0 loops=1)
Workers Planned: 1
Params Evaluated: $0
Workers Launched: 1
-> Nested Loop Left Join (cost=0.00..91025107.47 rows=1 width=27) (actual time=192588.143..192588.144 rows=0 loops=2)
Join Filter: ($0 IS NOT NULL)
Filter: ((sample_0.l_orderkey IS NULL) AND (sample_0.l_shipmode IS NULL))
Rows Removed by Filter: 1810515312
-> Parallel Seq Scan on lineitem ref_0 (cost=0.00..1721.97 rows=35397 width=11) (actual time=0.007..3.797 rows=30088 loops=2)
-> Seq Scan on lineitem sample_0 (cost=0.00..1969.75 rows=60175 width=27) (actual time=0.000..2.637 rows=60175 loops=60175)
Planning Time: 0.174 ms
Execution Time: 192670.458 ms
(19 rows)
subq_0.c0 as c0,
subq_0.c1 as c1,
subq_0.c2 as c2
from
(select
ref_0.l_shipmode as c0,
sample_0.l_orderkey as c1,
sample_0.l_quantity as c2,
ref_0.l_orderkey as c3,
sample_0.l_shipmode as c5,
ref_0.l_shipinstruct as c6
from
public.lineitem as ref_0
left join public.lineitem as sample_0
on ((select p_partkey from public.part order by p_partkey limit 1)
is not NULL)
where sample_0.l_orderkey is NULL) as subq_0
where subq_0.c5 is NULL
limit 1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=78.00..45267050.75 rows=1 width=27) (actual time=299695.097..299695.099 rows=0 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=78.00..78.00 rows=1 width=8) (actual time=0.651..0.652 rows=1 loops=1)
-> Sort (cost=78.00..83.00 rows=2000 width=8) (actual time=0.650..0.651 rows=1 loops=1)
Sort Key: part.p_partkey
Sort Method: top-N heapsort Memory: 25kB
-> Seq Scan on part (cost=0.00..68.00 rows=2000 width=8) (actual time=0.013..0.428 rows=2000 loops=1)
-> Nested Loop Left Join (cost=0.00..45266972.75 rows=1 width=27) (actual time=299695.096..299695.096 rows=0 loops=1)
Join Filter: ($0 IS NOT NULL)
Filter: ((sample_0.l_orderkey IS NULL) AND (sample_0.l_shipmode IS NULL))
Rows Removed by Filter: 3621030625
-> Seq Scan on lineitem ref_0 (cost=0.00..1969.75 rows=60175 width=11) (actual time=0.026..6.225 rows=60175 loops=1)
-> Materialize (cost=0.00..2270.62 rows=60175 width=27) (actual time=0.000..2.554 rows=60175 loops=60175)
-> Seq Scan on lineitem sample_0 (cost=0.00..1969.75 rows=60175 width=27) (actual time=0.004..8.169 rows=60175 loops=1)
Planning Time: 0.172 ms
Execution Time: 299695.501 ms
(16 rows)
After I set enable_material to off, the same query run faster, as below:
set enable_material = off;
explain analyze select
subq_0.c0 as c0,
subq_0.c1 as c1,
subq_0.c2 as c2
from
(select
ref_0.l_shipmode as c0,
sample_0.l_orderkey as c1,
sample_0.l_quantity as c2,
ref_0.l_orderkey as c3,
sample_0.l_shipmode as c5,
ref_0.l_shipinstruct as c6
from
public.lineitem as ref_0
left join public.lineitem as sample_0
on ((select p_partkey from public.part order by p_partkey limit 1)
is not NULL)
where sample_0.l_orderkey is NULL) as subq_0
where subq_0.c5 is NULL
limit 1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1078.00..91026185.57 rows=1 width=27) (actual time=192669.605..192670.425 rows=0 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=78.00..78.00 rows=1 width=8) (actual time=0.662..0.663 rows=1 loops=1)
-> Sort (cost=78.00..83.00 rows=2000 width=8) (actual time=0.661..0.662 rows=1 loops=1)
Sort Key: part.p_partkey
Sort Method: top-N heapsort Memory: 25kB
-> Seq Scan on part (cost=0.00..68.00 rows=2000 width=8) (actual time=0.017..0.430 rows=2000 loops=1)
-> Gather (cost=1000.00..91026107.57 rows=1 width=27) (actual time=192669.604..192670.422 rows=0 loops=1)
Workers Planned: 1
Params Evaluated: $0
Workers Launched: 1
-> Nested Loop Left Join (cost=0.00..91025107.47 rows=1 width=27) (actual time=192588.143..192588.144 rows=0 loops=2)
Join Filter: ($0 IS NOT NULL)
Filter: ((sample_0.l_orderkey IS NULL) AND (sample_0.l_shipmode IS NULL))
Rows Removed by Filter: 1810515312
-> Parallel Seq Scan on lineitem ref_0 (cost=0.00..1721.97 rows=35397 width=11) (actual time=0.007..3.797 rows=30088 loops=2)
-> Seq Scan on lineitem sample_0 (cost=0.00..1969.75 rows=60175 width=27) (actual time=0.000..2.637 rows=60175 loops=60175)
Planning Time: 0.174 ms
Execution Time: 192670.458 ms
(19 rows)
I debug the code and find consider_parallel_nestloop() doesn't consider materialized form of the cheapest inner path.
When enable_material = true, we can see Material path won in first plan, but Parallel Seq Scan node doesn't add as outer path, which because
in try_partial_nestloop_path() , the cost of nestloop wat computed using seq scan path not material path.
[1] include test table schema and data, you can repeat above problem.
I try fix this problem in attached patch, and I found pg12.12 also had this issue. Please review my patch, thanks!
Вложения
After using patch, the result as below :
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1078.00..26630101.20 rows=1 width=27) (actual time=160571.005..160571.105 rows=0 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=78.00..78.00 rows=1 width=8) (actual time=1.065..1.066 rows=1 loops=1)
-> Sort (cost=78.00..83.00 rows=2000 width=8) (actual time=1.064..1.065 rows=1 loops=1)
Sort Key: part.p_partkey
Sort Method: top-N heapsort Memory: 25kB
-> Seq Scan on part (cost=0.00..68.00 rows=2000 width=8) (actual time=0.046..0.830 rows=2000 loops=1)
-> Gather (cost=1000.00..26630023.20 rows=1 width=27) (actual time=160571.003..160571.102 rows=0 loops=1)
Workers Planned: 1
Params Evaluated: $0
Workers Launched: 1
-> Nested Loop Left Join (cost=0.00..26629023.10 rows=1 width=27) (actual time=160549.257..160549.258 rows=0 loops=2)
Join Filter: ($0 IS NOT NULL)
Filter: ((sample_0.l_orderkey IS NULL) AND (sample_0.l_shipmode IS NULL))
Rows Removed by Filter: 1810515312
-> Parallel Seq Scan on lineitem ref_0 (cost=0.00..1721.97 rows=35397 width=11) (actual time=0.010..3.393 rows=30088 loops=2)
-> Materialize (cost=0.00..2270.62 rows=60175 width=27) (actual time=0.000..2.839 rows=60175 loops=60175)
-> Seq Scan on lineitem sample_0 (cost=0.00..1969.75 rows=60175 width=27) (actual time=0.008..11.381 rows=60175 loops=2)
Planning Time: 0.174 ms
Execution Time: 160571.476 ms
(20 rows)
------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1078.00..26630101.20 rows=1 width=27) (actual time=160571.005..160571.105 rows=0 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=78.00..78.00 rows=1 width=8) (actual time=1.065..1.066 rows=1 loops=1)
-> Sort (cost=78.00..83.00 rows=2000 width=8) (actual time=1.064..1.065 rows=1 loops=1)
Sort Key: part.p_partkey
Sort Method: top-N heapsort Memory: 25kB
-> Seq Scan on part (cost=0.00..68.00 rows=2000 width=8) (actual time=0.046..0.830 rows=2000 loops=1)
-> Gather (cost=1000.00..26630023.20 rows=1 width=27) (actual time=160571.003..160571.102 rows=0 loops=1)
Workers Planned: 1
Params Evaluated: $0
Workers Launched: 1
-> Nested Loop Left Join (cost=0.00..26629023.10 rows=1 width=27) (actual time=160549.257..160549.258 rows=0 loops=2)
Join Filter: ($0 IS NOT NULL)
Filter: ((sample_0.l_orderkey IS NULL) AND (sample_0.l_shipmode IS NULL))
Rows Removed by Filter: 1810515312
-> Parallel Seq Scan on lineitem ref_0 (cost=0.00..1721.97 rows=35397 width=11) (actual time=0.010..3.393 rows=30088 loops=2)
-> Materialize (cost=0.00..2270.62 rows=60175 width=27) (actual time=0.000..2.839 rows=60175 loops=60175)
-> Seq Scan on lineitem sample_0 (cost=0.00..1969.75 rows=60175 width=27) (actual time=0.008..11.381 rows=60175 loops=2)
Planning Time: 0.174 ms
Execution Time: 160571.476 ms
(20 rows)
tender wang <tndrwang@gmail.com> 于2023年9月5日周二 16:52写道:
Hi all,I recently run benchmark[1] on master, but I found performance problem as below:explain analyze select
subq_0.c0 as c0,
subq_0.c1 as c1,
subq_0.c2 as c2
from
(select
ref_0.l_shipmode as c0,
sample_0.l_orderkey as c1,
sample_0.l_quantity as c2,
ref_0.l_orderkey as c3,
sample_0.l_shipmode as c5,
ref_0.l_shipinstruct as c6
from
public.lineitem as ref_0
left join public.lineitem as sample_0
on ((select p_partkey from public.part order by p_partkey limit 1)
is not NULL)
where sample_0.l_orderkey is NULL) as subq_0
where subq_0.c5 is NULL
limit 1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=78.00..45267050.75 rows=1 width=27) (actual time=299695.097..299695.099 rows=0 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=78.00..78.00 rows=1 width=8) (actual time=0.651..0.652 rows=1 loops=1)
-> Sort (cost=78.00..83.00 rows=2000 width=8) (actual time=0.650..0.651 rows=1 loops=1)
Sort Key: part.p_partkey
Sort Method: top-N heapsort Memory: 25kB
-> Seq Scan on part (cost=0.00..68.00 rows=2000 width=8) (actual time=0.013..0.428 rows=2000 loops=1)
-> Nested Loop Left Join (cost=0.00..45266972.75 rows=1 width=27) (actual time=299695.096..299695.096 rows=0 loops=1)
Join Filter: ($0 IS NOT NULL)
Filter: ((sample_0.l_orderkey IS NULL) AND (sample_0.l_shipmode IS NULL))
Rows Removed by Filter: 3621030625
-> Seq Scan on lineitem ref_0 (cost=0.00..1969.75 rows=60175 width=11) (actual time=0.026..6.225 rows=60175 loops=1)
-> Materialize (cost=0.00..2270.62 rows=60175 width=27) (actual time=0.000..2.554 rows=60175 loops=60175)
-> Seq Scan on lineitem sample_0 (cost=0.00..1969.75 rows=60175 width=27) (actual time=0.004..8.169 rows=60175 loops=1)
Planning Time: 0.172 ms
Execution Time: 299695.501 ms
(16 rows)
After I set enable_material to off, the same query run faster, as below:
set enable_material = off;
explain analyze select
subq_0.c0 as c0,
subq_0.c1 as c1,
subq_0.c2 as c2
from
(select
ref_0.l_shipmode as c0,
sample_0.l_orderkey as c1,
sample_0.l_quantity as c2,
ref_0.l_orderkey as c3,
sample_0.l_shipmode as c5,
ref_0.l_shipinstruct as c6
from
public.lineitem as ref_0
left join public.lineitem as sample_0
on ((select p_partkey from public.part order by p_partkey limit 1)
is not NULL)
where sample_0.l_orderkey is NULL) as subq_0
where subq_0.c5 is NULL
limit 1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1078.00..91026185.57 rows=1 width=27) (actual time=192669.605..192670.425 rows=0 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=78.00..78.00 rows=1 width=8) (actual time=0.662..0.663 rows=1 loops=1)
-> Sort (cost=78.00..83.00 rows=2000 width=8) (actual time=0.661..0.662 rows=1 loops=1)
Sort Key: part.p_partkey
Sort Method: top-N heapsort Memory: 25kB
-> Seq Scan on part (cost=0.00..68.00 rows=2000 width=8) (actual time=0.017..0.430 rows=2000 loops=1)
-> Gather (cost=1000.00..91026107.57 rows=1 width=27) (actual time=192669.604..192670.422 rows=0 loops=1)
Workers Planned: 1
Params Evaluated: $0
Workers Launched: 1
-> Nested Loop Left Join (cost=0.00..91025107.47 rows=1 width=27) (actual time=192588.143..192588.144 rows=0 loops=2)
Join Filter: ($0 IS NOT NULL)
Filter: ((sample_0.l_orderkey IS NULL) AND (sample_0.l_shipmode IS NULL))
Rows Removed by Filter: 1810515312
-> Parallel Seq Scan on lineitem ref_0 (cost=0.00..1721.97 rows=35397 width=11) (actual time=0.007..3.797 rows=30088 loops=2)
-> Seq Scan on lineitem sample_0 (cost=0.00..1969.75 rows=60175 width=27) (actual time=0.000..2.637 rows=60175 loops=60175)
Planning Time: 0.174 ms
Execution Time: 192670.458 ms
(19 rows)I debug the code and find consider_parallel_nestloop() doesn't consider materialized form of the cheapest inner path.When enable_material = true, we can see Material path won in first plan, but Parallel Seq Scan node doesn't add as outer path, which becausein try_partial_nestloop_path() , the cost of nestloop wat computed using seq scan path not material path.[1] include test table schema and data, you can repeat above problem.I try fix this problem in attached patch, and I found pg12.12 also had this issue. Please review my patch, thanks!
On Tue, Sep 5, 2023 at 4:52 PM tender wang <tndrwang@gmail.com> wrote:
I recently run benchmark[1] on master, but I found performance problem as below:...I debug the code and find consider_parallel_nestloop() doesn't consider materialized form of the cheapest inner path.
Yeah, this seems an omission in commit 45be99f8. I reviewed the patch
and here are some comments.
* I think we should not consider materializing the cheapest inner path
if we're doing JOIN_UNIQUE_INNER, because in this case we have to
unique-ify the inner path.
* I think we can check if it'd be parallel safe before creating the
material path, thus avoid the creation in unsafe cases.
* I don't think the test case you added works for the code changes.
Maybe a plan likes below is better:
explain (costs off)
select * from tenk1, tenk2 where tenk1.two = tenk2.two;
QUERY PLAN
----------------------------------------------
Gather
Workers Planned: 4
-> Nested Loop
Join Filter: (tenk1.two = tenk2.two)
-> Parallel Seq Scan on tenk1
-> Materialize
-> Seq Scan on tenk2
(7 rows)
Thanks
Richard
and here are some comments.
* I think we should not consider materializing the cheapest inner path
if we're doing JOIN_UNIQUE_INNER, because in this case we have to
unique-ify the inner path.
* I think we can check if it'd be parallel safe before creating the
material path, thus avoid the creation in unsafe cases.
* I don't think the test case you added works for the code changes.
Maybe a plan likes below is better:
explain (costs off)
select * from tenk1, tenk2 where tenk1.two = tenk2.two;
QUERY PLAN
----------------------------------------------
Gather
Workers Planned: 4
-> Nested Loop
Join Filter: (tenk1.two = tenk2.two)
-> Parallel Seq Scan on tenk1
-> Materialize
-> Seq Scan on tenk2
(7 rows)
Thanks
Richard
Richard Guo <guofenglinux@gmail.com> 于2023年9月5日周二 18:51写道:
On Tue, Sep 5, 2023 at 4:52 PM tender wang <tndrwang@gmail.com> wrote:I recently run benchmark[1] on master, but I found performance problem as below:...I debug the code and find consider_parallel_nestloop() doesn't consider materialized form of the cheapest inner path.Yeah, this seems an omission in commit 45be99f8. I reviewed the patch
and here are some comments.
* I think we should not consider materializing the cheapest inner path
if we're doing JOIN_UNIQUE_INNER, because in this case we have to
unique-ify the inner path.
That's right. The V2 patch has been fixed.
* I think we can check if it'd be parallel safe before creating the
material path, thus avoid the creation in unsafe cases.
Agreed.
* I don't think the test case you added works for the code changes.
Maybe a plan likes below is better:
Agreed.
explain (costs off)
select * from tenk1, tenk2 where tenk1.two = tenk2.two;
QUERY PLAN
----------------------------------------------
Gather
Workers Planned: 4
-> Nested Loop
Join Filter: (tenk1.two = tenk2.two)
-> Parallel Seq Scan on tenk1
-> Materialize
-> Seq Scan on tenk2
(7 rows)
Thanks
Richard
Вложения
On Tue, Sep 5, 2023 at 8:07 AM Richard Guo <guofenglinux@gmail.com> wrote: > Yeah, this seems an omission in commit 45be99f8. It's been a while, but I think I omitted this deliberately because I didn't really understand the value of it and wanted to keep the planning cost down. The example query provided here seems rather artificial. Surely few people write a join clause that references neither of the tables being joined. Is there a more realistic case where this makes a big difference? -- Robert Haas EDB: http://www.enterprisedb.com
On Fri, Sep 8, 2023 at 3:15 AM Robert Haas <robertmhaas@gmail.com> wrote:
The example query provided here seems rather artificial. Surely few
people write a join clause that references neither of the tables being
joined. Is there a more realistic case where this makes a big
difference?
Yes the given example query is not that convincing. I tried a query
with plans as below (after some GUC setting) which might be more
realistic in real world.
unpatched:
explain select * from partsupp join lineitem on l_partkey > ps_partkey;
QUERY PLAN
--------------------------------------------------------------------------------------
Gather (cost=0.00..5522666.44 rows=160466667 width=301)
Workers Planned: 4
-> Nested Loop (cost=0.00..5522666.44 rows=40116667 width=301)
Join Filter: (lineitem.l_partkey > partsupp.ps_partkey)
-> Parallel Seq Scan on lineitem (cost=0.00..1518.44 rows=15044 width=144)
-> Seq Scan on partsupp (cost=0.00..267.00 rows=8000 width=157)
(6 rows)
patched:
explain select * from partsupp join lineitem on l_partkey > ps_partkey;
QUERY PLAN
--------------------------------------------------------------------------------------
Gather (cost=0.00..1807085.44 rows=160466667 width=301)
Workers Planned: 4
-> Nested Loop (cost=0.00..1807085.44 rows=40116667 width=301)
Join Filter: (lineitem.l_partkey > partsupp.ps_partkey)
-> Parallel Seq Scan on lineitem (cost=0.00..1518.44 rows=15044 width=144)
-> Materialize (cost=0.00..307.00 rows=8000 width=157)
-> Seq Scan on partsupp (cost=0.00..267.00 rows=8000 width=157)
(7 rows)
The execution time (ms) are (avg of 3 runs):
unpatched: 71769.21
patched: 65510.04
So we can see some (~9%) performance gains in this case.
Thanks
Richard
with plans as below (after some GUC setting) which might be more
realistic in real world.
unpatched:
explain select * from partsupp join lineitem on l_partkey > ps_partkey;
QUERY PLAN
--------------------------------------------------------------------------------------
Gather (cost=0.00..5522666.44 rows=160466667 width=301)
Workers Planned: 4
-> Nested Loop (cost=0.00..5522666.44 rows=40116667 width=301)
Join Filter: (lineitem.l_partkey > partsupp.ps_partkey)
-> Parallel Seq Scan on lineitem (cost=0.00..1518.44 rows=15044 width=144)
-> Seq Scan on partsupp (cost=0.00..267.00 rows=8000 width=157)
(6 rows)
patched:
explain select * from partsupp join lineitem on l_partkey > ps_partkey;
QUERY PLAN
--------------------------------------------------------------------------------------
Gather (cost=0.00..1807085.44 rows=160466667 width=301)
Workers Planned: 4
-> Nested Loop (cost=0.00..1807085.44 rows=40116667 width=301)
Join Filter: (lineitem.l_partkey > partsupp.ps_partkey)
-> Parallel Seq Scan on lineitem (cost=0.00..1518.44 rows=15044 width=144)
-> Materialize (cost=0.00..307.00 rows=8000 width=157)
-> Seq Scan on partsupp (cost=0.00..267.00 rows=8000 width=157)
(7 rows)
The execution time (ms) are (avg of 3 runs):
unpatched: 71769.21
patched: 65510.04
So we can see some (~9%) performance gains in this case.
Thanks
Richard
Hi tom,
Do you have any comments or suggestions on this issue? Thanks.
Richard Guo <guofenglinux@gmail.com> 于2023年9月8日周五 14:06写道:
On Fri, Sep 8, 2023 at 3:15 AM Robert Haas <robertmhaas@gmail.com> wrote:The example query provided here seems rather artificial. Surely few
people write a join clause that references neither of the tables being
joined. Is there a more realistic case where this makes a big
difference?Yes the given example query is not that convincing. I tried a query
with plans as below (after some GUC setting) which might be more
realistic in real world.
unpatched:
explain select * from partsupp join lineitem on l_partkey > ps_partkey;
QUERY PLAN
--------------------------------------------------------------------------------------
Gather (cost=0.00..5522666.44 rows=160466667 width=301)
Workers Planned: 4
-> Nested Loop (cost=0.00..5522666.44 rows=40116667 width=301)
Join Filter: (lineitem.l_partkey > partsupp.ps_partkey)
-> Parallel Seq Scan on lineitem (cost=0.00..1518.44 rows=15044 width=144)
-> Seq Scan on partsupp (cost=0.00..267.00 rows=8000 width=157)
(6 rows)
patched:
explain select * from partsupp join lineitem on l_partkey > ps_partkey;
QUERY PLAN
--------------------------------------------------------------------------------------
Gather (cost=0.00..1807085.44 rows=160466667 width=301)
Workers Planned: 4
-> Nested Loop (cost=0.00..1807085.44 rows=40116667 width=301)
Join Filter: (lineitem.l_partkey > partsupp.ps_partkey)
-> Parallel Seq Scan on lineitem (cost=0.00..1518.44 rows=15044 width=144)
-> Materialize (cost=0.00..307.00 rows=8000 width=157)
-> Seq Scan on partsupp (cost=0.00..267.00 rows=8000 width=157)
(7 rows)
The execution time (ms) are (avg of 3 runs):
unpatched: 71769.21
patched: 65510.04
So we can see some (~9%) performance gains in this case.
Thanks
Richard
On Fri, 8 Sept 2023 at 09:41, Robert Haas <robertmhaas@gmail.com> wrote: > > On Tue, Sep 5, 2023 at 8:07 AM Richard Guo <guofenglinux@gmail.com> wrote: > > Yeah, this seems an omission in commit 45be99f8. > > It's been a while, but I think I omitted this deliberately because I > didn't really understand the value of it and wanted to keep the > planning cost down. I think the value is potentially not having to repeatedly execute some expensive rescan to the nested loop join once for each outer-side tuple. The planning cost is something to consider for sure, but it seems strange that we deemed it worthy to consider material paths for the non-parallel input paths but draw the line for the parallel/partial ones. It seems to me that the additional costs and the possible benefits are the same for both. David
On Fri, 8 Sept 2023 at 19:14, Richard Guo <guofenglinux@gmail.com> wrote: > explain select * from partsupp join lineitem on l_partkey > ps_partkey; > QUERY PLAN > -------------------------------------------------------------------------------------- > Gather (cost=0.00..1807085.44 rows=160466667 width=301) > Workers Planned: 4 > -> Nested Loop (cost=0.00..1807085.44 rows=40116667 width=301) > Join Filter: (lineitem.l_partkey > partsupp.ps_partkey) > -> Parallel Seq Scan on lineitem (cost=0.00..1518.44 rows=15044 width=144) > -> Materialize (cost=0.00..307.00 rows=8000 width=157) > -> Seq Scan on partsupp (cost=0.00..267.00 rows=8000 width=157) > (7 rows) > > The execution time (ms) are (avg of 3 runs): > > unpatched: 71769.21 > patched: 65510.04 This gap would be wider if the partsupp Seq Scan were filtering off some rows and wider still if you added more rows to lineitem. However, a clauseless seqscan is not the most compelling use case below a material node. The inner side of the nested loop could be some subquery that takes 6 days to complete. Running the 6 day query ~15044 times seems like something that would be good to avoid. It seems worth considering Material paths to me. I think that the above example could be tuned any way you like to make it look better or worse. David
Re: Should consider materializing the cheapest inner path in consider_parallel_nestloop()
От
Alena Rybakina
Дата:
Hi!
Thank you for your work on the subject.
I reviewed your patch and found that your commit message does not fully explain your code, in addition, I found several spelling mistakes.
I think it's better to change to:With parallel seqscan, we should consider materializing the cheapest inner path in
case of nested loop if it doesn't contain a unique node or it is unsafe to use it in a subquery.
Besides, I couldn't understand why we again check that material path is safe?
if (matpath != NULL && matpath->parallel_safe)
try_partial_nestloop_path(root, joinrel, outerpath, matpath,
pathkeys, jointype, extra);
-- Regards, Alena Rybakina
Re: Should consider materializing the cheapest inner path in consider_parallel_nestloop()
От
"Andrey M. Borodin"
Дата:
> On 27 Sep 2023, at 16:06, tender wang <tndrwang@gmail.com> wrote: > > Do you have any comments or suggestions on this issue? Thanks. Hi Tender, there are some review comments in the thread, that you might be interested in. I'll mark this [0] entry "Waiting on Author" and move to next CF. Thanks! Best regards, Andrey Borodin. [0]https://commitfest.postgresql.org/47/4549/
Andrey M. Borodin <x4mmm@yandex-team.ru> 于2024年4月8日周一 17:40写道:
> On 27 Sep 2023, at 16:06, tender wang <tndrwang@gmail.com> wrote:
>
> Do you have any comments or suggestions on this issue? Thanks.
Hi Tender,
there are some review comments in the thread, that you might be interested in.
I'll mark this [0] entry "Waiting on Author" and move to next CF.
Thank you for the reminder. I will update the patch later.
I also deeply hope to get more advice about this patch.
(even the advice that not worth continuint to work on this patch).
Thanks.
Thanks!
Best regards, Andrey Borodin.
[0]https://commitfest.postgresql.org/47/4549/
Tender Wang
OpenPie: https://en.openpie.com/Andrey M. Borodin <x4mmm@yandex-team.ru> 于2024年4月8日周一 17:40写道:
> On 27 Sep 2023, at 16:06, tender wang <tndrwang@gmail.com> wrote:
>
> Do you have any comments or suggestions on this issue? Thanks.
Hi Tender,
there are some review comments in the thread, that you might be interested in.
I'll mark this [0] entry "Waiting on Author" and move to next CF.
Thanks!
Best regards, Andrey Borodin.
[0]https://commitfest.postgresql.org/47/4549/
I have rebased master and fixed a plan diff case.
-- Tender Wang
OpenPie: https://en.openpie.com/