Обсуждение: explicit joins wrong planning
Hi
SELECT version();
PostgreSQL 7.3.4 on i386-pc-linux-gnu, compiled by GCC i386-linux-gcc
(GCC) 3.3.2 20031005 (Debian prerelease)
Let's say I have 3 tables:
groups ( groupid integer primary key, name varchar, begindate date
);
offsets ( offset_id integer, groupid integer references groups, offset_value integer
);
events ( offset_id integer references offsets, event_date date, primary key (offset_id,event_date)
);
explain analyze select *
from groups g join offsets o using (groupid) join events e on (e.offsetid=o.offset_id and
e.event_date=g.begindate+o.offset_value)
where g.name='some_name';
Postgres doesn't use join on these both fields and doesn't use index
scan properly.
I get: Hash Cond: ("outer".offset_id = "inner".offset_id) Join Filter: ("outer".event_date = ("inner".begindate +
"inner".offset_value))
Why?
I lost few hours trying to fix it and I found, that copying one of these
conditions into where clause solved my problem:
explain analyze select *
from groups g join offsets o using (groupid) join events e on (e.offsetid=o.offset_id and
e.event_date=g.begindate+o.offset_value)
where g.name='some_name' and e.offsetid=o.offset_id;
Join Filter: ("outer".event_date = ("inner".begindate +
"inner".offset_value)) Nested Loop... Join Filter: ("outer".offset_id = "inner".offset_id)
Why? What was I doing wrong?
Regards,
Tomasz Myrta
Tomasz Myrta <jasiek@klaster.net> writes:
> Postgres doesn't use join on these both fields and doesn't use index
> scan properly.
Hard to say much when you didn't actually show us the output of EXPLAIN
ANALYZE.
regards, tom lane
Dnia 2003-11-28 20:18, Użytkownik Tom Lane napisał:
> Hard to say much when you didn't actually show us the output of EXPLAIN
> ANALYZE.
OK, Here you are:
explain analyze select *
from plany pl join linia_trasy lt using (id_linii) join kursy k on (k.id_trasy=lt.id_trasy and
k.event_date=pl.begindate+lt.offset)
where pl.id_planu=508;
event_date - date
begindate - date
id_linii - integer
id_trasy - integer
offset - integer
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
HashJoin (cost=5.82..1210.68 rows=2 width=40) (actual
time=718.11..759.27 rows=5 loops=1) Hash Cond: ("outer".id_trasy = "inner".id_trasy) Join Filter:
("outer".event_date= ("inner".begindate + "inner".offset)) -> Seq Scan on kursy k (cost=0.00..876.07 rows=58707
width=11)
(actual time=0.02..547.65 rows=58707 loops=1) -> Hash (cost=5.81..5.81 rows=7 width=29) (actual time=0.43..0.43
rows=0 loops=1) -> Nested Loop (cost=0.00..5.81 rows=7 width=29) (actual
time=0.16..0.37 rows=7 loops=1) Join Filter: ("outer".id_linii = "inner".id_linii) ->
IndexScan using plany_pkey on plany pl
(cost=0.00..4.49 rows=1 width=17) (actual time=0.09..0.11 rows=1 loops=1) Index Cond: (id_planu =
508) -> Seq Scan on linia_trasy lt (cost=0.00..1.14 rows=14
width=12) (actual time=0.02..0.12 rows=14 loops=1)
If I add "and k.id_trasy=lt.id_trasy" into where clause (duplicate), the
query works fine.
Regards,
Tomasz Myrta
Tomasz Myrta <jasiek@klaster.net> writes:
> If I add "and k.id_trasy=lt.id_trasy" into where clause (duplicate), the
> query works fine.
Define "works fine", please (again, EXPLAIN ANALYZE would be a nice
concrete description).
regards, tom lane
Dnia 2003-11-28 20:52, Użytkownik Tom Lane napisał:
> Tomasz Myrta <jasiek@klaster.net> writes:
>
>>If I add "and k.id_trasy=lt.id_trasy" into where clause (duplicate), the
>>query works fine.
>
>
> Define "works fine", please (again, EXPLAIN ANALYZE would be a nice
> concrete description).
Sorry, I thought the wrong case would be enough.
In first case (the worse one) I had: Total runtime: 678.31 msec
After my changes I got:
explain analyze select *
from plany pl join linia_trasy lt using (id_linii) join kursy k on (k.event_date=pl.begindate+lt.offset and
k.id_trasy=lt.id_trasy)
where pl.id_planu=508
and k.id_trasy=lt.id_trasy;
^^^^^^^^^^^^^^^^^^^^^^^^^^^ QUERY PLAN
-------------------------------------------------------- Nested Loop (cost=0.00..94.43 rows=2 width=40) (actual
time=2.97..77.55 rows=5 loops=1) Join Filter: ("inner".event_date = ("outer".begindate + "outer".offset)) ->
NestedLoop (cost=0.00..5.81 rows=7 width=29) (actual
time=0.15..0.41 rows=7 loops=1) Join Filter: ("outer".id_linii = "inner".id_linii) -> Index Scan using
plany_pkeyon plany pl (cost=0.00..4.49
rows=1 width=17) (actual time=0.09..0.10 rows=1 loops=1) Index Cond: (id_planu = 508) -> Seq
Scanon linia_trasy lt (cost=0.00..1.14 rows=14
width=12) (actual time=0.02..0.15 rows=14 loops=1) -> Index Scan using kursy_pkey on kursy k (cost=0.00..7.62
rows=288 width=11) (actual time=0.05..8.01 rows=533 loops=7) Index Cond: ((k.id_trasy = "outer".id_trasy) AND
(k.id_trasy=
"outer".id_trasy)) Total runtime: 78.01 msec
It's much better now (10x faster), but I've just found this plan still
isn't as I want to have. I wish I could have index usage on both fields,
it means:
Index Cond: ((k.id_trasy = "outer".id_trasy) AND (("inner".event_date =
("outer".begindate + "outer".offset)
Regards,
Tomasz Myrta
Tomasz Myrta <jasiek@klaster.net> writes:
> It's much better now (10x faster), but I've just found this plan still
> isn't as I want to have. I wish I could have index usage on both fields,
FWIW, I see this plan in 7.4 using your original test case:
Nested Loop (cost=22.51..71.79 rows=1 width=56) -> Hash Join (cost=22.51..47.56 rows=5 width=48) Hash Cond:
("outer".groupid= "inner".groupid) -> Seq Scan on offsets o (cost=0.00..20.00 rows=1000 width=12) ->
Hash (cost=22.50..22.50 rows=5 width=40) -> Seq Scan on groups g (cost=0.00..22.50 rows=5 width=40)
Filter: ((name)::text = 'some_name'::text) -> Index Scan using events_pkey on events e (cost=0.00..4.83
rows=1width=8) Index Cond: ((e.offset_id = "outer".offset_id) AND (e.event_date = ("outer".begindate +
"outer".offset_value)))
I believe the difficulty in 7.3 is because begindate and offset_value
come from different relations, and specifically from different relations
than the first index condition uses. This was fixed here:
2002-11-24 16:52 tgl
* src/: backend/nodes/copyfuncs.c, backend/nodes/equalfuncs.c,backend/nodes/list.c,
backend/nodes/outfuncs.c,backend/nodes/readfuncs.c,
backend/optimizer/path/indxpath.c,backend/optimizer/path/joinpath.c,backend/optimizer/path/orindxpath.c,backend/optimizer/path/tidpath.c,backend/optimizer/plan/initsplan.c,backend/optimizer/util/pathnode.c,backend/optimizer/util/plancat.c,
backend/optimizer/util/relnode.c,backend/optimizer/util/restrictinfo.c,include/nodes/nodes.h,include/nodes/pg_list.h,
include/nodes/relation.h,include/optimizer/paths.h,include/optimizer/restrictinfo.h:Restructure planning of nestloop
innerindexscans so that the setof usable joinclauses is determined accurately for each join. Formerly, the code only
consideredjoinclauses that used all of therels from the outer side of the join; thus for example FROM (aCROSS
JOINb) JOIN c ON (c.f1 = a.x AND c.f2 = b.y) could notexploit a two-column index on c(f1,f2), since neither of the
qualclauseswould be in the joininfo list it looked in. The new codedoes this correctly, and also is able to eliminate
redundantclauses,thus fixing the problem noted 24-Oct-02 by Hans-J�rgenSch�nig.
regards, tom lane
Dnia 2003-11-28 22:57, Użytkownik Tom Lane napisał: > Restructure planning of nestloop inner indexscans so that the set > of usable joinclauses is determined accurately for each join. > Formerly, the code only considered joinclauses that used all of the > rels from the outer side of the join; thus for example FROM (a > CROSS JOIN b) JOIN c ON (c.f1 = a.x AND c.f2 = b.y) could not > exploit a two-column index on c(f1,f2), since neither of the qual > clauses would be in the joininfo list it looked in. The new code > does this correctly, and also is able to eliminate redundant > clauses, thus fixing the problem noted 24-Oct-02 by Hans-Jürgen > Schönig. Yes, this is a description of my problem. Thanks a lot. Regards, Tomasz Myrta