Обсуждение: [BUGS] BUG #14646: performance hint to remove
The following bug has been logged on the website: Bug reference: 14646 Logged by: Boshomi Email address: boshomi@gmail.com PostgreSQL version: 9.6.2 Operating system: Linux opensuse Description: the order of tables in from clause or where clause can result in different query plans, with different execution time. Begin; --DROP TABLE testjoinperf.tempsmall CASCADE; --DROP TABLE testjoinperf.tempbig; /* CREATE SCHEMA testjoinperf; CREATE TABLE testjoinperf.tempsmall ( idsmall serial NOT NULL, smallrnd double precision, CONSTRAINT pk_tempsamll PRIMARY KEY (idsmall) ); CREATE TABLE testjoinperf.tempbig ( idbig serial NOT NULL, idsmall integer, bigrnd double precision, CONSTRAINT pk_tempbig PRIMARY KEY (idbig), CONSTRAINTfk_tempsmall FOREIGN KEY (idsmall) REFERENCES testjoinperf.tempsmall (idsmall) MATCH SIMPLE ON UPDATE NOACTION ON DELETE NO ACTION ); insert into testjoinperf.tempsmall (idsmall, smallrnd) select a.a, random() from generate_series(1,50000) as a; insert into testjoinperf.tempbig (idbig, idsmall, bigrnd) select b.b, (b.b % 50000)+1 , random() from generate_series(1,1000000) as b;analyze testjoinperf.tempsmall;analyze testjoinperf.tempbig; analyze; */ /* --fast query big table left explain analyze select * from testjoinperf.tempbig join testjoinperf.tempsmall using(idsmall) where idsmall between 48000 and 49000 limit 100; */ /* -- slow query, small table left. explain analyze select * from testjoinperf.tempsmall join testjoinperf.tempbig using(idsmall) where idsmall between 48000 and 49000 limit 100; */ rollback; /* faster query QUERY PLAN Limit (cost=0.29..137.92 rows=100 width=24) (actual time=6.589..6.896 rows=100 loops=1) -> Nested Loop (cost=0.29..28656.67 rows=20821 width=24) (actual time=6.588..6.880 rows=100 loops=1) -> Seq Scan on tempbig (cost=0.00..20406.00 rows=20821 width=16) (actual time=6.576..6.609 rows=100 loops=1) Filter: ((idsmall >= 48000) AND (idsmall <= 49000)) RowsRemoved by Filter: 47998 -> Index Scan using pk_tempsamll on tempsmall (cost=0.29..0.39 rows=1 width=12) (actual time=0.002..0.002 rows=1 loops=100) Index Cond: (idsmall = tempbig.idsmall) Planning time: 0.272 ms Execution time: 6.935 ms slower query QUERY PLAN Limit (cost=52.46..153.33 rows=100 width=24) (actual time=13.063..13.123 rows=100 loops=1) -> Hash Join (cost=52.46..19399.50 rows=19180 width=24) (actual time=13.062..13.108 rows=100 loops=1) Hash Cond: (tempbig.idsmall = tempsmall.idsmall) -> Seq Scan on tempbig (cost=0.00..15406.00 rows=1000000 width=16) (actual time=0.011..5.361 rows=48098 loops=1) -> Hash (cost=40.47..40.47 rows=959 width=12) (actual time=0.406..0.406 rows=1001 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 55kB -> Index Scanusing pk_tempsamll on tempsmall (cost=0.29..40.47 rows=959 width=12) (actual time=0.019..0.239 rows=1001 loops=1) Index Cond: ((idsmall >= 48000) AND (idsmall <= 49000)) Planning time: 0.277 ms Execution time: 13.166 ms the query are eqal. it also makes a differnence if the where clause is where tempbig.idsmall between 48000 and 49000 /* fast */ or where tempsmall.idsmall between 48000 and 49000 /* slow */ to change the order of tables is a performance hint, and should be removed. (Boshomi is my username on german wikipedia (de.wikipedia.org), feel free to contact me for any question. -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
The following bug has been logged on the website:
Bug reference: 14646
Logged by: Boshomi
Email address: boshomi@gmail.com
PostgreSQL version: 9.6.2
Operating system: Linux opensuse
Description:
the order of tables in from clause or where clause can result in different
query plans, with different execution time.
[...]
--fast query big table left
explain analyze select * from testjoinperf.tempbig join
testjoinperf.tempsmall using(idsmall)
where idsmall between 48000 and 49000
-- slow query, small table left.
explain analyze select * from testjoinperf.tempsmall join
testjoinperf.tempbig using(idsmall)
where idsmall between 48000 and 49000
Confirmed 9.6.2 on Ubuntu. Not sure this is properly classified as a bug but its definitely an area where improvement would seem desirable. I am a particularly heavy user of the join predicate "USING" and never really thought to look at this dynamic (without the WHERE clause it doesn't matter, both tables up end sequentially scanned).
Someone more knowledgeable than myself will need to comment on the technical aspects as to why "where idsmall" seems to be linked to the left relation instead of the one with a more favorable execution plan.
Boshomi, how did you stumble across this anyway - just with artificial data or did you come up with that after hitting the problem with real data?
David J.
"David G. Johnston" <david.g.johnston@gmail.com> writes: > Confirmed 9.6.2 on Ubuntu. Not sure this is properly classified as a bug > but its definitely an area where improvement would seem desirable. I am a > particularly heavy user of the join predicate "USING" and never really > thought to look at this dynamic (without the WHERE clause it doesn't > matter, both tables up end sequentially scanned). Yeah, if you use USING the parser tries to define the merged join variable as equivalent to just one or the other input variable. With an inner join it'll arbitrarily pick the left input, with a left or right join it will use the outer-side variable (since the inner-side one might go to NULL). With a FULL join, the merged variable is actually defined as COALESCE(left-input, right-input), which is necessary for semantic correctness but is pretty awful for optimization purposes. If you put an equality constraint on the merged join variable, everything works pretty well anyway because of deduction of implied equalities (ie, we have left-input = right-input from the USING's join clause, and then if we also know e.g. left-input = 42 we can deduce right-input = 42). There are some hacks that make that carry through for outer join cases as well. The OP's WHERE clause isn't a simple equality so the equivalence-class machinery doesn't help there. There's been some speculation about improving that, but there are a bunch of semantic and practical problems in the way. One of the biggest is that we can't simply duplicate the restriction for each input variable, because that would cause the planner to misestimate their selectivity and produce bad plans of a different sort (not knowing that the restrictions are redundant). The eclass machinery knows enough to compensate for that, but it's not clear how to do it for arbitrary restrictions. Anyway, this isn't a bug but a missed optimization opportunity, and I would counsel not holding your breath waiting for an improvement. It's a difficult problem and it's not tremendously high priority. regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs