Re: limit in subquery causes poor selectivity estimation
От | Peter Eisentraut |
---|---|
Тема | Re: limit in subquery causes poor selectivity estimation |
Дата | |
Msg-id | 1314786124.27073.11.camel@fsopti579.F-Secure.com обсуждение исходный текст |
Ответ на | Re: limit in subquery causes poor selectivity estimation (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: limit in subquery causes poor selectivity estimation
(Robert Haas <robertmhaas@gmail.com>)
Re: limit in subquery causes poor selectivity estimation (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-hackers |
On lör, 2011-08-27 at 13:32 -0400, Tom Lane wrote: > > EXPLAIN SELECT * FROM test1 WHERE sha1 in (SELECT sha1 FROM test2 > LIMIT 200); > > > Here, however, it has apparently not passed this knowledge through > the > > LIMIT. > > The LIMIT prevents the subquery from being flattened entirely, ie we > don't have just "test1 SEMI JOIN test2" but "test1 SEMI JOIN (SELECT * > FROM test2 LIMIT 200)". If you look at examine_variable in selfuncs.c > you'll note that it punts for Vars coming from unflattened subqueries. > > > So what's up with that? Just a case of, we haven't thought about > > covering this case yet, or are there larger problems? > > The larger problem is that if a subquery didn't get flattened, it's > often because it's got LIMIT, or GROUP BY, or some similar clause that > makes it highly suspect whether the statistics available for the table > column are reasonable to use for the subquery outputs. It wouldn't be > that hard to grab the stats for test2.sha1, but then how do you want > to adjust them to reflect the LIMIT? It turns out that this is a regression introduced in 8.4.8; the same topic is also being discussed in http://archives.postgresql.org/pgsql-performance/2011-08/msg00248.php and http://archives.postgresql.org/pgsql-general/2011-08/msg00995.php This is the (previously posted) plan with 8.4.8: QUERY PLAN ----------------------------------------------------------------------------------Hash Join (cost=10.60..34.35 rows=500width=31) Hash Cond: (test1.sha1 = test2.sha1) -> Seq Scan on test1 (cost=0.00..18.00 rows=1000 width=31) -> Hash (cost=8.10..8.10 rows=200 width=32) -> HashAggregate (cost=6.10..8.10 rows=200 width=32) -> Limit (cost=0.00..3.60 rows=200 width=21) -> Seq Scan on test2 (cost=0.00..18.01 rows=1001 width=21) And this is the plan with 8.4.7: QUERY PLAN ----------------------------------------------------------------------------------Hash Join (cost=10.80..34.55 rows=200width=31) Hash Cond: (test1.sha1 = test2.sha1) -> Seq Scan on test1 (cost=0.00..18.00 rows=1000 width=31) -> Hash (cost=8.30..8.30 rows=200 width=32) -> HashAggregate (cost=6.30..8.30 rows=200 width=32) -> Limit (cost=0.00..3.80 rows=200 width=21) -> Seq Scan on test2 (cost=0.00..19.01 rows=1001 width=21) I liked the old one better. ;-)
В списке pgsql-hackers по дате отправления:
Предыдущее
От: "Albe Laurenz"Дата:
Сообщение: Re: postgesql-9.0.4 compile on AIX 6.1 using gcc 4.4.6