Re: ordering of 'where' sub clauses
От | Tom Lane |
---|---|
Тема | Re: ordering of 'where' sub clauses |
Дата | |
Msg-id | 18188.963935438@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: ordering of 'where' sub clauses (Steve Heaven <steve@thornet.co.uk>) |
Список | pgsql-general |
Steve Heaven <steve@thornet.co.uk> writes: > explain select m.* from main m, subset_table s > where m.stockno=s.stockno AND m.descrip ~ 'SEARCHTERM'; > NOTICE: QUERY PLAN: > Hash Join (cost=118431.88 rows=714797 width=172) > -> Seq Scan on main m (cost=79300.27 rows=714796 width=160) > -> Hash (cost=8121.17 rows=203793 width=12) > -> Seq Scan on subset_table s (cost=8121.17 rows=203793 width=12) Given the sizes of the tables, I'm not sure that's such an unreasonable plan. Your subset seems to be more than a quarter of the rows in the main table. The only way to build a plan that visits *only* those rows in main that are also in subset is to use an indexscan on main, right? The cold hard fact is that an indexscan that touches a quarter of the rows in a large table is going to be slower than a linear scan of the whole table. (Barring special cases such as a nearly-in-order table, anyway.) The cost of extra disk I/O will swamp out the extra comparisons. So I think the planner is doing the right thing to use a seqscan on main, and then its only interesting choice is how to perform the join with subset. The above looks pretty good, bearing in mind again that an indexscan that has to touch every row in subset will be mighty slow. My advice is to forget about the subset table; it's not buying you anything that's worth the trouble of doing the join. If your original subset condition is really expensive, you could run it and store the result in a field added to the main table --- otherwise, why not just evaluate it on the fly? In any case, retrieving a subset that represents more than a few percent of the total number of rows is likely best done by a sequential scan, unsexy as that may sound. BTW, it would be interesting to see if the 7.0.* planner produces a materially different plan than the pre-7.0 version I observe you are using. Its cost estimates will be different, but I expect it will arrive at the same conclusion that indexscan is not the way to do this query. regards, tom lane
В списке pgsql-general по дате отправления: