Hashjoin startup strategy (was Re: Getting different number of results when using hashjoin on/off)
От | Tom Lane |
---|---|
Тема | Hashjoin startup strategy (was Re: Getting different number of results when using hashjoin on/off) |
Дата | |
Msg-id | 13031.1133203192@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: Getting different number of results when using hashjoin on/off ("Mario Weilguni" <mario.weilguni@icomedias.com>) |
Ответы |
Re: Hashjoin startup strategy (was Re: Getting different number of results when using hashjoin on/off)
Re: Hashjoin startup strategy (was Re: Getting different number of results when using hashjoin on/off) |
Список | pgsql-hackers |
"Mario Weilguni" <mario.weilguni@icomedias.com> writes: > Thanks for the quick response, I've tried the patch, but it did not work > as expected. When I set enable_hashjoin to off, everything works as > expected, but with hashjoin on I do not even get results anymore, CPU is > going up to 100% and after 3 minutes I cancelled the query (it normale > would take ~100-500 milliseconds). Try letting it run longer. I think your expectation is tuned for the broken implementation (which runs the subqueries only once instead of 26k times...) The test case I developed for this failure in the regression database is select count(*) from tenk1 a where exists (select 1 from tenk1 b, tenk1 c where b.unique1=c.unique2 and b.hundred in (4,5) andc.hundred=a.hundred+99); 8.0 prefers a nestloop for the subquery, and that plan runs in about 600 ms on my machine. If forced to a hash join, it takes about 2450 ms. 8.1 prefers the hash join to start with, but takes 11300 ms to run it :-( (after the patch that is). The reason for the differential is that 8.1 guesses wrong about which subplan to cycle first: most of the time, the inner plan is empty and so there's no need to pull any rows from the outer plan, but 8.1 pulls the first row from the outer plan anyway, and doing that 10000 times is what's eating the extra runtime. It looks from your previous message that similar things are happening with your data distribution, allowing 8.0 to run faster for you than 8.1 does. Not sure if there's much we can do about this. The presence of the upper-query parameter in the subplan makes it difficult to derive any stats at all, let alone guess how often the subplan will be completely empty, so I'm not sure the planner can help. For a query like this, where the hash join is being done repeatedly, it might be useful for the executor itself to track how often each subplan has been seen to be empty. In particular, the executor knows that the outer subplan is parameterless and therefore should deliver the same results each time (modulo volatile functions of course), so after the first cycle it could know that there's no point in trying the early fetch on that side. Dunno if this will be of wide enough use to be worth implementing though --- in simple cases the join won't be rescanned and so the executor can't help. Anyone have any other ideas? regards, tom lane
В списке pgsql-hackers по дате отправления: