Re: planner not choosing fastest estimated plan
От | Tom Lane |
---|---|
Тема | Re: planner not choosing fastest estimated plan |
Дата | |
Msg-id | 17167.1373213504@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | planner not choosing fastest estimated plan (Jeff Janes <jeff.janes@gmail.com>) |
Список | pgsql-hackers |
Jeff Janes <jeff.janes@gmail.com> writes: > I have a weird case where the planner doesn't choose the plan that it > itself believes to be the fastest plan. I poked into this a bit and found where things are going wrong. The ideal plan for this case involves a nestloop with a parameterized inner scan over an inheritance tree (an append relation). The Path for that scan has to be built by set_append_rel_pathlist(), around lines 810-860 of allpaths.c in HEAD. And what that code is doing is taking the cheapest path for each appendrel member, then "reparameterizing" it if necessary, which means modifying it to include enforcement of any available parameterized join quals that it wasn't using already. In this case what we've got for each child is a simple seqscan path (enforcing no quals) and a bitmap indexscan path that uses the parameterized joinqual perlupper(a.val1)=perlupper(b.val1). Ordinarily the bitmap path would be the cheapest and everything would be fine. However, Jeff's example assigns a very high cost to the UDF, and the bitmap scan's cost estimate includes (correctly) one evaluation of the UDF. With a high enough UDF cost, the bare seqscan is cheaper, and so it gets picked. Now, once we reparameterize the seqscan, which in this case amounts to having it evaluate the parameterized joinqual at every row, it's hugely expensive; so we end up with a very expensive parameterized append path that isn't going to look attractive when join paths are made, and the planner picks a hash or merge join instead. So in short, the error lies in assuming that the cheapest plain path will still be the cheapest one after reparameterization. I think that I did recognize that as a risk when I wrote this code, but supposed that the possible addition of extra quals to evaluate wouldn't change the outcome. Jeff's example shows that actually the added quals can make a huge difference. The simplest correct logic would involve running through all the available paths for the child relation, reparameterizing each one, and only then comparing their costs. That sounds a bit expensive though. What we could do to ameliorate the planning cost is to still search for the overall-cheapest child path, and if it is already parameterized the way we want (which will frequently be the case, I think), then we're done. Only if it requires reparameterization do we have to go through the more complicated process. Thoughts, better ideas? regards, tom lane
В списке pgsql-hackers по дате отправления: