Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
От | Tomas Vondra |
---|---|
Тема | Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not? |
Дата | |
Msg-id | e9c62bf4-95cd-ad83-944d-b68ea5b717d8@enterprisedb.com обсуждение исходный текст |
Ответ на | Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not? (Robert Haas <robertmhaas@gmail.com>) |
Ответы |
Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
|
Список | pgsql-hackers |
On 2/17/22 23:16, Robert Haas wrote: > On Thu, Feb 17, 2022 at 4:17 PM Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: >> IMHO the whole problem is we're unable to estimate the join clause as a >> conditional probability, i.e. >> >> P(A.x = B.x | (A.x < 42) & (B.x < 42)) >> >> so maybe instead of trying to generate additional RelOptInfo items we >> should think about improving that. The extra RelOptInfos don't really >> solve this, because even if you decide to join A|x<42 to B|x<42 it does >> nothing to improve the join clause estimate. > > I guess I hadn't considered that angle. I think the extra RelOptInfos > (or whatever) actually do solve a problem, because enforcing a > high-selectivity join qual against both sides is potentially quite > wasteful, and you need some way to decide whether to do it on one > side, the other, or both. But it's also true that I was wrong to > assume independence ... and if we could avoid assuming that, then the > join selectivity would work itself out without any of the machinery > that I just proposed. > True. We kinda already have this issue for the equality clauses, and having paths with the condition pushed down (or not) seems like a natural approach. >> It actually deals with a more general form of this case, because the >> clauses don't need to reference the same attribute - so for example this >> would work too, assuming there is extended stats object on the columns >> on each side: >> >> P(A.c = B.d | (A.e < 42) & (B.f < 42)) > > That'd be cool. > Yeah, but the patch implementing this still needs more work. >> Not sure. In my experience queries with both a join clause and other >> clauses referencing the same attribute are pretty rare. But I agree if >> we can do the expensive stuff only when actually needed, with no cost in >> the 99.999% other cases, I don't see why not. Of course, code complexity >> is a cost too. > > Right. I mean, we could have a planner GUC to control whether the > optimization is used even in cases where we see that it's possible. > But Tom keeps arguing that it is possible in many queries and would > benefit few queries, and I'm not seeing why that should be so. I think > it's likely to benefit many of the queries to which it applies. > Maybe. Although the example I linked some time ago shows a pretty dramatic improvement, due to picking merge join + index scan, and not realizing we'll have to skip a lot of data. But that's just one anecdotal example. Anyway, I think the best way to deal with these (perfectly legitimate) concerns is to show how expensive it is for queries not not having such join/restriction clauses, with the cost being close to 0. And then for queries with such clauses but not benefiting from the change (a bit like a worst case). regards [1] https://www.postgresql.org/message-id/CA%2B1Wm9U_sP9237f7OH7O%3D-UTab71DWOO4Qc-vnC78DfsJQBCwQ%40mail.gmail.com -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления: