Обсуждение: Retiring is_pushed_down
When forming an outer join's joinrel, we have the is_pushed_down flag in
RestrictInfo nodes to distinguish those quals that are in that join's
JOIN/ON condition from those that were pushed down to the joinrel and
thus act as filter quals. Since now we have the outer-join-aware-Var
infrastructure, I think we can check to see whether a qual clause's
required_relids reference the outer join(s) being formed, in order to
tell if it's a join or filter clause. This seems like a more principled
way. (Interesting that optimizer/README actually describes this way in
section 'Relation Identification and Qual Clause Placement'.)
So I give it a try to retire is_pushed_down as attached. But there are
several points that may need more thoughts.
* When we form an outer join, it's possible that more than one outer
join relid is added to the join's relid set, if there are any pushed
down outer joins per identity 3. And it's also possible that no outer
join relid is added, for an outer join that has been pushed down. So
instead of checking if a qual clause's required_relids include the outer
join's relid, I think we should check if its required_relids overlap
the outer join relids that are being formed, which means that we should
use bms_overlap(rinfo->required_relids, ojrelids) rather than
bms_is_member(ojrelid, rinfo->required_relids). And we should do this
check only for outer joins.
* This patch calculates the outer join relids that are being formed
generally in this way:
bms_difference(joinrelids, bms_union(outerrelids, innerrelids))
Of course this can only be used after the outer join relids has been
added by add_outer_joins_to_relids(). This calculation is performed
multiple times during planning. I'm not sure if this has performance
issues. Maybe we can calculate it only once and store the result in
some place (such as in JoinPath)?
* ANTI joins are treated as outer joins but sometimes they do not have
rtindex (such as ANTI joins derived from SEMI). This would be a problem
with this new check. As an example, consider query
select * from a where not exists (select 1 from b where a.i = b.i) and
CURRENT_USER = SESSION_USER;
The pseudoconstant clause 'CURRENT_USER = SESSION_USER' is supposed to
be treated as a filter clause but the new check would treat it as a join
clause because the outer join relid set being formed is empty since the
ANTI join here does not have an rtindex. To solve this problem, this
patch manually adds a RTE for a ANTI join derived from SEMI.
Any thoughts?
Thanks
Richard
RestrictInfo nodes to distinguish those quals that are in that join's
JOIN/ON condition from those that were pushed down to the joinrel and
thus act as filter quals. Since now we have the outer-join-aware-Var
infrastructure, I think we can check to see whether a qual clause's
required_relids reference the outer join(s) being formed, in order to
tell if it's a join or filter clause. This seems like a more principled
way. (Interesting that optimizer/README actually describes this way in
section 'Relation Identification and Qual Clause Placement'.)
So I give it a try to retire is_pushed_down as attached. But there are
several points that may need more thoughts.
* When we form an outer join, it's possible that more than one outer
join relid is added to the join's relid set, if there are any pushed
down outer joins per identity 3. And it's also possible that no outer
join relid is added, for an outer join that has been pushed down. So
instead of checking if a qual clause's required_relids include the outer
join's relid, I think we should check if its required_relids overlap
the outer join relids that are being formed, which means that we should
use bms_overlap(rinfo->required_relids, ojrelids) rather than
bms_is_member(ojrelid, rinfo->required_relids). And we should do this
check only for outer joins.
* This patch calculates the outer join relids that are being formed
generally in this way:
bms_difference(joinrelids, bms_union(outerrelids, innerrelids))
Of course this can only be used after the outer join relids has been
added by add_outer_joins_to_relids(). This calculation is performed
multiple times during planning. I'm not sure if this has performance
issues. Maybe we can calculate it only once and store the result in
some place (such as in JoinPath)?
* ANTI joins are treated as outer joins but sometimes they do not have
rtindex (such as ANTI joins derived from SEMI). This would be a problem
with this new check. As an example, consider query
select * from a where not exists (select 1 from b where a.i = b.i) and
CURRENT_USER = SESSION_USER;
The pseudoconstant clause 'CURRENT_USER = SESSION_USER' is supposed to
be treated as a filter clause but the new check would treat it as a join
clause because the outer join relid set being formed is empty since the
ANTI join here does not have an rtindex. To solve this problem, this
patch manually adds a RTE for a ANTI join derived from SEMI.
Any thoughts?
Thanks
Richard
Вложения
On Tue, Jul 25, 2023 at 3:39 PM Richard Guo <guofenglinux@gmail.com> wrote:
* This patch calculates the outer join relids that are being formed
generally in this way:
bms_difference(joinrelids, bms_union(outerrelids, innerrelids))
Of course this can only be used after the outer join relids has been
added by add_outer_joins_to_relids(). This calculation is performed
multiple times during planning. I'm not sure if this has performance
issues. Maybe we can calculate it only once and store the result in
some place (such as in JoinPath)?
In the v2 patch, I added a member in JoinPath to store the relid set of
any outer joins that will be calculated at this join, and this would
avoid repeating this calculation when creating nestloop/merge/hash join
plan nodes. Also fixed a comment in v2.
Thanks
Richard
any outer joins that will be calculated at this join, and this would
avoid repeating this calculation when creating nestloop/merge/hash join
plan nodes. Also fixed a comment in v2.
Thanks
Richard
Вложения
On Thu, 27 Jul 2023 at 08:25, Richard Guo <guofenglinux@gmail.com> wrote: > > > On Tue, Jul 25, 2023 at 3:39 PM Richard Guo <guofenglinux@gmail.com> wrote: >> >> * This patch calculates the outer join relids that are being formed >> generally in this way: >> >> bms_difference(joinrelids, bms_union(outerrelids, innerrelids)) >> >> Of course this can only be used after the outer join relids has been >> added by add_outer_joins_to_relids(). This calculation is performed >> multiple times during planning. I'm not sure if this has performance >> issues. Maybe we can calculate it only once and store the result in >> some place (such as in JoinPath)? > > > In the v2 patch, I added a member in JoinPath to store the relid set of > any outer joins that will be calculated at this join, and this would > avoid repeating this calculation when creating nestloop/merge/hash join > plan nodes. Also fixed a comment in v2. I'm seeing that there has been no activity in this thread for nearly 6 months, I'm planning to close this in the current commitfest unless someone is planning to take it forward. It can be opened again when there is more interest. Regards, Vignesh
On Sun, Jan 21, 2024 at 8:37 PM vignesh C <vignesh21@gmail.com> wrote:
I'm seeing that there has been no activity in this thread for nearly 6
months, I'm planning to close this in the current commitfest unless
someone is planning to take it forward. It can be opened again when
there is more interest.
I'm planning to take it forward. The v2 patch does not compile any
more. Attached is an updated patch that is rebased over master and
fixes the compiling issues. Nothing else has changed.
Thanks
Richard
more. Attached is an updated patch that is rebased over master and
fixes the compiling issues. Nothing else has changed.
Thanks
Richard
Вложения
Here is another rebase over 3af7040985. Nothing else has changed.
Thanks
Richard
Thanks
Richard
Вложения
I see following issue with the latest patch,
relnode.c:2122:32: error: use of undeclared identifier 'ojrelids'
RINFO_IS_PUSHED_DOWN(rinfo, ojrelids, joinrel->relids))
On Thu, 18 Apr 2024 at 09:34, Richard Guo <guofenglinux@gmail.com> wrote:
Here is another rebase over 3af7040985. Nothing else has changed.
Thanks
Richard
Regards,
Rafia Sabih
Rafia Sabih
Richard Guo <guofenglinux@gmail.com> writes: > When forming an outer join's joinrel, we have the is_pushed_down flag in > RestrictInfo nodes to distinguish those quals that are in that join's > JOIN/ON condition from those that were pushed down to the joinrel and > thus act as filter quals. Since now we have the outer-join-aware-Var > infrastructure, I think we can check to see whether a qual clause's > required_relids reference the outer join(s) being formed, in order to > tell if it's a join or filter clause. This seems like a more principled > way. (Interesting that optimizer/README actually describes this way in > section 'Relation Identification and Qual Clause Placement'.) Sorry for being so slow to look at this patch. The idea you're following is one that I spent a fair amount of time on while working on what became 2489d76c4 ("Make Vars be outer-join-aware"). I failed to make it work though. Digging in my notes from the time: ----- How about is_pushed_down? Would really like to get rid of that, because it's ugly/sloppily defined, and it's hard to determine the correct value for EquivClass-generated clauses once we allow derivations from OJ clauses. However, my original idea of checking for join's ojrelid present in clause's required_relids has issues: * fails if clause is not pushed as far down as it can possibly be (and lateral refs mean that that's hard to do sometimes) * getting the join's ojrelid to everywhere we need to check this is messy. I'd tolerate the mess if it worked nicely, but ... ----- So I'm worried that the point about lateral refs is still a problem in your version. To be clear, the hazard is that if a WHERE clause ends up getting placed at an outer join that's higher than any of the OJs specifically listed in its required_relids, we'd misinterpret it as being a join clause for that OJ although it should be a filter clause. The other thing I find in my old notes is speculation that we could use the concept of JoinDomains to replace is_pushed_down. That is, we'd have to label every RestrictInfo with the JoinDomain of its syntactic source location, and then we could tell if the RI was "pushed down" relative to a particular join by seeing if the JD was above or below that join. This ought to be impervious to not-pushed-down-all-the-way problems. The thing I'd not figured out was how to make this work with quals of full joins: they don't belong to either the upper JoinDomain or either of the lower ones. We could possibly fix this by giving a full join its very own JoinDomain that is understood to be a parent of both lower domains, but I ran out of energy to pursue that. If we went this route, we'd basically be replacing the is_pushed_down field with a JoinDomain field, which is surely not simpler. But it seems more crisply defined and perhaps more amenable to my long-term desire to be able to use the EquivalenceClass machinery with outer join clauses. (The idea being that an EC would describe equalities that hold within a JoinDomain, but not necessarily elsewhere.) regards, tom lane
On Fri, Sep 27, 2024 at 5:06 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > So I'm worried that the point about lateral refs is still a problem > in your version. To be clear, the hazard is that if a WHERE clause > ends up getting placed at an outer join that's higher than any of > the OJs specifically listed in its required_relids, we'd misinterpret > it as being a join clause for that OJ although it should be a filter > clause. Sorry it took me so long to get back to this thread. I don't quite understand how this could happen. If a WHERE clause is placed on an outer join but does not include the outer join's ojrelid in its required_relids, then it must only refer to the non-nullable side. In that case, we should be able to push this clause down to the non-nullable side of the outer join. Perhaps this issue could occur with a lateral join, but I wasn't able to construct such a query. I wonder if you happen to have an example on hand. > If we went this route, we'd basically be replacing the is_pushed_down > field with a JoinDomain field, which is surely not simpler. But it > seems more crisply defined and perhaps more amenable to my long-term > desire to be able to use the EquivalenceClass machinery with outer > join clauses. (The idea being that an EC would describe equalities > that hold within a JoinDomain, but not necessarily elsewhere.) Exactly. At first, I thought that with JoinDomain, we could avoid checking if a clause's required_relids exceeds the scope of the join in RINFO_IS_PUSHED_DOWN, so RINFO_IS_PUSHED_DOWN can be reduced to: #define RINFO_IS_PUSHED_DOWN(rinfo, joinrelids) \ (bms_is_subset(joinrelids, (rinfo)->jdomain->jd_relids)) However, if the outer join was commuted with another one according to Identity 3, simply checking whether the JoinDomain is above or below the outer join is not sufficient. We'll still need to check if the clause's required_relids exceeds the scope of the join, as we do currently. But I agree that changing to use JoinDomain is more amenable to the goal of using ECs with outer join clauses. Thanks Richard