Обсуждение: NOT IN subquery optimization
The semantics of NOT IN (SELECT ...) are subtly different from the semantics of NOT EXISTS (SELECT ...). These differences center on how NULLs are treated, and in general can result in statements that are harder to optimize and slower to execute than the apparently similar NOT EXISTS statement. A little over a year ago, Christian Antognini authored the blog "/How Well a Query Optimizer Handles Subqueries?/" summarizing his findings about the performance of PostgreSQL, MySQL, and Oracle on various subqueries: https://antognini.ch/2017/12/how-well-a-query-optimizer-handles-subqueries/ His position was that you can classify the optimizations as correct or incorrect, and based on that he provided the following comparison summary (see below). In short, PostgreSQL was the worst of the three systems: "Summary The number of queries that the query optimizers handle correctly are the following: Oracle Database 12.2: 72 out of 80 MySQL 8.0.3: 67 out of 80 PostgreSQL 10.0: 60 out of 80 Since not all queries are handled correctly, for best performance it is sometimes necessary to rewrite them." The subqueries that were found to be optimized "incorrectly" were almost entirely due to poor or absent NOT IN subquery optimization. The PostgreSQL community has been aware of the deficiencies in NOT IN optimization for quite some time. Based on an analysis of psgsql-performance posts between 2013 and 2015, Robert Haas identified NOT IN optimization as one of the common root causes of performance problems. We have been working on improved optimization of NOT IN, and we would like to share this optimizaton with the community. With respect to the test cases mentioned in the blog post mentioned above, it will elevate PostgreSQL from "worst" to "first". Generally the performance gains are large when the optimization applies, though we have found one test case where performance is worse. We are investigating this now to see if we can disable the optimization in that case. We would like to include a patch for this change in the current commitfest. This thread can be used to track comments about this optimization. ----- Jim Finnerty, AWS, Amazon Aurora PostgreSQL -- Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
Jim Finnerty <jfinnert@amazon.com> writes: > We have been working on improved optimization of NOT IN, and we would like > to share this optimizaton with the community. The idea that's been kicked around in the past is to detect whether the subselect's output column(s) can be proved NOT NULL, and if so, convert to an antijoin just like NOT EXISTS. Is that what you're doing, or something different? > We would like to include a patch for this change in the current commitfest. Generally, people just send patches, they don't ask for permission first ;-) Having said that, we have a general policy that we don't like complex patches that first show up for the last commitfest of a dev cycle. So unless this is a pretty small patch, it's probably going to get delayed to v13. Still, we'd like to have it in the queue, so submit away ... regards, tom lane
re: The idea that's been kicked around in the past is to detect whether the subselect's output column(s) can be proved NOT NULL, and if so, convert to an antijoin just like NOT EXISTS basically, yes. this will handle nullability of both the outer and inner correlated expression(s), multiple expressions, presence or absence of predicates in the WHERE clause, and whether the correlated expressions are on the null-padded side of an outer join. If it is judged to be more efficient, then it transforms the NOT IN sublink into an anti-join. some complications enter into the decision to transform NOT IN to anti-join based on whether a bitmap plan will/not be used, or whether it will/not be eligible for PQ. ----- Jim Finnerty, AWS, Amazon Aurora PostgreSQL -- Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
Jim Finnerty <jfinnert@amazon.com> writes: > re: The idea that's been kicked around in the past is to detect whether the > subselect's output column(s) can be proved NOT NULL, and if so, convert > to an antijoin just like NOT EXISTS > basically, yes. this will handle nullability of both the outer and inner > correlated expression(s), multiple expressions, presence or absence of > predicates in the WHERE clause, and whether the correlated expressions are > on the null-padded side of an outer join. If it is judged to be more > efficient, then it transforms the NOT IN sublink into an anti-join. Hmm, that seems overcomplicated ... > some complications enter into the decision to transform NOT IN to anti-join > based on whether a bitmap plan will/not be used, or whether it will/not be > eligible for PQ. ... and that even more so, considering that this decision really needs to be taken long before cost estimates would be available. As far as I can see, there should be no situation where we'd not want to transform to antijoin if we can prove it's semantically valid to do so. If there are cases where that comes out as a worse plan, that indicates a costing error that would be something to address separately (because it'd also be a problem for other antijoin cases). Also, as long as it nearly always wins, I'm not going to cry too hard if there are corner cases where it makes the wrong choice. That's not something that's possible to avoid completely. regards, tom lane
We can always correctly transform a NOT IN to a correlated NOT EXISTS. In almost all cases it is more efficient to do so. In the one case that we've found that is slower it does come down to a more general costing issue, so that's probably the right way to think about it. ----- Jim Finnerty, AWS, Amazon Aurora PostgreSQL -- Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
On Thu, 21 Feb 2019 at 16:27, Jim Finnerty <jfinnert@amazon.com> wrote: > We can always correctly transform a NOT IN to a correlated NOT EXISTS. In > almost all cases it is more efficient to do so. In the one case that we've > found that is slower it does come down to a more general costing issue, so > that's probably the right way to think about it. I worked on this over 4 years ago [1]. I think the patch there is not completely broken and seems just to need a few things fixed. I rebased it on top of current master and looked at it. I think the main remaining issue is fixing the code that ensures the outer side join quals can't be NULL. The code that's there looks broken still since it attempts to use quals from any inner joined rel for proofs that NULLs will be removed. That might not work so well in a case like: SELECT * FROM t1 LEFT JOIN t2 ON t1.a = t2.a AND t2.b NOT IN(select b from t3), however, I'd need to think harder about that since if there was such a qual then the planner should convert the left join into an inner join. But anyway, the function expressions_are_not_nullable() was more intended to work with targetlists to ensure exprs there can't be NULL. I just had done a poor job of trying to modify that into allowing it to take exprs from any random place, likely that should be a new function and expressions_are_not_nullable() should be put back to what Tom ended up with. I've attached the rebased and still broken version. [1] https://www.postgresql.org/message-id/CAApHDvqRB-iFBy68%3DdCgqS46aRep7AuN2pou4KTwL8kX9YOcTQ%40mail.gmail.com -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
On Fri, 22 Feb 2019 at 09:44, David Rowley <david.rowley@2ndquadrant.com> wrote: > I've attached the rebased and still broken version. I set about trying to make a less broken version of this. A quick reminder of the semantics of NOT IN: 1. WHERE <nullable_column> NOT IN(SELECT <not null column> FROM table); If table is non-empty: will filter out rows where <nullable_column> is NULL and only show values that are not in <not null column> If table is empty: Filters nothing. 2. WHERE <nonnullable_column> NOT IN(SELECT <null column> FROM table); If table contains NULLs in the <null column> no records will match. The previous patch handled #2 correctly but neglected to do anything about #1. For #1 the only way we can implement this as a planner only change is to insist that the outer side expressions also are not null. If we could have somehow tested if "table" was non-empty then we could have added a IS NOT NULL clause to the outer query and converted to an anti-join, but ... can't know that during planning and can't add the IS NOT NULL regardless as, if "table" is empty we will filter NULLs when we shouldn't. In the attached, I set about fixing #1 by determining if the outer expressions could be NULL by checking 1. If expression is a Var from an inner joined relation it can't be NULL if there's a NOT NULL constraint on the column; or 2. If expression is a Var from an inner joined relation and there is a strict WHERE/ON clause, the expression can't be NULL; or 3. If expression is a Var from an outer joined relation check for quals that were specified in the same syntactical level as the NOT IN for proofs that NULL will be filtered. An example of #3 is: SELECT * FROM t1 LEFT JOIN t2 on t1.a = t2.a WHERE t2.a IS NOT NULL AND t2.a NOT IN(SELECT a FROM t3); -- t2 becomes INNER JOINed later in planning, but... or; SELECT * FROM t1 LEFT JOIN t2 on t1.a = t2.a AND t2.a NOT IN(SELECT a FROM t3); In the latter of the two, the t1.a = t2.a join conditions ensures that NULLs can't exist where the NOT IN is evaluated. I implemented #3 by passing the quals down to pull_up_sublinks_qual_recurse(). At the top level call 'node' and 'notnull_proofs' are the same, but that changes in recursive calls like the one we make inside the is_andclause() condition. Comments welcome. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
I'm attaching a working patch following the discussion. You can also find the following patch description is the commit message: NOT IN to ANTI JOIN transformation The semantics of ANTI JOIN were created to match the semantics of NOT EXISTS, which enables NOT EXISTS subqueries to be efficiently executed as a kind of join. NOT IN subqueries have different NULL semantics than NOT EXISTS, but since there is no special join operator for NOT IN it is generally executed as a nested sub-plan. It is possible, however, to transform NOT IN to a correlated NOT EXISTS so that it can be executed it as an ANTI JOIN with additional correlated predicates. A general transformation from NOT IN to NOT EXISTS for the single-expression case (the multi-expression case is just ANDs of the single-expressions) is: t1.x NOT IN (SELECT t2.y from t2 where p) <=> NOT EXISTS (select 1 from t2 where (y=x or y is NULL or x is NULL) and p). If x or y is non-nullable, we can safely remove the predicate "x is NULL" or "y is NULL", and if there is no predicate p, then "x is NULL" may be factored out of the subquery. Experiments show that if we can remove one or the other ORed predicates, or if we can factor out the "x is NULL", then execution is typically much faster. Basically, for the single expression case (we also handle the multi expression case), we try to do the following transformation: When p does not exist: t1.x not in (t2.y) => ANTI JOIN t1(Filter: x is not null), t2 on join condition: t1.x=t2.y or t2.y is null. When x is non-nullable: t1.x not in (t2.y where p) => ANTI JOIN t1, t2 on join condition: (t1.x=t2.y or t2.y is null) and p. We implemented a nullability test routine is_node_nonnullable(). Currently it handles Var, TargetEntry, CoalesceExpr and Const. Outer joins are taken into consideration in the nullability test. We adjust and apply reduce_outer_joins() before the transformation so that the outer joins have an opportunity to be converted to inner joins prior to the transformation. Using this transformation, we measured performance improvements of two to three orders of magnitude on most queries in a development environment. In our performance experiments, table s (small) has 11 rows, table l (large) has 1 million rows. s.n and l.n have NULL values. s.nn and l.nn are NOT NULL. s.n not in (l.n) 1150 ms -> 0.49 ms s.nn not in (l.nn) 1120 ms -> 0.45 ms l.n not in (l.n) over 20 min -> 1700 ms l.nn not in (l.nn) over 20 min -> 220 ms l.n not in (s.n) 63 ms -> 750 ms l.nn not in (s.nn) 58 ms -> 46 ms For the only case that performance drops - l.n not in (s.n). It is likely to be resolved by ending the nested loop anti join early as soon as we find NULL inner tuple entry/entries that satisfies the join condition during execution. This is still under investigation. Comments are welcome. With Regards, --- Zheng Li, AWS, Amazon Aurora PostgreSQL On 2/25/19, 7:32 AM, "David Rowley" <david.rowley@2ndquadrant.com> wrote: On Fri, 22 Feb 2019 at 09:44, David Rowley <david.rowley@2ndquadrant.com> wrote: > I've attached the rebased and still broken version. I set about trying to make a less broken version of this. A quick reminder of the semantics of NOT IN: 1. WHERE <nullable_column> NOT IN(SELECT <not null column> FROM table); If table is non-empty: will filter out rows where <nullable_column> is NULL and only show values that are not in <not null column> If table is empty: Filters nothing. 2. WHERE <nonnullable_column> NOT IN(SELECT <null column> FROM table); If table contains NULLs in the <null column> no records will match. The previous patch handled #2 correctly but neglected to do anything about #1. For #1 the only way we can implement this as a planner only change is to insist that the outer side expressions also are not null. If we could have somehow tested if "table" was non-empty then we could have added a IS NOT NULL clause to the outer query and converted to an anti-join, but ... can't know that during planning and can't add the IS NOT NULL regardless as, if "table" is empty we will filter NULLs when we shouldn't. In the attached, I set about fixing #1 by determining if the outer expressions could be NULL by checking 1. If expression is a Var from an inner joined relation it can't be NULL if there's a NOT NULL constraint on the column; or 2. If expression is a Var from an inner joined relation and there is a strict WHERE/ON clause, the expression can't be NULL; or 3. If expression is a Var from an outer joined relation check for quals that were specified in the same syntactical level as the NOT IN for proofs that NULL will be filtered. An example of #3 is: SELECT * FROM t1 LEFT JOIN t2 on t1.a = t2.a WHERE t2.a IS NOT NULL AND t2.a NOT IN(SELECT a FROM t3); -- t2 becomes INNER JOINed later in planning, but... or; SELECT * FROM t1 LEFT JOIN t2 on t1.a = t2.a AND t2.a NOT IN(SELECT a FROM t3); In the latter of the two, the t1.a = t2.a join conditions ensures that NULLs can't exist where the NOT IN is evaluated. I implemented #3 by passing the quals down to pull_up_sublinks_qual_recurse(). At the top level call 'node' and 'notnull_proofs' are the same, but that changes in recursive calls like the one we make inside the is_andclause() condition. Comments welcome. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
Resend the patch with a whitespace removed so that "git apply patch" works directly. --- Zheng Li, AWS, Amazon Aurora PostgreSQL On 2/25/19, 12:39 PM, "Li, Zheng" <zhelli@amazon.com> wrote: I'm attaching a working patch following the discussion. You can also find the following patch description is the commit message: NOT IN to ANTI JOIN transformation The semantics of ANTI JOIN were created to match the semantics of NOT EXISTS, which enables NOT EXISTS subqueries to be efficiently executed as a kind of join. NOT IN subqueries have different NULL semantics than NOT EXISTS, but since there is no special join operator for NOT IN it is generally executed as a nested sub-plan. It is possible, however, to transform NOT IN to a correlated NOT EXISTS so that it can be executed it as an ANTI JOIN with additional correlated predicates. A general transformation from NOT IN to NOT EXISTS for the single-expression case (the multi-expression case is just ANDs of the single-expressions) is: t1.x NOT IN (SELECT t2.y from t2 where p) <=> NOT EXISTS (select 1 from t2 where (y=x or y is NULL or x is NULL) and p). If x or y is non-nullable, we can safely remove the predicate "x is NULL" or "y is NULL", and if there is no predicate p, then "x is NULL" may be factored out of the subquery. Experiments show that if we can remove one or the other ORed predicates, or if we can factor out the "x is NULL", then execution is typically much faster. Basically, for the single expression case (we also handle the multi expression case), we try to do the following transformation: When p does not exist: t1.x not in (t2.y) => ANTI JOIN t1(Filter: x is not null), t2 on join condition: t1.x=t2.y or t2.y is null. When x is non-nullable: t1.x not in (t2.y where p) => ANTI JOIN t1, t2 on join condition: (t1.x=t2.y or t2.y is null) and p. We implemented a nullability test routine is_node_nonnullable(). Currently it handles Var, TargetEntry, CoalesceExpr and Const. Outer joins are taken into consideration in the nullability test. We adjust and apply reduce_outer_joins() before the transformation so that the outer joins have an opportunity to be converted to inner joins prior to the transformation. Using this transformation, we measured performance improvements of two to three orders of magnitude on most queries in a development environment. In our performance experiments, table s (small) has 11 rows, table l (large) has 1 million rows. s.n and l.n have NULL values. s.nn and l.nn are NOT NULL. s.n not in (l.n) 1150 ms -> 0.49 ms s.nn not in (l.nn) 1120 ms -> 0.45 ms l.n not in (l.n) over 20 min -> 1700 ms l.nn not in (l.nn) over 20 min -> 220 ms l.n not in (s.n) 63 ms -> 750 ms l.nn not in (s.nn) 58 ms -> 46 ms For the only case that performance drops - l.n not in (s.n). It is likely to be resolved by ending the nested loop anti join early as soon as we find NULL inner tuple entry/entries that satisfies the join condition during execution. This is still under investigation. Comments are welcome. With Regards, --- Zheng Li, AWS, Amazon Aurora PostgreSQL On 2/25/19, 7:32 AM, "David Rowley" <david.rowley@2ndquadrant.com> wrote: On Fri, 22 Feb 2019 at 09:44, David Rowley <david.rowley@2ndquadrant.com> wrote: > I've attached the rebased and still broken version. I set about trying to make a less broken version of this. A quick reminder of the semantics of NOT IN: 1. WHERE <nullable_column> NOT IN(SELECT <not null column> FROM table); If table is non-empty: will filter out rows where <nullable_column> is NULL and only show values that are not in <not null column> If table is empty: Filters nothing. 2. WHERE <nonnullable_column> NOT IN(SELECT <null column> FROM table); If table contains NULLs in the <null column> no records will match. The previous patch handled #2 correctly but neglected to do anything about #1. For #1 the only way we can implement this as a planner only change is to insist that the outer side expressions also are not null. If we could have somehow tested if "table" was non-empty then we could have added a IS NOT NULL clause to the outer query and converted to an anti-join, but ... can't know that during planning and can't add the IS NOT NULL regardless as, if "table" is empty we will filter NULLs when we shouldn't. In the attached, I set about fixing #1 by determining if the outer expressions could be NULL by checking 1. If expression is a Var from an inner joined relation it can't be NULL if there's a NOT NULL constraint on the column; or 2. If expression is a Var from an inner joined relation and there is a strict WHERE/ON clause, the expression can't be NULL; or 3. If expression is a Var from an outer joined relation check for quals that were specified in the same syntactical level as the NOT IN for proofs that NULL will be filtered. An example of #3 is: SELECT * FROM t1 LEFT JOIN t2 on t1.a = t2.a WHERE t2.a IS NOT NULL AND t2.a NOT IN(SELECT a FROM t3); -- t2 becomes INNER JOINed later in planning, but... or; SELECT * FROM t1 LEFT JOIN t2 on t1.a = t2.a AND t2.a NOT IN(SELECT a FROM t3); In the latter of the two, the t1.a = t2.a join conditions ensures that NULLs can't exist where the NOT IN is evaluated. I implemented #3 by passing the quals down to pull_up_sublinks_qual_recurse(). At the top level call 'node' and 'notnull_proofs' are the same, but that changes in recursive calls like the one we make inside the is_andclause() condition. Comments welcome. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
On Tue, 26 Feb 2019 at 11:51, Li, Zheng <zhelli@amazon.com> wrote: > Resend the patch with a whitespace removed so that "git apply patch" works directly. I had a quick look at this and it seems to be broken for the empty table case I mentioned up thread. Quick example: Setup: create table t1 (a int); create table t2 (a int not null); insert into t1 values(NULL),(1),(2); select * from t1 where a not in(select a from t2); Patched: a --- 1 2 (2 rows) Master: a --- 1 2 (3 rows) This will be due to the fact you're adding an a IS NOT NULL qual to the scan of a: postgres=# explain select * from t1 where a not in(select a from t2); QUERY PLAN ------------------------------------------------------------------ Hash Anti Join (cost=67.38..152.18 rows=1268 width=4) Hash Cond: (t1.a = t2.a) -> Seq Scan on t1 (cost=0.00..35.50 rows=2537 width=4) Filter: (a IS NOT NULL) -> Hash (cost=35.50..35.50 rows=2550 width=4) -> Seq Scan on t2 (cost=0.00..35.50 rows=2550 width=4) (6 rows) but as I mentioned, you can't do that as t2 might be empty and there's no way to know that during planning. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Greenplum Database does this optimization. The idea is to use a new join
type, let's call it JOIN_LASJ_NOTIN, and its semantic regarding NULL is
defined as below:
1. If there is a NULL in the outer side, and the inner side is empty, the
NULL should be part of the outputs.
2. If there is a NULL in the outer side, and the inner side is not empty,
the NULL should not be part of the outputs.
3. If there is a NULL in the inner side, no outputs should be produced.
An example plan looks like:
gpadmin=# explain (costs off) select * from t1 where a not in(select a from t2);
QUERY PLAN
-----------------------------------
Hash Left Anti Semi (Not-In) Join
Hash Cond: (t1.a = t2.a)
-> Seq Scan on t1
-> Hash
-> Seq Scan on t2
(5 rows)
Thanks
Richard
On Tue, Feb 26, 2019 at 7:20 AM David Rowley <david.rowley@2ndquadrant.com> wrote:
On Tue, 26 Feb 2019 at 11:51, Li, Zheng <zhelli@amazon.com> wrote:
> Resend the patch with a whitespace removed so that "git apply patch" works directly.
I had a quick look at this and it seems to be broken for the empty
table case I mentioned up thread.
Quick example:
Setup:
create table t1 (a int);
create table t2 (a int not null);
insert into t1 values(NULL),(1),(2);
select * from t1 where a not in(select a from t2);
Patched:
a
---
1
2
(2 rows)
Master:
a
---
1
2
(3 rows)
This will be due to the fact you're adding an a IS NOT NULL qual to
the scan of a:
postgres=# explain select * from t1 where a not in(select a from t2);
QUERY PLAN
------------------------------------------------------------------
Hash Anti Join (cost=67.38..152.18 rows=1268 width=4)
Hash Cond: (t1.a = t2.a)
-> Seq Scan on t1 (cost=0.00..35.50 rows=2537 width=4)
Filter: (a IS NOT NULL)
-> Hash (cost=35.50..35.50 rows=2550 width=4)
-> Seq Scan on t2 (cost=0.00..35.50 rows=2550 width=4)
(6 rows)
but as I mentioned, you can't do that as t2 might be empty and there's
no way to know that during planning.
--
David Rowley https://urldefense.proofpoint.com/v2/url?u=http-3A__www.2ndQuadrant.com_&d=DwIBaQ&c=lnl9vOaLMzsy2niBC8-h_K-7QJuNJEsFrzdndhuJ3Sw&r=5r3cnfZPUDOHrMiXq8Mq2g&m=dE1nglE17x3nD-oH_BrF0r4SLaFnQKzwwJBJGpDoaaA&s=dshupMomMvkDAd92918cU21AJ1E1s7QwbrxIGSRxZA8&e=
PostgreSQL Development, 24x7 Support, Training & Services
The problem is that the special optimization that was proposed for the case where the subquery has no WHERE clause isn't valid when the subquery returns no rows. That additional optimization needs to be removed, and preferably replaced with some sort of efficient run-time test. ----- Jim Finnerty, AWS, Amazon Aurora PostgreSQL -- Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
On Wed, 27 Feb 2019 at 03:07, Jim Finnerty <jfinnert@amazon.com> wrote: > > The problem is that the special optimization that was proposed for the case > where the subquery has no WHERE clause isn't valid when the subquery returns > no rows. That additional optimization needs to be removed, and preferably > replaced with some sort of efficient run-time test. That is one option, however, the join type that Richard mentions, to satisfy point #3, surely only can work for Hash joins and perhaps Merge joins that required a sort, assuming there's some way for the sort to communicate about if it found NULLs or not. Either way, we need to have looked at the entire inner side to ensure there are no nulls. Probably it would be possible to somehow team that up with a planner check to see if the inner exprs could be NULL then just implement points #1 and #2 for other join methods. If you're proposing to do that for this thread then I can take my planner only patch somewhere else. I only posted my patch as I pretty much already had what I thought you were originally talking about. However, please be aware there are current patents around adding execution time smarts in this area, so it's probably unlikely you'll find a way to do this in the executor that does not infringe on those. Probably no committer would want to touch it. I think my patch covers a good number of use cases and as far as I understand, does not go near any current patents. Please let me know if I should move my patch to another thread. I don't want to hi-jack this if you're going in another direction. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
I agree we will need some runtime smarts (such as a new anti join type as pointed out by Richard) to "ultimately" accountfor all the cases of NOT IN queries. However, given that the March CommitFest is imminent and the runtime smarts patent concerns David had pointed out (whichI was not aware of before), we would not move that direction at the moment. I propose that we collaborate to build one patch from the two patches submitted in this thread for the CF. The two patchesare for the same purpose and similar. However, they differ in the following ways as far as I can tell: Nullability Test: -David's patch uses strict predicates for nullability test. -Our patch doesn't use strict predicates, but it accounts for COALESCE and null-padded rows from outer join. In addition,we made reduce_outer_joins() work before the transformation which makes the nullability test more accurate. Anti Join Transformation: -Dvaid's patch does the transformation when both inner and outer outputs are non-nullable. -With the latest fix (for the empty table case), our patch does the transformation as long as the outer is non-nullable regardlessof the inner nullability, experiments show that the results are always faster. David, please let me know what you think. If you would like to collaborate, I'll start merging with your code on using strictpredicates to make a better Nullability Test. Thanks, Zheng
"Li, Zheng" <zhelli@amazon.com> writes: > However, given that the March CommitFest is imminent and the runtime smarts patent concerns David had pointed out (whichI was not aware of before), we would not move that direction at the moment. > I propose that we collaborate to build one patch from the two patches submitted in this thread for the CF. TBH, I think it's very unlikely that any patch for this will be seriously considered for commit in v12. It would be against project policy, and spending a lot of time reviewing the patch would be quite unfair to other patches that have been in the queue longer. Therefore, I'd suggest that you not bend things out of shape just to have some patch to submit before March 1. It'd be better to work with the goal of having a coherent patch ready for the first v13 CF, probably July-ish. regards, tom lane
On Wed, 27 Feb 2019 at 13:05, Li, Zheng <zhelli@amazon.com> wrote: > -With the latest fix (for the empty table case), our patch does the transformation as long as the outer is non-nullableregardless of the inner nullability, experiments show that the results are always faster. Hi Zheng, I'm interested to know how this works without testing for inner nullability. If any of the inner side's join exprs are NULL then no records can match. What do you propose to work around that? -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Wed, 27 Feb 2019 at 13:13, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > "Li, Zheng" <zhelli@amazon.com> writes: > > However, given that the March CommitFest is imminent and the runtime smarts patent concerns David had pointed out (whichI was not aware of before), we would not move that direction at the moment. > > > I propose that we collaborate to build one patch from the two patches submitted in this thread for the CF. > > TBH, I think it's very unlikely that any patch for this will be seriously > considered for commit in v12. It would be against project policy, and > spending a lot of time reviewing the patch would be quite unfair to other > patches that have been in the queue longer. Therefore, I'd suggest that > you not bend things out of shape just to have some patch to submit before > March 1. It'd be better to work with the goal of having a coherent patch > ready for the first v13 CF, probably July-ish. FWIW, I did add this to the March CF, but I set the target version to 13. I wasn't considering this for PG12. I see Zheng was, but I agree with you on PG13 being the target for this. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
I'm totally fine with setting the target to PG13. -- I'm interested to know how this works without testing for inner nullability. If any of the inner side's join exprs are NULL then no records can match. What do you propose to work around that? -- We still check for inner side's nullability, when it is nullable we append a "var is NULL" to the anti join condition. So every outer tuple is going to evaluate to true on the join condition when there is indeed a null entry in the inner. Actually I think the nested loop anti join can end early in this case, but I haven't find a way to do it properly, this may be one other reason why we need a new join type for NOT IN. e.g. explain select count(*) from s where u not in (select n from l); QUERY PLAN ------------------------------------------------------------------------------------ Aggregate (cost=2892.88..2892.89 rows=1 width=8) -> Nested Loop Anti Join (cost=258.87..2892.88 rows=1 width=0) -> Seq Scan on s (cost=0.00..1.11 rows=11 width=4) -> Bitmap Heap Scan on l (cost=258.87..262.88 rows=1 width=4) Recheck Cond: ((s.u = n) OR (n IS NULL)) -> BitmapOr (cost=258.87..258.87 rows=1 width=0) -> Bitmap Index Scan on l_n (cost=0.00..4.43 rows=1 width=0) Index Cond: (s.u = n) -> Bitmap Index Scan on l_n (cost=0.00..4.43 rows=1 width=0) Index Cond: (n IS NULL) Zheng
On Wed, 27 Feb 2019 at 13:41, Li, Zheng <zhelli@amazon.com> wrote: > We still check for inner side's nullability, when it is nullable we > append a "var is NULL" to the anti join condition. So every outer > tuple is going to evaluate to true on the join condition when there > is indeed a null entry in the inner. That's possible, at least providing the var is NULL is an OR condition, but the problem there is that you force the plan into a nested loop join. That's unfortunately not going to perform very well when the number of rows to process is large. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Feb 27, 2019 at 4:52 AM David Rowley <david.rowley@2ndquadrant.com> wrote:
On Wed, 27 Feb 2019 at 03:07, Jim Finnerty <jfinnert@amazon.com> wrote:
If you're proposing to do that for this thread then I can take my
planner only patch somewhere else. I only posted my patch as I pretty
much already had what I thought you were originally talking about.
However, please be aware there are current patents around adding
execution time smarts in this area, so it's probably unlikely you'll
find a way to do this in the executor that does not infringe on those.
Probably no committer would want to touch it. I think my patch covers
a good number of use cases and as far as I understand, does not go
near any current patents.
Thanks for pointing out the patent concerns. I was not aware of that before.
Could you please provide some clue where I can find more info about the patents?
Thanks
Richard
On Tue, Feb 26, 2019 at 6:51 AM Li, Zheng <zhelli@amazon.com> wrote:
Resend the patch with a whitespace removed so that "git apply patch" works directly.
Hi Zheng,
I have reviewed your patch. Good job except two issues I can find:
1. The patch would give wrong results when the inner side is empty. In this
case, the whole data from outer side should be in the outputs. But with the
patch, we will lose the NULLs from outer side.
2. Because of the new added predicate 'OR (var is NULL)', we cannot use hash
join or merge join to do the ANTI JOIN. Nested loop becomes the only choice,
which is low-efficency.
Thanks
Richard
On Fri, 1 Mar 2019 at 15:27, Richard Guo <riguo@pivotal.io> wrote: > I have reviewed your patch. Good job except two issues I can find: > > 1. The patch would give wrong results when the inner side is empty. In this > case, the whole data from outer side should be in the outputs. But with the > patch, we will lose the NULLs from outer side. > > 2. Because of the new added predicate 'OR (var is NULL)', we cannot use hash > join or merge join to do the ANTI JOIN. Nested loop becomes the only choice, > which is low-efficency. Yeah. Both of these seem pretty fundamental, so setting the patch to waiting on author. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Hi, On March 1, 2019 4:53:03 AM PST, David Rowley <david.rowley@2ndquadrant.com> wrote: >On Fri, 1 Mar 2019 at 15:27, Richard Guo <riguo@pivotal.io> wrote: >> I have reviewed your patch. Good job except two issues I can find: >> >> 1. The patch would give wrong results when the inner side is empty. >In this >> case, the whole data from outer side should be in the outputs. But >with the >> patch, we will lose the NULLs from outer side. >> >> 2. Because of the new added predicate 'OR (var is NULL)', we cannot >use hash >> join or merge join to do the ANTI JOIN. Nested loop becomes the only >choice, >> which is low-efficency. > >Yeah. Both of these seem pretty fundamental, so setting the patch to >waiting on author. I've not checked, but could we please make sure these cases are covered in the regression tests today with a single liner?Seems people had to rediscover them a number of times now, and unless this thread results in an integrated featuresoonish, it seems likely other people will again. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Andres Freund <andres@anarazel.de> writes: > On March 1, 2019 4:53:03 AM PST, David Rowley <david.rowley@2ndquadrant.com> wrote: >> On Fri, 1 Mar 2019 at 15:27, Richard Guo <riguo@pivotal.io> wrote: >>> 1. The patch would give wrong results when the inner side is empty. >>> 2. Because of the new added predicate 'OR (var is NULL)', we cannot >>> use hash join or merge join to do the ANTI JOIN. > I've not checked, but could we please make sure these cases are covered > in the regression tests today with a single liner? I'm not sure if the second one is actually a semantics bug or just a misoptimization? But yeah, +1 for putting in some simple tests for corner cases right now. Anyone want to propose a specific patch? regards, tom lane
On Sat, 2 Mar 2019 at 05:44, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Andres Freund <andres@anarazel.de> writes: > > I've not checked, but could we please make sure these cases are covered > > in the regression tests today with a single liner? > > I'm not sure if the second one is actually a semantics bug or just a > misoptimization? But yeah, +1 for putting in some simple tests for > corner cases right now. Anyone want to propose a specific patch? The second is just reducing the planner's flexibility to produce a good plan. The first is a bug. Proposed regression test attached. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
David Rowley <david.rowley@2ndquadrant.com> writes: > On Sat, 2 Mar 2019 at 05:44, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I'm not sure if the second one is actually a semantics bug or just a >> misoptimization? But yeah, +1 for putting in some simple tests for >> corner cases right now. Anyone want to propose a specific patch? > The second is just reducing the planner's flexibility to produce a > good plan. The first is a bug. Proposed regression test attached. LGTM, pushed. regards, tom lane
Thanks all for the feedbacks! I'm working on a refined patch. Although adding "or var is NULL" to the anti join condition forces the planner to choose nested loop anti join, it is alwaysfaster compared to the original plan. In order to enable the transformation from NOT IN to anti join when the inner/outeris nullable, we have to add some NULL test to the join condition. We could make anti join t1, t2 on (t1.x = t2.y or t2.y IS NULL) eligible for hashjoin, it would require changes in allowingthis special join quals for hash join as well as changes in hash join executor to handle NULL accordingly for thecase. Another option of transformation is to add "is not false" on top of the join condition. Regards, Zheng On 3/1/19, 5:28 PM, "David Rowley" <david.rowley@2ndquadrant.com> wrote: On Sat, 2 Mar 2019 at 05:44, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Andres Freund <andres@anarazel.de> writes: > > I've not checked, but could we please make sure these cases are covered > > in the regression tests today with a single liner? > > I'm not sure if the second one is actually a semantics bug or just a > misoptimization? But yeah, +1 for putting in some simple tests for > corner cases right now. Anyone want to propose a specific patch? The second is just reducing the planner's flexibility to produce a good plan. The first is a bug. Proposed regression test attached. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
"Li, Zheng" <zhelli@amazon.com> writes: > Although adding "or var is NULL" to the anti join condition forces the planner to choose nested loop anti join, it is alwaysfaster compared to the original plan. TBH, I am *really* skeptical of sweeping claims like that. The existing code will typically produce a hashed-subplan plan, which ought not be that awful as long as the subquery result doesn't blow out memory. It certainly is going to beat a naive nested loop. regards, tom lane
On Sat, 2 Mar 2019 at 12:13, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > "Li, Zheng" <zhelli@amazon.com> writes: > > Although adding "or var is NULL" to the anti join condition forces the planner to choose nested loop anti join, it isalways faster compared to the original plan. > > TBH, I am *really* skeptical of sweeping claims like that. The existing > code will typically produce a hashed-subplan plan, which ought not be > that awful as long as the subquery result doesn't blow out memory. > It certainly is going to beat a naive nested loop. It's pretty easy to show the claim is false using master and NOT EXISTS. create table small(a int not null); create table big (a int not null); insert into small select generate_Series(1,1000); insert into big select x%1000+1 from generate_Series(1,1000000) x; select count(*) from big b where not exists(select 1 from small s where s.a = b.a); Time: 178.575 ms select count(*) from big b where not exists(select 1 from small s where s.a = b.a or s.a is null); Time: 38049.969 ms (00:38.050) -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
The current transformation would not add "or s.a is NULL" in the example provided since it is non-nullable. You will be comparingthese two cases in terms of the transformation: select count(*) from big b where not exists(select 1 from small s where s.a = b.a); Time: 51.416 ms select count(*) from big b where a not in (select a from s); Time: 890.088 ms But if s.a is nullable, yes, you have proved my previous statement is false... I should have used almost always. However, if s.a is nullable, we would do this transformation: select count(*) from big b where not exists(select 1 from small s where s.a = b.a or s.a is null); It's possible to stop the nested loop join early during execution once we find an inner Null entry because every outer tupleis going to evaluate to true on the join condition. Zheng On 3/1/19, 6:17 PM, "David Rowley" <david.rowley@2ndquadrant.com> wrote: On Sat, 2 Mar 2019 at 12:13, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > "Li, Zheng" <zhelli@amazon.com> writes: > > Although adding "or var is NULL" to the anti join condition forces the planner to choose nested loop anti join, itis always faster compared to the original plan. > > TBH, I am *really* skeptical of sweeping claims like that. The existing > code will typically produce a hashed-subplan plan, which ought not be > that awful as long as the subquery result doesn't blow out memory. > It certainly is going to beat a naive nested loop. It's pretty easy to show the claim is false using master and NOT EXISTS. create table small(a int not null); create table big (a int not null); insert into small select generate_Series(1,1000); insert into big select x%1000+1 from generate_Series(1,1000000) x; select count(*) from big b where not exists(select 1 from small s where s.a = b.a); Time: 178.575 ms select count(*) from big b where not exists(select 1 from small s where s.a = b.a or s.a is null); Time: 38049.969 ms (00:38.050) -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Sat, 2 Mar 2019 at 12:39, Li, Zheng <zhelli@amazon.com> wrote: > However, if s.a is nullable, we would do this transformation: > select count(*) from big b where not exists(select 1 from small s > where s.a = b.a or s.a is null); I understand you're keen to make this work, but you're assuming again that forcing the planner into a nested loop plan is going to be a win over the current behaviour. It may well be in some cases, but it's very simple to show cases where it's a significant regression. Using the same tables from earlier, and again with master: alter table small alter column a drop not null; select * from big where a not in(select a from small); Time: 430.283 ms Here's what you're proposing: select * from big b where not exists(select 1 from small s where s.a = b.a or s.a is null); Time: 37419.646 ms (00:37.420) about 80 times slower. Making "small" a little less small would likely see that gap grow even further. I think you're fighting a losing battle here with adding OR quals to the join condition. This transformation happens so early in planning that you really can't cost it out either. I think the only way that could be made to work satisfactorily would be with some execution level support for it. Short of that, you're left with just adding checks that either side of the join cannot produce NULL values... That's what I've proposed in [1]. [1] https://www.postgresql.org/message-id/CAKJS1f_OA5VeZx8A8H8mkj3uqEgOtmHBGCUA6%2BxqgmUJ6JQURw%40mail.gmail.com -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
David Rowley <david.rowley@2ndquadrant.com> writes: > I think you're fighting a losing battle here with adding OR quals to > the join condition. Yeah --- that has a nontrivial risk of making things significantly worse, which makes it a hard sell. I think the most reasonable bet here is simply to not perform the transformation if we can't prove the inner side NOT NULL. That's going to catch most of the useful cases anyway IMO. regards, tom lane
Folks - I was away on vacation for the month of February, and can give this my attention again. I agree with Tom's comment above - when the cost of the NOT IN is dominated by the cost of the outer scan (i.e. when the cardinality of the outer relation is large relative to the cardinality of the subquery), and if the inner cardinality is small enough to fit in memory, then the current implementation does an efficient hash lookup into an in-memory structure, and that's a very fast way to do the NOT IN. It almost achieves the lower-bound cost of scanning the outer relation. It can also parallelizes easily, whether or not we currently can do that. In these cases, the current plan is the preferred plan, and we should keep it. preferred in-memory hash lookup plan: https://explain.depesz.com/s/I1kN This is a case that we would want to avoid the transform, because when both the inner and outer are nullable and the outer is large and the inner is small, the transformed plan would Scan and Materialize the inner for each row of the outer row, which is very slow compared to the untransformed plan: slow case for the transformation: https://explain.depesz.com/s/0CBB However, if the inner is too large to fit into memory, then the transformed plan is faster on all of our other test cases, although our test cases are far from complete. If the current solution supports parallel scan of the outer, for example, then PQ could have lower elapsed time than the non-PQ nested loop solution. Also, remember that the issue with the empty inner was just a bug that was the result of trying to do an additional optimization in the case where there is no WHERE clause in the subquery. That bug has been fixed. The general case transformation described in the base note produces the correct result in all cases, including the empty subquery case. ----- Jim Finnerty, AWS, Amazon Aurora PostgreSQL -- Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
On Sat, 2 Mar 2019 at 13:45, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > David Rowley <david.rowley@2ndquadrant.com> writes: > > I think you're fighting a losing battle here with adding OR quals to > > the join condition. > > Yeah --- that has a nontrivial risk of making things significantly worse, > which makes it a hard sell. I think the most reasonable bet here is > simply to not perform the transformation if we can't prove the inner side > NOT NULL. That's going to catch most of the useful cases anyway IMO. Did you mean outer side NOT NULL? The OR col IS NULL was trying to solve the outer side nullability problem when the inner side is empty. Of course, the inner side needs to not produce NULLs either, but that's due to the fact that if a NULL exists in the inner side then the anti-join should not produce any records. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
David Rowley <david.rowley@2ndquadrant.com> writes: > On Sat, 2 Mar 2019 at 13:45, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Yeah --- that has a nontrivial risk of making things significantly worse, >> which makes it a hard sell. I think the most reasonable bet here is >> simply to not perform the transformation if we can't prove the inner side >> NOT NULL. That's going to catch most of the useful cases anyway IMO. > Did you mean outer side NOT NULL? Sorry, sloppy thinking. > Of course, the inner side needs to not produce NULLs either, but > that's due to the fact that if a NULL exists in the inner side then > the anti-join should not produce any records. Right. So the path of least resistance is to transform to antijoin only if we can prove both of those things (and maybe we need to check that the join operator is strict, too? -ENOCAFFEINE). The question before us is what is the cost-benefit ratio of trying to cope with additional cases. I'm skeptical that it's attractive: the cost certainly seems high, and I don't know that there are very many real-world cases where we'd get a win. Hmm ... thinking about the strictness angle some more: what we really need to optimize NOT IN, IIUC, is an assumption that the join operator will never return NULL. While not having NULL inputs is certainly a *necessary* condition for that (assuming a strict operator) it's not a *sufficient* condition. Any Postgres function/operator is capable of returning NULL whenever it feels like it. So checking strictness does not lead to a mathematically correct optimization. My initial thought about plugging that admittedly-academic point is to insist that the join operator be both strict and a member of a btree opclass (hash might be OK too; less sure about other index types). The system already contains assumptions that btree comparators never return NULL. I doubt that this costs us any real-world applicability, because if the join operator can neither merge nor hash, we're screwed anyway for finding a join plan that's better than nested-loop. regards, tom lane
On Sun, 3 Mar 2019 at 05:25, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Hmm ... thinking about the strictness angle some more: what we really > need to optimize NOT IN, IIUC, is an assumption that the join operator > will never return NULL. While not having NULL inputs is certainly a > *necessary* condition for that (assuming a strict operator) it's not a > *sufficient* condition. Any Postgres function/operator is capable > of returning NULL whenever it feels like it. So checking strictness > does not lead to a mathematically correct optimization. That's something I didn't think of. > My initial thought about plugging that admittedly-academic point is > to insist that the join operator be both strict and a member of a > btree opclass (hash might be OK too; less sure about other index types). > The system already contains assumptions that btree comparators never > return NULL. I doubt that this costs us any real-world applicability, > because if the join operator can neither merge nor hash, we're screwed > anyway for finding a join plan that's better than nested-loop. Why strict? If both inputs are non-NULL, then what additional guarantees does strict give us? I implemented a btree opfamily check in my version of the patch. Not so sure about hash, can you point me in the direction of a mention of how this is guarantees for btree? The attached v1.2 does this adds a regression test using the LINE type. This has an operator named '=', but no btree opfamily. A few other types are in this boat too, per: select typname from pg_type t where not exists(select 1 from pg_amop where amoplefttype = t.oid and amopmethod=403) and exists (select 1 from pg_operator where oprleft = t.oid and oprname = '='); The list of builtin types that have a hash opfamily but no btree opfamily that support NOT IN are not very exciting, so doing the same for hash might not be worth the extra code. select typname from pg_type t where exists(select 1 from pg_amop where amoplefttype = t.oid and amopmethod=405) and exists (select 1 from pg_operator where oprleft = t.oid and oprname = '=') and not exists(select 1 from pg_amop where amoplefttype = t.oid and amopmethod=403); typname --------- xid cid aclitem (3 rows) -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
David Rowley <david.rowley@2ndquadrant.com> writes: > On Sun, 3 Mar 2019 at 05:25, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> My initial thought about plugging that admittedly-academic point is >> to insist that the join operator be both strict and a member of a >> btree opclass (hash might be OK too; less sure about other index types). > Why strict? If both inputs are non-NULL, then what additional > guarantees does strict give us? Yeah, if we're verifying the inputs are non-null, I think that probably doesn't matter. > I implemented a btree opfamily check in my version of the patch. Not > so sure about hash, can you point me in the direction of a mention of > how this is guarantees for btree? https://www.postgresql.org/docs/devel/btree-support-funcs.html quoth The comparison function must take two non-null values A and B and return an int32 value that is < 0, 0, or > 0 when A < B, A = B, or A > B, respectively. A null result is disallowed: all values of the data type must be comparable. (At the code level, this is implicit in the fact that the comparison function will be called via FunctionCall2Coll or a sibling, and those all throw an error if the called function returns NULL.) Now, it doesn't say in so many words that the comparison operators have to yield results consistent with the comparison support function, but I think that's pretty obvious ... For hash, the equivalent constraint is that the hash function has to work for every possible input value. I suppose it's possible that the associated equality operator would sometimes return null, but it's hard to think of a practical reason for doing so. I've not dug in the code, but I wouldn't be too surprised if nodeMergejoin.c or nodeHashjoin.c, or the stuff related to hash grouping or hash aggregation, also contain assumptions about the equality operators not returning null. > The list of builtin types that have a hash opfamily but no btree > opfamily that support NOT IN are not very exciting, so doing the same > for hash might not be worth the extra code. Agreed for builtin types, but there might be some extensions out there where this doesn't hold. It's not terribly hard to imagine a data type that hasn't got a linear sort order but is amenable to hashing. (The in-core xid type is an example, actually.) regards, tom lane
On Sun, 3 Mar 2019 at 17:11, Tom Lane <tgl@sss.pgh.pa.us> wrote: > (At the code level, this is implicit in the fact that the comparison > function will be called via FunctionCall2Coll or a sibling, and those > all throw an error if the called function returns NULL.) > > Now, it doesn't say in so many words that the comparison operators > have to yield results consistent with the comparison support function, > but I think that's pretty obvious ... Ah okay. I can get it to misbehave by setting fcinfo->isnull = true in the debugger from int4eq(). I see the NULL result there is not verified as that's just translated into "false" by ExecInterpExpr()'s EEOP_QUAL case. If you're saying something doing that is fundamentally broken, then I guess we're okay. > David Rowley <david.rowley@2ndquadrant.com> writes: > > The list of builtin types that have a hash opfamily but no btree > > opfamily that support NOT IN are not very exciting, so doing the same > > for hash might not be worth the extra code. > > Agreed for builtin types, but there might be some extensions out there > where this doesn't hold. It's not terribly hard to imagine a data type > that hasn't got a linear sort order but is amenable to hashing. On reflection, it seems pretty easy to add this check, so I've done so in the attached. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
David Rowley <david.rowley@2ndquadrant.com> writes: > On Sun, 3 Mar 2019 at 17:11, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> (At the code level, this is implicit in the fact that the comparison >> function will be called via FunctionCall2Coll or a sibling, and those >> all throw an error if the called function returns NULL.) > Ah okay. I can get it to misbehave by setting fcinfo->isnull = true in > the debugger from int4eq(). I see the NULL result there is not > verified as that's just translated into "false" by ExecInterpExpr()'s > EEOP_QUAL case. If you're saying something doing that is > fundamentally broken, then I guess we're okay. No, what I'm thinking of is this bit in _bt_compare: result = DatumGetInt32(FunctionCall2Coll(&scankey->sk_func, scankey->sk_collation, datum, scankey->sk_argument)); You absolutely will get errors during btree insertions and searches if a datatype's btree comparison functions ever return NULL (for non-NULL inputs). For hash indexes, that kind of restriction only directly applies to hash-calculation functions, which perhaps are not as tightly tied to the opclass's user-visible operators as is the case for btree opclasses. But I think you might be able to find places in hash join or grouping that are calling the actual equality operator and not allowing for it to return NULL. regards, tom lane
On Mon, 4 Mar 2019 at 04:42, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > David Rowley <david.rowley@2ndquadrant.com> writes: > > On Sun, 3 Mar 2019 at 17:11, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> (At the code level, this is implicit in the fact that the comparison > >> function will be called via FunctionCall2Coll or a sibling, and those > >> all throw an error if the called function returns NULL.) > > > Ah okay. I can get it to misbehave by setting fcinfo->isnull = true in > > the debugger from int4eq(). I see the NULL result there is not > > verified as that's just translated into "false" by ExecInterpExpr()'s > > EEOP_QUAL case. If you're saying something doing that is > > fundamentally broken, then I guess we're okay. > > No, what I'm thinking of is this bit in _bt_compare: > > result = DatumGetInt32(FunctionCall2Coll(&scankey->sk_func, > scankey->sk_collation, > datum, > scankey->sk_argument)); > > You absolutely will get errors during btree insertions and searches > if a datatype's btree comparison functions ever return NULL (for > non-NULL inputs). I understand this is the case if an index happens to be used, but there's no guarantee that's going to be the case. I was looking at the case where an index was not used. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
David Rowley <david.rowley@2ndquadrant.com> writes: > On Mon, 4 Mar 2019 at 04:42, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> You absolutely will get errors during btree insertions and searches >> if a datatype's btree comparison functions ever return NULL (for >> non-NULL inputs). > I understand this is the case if an index happens to be used, but > there's no guarantee that's going to be the case. I was looking at the > case where an index was not used. Not following your point? An index opclass is surely not going to be designed on the assumption that it can never be used in an index. Therefore, its support functions can't return NULL unless the index AM allows that. regards, tom lane
On Mon, 4 Mar 2019 at 11:06, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > David Rowley <david.rowley@2ndquadrant.com> writes: > > On Mon, 4 Mar 2019 at 04:42, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> You absolutely will get errors during btree insertions and searches > >> if a datatype's btree comparison functions ever return NULL (for > >> non-NULL inputs). > > > I understand this is the case if an index happens to be used, but > > there's no guarantee that's going to be the case. I was looking at the > > case where an index was not used. > > Not following your point? An index opclass is surely not going to be > designed on the assumption that it can never be used in an index. > Therefore, its support functions can't return NULL unless the index AM > allows that. I agree that it makes sense that the behaviour of the two match. I was trying to hint towards that when I said: > If you're saying something doing that is > fundamentally broken, then I guess we're okay. but likely I didn't make that very clear. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 2/27/19 2:26 AM, David Rowley wrote: > On Wed, 27 Feb 2019 at 13:13, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> >> "Li, Zheng" <zhelli@amazon.com> writes: >>> However, given that the March CommitFest is imminent and the runtime smarts patent concerns David had pointed out (whichI was not aware of before), we would not move that direction at the moment. >> >>> I propose that we collaborate to build one patch from the two patches submitted in this thread for the CF. >> >> TBH, I think it's very unlikely that any patch for this will be seriously >> considered for commit in v12. It would be against project policy, and >> spending a lot of time reviewing the patch would be quite unfair to other >> patches that have been in the queue longer. Therefore, I'd suggest that >> you not bend things out of shape just to have some patch to submit before >> March 1. It'd be better to work with the goal of having a coherent patch >> ready for the first v13 CF, probably July-ish. > > FWIW, I did add this to the March CF, but I set the target version to > 13. I wasn't considering this for PG12. I see Zheng was, but I agree > with you on PG13 being the target for this. Looks like the target version of 13 was removed but I have added it back. Regards, -- -David david@pgmasters.net
On Tue, 5 Mar 2019 at 21:21, David Steele <david@pgmasters.net> wrote: > > On 2/27/19 2:26 AM, David Rowley wrote: > > FWIW, I did add this to the March CF, but I set the target version to > > 13. I wasn't considering this for PG12. I see Zheng was, but I agree > > with you on PG13 being the target for this. > > Looks like the target version of 13 was removed but I have added it back. The problem seems to be that there are now 2 CF entries for this thread. I originally added [1], but later Zheng added [2]. From what Jim mentioned when he opened this thread I had the idea that no patch existed yet, so I posted the one I already had written for this 4 years ago thinking that might be useful to base new work on. I guess Zheng's patch already exists when Jim opened this thread as a patch appeared fairly quickly afterwards. If that's true then I understand that they wouldn't want to drop the work they'd already done in favour of picking mine up. I'm not all that sure what do to about this. It's going to cause quite a bit of confusion having two patches in one thread. Part of me feels that I've hijacked this thread and that I should just drop my patch altogether and help review Zheng's patch, but I'm struggling a bit to do that as I've not managed to find problems with my version, but a few have been pointed out with the other patch (of course, there may be some yet undiscovered issues with my version too). Alternatively, I could take my patch to another thread, but there does not seem to be much sense in that. It might not solve the confusion problem. The best thing would be that if we could work together to make this work, however, we both seem to have fairly different ideas on how it should work. Tom and I both agree that Zheng and Jim's proposal to add OR x IS NULL clauses to the join condition is most likely a no go area due to it disallowing hash and merge anti-joins. The last I can understand from Jim is that they appear to disagree with that and want to do the transformation based on costs. Perhaps they're working on some new ideas to make that more feasible. I'm interested to hear the latest on this. [1] https://commitfest.postgresql.org/22/2020/ [2] https://commitfest.postgresql.org/22/2023/ -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 3/5/19 10:53 AM, David Rowley wrote: > On Tue, 5 Mar 2019 at 21:21, David Steele <david@pgmasters.net> wrote: >> >> On 2/27/19 2:26 AM, David Rowley wrote: >>> FWIW, I did add this to the March CF, but I set the target version to >>> 13. I wasn't considering this for PG12. I see Zheng was, but I agree >>> with you on PG13 being the target for this. >> >> Looks like the target version of 13 was removed but I have added it back. > > The problem seems to be that there are now 2 CF entries for this > thread. I originally added [1], but later Zheng added [2]. From what > Jim mentioned when he opened this thread I had the idea that no patch > existed yet, so I posted the one I already had written for this 4 > years ago thinking that might be useful to base new work on. Yeah, I just figured this out when I got to your patch which was properly marked as PG13 and then saw they were pointing at the same thread. At the very least one of the patch entries should be closed, or moved to a new thread. I'm not sure if I have an issue with competing patches on the same thread. I've seen that before and it can lead to a good outcome. It case, as you say, also lead to confusion. Regards, -- -David david@pgmasters.net
On Sun, 3 Mar 2019 at 01:34, Jim Finnerty <jfinnert@amazon.com> wrote: > I agree with Tom's comment above - when the cost of the NOT IN is dominated > by the cost of the outer scan (i.e. when the cardinality of the outer > relation is large relative to the cardinality of the subquery), and if the > inner cardinality is small enough to fit in memory, then the current > implementation does an efficient hash lookup into an in-memory structure, > and that's a very fast way to do the NOT IN. It almost achieves the > lower-bound cost of scanning the outer relation. It can also parallelizes > easily, whether or not we currently can do that. In these cases, the > current plan is the preferred plan, and we should keep it. If you do the conversion to anti-join (without hacking at the join quals and assuming no nulls are possible), then the planner can decide what's best. The planner may choose a hash join which is not hugely different from a hashed subplan, however from the testing I've done the Hash Join is a bit faster. I imagine there's been more motivation over the years to optimise that over hashed subplans. As far as I know, there's no parallel query support for hashed subplans, but I know there is for hash joins. In short, I don't think it makes any sense to not translate into an anti-join (when possible). I think the best anti-join plan will always be a win over the subquery. The planner could make a mistake of course, but that's a different issue. We certainly don't consider keeping the subquery around for NOT EXISTS. > This is a case that we would want to avoid the transform, because when both > the inner and outer are nullable and the outer is large and the inner is > small, the transformed plan would Scan and Materialize the inner for each > row of the outer row, which is very slow compared to the untransformed plan: > > slow case for the transformation: https://explain.depesz.com/s/0CBB Well, that's because you're forcing the planner into a corner in regards to the join condition. It has no choice but to nested loop that join. > However, if the inner is too large to fit into memory, then the transformed > plan is faster on all of our other test cases, although our test cases are > far from complete. If the current solution supports parallel scan of the > outer, for example, then PQ could have lower elapsed time than the non-PQ > nested loop solution. I'm having a little trouble understanding this. From what I understand the code adds an OR .. IS NULL clause to the join condition. Is this still the case with what you've been testing here? If so, I'm surprised to hear all your test cases are faster. If there's an OR clause in the join condition then the planner has no choice but to use a nested loop join, so it's very surprising that you would find that faster with larger data sets. Or does the code your testing implement this a different way? Perhaps with some execution level support? > Also, remember that the issue with the empty inner was just a bug that was > the result of trying to do an additional optimization in the case where > there is no WHERE clause in the subquery. That bug has been fixed. The > general case transformation described in the base note produces the correct > result in all cases, including the empty subquery case. I'm not sure why lack of WHERE clause in the subquery counts for anything here. The results set from the subquery can be empty or not empty with or without a WHERE clause. The only way you'll know it's empty during planning is if some gating qual says so, but that's yet to be determined by the time the transformation should be done. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
David Steele <david@pgmasters.net> writes: > I'm not sure if I have an issue with competing patches on the same > thread. I've seen that before and it can lead to a good outcome. It > case, as you say, also lead to confusion. It's a bit of a shame that the cfbot will only be testing one of them at a time if we leave it like this. I kind of lean towards the two-thread, two-CF-entry approach because of that. The amount of confusion is a constant. regards, tom lane
On Wed, 6 Mar 2019 at 03:37, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > David Steele <david@pgmasters.net> writes: > > I'm not sure if I have an issue with competing patches on the same > > thread. I've seen that before and it can lead to a good outcome. It > > case, as you say, also lead to confusion. > > It's a bit of a shame that the cfbot will only be testing one of them > at a time if we leave it like this. I kind of lean towards the > two-thread, two-CF-entry approach because of that. The amount of > confusion is a constant. That sounds fine. I'll take mine elsewhere since I didn't start this thread. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Wed, 6 Mar 2019 at 12:25, David Rowley <david.rowley@2ndquadrant.com> wrote: > That sounds fine. I'll take mine elsewhere since I didn't start this thread. Moved to https://www.postgresql.org/message-id/CAKJS1f82pqjqe3WT9_xREmXyG20aOkHc-XqkKZG_yMA7JVJ3Tw%40mail.gmail.com -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Hey, here is our latest patch. Major changes in this patch include: 1. Use the original hashed subplan if the inner fits in memory as decided by subplan_is_hashable(). 2. Fixed the inner relation empty case by adding an inner relation existence check when we pull x out as a filter on theouter (see details below). 3. Integrate David Rowley's routine to use strict predicates and inner join conditions when checking nullability of a Var. Detailed description of the patch: NOT IN to ANTI JOIN transformation If the NOT IN subquery is not eligible for hashed subplan as decided by subplan_is_hashable(), do the following NOT IN to ANTI JOIN transformation: Single expression: When x is nullable: t1.x not in (t2.y where p) => ANTI JOIN t1 (Filter: t1.x IS NOT NULL or NOT EXISTS (select 1 from t2 where p)), t2 on join condition (t1.x=t2.y or t2.y IS NULL) and p. The predicate "t2.y IS NULL" can be removed if y is non-nullable. When x is non-nullable: t1.x not in (t2.y where p) => ANTI JOIN t1, t2 on join condition (t1.x=t2.y or t2.y IS NULL) and p. The predicate "t2.y IS NULL" can be removed if y is non-nullable. Multi expression: If all xi's are nullable: (x1, x2, ... xn) not in (y1, y2, ... yn ) => ANTI JOIN t1, t2 on join condition: ((t1.x1 = t2.y1) and ... (t1.xi = t2.yi) ... and (t1.xn = t2.yn)) IS NOT FALSE. If at least one xi is non-nuallable: (x1, x2, ... xn) not in (y1, y2, ... yn ) => ANTI JOIN t1, t2 on join condition: (t1.x1 = t2.y1 or t2.y1 IS NULL or t1.x1 IS NULL) and ... (t1.xi = t2.yi or t2.yi IS NULL) ... and (t1.xn = t2.yn or t2.yn IS NULL or t1.xn IS NULL). Add nullability testing routine is_node_nonnullable(), currently it handles Var, TargetEntry, CoalesceExpr and Const. It uses strict predicates, inner join conditions and NOT NULL constraint to check the nullability of a Var. Adjust and apply reduce_outer_joins() before the transformation so that the outer joins have an opportunity to be converted to inner joins prior to the transformation. We measured performance improvements of two to five orders of magnitude on most queries in a development environment. In our performance experiments, table s (small) has 11 rows, table l (large) has 1 million rows. s.n and l.n have NULL value. s.nn and l.nn are NOT NULL. Index is created on each column. Cases using hash anti join: l.nn not in (l.nn) 21900s -> 235ms l.nn not in (l.nn where u>0) 22000s -> 240ms l.n not in (l.nn) 21900s -> 238ms l.n not in (l.nn where u>0) 22000s -> 248ms Cases using index nested loop anti join s.n not in (l.nn) 360ms -> 0.5ms s.n not in (l.nn where u>0) 390ms -> 0.6ms s.nn not in (l.nn) 365ms -> 0.5ms s.nn not in (l.nn where u>0) 390ms -> 0.5ms Cases using bitmap heap scan on the inner and nested loop anti join: s.n not in (l.n) 360ms -> 0.7ms l.n not in (l.n) 21900s -> 1.6s l.n not in (l.n where u>0) 22000s -> 1680ms s.nn not in (l.n) 360ms -> 0.5ms l.nn not in (l.n) 21900s -> 1650ms l.nn not in (l.n where u>0) 22000s -> 1660ms Cases using the original hashed subplan: l.n not in (s.n) 63ms -> 63ms l.nn not in (s.nn) 63ms -> 63ms l.n not in (s.n where u>0) 63ms -> 63ms Comments are welcome. Regards, ----------- Zheng Li AWS, Amazon Aurora PostgreSQL
Вложения
In our patch, we only proceed with the ANTI JOIN transformation if subplan_is_hashable(subplan) is false, it requires the subquery to be planned at this point. To avoid planning the subquery again later on, I want to keep a pointer of the subplan in SubLink so that we can directly reuse the subplan when needed. However, this change breaks initdb for some reason and I'm trying to figure it out. I'll send the rebased patch in the following email since it's been a while. -- Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
Rebased patch is attached. Comments are welcome. ----------- Zheng Li AWS, Amazon Aurora PostgreSQL On 6/14/19, 5:39 PM, "zhengli" <zhelli@amazon.com> wrote: In our patch, we only proceed with the ANTI JOIN transformation if subplan_is_hashable(subplan) is false, it requires the subquery to be planned at this point. To avoid planning the subquery again later on, I want to keep a pointer of the subplan in SubLink so that we can directly reuse the subplan when needed. However, this change breaks initdb for some reason and I'm trying to figure it out. I'll send the rebased patch in the following email since it's been a while. -- Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
Вложения
----- To avoid planning the subquery again later on, I want to keep a pointer of the subplan in SubLink so that we can directly reuse the subplan when needed. However, this change breaks initdb for some reason and I'm trying to figure it out. ----- "make clean" solved the initdb issue. This new patch keeps a pointer of the subplan in SubLink so that we can directly reuse the subplan when needed. When the subplan is not hashable (too big to fit in work_mem), the NOT IN query will be flattened to an ANTI JOIN and we won't need to use subplan again. However, when the subplan is hashable, we don't do the conversion and will need to use subplan later, patch v2.1 avoids planning the subquery twice in this case. ----------- Zheng Li AWS, Amazon Aurora PostgreSQL
Вложения
Hi, I'm submitting patch v2.2. This version fixed an issue that involves CTE. Because we call subquery_planner before deciding whether to proceed with thetransformation, we need to setup access to upper level CTEs at this point if the subquery contains any CTE RangeTblEntry. Also added more test cases of NOT IN accessing CTEs, including recursive CTE. It's nice that CTE can use index now! Let me know if you have any comments. Regards, ----------- Zheng Li AWS, Amazon Aurora PostgreSQL
Вложения
On Wed, Jun 26, 2019 at 09:26:16PM +0000, Li, Zheng wrote: > Let me know if you have any comments. I have one: the latest patch visibly applies, but fails to build because of the recent API changes around lists in the backend code. So a rebase is in order. The discussion has not moved a iota in the last couple of months, still as the latest patch has not received reviews, I have moved it to next CF waiting on author. -- Michael
Вложения
Hi Michael, Here is the latest rebased patch. Regards, ----------- Zheng Li AWS, Amazon Aurora PostgreSQL On 11/30/19, 10:43 PM, "Michael Paquier" <michael@paquier.xyz> wrote: On Wed, Jun 26, 2019 at 09:26:16PM +0000, Li, Zheng wrote: > Let me know if you have any comments. I have one: the latest patch visibly applies, but fails to build because of the recent API changes around lists in the backend code. So a rebase is in order. The discussion has not moved a iota in the last couple of months, still as the latest patch has not received reviews, I have moved it to next CF waiting on author. -- Michael
Вложения
At the top of the thread your co-author argued the beginning of this work with "findings about the performance of PostgreSQL, MySQL, and Oracle on various subqueries:" https://antognini.ch/2017/12/how-well-a-query-optimizer-handles-subqueries/ I launched this test set with your "not_in ..." patch. Your optimization improves only results of D10-D13 queries. Nothing has changed for bad plans of the E20-E27 and F20-F27 queries. For example, we can replace E20 query: SELECT * FROM large WHERE n IN (SELECT n FROM small WHERE small.u = large.u); - execution time: 1370 ms, by SELECT * FROM large WHERE EXISTS (SELECT n,u FROM small WHERE (small.u = large.u) AND (large.n = small.n )) AND n IS NOT NULL; - execution time: 0.112 ms E21 query: SELECT * FROM large WHERE n IN (SELECT nn FROM small WHERE small.u = large.u); - 1553 ms, by SELECT * FROM large WHERE EXISTS (SELECT nn FROM small WHERE (small.u = large.u) AND (small.nn = large.n)); - 0.194 ms F27 query: SELECT * FROM large WHERE nn NOT IN (SELECT nn FROM small WHERE small.nu = large.u); - 1653.048 ms, by SELECT * FROM large WHERE NOT EXISTS (SELECT nn,nu FROM small WHERE (small.nu = large.u) AND (small.nn = large.nn)); - 274.019 ms Are you planning to make another patch for these cases? Also i tried to increase work_mem up to 2GB: may be hashed subqueries can improve situation? But hashing is not improved execution time of the queries significantly. On your test cases (from the comments of the patch) the subquery hashing has the same execution time with queries No.13-17. At the queries No.1-12 it is not so slow as without hashing, but works more slowly (up to 3 orders) than NOT IN optimization. On 12/2/19 9:25 PM, Li, Zheng wrote: > Here is the latest rebased patch. -- Andrey Lepikhov Postgres Professional https://postgrespro.com The Russian Postgres Company
Hi Andrey, Thanks for the comment! The unimproved cases you mentioned all fall into the category “correlated subquery”. This category is explicitly disallowedby existing code to convert to join in convert_ANY_sublink_to_join: /* * The sub-select must not refer to any Vars of the parent query. (Vars of * higher levels should be okay, though.) */ if (contain_vars_of_level((Node *) subselect, 1)) return NULL; I think this is also the reason why hashed subplan is not used for such subqueries. It's probably not always safe to convert a correlated subquery to join. We need to find out/prove when it’s safe/unsafe toconvert such ANY subquery if we were to do so. Regards, ----------- Zheng Li AWS, Amazon Aurora PostgreSQL On 1/5/20, 1:12 AM, "Andrey Lepikhov" <a.lepikhov@postgrespro.ru> wrote: At the top of the thread your co-author argued the beginning of this work with "findings about the performance of PostgreSQL, MySQL, and Oracle on various subqueries:" https://antognini.ch/2017/12/how-well-a-query-optimizer-handles-subqueries/ I launched this test set with your "not_in ..." patch. Your optimization improves only results of D10-D13 queries. Nothing has changed for bad plans of the E20-E27 and F20-F27 queries. For example, we can replace E20 query: SELECT * FROM large WHERE n IN (SELECT n FROM small WHERE small.u = large.u); - execution time: 1370 ms, by SELECT * FROM large WHERE EXISTS (SELECT n,u FROM small WHERE (small.u = large.u) AND (large.n = small.n )) AND n IS NOT NULL; - execution time: 0.112 ms E21 query: SELECT * FROM large WHERE n IN (SELECT nn FROM small WHERE small.u = large.u); - 1553 ms, by SELECT * FROM large WHERE EXISTS (SELECT nn FROM small WHERE (small.u = large.u) AND (small.nn = large.n)); - 0.194 ms F27 query: SELECT * FROM large WHERE nn NOT IN (SELECT nn FROM small WHERE small.nu = large.u); - 1653.048 ms, by SELECT * FROM large WHERE NOT EXISTS (SELECT nn,nu FROM small WHERE (small.nu = large.u) AND (small.nn = large.nn)); - 274.019 ms Are you planning to make another patch for these cases? Also i tried to increase work_mem up to 2GB: may be hashed subqueries can improve situation? But hashing is not improved execution time of the queries significantly. On your test cases (from the comments of the patch) the subquery hashing has the same execution time with queries No.13-17. At the queries No.1-12 it is not so slow as without hashing, but works more slowly (up to 3 orders) than NOT IN optimization. On 12/2/19 9:25 PM, Li, Zheng wrote: > Here is the latest rebased patch. -- Andrey Lepikhov Postgres Professional https://postgrespro.com The Russian Postgres Company
On 1/7/20 12:34 AM, Li, Zheng wrote: > Hi Andrey, > > Thanks for the comment! > > The unimproved cases you mentioned all fall into the category “correlated subquery”. This category is explicitly disallowedby existing code to convert to join in convert_ANY_sublink_to_join: > /* > * The sub-select must not refer to any Vars of the parent query. (Vars of > * higher levels should be okay, though.) > */ > if (contain_vars_of_level((Node *) subselect, 1)) > return NULL; > > I think this is also the reason why hashed subplan is not used for such subqueries. > > It's probably not always safe to convert a correlated subquery to join. We need to find out/prove when it’s safe/unsafeto convert such ANY subquery if we were to do so. > Maybe this part of code contains logical error? You optimize only the special case of the "NOT IN" expression, equal to NOT EXISTS. The convert_EXISTS_sublink_to_join() routine can contain vars of the parent query. May be you give an trivial example for this problem? -- Andrey Lepikhov Postgres Professional https://postgrespro.com The Russian Postgres Company
"Li, Zheng" <zhelli@amazon.com> writes: > Here is the latest rebased patch. I noticed that the cfbot is failing to test this because of some trivial merge conflicts, so here's a re-rebased version. I haven't reviewed this in any detail, but here's a couple of notes from having quickly looked through the patch: * I find it entirely unacceptable to stick some planner temporary fields into struct SubLink. If you need that storage you'll have to find some other place to put it. But in point of fact I don't think you need it; it doesn't look to me to be critical to generate the subquery's plan any earlier than make_subplan would have done it. Moreover, you should really strive to *not* do that, because it's likely to get in the way of other future optimizations. As the existing comment in make_subplan already suggests, we might want to delay subplan planning even further than that in future. * I'm also not too happy with the (undocumented) rearrangement of reduce_outer_joins. There's a specific sequence of processing that that's involved in, as documented at the top of prepjointree.c, and I doubt that you can just randomly call it from other places and expect good results. In particular, since JOIN alias var flattening won't have happened yet when this code is being run from pull_up_sublinks, it's unlikely that reduce_outer_joins will reliably get the same answers it would get normally. I also wonder whether it's safe to make the parsetree changes it makes earlier than normal, and whether it will be problematic to run it twice on the same tree, and whether its rather indirect connection to distribute_qual_to_rels is going to misbehave. * The proposed test additions seem to about triple the runtime of subselect.sql. This seems excessive. I also wonder why it's necessary for this test to build its own large tables; couldn't it re-use ones that already exist in the regression database? * Not really sure that we need a new planner GUC for this, but if we do, it needs to be documented. regards, tom lane diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c index 5da0528..16ce707 100644 --- a/src/backend/optimizer/plan/planner.c +++ b/src/backend/optimizer/plan/planner.c @@ -992,7 +992,7 @@ subquery_planner(PlannerGlobal *glob, Query *parse, * preprocessing. */ if (hasOuterJoins) - reduce_outer_joins(root); + reduce_outer_joins(parse); /* * If we have any RTE_RESULT relations, see if they can be deleted from diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c index 3650e83..fa64281 100644 --- a/src/backend/optimizer/plan/subselect.c +++ b/src/backend/optimizer/plan/subselect.c @@ -39,6 +39,8 @@ #include "utils/syscache.h" +bool enable_not_in_transform; + typedef struct convert_testexpr_context { PlannerInfo *root; @@ -158,8 +160,7 @@ get_first_col_type(Plan *plan, Oid *coltype, int32 *coltypmod, * subquery itself is in a resjunk tlist entry whose value is uninteresting). */ static Node * -make_subplan(PlannerInfo *root, Query *orig_subquery, - SubLinkType subLinkType, int subLinkId, +make_subplan(PlannerInfo *root, SubLink *sublink, Node *testexpr, bool isTopQual) { Query *subquery; @@ -171,6 +172,9 @@ make_subplan(PlannerInfo *root, Query *orig_subquery, Plan *plan; List *plan_params; Node *result; + Query *orig_subquery = (Query *) sublink->subselect; + SubLinkType subLinkType = sublink->subLinkType; + int subLinkId = sublink->subLinkId; /* * Copy the source Query node. This is a quick and dirty kluge to resolve @@ -216,24 +220,33 @@ make_subplan(PlannerInfo *root, Query *orig_subquery, /* plan_params should not be in use in current query level */ Assert(root->plan_params == NIL); - /* Generate Paths for the subquery */ - subroot = subquery_planner(root->glob, subquery, - root, - false, tuple_fraction); + if(sublink->subroot != NULL && + sublink->subplan != NULL) + { + subroot = (PlannerInfo *) sublink->subroot; + plan = (Plan *) sublink->subplan; + } + else + { + /* Generate Paths for the subquery */ + subroot = subquery_planner(root->glob, subquery, + root, + false, tuple_fraction); + + /* + * Select best Path and turn it into a Plan. At least for now, there + * seems no reason to postpone doing that. + */ + final_rel = fetch_upper_rel(subroot, UPPERREL_FINAL, NULL); + best_path = get_cheapest_fractional_path(final_rel, tuple_fraction); + + plan = create_plan(subroot, best_path); + } /* Isolate the params needed by this specific subplan */ plan_params = root->plan_params; root->plan_params = NIL; - /* - * Select best Path and turn it into a Plan. At least for now, there - * seems no reason to postpone doing that. - */ - final_rel = fetch_upper_rel(subroot, UPPERREL_FINAL, NULL); - best_path = get_cheapest_fractional_path(final_rel, tuple_fraction); - - plan = create_plan(subroot, best_path); - /* And convert to SubPlan or InitPlan format. */ result = build_subplan(root, plan, subroot, plan_params, subLinkType, subLinkId, @@ -1173,6 +1186,389 @@ inline_cte_walker(Node *node, inline_cte_walker_context *context) return expression_tree_walker(node, inline_cte_walker, context); } +/* Returns a List of Nodes from the testexpr of an Any SubLink */ +static List * +getTestExpr(SubLink *sublink) +{ + Node * testexpr = sublink->testexpr; + Assert(testexpr); + + /* single expression */ + if(IsA(testexpr, OpExpr)) + { + OpExpr *opexpr = (OpExpr *) testexpr; + Node *testnode = linitial(opexpr->args); + + return list_make1(testnode); + } + /* multi-expression */ + else if(IsA(testexpr, BoolExpr)) + { + BoolExpr *bexpr = (BoolExpr *) testexpr; + ListCell *lc; + Node *node; + List *result = NULL; + + foreach(lc, bexpr->args) + { + node = lfirst(lc); + if(IsA(node, OpExpr)) + { + OpExpr *expr = (OpExpr *) node; + result = lappend(result, linitial(expr->args)); + } + else + { + elog(ERROR, "unrecognized node type for testexpr: %d", + (int) nodeTag(node)); + } + } + return result; + } + else + { + elog(ERROR, "unrecognized node type for testexpr: %d", + (int) nodeTag(testexpr)); + } +} + +/* Try to reduce outer joins if there is one in the Query */ +static void +reduce_outer_joins_NOT_IN(Query *parse) +{ + ListCell *lc; + RangeTblEntry *rte; + + foreach(lc, parse->rtable) + { + rte = (RangeTblEntry *) lfirst(lc); + /* try to reduce outer joins if there is one */ + if (rte->rtekind == RTE_JOIN && + IS_OUTER_JOIN(rte->jointype)) + { + reduce_outer_joins(parse); + return; + } + } +} + +/* + * Make clause NOT EXISTS + * (select 1 from t2 where p) for the NOT IN to ANTI JOIN + * transformation. + */ +static +Node * +makeExistsTest_NOT_IN(Query *subselect) +{ + BoolExpr *notExpr = makeNode(BoolExpr); + SubLink *exists = makeNode(SubLink); + Query *selectOne = copyObject(subselect); + Const *oneconst; + TargetEntry *dummyte; + + /* modify subselect target list to contain a dummy const 1 */ + oneconst = makeConst(INT4OID, + -1, + InvalidOid, + sizeof(int32), + Int32GetDatum(1), + false, /* isnull */ + true); /* pass by value */ + dummyte = makeTargetEntry((Expr *) oneconst, + 1, + "one", + false /* resjunk */ ); + selectOne->targetList = list_make1(dummyte); + + /* make EXISTS(select 1 from t2 where p) */ + exists->subLinkType = EXISTS_SUBLINK; + exists->subLinkId = 0; + exists->subselect = (Node *) selectOne; + exists->location = -1; + exists->subroot = NULL; + exists->subplan = NULL; + + /* make NOT EXISTS(select 1 from t2 where p) */ + notExpr->boolop = NOT_EXPR; + notExpr->args = list_make1(exists); + notExpr->location = -1; + + return (Node *) notExpr; +} + +/* + *Allow transformation from NOT IN query to ANTI JOIN if ALL of the + * following conditions are true: + * 1. The GUC enable_not_in_transform is set to true. + * 2. the NOT IN subquery is not hashable, in which case an expensive + * subplan will be generated if we don't transform. + * 3.. the subquery does not define any CTE. + */ +static bool +allow_NOT_IN_transformation(PlannerInfo *root, + SubLink *sublink) +{ + Query *subselect = (Query *) sublink->subselect; + PlannerInfo *subroot; + double tuple_fraction; + RelOptInfo *final_rel; + Path *best_path; + Plan *plan; + + if(! enable_not_in_transform) + return false; + + /* + * Can't flatten if it contains WITH. (We could arrange to pull up the + * WITH into the parent query's cteList, but that risks changing the + * semantics, since a WITH ought to be executed once per associated query + * call.) Note that convert_ANY_sublink_to_join doesn't have to reject + * this case, since it just produces a subquery RTE that doesn't have to + * get flattened into the parent query. + */ + if(subselect->cteList) + return false; + + /* For ALL and ANY subplans, we will be + * able to stop evaluating if the test condition fails or matches, so very + * often not all the tuples will be retrieved; for lack of a better idea, + * specify 50% retrieval. + */ + tuple_fraction = 0.5; + /* + * Generate Paths for the subquery, use a copied version of the subquery + * so that the existing one doesn't get modified. + */ + subroot = subquery_planner(root->glob, copyObject(subselect), + root, false, tuple_fraction); + + /* Select best Path and turn it into a Plan. */ + final_rel = fetch_upper_rel(subroot, UPPERREL_FINAL, NULL); + best_path = get_cheapest_fractional_path(final_rel, tuple_fraction); + + plan = create_plan(subroot, best_path); + + sublink->subroot = (Node *) subroot; + sublink->subplan = (Node *) plan; + /* + * Punt if subplan is hashable since using hashed subplan is almost like + * doing a Hash Anti Join, we probably can't do better than that. + */ + if(subplan_is_hashable(plan)) + { + return false; + } + + return true; +} + +/* + * do the following NOT IN to ANTI JOIN conversions: + * + * When x is non-nullable: + * t1.x not in (t2.y where p) => ANTI JOIN + * t1, t2 on join condition (t1.x=t2.y or t2.y IS NULL) and p. + * the above predicate "t2.y IS NULL" can be removed if y + * is also non-nullable. + * + * When x is nullable: + * t1.x not in (t2.y where p) => ANTI JOIN + * t1 (Filter: t1.x is not null or not exists (select 1 from t2 where p)), + * t2 on join condition (t1.x=t2.y or t2.y is null) and p. + * the above predicate "t2.y IS NULL" can be removed if y + * is also non-nullable. + * + * The multi-expression case is just ANDs of the single- + * expression case. + */ +static bool +convert_NOT_IN_to_join(PlannerInfo *root, Node **quals, + SubLink *sublink, List *subquery_vars, + Node **pullout) +{ + Query *parse = root->parse; + Query *subselect = (Query *) sublink->subselect; + List *testnodes = getTestExpr(sublink); + bool outerNonNull; + bool innerNonNull; + NullTest *nt; + + /* + * Try reduce_outer_joins since outer join affects the nullability test that's coming up next. + * We have to call reduce_outer_joins for outer and inner query separately because we + * don't have a global range table yet. + */ + reduce_outer_joins_NOT_IN(parse); + + reduce_outer_joins_NOT_IN(subselect); + + Assert(testnodes); + outerNonNull = + list_hasnonnullable(testnodes, parse); + innerNonNull = + list_hasnonnullable(subselect->targetList, subselect); + + /* Single-expression case, do the following: + * When x is non-nullable: + * t1.x not in (t2.y where p) => ANTI JOIN + * t1, t2 on join condition (t1.x=t2.y or t2.y IS NULL) and p. + * the above predicate "t2.y IS NULL" can be removed if y + * is also non-nullable. + * + * When x is nullable: + * t1.x not in (t2.y where p) => ANTI JOIN + * t1 (Filter: t1.x is not null or not exists (select 1 from t2 where p)), + * t2 on join condition (t1.x=t2.y or t2.y is null) and p. + * the above predicate "t2.y IS NULL" can be removed if y + * is also non-nullable. + */ + if(IsA(*quals, OpExpr)) + { + /* Add "OR y IS NULL" if y is nullable */ + if(!innerNonNull) + { + /* make expr y IS NULL */ + nt = makeNode(NullTest); + nt->arg = (Expr *)linitial(subquery_vars); + nt->nulltesttype = IS_NULL; + nt->argisrow = false; + nt->location = -1; + + /* make orclause (x = y OR y IS NULL) */ + *quals = (Node *)make_orclause(list_make2(*quals, + (Node *)nt)); + } + + /* + * if x is nullable, make the following filter for t1 : + * x IS NOT NULL or NOT EXISTS (select 1 from t2 where p) + */ + if(!outerNonNull) + { + Node * existsTest = NULL; + + /* make expr x IS NOT NULL */ + nt = makeNode(NullTest); + nt->arg = (Expr *) linitial(testnodes); + nt->nulltesttype = IS_NOT_NULL; + nt->argisrow = false; + nt->location = -1; + + existsTest = makeExistsTest_NOT_IN(subselect); + /* make x IS NOT NULL OR NOT EXISTS (select 1 from t2 where p) */ + *pullout = (Node *)make_orclause(list_make2( + (Node *) nt, existsTest)); + } + } + /* + * Multi-expression case: + * If all xi's are nullable: + * (x1, x2, ... xn) not in (y1, y2, ... yn ) => + * ANTI JOIN t1, + * t2 on join condition: + * ((t1.x1 = t2.y1) and ... (t1.xi = t2.yi) ... and + * (t1.xn = t2.yn)) is NOT FALSE. + * + * If at least one xi is non-nuallable: + * (x1, x2, ... xn) not in (y1, y2, ... yn ) => + * ANTI JOIN t1, + * t2 on join condition: + * (t1.x1 = t2.y1 or t2.y1 is NULL) and ... + * (t1.xi = t2.yi or t2.yi is NULL or t1.xi is NULL) ... and + * (t1.xn = t2.yn or t2.yn is NULL). + */ + else if(IsA(*quals, BoolExpr)) + { + /* + * Add IS NOT FALSE on top of the join condition if ALL x_i's are nullable + */ + if(!outerNonNull) + { + BooleanTest *btest; + + btest = makeNode(BooleanTest); + btest->arg = (Expr *) *quals; + btest->booltesttype = IS_NOT_FALSE; + *quals = (Node *) btest; + } + else + { + ListCell *qualc; + TargetEntry *te; + ListCell *xc = list_head(testnodes); + ListCell *yc = list_head(subquery_vars); + ListCell *ytlc = list_head(subselect->targetList); + List *quallist = ((BoolExpr *)*quals)->args; + Node *joinCondition = NULL; + bool xnonNull; + bool ynonNull; + + /* Reconstruct quals in the loop */ + *quals = NULL; + foreach(qualc, quallist) + { + te = (TargetEntry *)lfirst(ytlc); + /* x_i = y_i */ + joinCondition = lfirst(qualc); + ynonNull = is_node_nonnullable((Node*)te, subselect); + + /* append y_i IS NULL to x_i = y_i if y_i is non-nullable */ + if(!ynonNull) + { + /* make expr y_i IS NULL */ + nt = makeNode(NullTest); + nt->arg = (Expr *)lfirst(yc); + nt->nulltesttype = IS_NULL; + nt->argisrow = false; + nt->location = -1; + + /* make orclause (x_i = y_i OR y_i IS NULL) */ + joinCondition = (Node *)make_orclause(list_make2(joinCondition, + (Node *)nt)); + } + + /* + * Append "OR x_i is null" to the join condition if x_i is nullable. + * Notice at least one x_i should be non-nullable because the all + * x_i's nullable case is handled earlier by adding "IS NOT FALSE" + * on top of the join condition. + */ + xnonNull = is_node_nonnullable(lfirst(xc), parse); + if(!xnonNull) + { + /* make expr x_i IS NULL */ + nt = makeNode(NullTest); + nt->arg = (Expr *)lfirst(xc); + nt->nulltesttype = IS_NULL; + nt->argisrow = false; + nt->location = -1; + + /* make orclause (x_i = y_i OR y_i IS NULL OR x_i IS NULL) */ + joinCondition = (Node *)make_orclause(list_make2(joinCondition, + (Node *)nt)); + } + + /* + * Now append joinCondition to quals as one andclause. + * (x_i = y_i OR y_i IS NULL OR x_i IS NULL) AND + * (x_j = y_j OR y_j IS NULL OR x_j IS NULL)... + */ + *quals = (Node *)make_andclause(list_make2(*quals, joinCondition)); + xc = lnext(testnodes, xc); + yc = lnext(subquery_vars, yc); + ytlc = lnext(subselect->targetList, ytlc); + } + } + } + /* quals should be either OpExpr or BoolExpr, otherwise don't convert */ + else + { + return false; + } + + return true; +} /* * convert_ANY_sublink_to_join: try to convert an ANY SubLink to a join @@ -1210,7 +1606,8 @@ inline_cte_walker(Node *node, inline_cte_walker_context *context) */ JoinExpr * convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink, - Relids available_rels) + bool under_not, Node **pullout, + Relids available_rels) { JoinExpr *result; Query *parse = root->parse; @@ -1254,6 +1651,12 @@ convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink, if (contain_volatile_functions(sublink->testexpr)) return NULL; + if (under_not && + ! allow_NOT_IN_transformation(root, sublink)) + { + return NULL; + } + /* Create a dummy ParseState for addRangeTableEntryForSubquery */ pstate = make_parsestate(NULL); @@ -1293,10 +1696,28 @@ convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink, quals = convert_testexpr(root, sublink->testexpr, subquery_vars); /* + * Try converting x NOT IN (y) to ANTI JOIN. + */ + if(under_not && + !convert_NOT_IN_to_join(root, &quals, + sublink, subquery_vars, pullout)) + { + /* + * In theory, we shouldn't get here since allow_NOT_IN_transformation() + * has already ruled out cases that shouldn't be transformed. In other words, + * I expect convert_NOT_IN_to_join to always return true, but just in case + * it fails, reset parse->rtable which has been changed a few lines above. + */ + parse->rtable = list_delete(parse->rtable, rte); + return NULL; + } + + /* * And finally, build the JoinExpr node. */ result = makeNode(JoinExpr); - result->jointype = JOIN_SEMI; + /* NOT IN will be converted to ANTI JOIN */ + result->jointype = under_not ? JOIN_ANTI : JOIN_SEMI; result->isNatural = false; result->larg = NULL; /* caller must fill this in */ result->rarg = (Node *) rtr; @@ -1885,9 +2306,7 @@ process_sublinks_mutator(Node *node, process_sublinks_context *context) * Now build the SubPlan node and make the expr to return. */ return make_subplan(context->root, - (Query *) sublink->subselect, - sublink->subLinkType, - sublink->subLinkId, + sublink, testexpr, context->isTopQual); } diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c index 1452172..cd1557c 100644 --- a/src/backend/optimizer/prep/prepjointree.c +++ b/src/backend/optimizer/prep/prepjointree.c @@ -113,7 +113,7 @@ static Query *pullup_replace_vars_subquery(Query *query, static reduce_outer_joins_state *reduce_outer_joins_pass1(Node *jtnode); static void reduce_outer_joins_pass2(Node *jtnode, reduce_outer_joins_state *state, - PlannerInfo *root, + Query *parse, Relids nonnullable_rels, List *nonnullable_vars, List *forced_null_vars); @@ -267,6 +267,13 @@ pull_up_sublinks_jointree_recurse(PlannerInfo *root, Node *jtnode, } /* Build the replacement FromExpr; no quals yet */ newf = makeFromExpr(newfromlist, NULL); + /* + * Replace parse->jointree with newf now because we might modify join types + * during reduce_outer_joins() in convert_NOT_IN_to_join() which is called + * in pull_up_sublinks_qual_recurse() that's coming up next. + */ + newf->quals = f->quals; + root->parse->jointree = newf; /* Set up a link representing the rebuilt jointree */ jtlink = (Node *) newf; /* Now process qual --- all children are available for use */ @@ -399,7 +406,7 @@ pull_up_sublinks_qual_recurse(PlannerInfo *root, Node *node, /* Is it a convertible ANY or EXISTS clause? */ if (sublink->subLinkType == ANY_SUBLINK) { - if ((j = convert_ANY_sublink_to_join(root, sublink, + if ((j = convert_ANY_sublink_to_join(root, sublink, false, NULL, available_rels1)) != NULL) { /* Yes; insert the new join node into the join tree */ @@ -425,7 +432,7 @@ pull_up_sublinks_qual_recurse(PlannerInfo *root, Node *node, return NULL; } if (available_rels2 != NULL && - (j = convert_ANY_sublink_to_join(root, sublink, + (j = convert_ANY_sublink_to_join(root, sublink, false, NULL, available_rels2)) != NULL) { /* Yes; insert the new join node into the join tree */ @@ -571,6 +578,68 @@ pull_up_sublinks_qual_recurse(PlannerInfo *root, Node *node, return NULL; } } + else if (sublink->subLinkType == ANY_SUBLINK) + { + Node *pullout = NULL; + + if ((j = convert_ANY_sublink_to_join(root, sublink, true, &pullout, + available_rels1)) != NULL) + { + /* Yes; insert the new join node into the join tree */ + j->larg = *jtlink1; + *jtlink1 = (Node *) j; + /* Recursively process pulled-up jointree nodes */ + j->rarg = pull_up_sublinks_jointree_recurse(root, + j->rarg, + &child_rels); + + /* + * Now recursively process the pulled-up quals. Any inserted + * joins can get stacked onto either j->larg or j->rarg, + * depending on which rels they reference. + */ + j->quals = pull_up_sublinks_qual_recurse(root, + j->quals, + &j->larg, + available_rels1, + &j->rarg, + child_rels); + /* + * Return pullout predicate (x is NOT NULL) if it's not null, + * otherwise return NULL representing constant TRUE. + */ + return pullout? pullout : NULL; + } + if (available_rels2 != NULL && + (j = convert_ANY_sublink_to_join(root, sublink, true, &pullout, + available_rels2)) != NULL) + { + /* Yes; insert the new join node into the join tree */ + j->larg = *jtlink2; + *jtlink2 = (Node *) j; + /* Recursively process pulled-up jointree nodes */ + j->rarg = pull_up_sublinks_jointree_recurse(root, + j->rarg, + &child_rels); + + /* + * Now recursively process the pulled-up quals. Any inserted + * joins can get stacked onto either j->larg or j->rarg, + * depending on which rels they reference. + */ + j->quals = pull_up_sublinks_qual_recurse(root, + j->quals, + &j->larg, + available_rels2, + &j->rarg, + child_rels); + /* + * Return pullout predicate (x is NOT NULL) if it's not null, + * otherwise return NULL representing constant TRUE. + */ + return pullout? pullout : NULL; + } + } } /* Else return it unmodified */ return node; @@ -909,7 +978,13 @@ pull_up_simple_subquery(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte, subroot->parse = subquery; subroot->glob = root->glob; subroot->query_level = root->query_level; - subroot->parent_root = root->parent_root; + /* + * Keep a path to the top level root so that we can recursively access top level + * CTEs in root->parse->cteList, and CTE plans in root->init_plans. This hack + * won't change the original PlannerInfo tree structure because subroot is just + * a auxiliary PlannerInfo to help pulling up subquery. + */ + subroot->parent_root = root; subroot->plan_params = NIL; subroot->outer_params = NULL; subroot->planner_cxt = CurrentMemoryContext; @@ -2558,7 +2633,7 @@ flatten_simple_union_all(PlannerInfo *root) * alias-var expansion). */ void -reduce_outer_joins(PlannerInfo *root) +reduce_outer_joins(Query *parse) { reduce_outer_joins_state *state; @@ -2571,14 +2646,14 @@ reduce_outer_joins(PlannerInfo *root) * join(s) below each side of each join clause. The second pass examines * qual clauses and changes join types as it descends the tree. */ - state = reduce_outer_joins_pass1((Node *) root->parse->jointree); + state = reduce_outer_joins_pass1((Node *) parse->jointree); /* planner.c shouldn't have called me if no outer joins */ if (state == NULL || !state->contains_outer) elog(ERROR, "so where are the outer joins?"); - reduce_outer_joins_pass2((Node *) root->parse->jointree, - state, root, NULL, NIL, NIL); + reduce_outer_joins_pass2((Node *) parse->jointree, + state, parse, NULL, NIL, NIL); } /* @@ -2661,7 +2736,7 @@ reduce_outer_joins_pass1(Node *jtnode) static void reduce_outer_joins_pass2(Node *jtnode, reduce_outer_joins_state *state, - PlannerInfo *root, + Query *parse, Relids nonnullable_rels, List *nonnullable_vars, List *forced_null_vars) @@ -2700,7 +2775,7 @@ reduce_outer_joins_pass2(Node *jtnode, reduce_outer_joins_state *sub_state = lfirst(s); if (sub_state->contains_outer) - reduce_outer_joins_pass2(lfirst(l), sub_state, root, + reduce_outer_joins_pass2(lfirst(l), sub_state, parse, pass_nonnullable_rels, pass_nonnullable_vars, pass_forced_null_vars); @@ -2812,7 +2887,7 @@ reduce_outer_joins_pass2(Node *jtnode, /* Apply the jointype change, if any, to both jointree node and RTE */ if (rtindex && jointype != j->jointype) { - RangeTblEntry *rte = rt_fetch(rtindex, root->parse->rtable); + RangeTblEntry *rte = rt_fetch(rtindex, parse->rtable); Assert(rte->rtekind == RTE_JOIN); Assert(rte->jointype == j->jointype); @@ -2897,7 +2972,7 @@ reduce_outer_joins_pass2(Node *jtnode, pass_nonnullable_vars = NIL; pass_forced_null_vars = NIL; } - reduce_outer_joins_pass2(j->larg, left_state, root, + reduce_outer_joins_pass2(j->larg, left_state, parse, pass_nonnullable_rels, pass_nonnullable_vars, pass_forced_null_vars); @@ -2919,7 +2994,7 @@ reduce_outer_joins_pass2(Node *jtnode, pass_nonnullable_vars = NIL; pass_forced_null_vars = NIL; } - reduce_outer_joins_pass2(j->rarg, right_state, root, + reduce_outer_joins_pass2(j->rarg, right_state, parse, pass_nonnullable_rels, pass_nonnullable_vars, pass_forced_null_vars); diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c index 0c6fe01..46a9877 100644 --- a/src/backend/optimizer/util/clauses.c +++ b/src/backend/optimizer/util/clauses.c @@ -38,10 +38,12 @@ #include "optimizer/optimizer.h" #include "optimizer/plancat.h" #include "optimizer/planmain.h" +#include "optimizer/pathnode.h" #include "parser/analyze.h" #include "parser/parse_agg.h" #include "parser/parse_coerce.h" #include "parser/parse_func.h" +#include "parser/parsetree.h" #include "rewrite/rewriteManip.h" #include "tcop/tcopprot.h" #include "utils/acl.h" @@ -2013,6 +2015,271 @@ find_forced_null_var(Node *node) } /* + * find_innerjoined_rels + * Traverse jointree to locate non-outerjoined-rels and quals above them + * + * We fill innerjoined_rels with the relids of all rels that are not below + * the nullable side of any outer join (which would cause their Vars to be + * possibly NULL regardless of what's in the catalogs). In the same scan, + * we locate all WHERE and JOIN/ON quals that constrain these rels add them to + * the usable_quals list (forming a list with implicit-AND semantics). + * + * Top-level caller must initialize innerjoined_rels/usable_quals to NULL/NIL. + */ +static void +find_innerjoined_rels(Node *jtnode, + Relids *innerjoined_rels, List **usable_quals) +{ + if (jtnode == NULL) + return; + if (IsA(jtnode, RangeTblRef)) + { + int varno = ((RangeTblRef *) jtnode)->rtindex; + + *innerjoined_rels = bms_add_member(*innerjoined_rels, varno); + } + else if (IsA(jtnode, FromExpr)) + { + FromExpr *f = (FromExpr *) jtnode; + ListCell *lc; + + /* All elements of the FROM list are allowable */ + foreach(lc, f->fromlist) + find_innerjoined_rels((Node *) lfirst(lc), + innerjoined_rels, usable_quals); + /* ... and its WHERE quals are too */ + if (f->quals) + *usable_quals = lappend(*usable_quals, f->quals); + } + else if (IsA(jtnode, JoinExpr)) + { + JoinExpr *j = (JoinExpr *) jtnode; + + switch (j->jointype) + { + case JOIN_INNER: + /* visit both children */ + find_innerjoined_rels(j->larg, + innerjoined_rels, usable_quals); + find_innerjoined_rels(j->rarg, + innerjoined_rels, usable_quals); + /* and grab the ON quals too */ + if (j->quals) + *usable_quals = lappend(*usable_quals, j->quals); + break; + + case JOIN_LEFT: + case JOIN_SEMI: + case JOIN_ANTI: + + /* + * Only the left input is possibly non-nullable; furthermore, + * the quals of this join don't constrain the left input. + * Note: we probably can't see SEMI or ANTI joins at this + * point, but if we do, we can treat them like LEFT joins. + */ + find_innerjoined_rels(j->larg, + innerjoined_rels, usable_quals); + break; + + case JOIN_RIGHT: + /* Reverse of the above case */ + find_innerjoined_rels(j->rarg, + innerjoined_rels, usable_quals); + break; + + case JOIN_FULL: + /* Neither side is non-nullable, so stop descending */ + break; + + case JOIN_UNIQUE_OUTER: + case JOIN_UNIQUE_INNER: + /* Don't think we will see JOIN_UNIQUE_OUTER or + * JOIN_UNIQUE_INNER since they are only used internally in + * the planner in a much later phase (in standard_join_search). + */ + break; + + default: + elog(ERROR, "unrecognized join type: %d", + (int) j->jointype); + } + } + else + elog(ERROR, "unrecognized node type: %d", + (int) nodeTag(jtnode)); +} + +/* + * Returns true if the Node passed in is nonnullable. Currently handles Var, + * TargetEntry, CoaleseExpr and Const. + * TODO: Add more supporting cases. + * A Var is nonnullable if: + * It does not appear on the null-padded side of an outer join and it has NOT NULL constraint, + * Or if it's forced non-null by inner join or other strict predicates. + * A CoalesceExpr is nonnullable if it has a non-null argument. + */ +bool +is_node_nonnullable(Node * node, Query *parse) +{ + AttrNumber attno = InvalidAttrNumber; + Oid reloid; + Var *var = NULL; + + /* + * If the query contains set operations, punt. The set ops themselves + * couldn't introduce nulls that weren't in their inputs, but the tlist + * present in the top-level query is just dummy and won't give us useful + * info. We could get an answer by recursing to examine each leaf query, + * but for the moment it doesn't seem worth the extra complication. + * + * Note that we needn't consider other top-level operators such as + * DISTINCT, GROUP BY, etc, as those will not introduce nulls either. + */ + if (parse->setOperations) + return false; + + switch (nodeTag(node)) + { + case T_Var: + { + RangeTblEntry *rte; + + var = (Var *) node; + attno = var->varattno; + rte = rt_fetch(var->varno, parse->rtable); + reloid = rte->relid; + break; + } + case T_TargetEntry: + { + TargetEntry *te = (TargetEntry *)node; + switch(nodeTag(te->expr)) + { + case T_Var: + { + var = (Var *) te->expr; + attno = te->resorigcol; + reloid = te->resorigtbl; + break; + } + /* recurse into is_node_nonnullable for other types of Node */ + default: + return is_node_nonnullable((Node *)te->expr, parse); + } + break; + } + case T_CoalesceExpr: + { + ListCell *arg; + CoalesceExpr *cexpr = (CoalesceExpr *)node; + + /* handle COALESCE Function by looking for non-null argument */ + foreach(arg, cexpr->args) + { + Node *e = lfirst(arg); + + /* recurse into is_node_nonnullable */ + if (is_node_nonnullable(e, parse)) + { + return true; + } + } + break; + } + case T_Const: + { + if(!((Const *) node)->constisnull) + { + return true; + } + break; + } + /* + * TODO: handle more cases to make the nullability test more accurate + * Assume unhandled cases are nullable. + */ + default: + return false; + } + + /* + * If we have found a Var, it is non-null if: + * not on NULL-padded side of an outer join and has NOT NULL constraint + * or forced NOT NULL by inner join conditions or by other strict predicates. + */ + if(var && + reloid != InvalidOid) + { + Relids innerjoined_rels = NULL; + List *innerjoined_useful_quals = NIL; + List *nonnullable_innerjoined_vars = NIL; + + find_innerjoined_rels((Node *) parse->jointree, + &innerjoined_rels, + &innerjoined_useful_quals); + + /* + * Check if the Var is from an INNER JOINed rel, it's also guaranteed + * to not be on the null-padded side of an outer join. + */ + if (bms_is_member(var->varno, innerjoined_rels)) + { + /* + * If Var is from a plain relation and its column is marked + * NOT NULL according to the catalogs, it can't produce NULL. + */ + if (get_attnotnull(reloid, attno)) + { + return true; + } + + /* + * Otherwise check for the existance of strict predicates which filter + * out NULL values for this Var. + */ + nonnullable_innerjoined_vars = + find_nonnullable_vars((Node *) innerjoined_useful_quals); + + if (list_member(nonnullable_innerjoined_vars, var)) + { + return true; + } + } + + /* + * If we get here it means - + * The var is on null-padded side of an outer-join, since we've already + * tried reduce_outer_joins(), there isn't any strict predicates to turn this + * outer join to inner join, then there will be no strict predicates to force + * this var non-null as well. + */ + } + + return false; +} + +/* + * Returns true if at least one Node in the input list is non-nullable + */ +bool +list_hasnonnullable(List * list, Query *parse) +{ + ListCell *lc; + Node *node; + + foreach(lc, list) + { + node = lfirst(lc); + if(is_node_nonnullable(node, parse)) + { + return true; + } + } + return false; +} + +/* * Can we treat a ScalarArrayOpExpr as strict? * * If "falseOK" is true, then a "false" result can be considered strict, diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c index 27bbb58..61ecac7 100644 --- a/src/backend/utils/cache/lsyscache.c +++ b/src/backend/utils/cache/lsyscache.c @@ -934,6 +934,35 @@ get_cast_oid(Oid sourcetypeid, Oid targettypeid, bool missing_ok) return oid; } +/* + * get_attnotnull + * Given the relation id and the attribute number, + * return the "attnotnull" field from the attribute relation. + */ +bool +get_attnotnull(Oid relid, AttrNumber attnum) +{ + HeapTuple tp; + Form_pg_attribute att_tup; + + tp = SearchSysCache2(ATTNUM, + ObjectIdGetDatum(relid), + Int16GetDatum(attnum)); + + if (HeapTupleIsValid(tp)) + { + bool result; + + att_tup = (Form_pg_attribute) GETSTRUCT(tp); + result = att_tup->attnotnull; + ReleaseSysCache(tp); + + return result; + } + /* Assume att is nullable if no valid heap tuple is found */ + return false; +} + /* ---------- COLLATION CACHE ---------- */ /* diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index af876d1..29e3fac 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -1122,6 +1122,15 @@ static struct config_bool ConfigureNamesBool[] = NULL, NULL, NULL }, { + {"enable_not_in_transform", PGC_USERSET, QUERY_TUNING_METHOD, + gettext_noop("Enables the planner to transform NOT IN subquery to ANTI JOIN when possible."), + NULL + }, + &enable_not_in_transform, + true, + NULL, NULL, NULL + }, + { {"geqo", PGC_USERSET, QUERY_TUNING_GEQO, gettext_noop("Enables genetic query optimization."), gettext_noop("This algorithm attempts to do planning without " diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h index d73be2a..12ee3a7 100644 --- a/src/include/nodes/primnodes.h +++ b/src/include/nodes/primnodes.h @@ -657,6 +657,8 @@ typedef struct SubLink List *operName; /* originally specified operator name */ Node *subselect; /* subselect as Query* or raw parsetree */ int location; /* token location, or -1 if unknown */ + Node *subroot; /* PlannerInfo for the subquery */ + Node *subplan; /* best Plan for the subquery */ } SubLink; /* diff --git a/src/include/optimizer/clauses.h b/src/include/optimizer/clauses.h index b7456e3..5078cdd 100644 --- a/src/include/optimizer/clauses.h +++ b/src/include/optimizer/clauses.h @@ -45,6 +45,9 @@ extern List *find_nonnullable_vars(Node *clause); extern List *find_forced_null_vars(Node *clause); extern Var *find_forced_null_var(Node *clause); +extern bool is_node_nonnullable(Node * node, Query *parse); +extern bool list_hasnonnullable(List * list, Query *parse); + extern bool is_pseudo_constant_clause(Node *clause); extern bool is_pseudo_constant_clause_relids(Node *clause, Relids relids); diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h index 735ba09..d103415 100644 --- a/src/include/optimizer/cost.h +++ b/src/include/optimizer/cost.h @@ -60,6 +60,7 @@ extern PGDLLIMPORT bool enable_nestloop; extern PGDLLIMPORT bool enable_material; extern PGDLLIMPORT bool enable_mergejoin; extern PGDLLIMPORT bool enable_hashjoin; +extern PGDLLIMPORT bool enable_not_in_transform; extern PGDLLIMPORT bool enable_gathermerge; extern PGDLLIMPORT bool enable_partitionwise_join; extern PGDLLIMPORT bool enable_partitionwise_aggregate; diff --git a/src/include/optimizer/prep.h b/src/include/optimizer/prep.h index 19c9230..7f807c4 100644 --- a/src/include/optimizer/prep.h +++ b/src/include/optimizer/prep.h @@ -26,7 +26,7 @@ extern void pull_up_sublinks(PlannerInfo *root); extern void preprocess_function_rtes(PlannerInfo *root); extern void pull_up_subqueries(PlannerInfo *root); extern void flatten_simple_union_all(PlannerInfo *root); -extern void reduce_outer_joins(PlannerInfo *root); +extern void reduce_outer_joins(Query *parse); extern void remove_useless_result_rtes(PlannerInfo *root); extern Relids get_relids_in_jointree(Node *jtnode, bool include_joins); extern Relids get_relids_for_join(Query *query, int joinrelid); diff --git a/src/include/optimizer/subselect.h b/src/include/optimizer/subselect.h index d6a872b..a4d309e 100644 --- a/src/include/optimizer/subselect.h +++ b/src/include/optimizer/subselect.h @@ -19,6 +19,8 @@ extern void SS_process_ctes(PlannerInfo *root); extern JoinExpr *convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink, + bool under_not, + Node **pullout, Relids available_rels); extern JoinExpr *convert_EXISTS_sublink_to_join(PlannerInfo *root, SubLink *sublink, diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h index 4e646c5..1e585aa 100644 --- a/src/include/utils/lsyscache.h +++ b/src/include/utils/lsyscache.h @@ -90,6 +90,7 @@ extern char get_attgenerated(Oid relid, AttrNumber attnum); extern Oid get_atttype(Oid relid, AttrNumber attnum); extern void get_atttypetypmodcoll(Oid relid, AttrNumber attnum, Oid *typid, int32 *typmod, Oid *collid); +extern bool get_attnotnull(Oid relid, AttrNumber attnum); extern Oid get_cast_oid(Oid sourcetypeid, Oid targettypeid, bool missing_ok); extern char *get_collation_name(Oid colloid); extern bool get_collation_isdeterministic(Oid colloid); diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out index 4c6cd5f..9a54f35 100644 --- a/src/test/regress/expected/subselect.out +++ b/src/test/regress/expected/subselect.out @@ -1614,3 +1614,2789 @@ select * from x for update; Output: subselect_tbl.f1, subselect_tbl.f2, subselect_tbl.f3 (2 rows) +-- test NON IN to ANTI JOIN conversion +CREATE TABLE s (u INTEGER NOT NULL, n INTEGER NULL, nn INTEGER NOT NULL, p VARCHAR(128) NULL); +insert into s (u, n, nn, p) + select + generate_series(1,3) as u, + generate_series(1,3) as n, + generate_series(1,3) as nn, + 'foo' as p; +insert into s values(1000002, 1000002, 1000002, 'foofoo'); +UPDATE s set n = NULL WHERE n = 3; +analyze s; +CREATE TABLE l (u INTEGER NOT NULL, n INTEGER NULL, nn INTEGER NOT NULL, p VARCHAR(128) NULL); +insert into l (u, n, nn, p) + select + generate_series(1,10000 ) as u, + generate_series(1,10000 ) as n, + generate_series(1,10000 ) as nn, + 'bar' as p; +UPDATE l set n = NULL WHERE n = 7; +CREATE UNIQUE INDEX l_u ON l (u); +CREATE INDEX l_n ON l (n); +CREATE INDEX l_nn ON l (nn); +analyze l; +CREATE TABLE s1 (u INTEGER NOT NULL, n INTEGER NULL, n1 INTEGER NULL, nn INTEGER NOT NULL, p VARCHAR(128) NULL); +insert into s1 (u, n, n1, nn, p) + select + generate_series(1,3) as u, + generate_series(1,3) as n, + generate_series(1,3) as n1, + generate_series(1,3) as nn, + 'foo' as p; +insert into s1 values(1000003, 1000003, 1000003, 1000003, 'foofoo'); +insert into s1 values(1003, 1003, 1003, 1003, 'foofoo'); +UPDATE s1 set n = NULL WHERE n = 3; +UPDATE s1 set n1 = NULL WHERE n = 2; +UPDATE s1 set n1 = NULL WHERE n1 = 3; +analyze s1; +CREATE TABLE empty (u INTEGER NOT NULL, n INTEGER NULL, nn INTEGER NOT NULL, p VARCHAR(128) NULL); +analyze empty; +-- set work_mem to 64KB so that NOT IN to ANTI JOIN optimization will kick in +set work_mem = 64; +-- correctness test 1: inner empty, return every thing from outer including NULL +explain (costs false) select * from s where n not in (select n from empty); + QUERY PLAN +------------------------------------ + Seq Scan on s + Filter: (NOT (hashed SubPlan 1)) + SubPlan 1 + -> Seq Scan on empty +(4 rows) + +select * from s where n not in (select n from empty); + u | n | nn | p +---------+---------+---------+-------- + 1 | 1 | 1 | foo + 2 | 2 | 2 | foo + 1000002 | 1000002 | 1000002 | foofoo + 3 | | 3 | foo +(4 rows) + +-- correctness test 2: inner has NULL, return empty result +explain (costs false) select * from s where n not in (select n from l); + QUERY PLAN +-------------------------------------------------- + Nested Loop Anti Join + InitPlan 1 (returns $0) + -> Seq Scan on l l_1 + -> Seq Scan on s + Filter: ((n IS NOT NULL) OR (NOT $0)) + -> Bitmap Heap Scan on l + Recheck Cond: ((s.n = n) OR (n IS NULL)) + -> BitmapOr + -> Bitmap Index Scan on l_n + Index Cond: (n = s.n) + -> Bitmap Index Scan on l_n + Index Cond: (n IS NULL) +(12 rows) + +select * from s where n not in (select n from l); + u | n | nn | p +---+---+----+--- +(0 rows) + +-- correctness test 3: inner non-null, result has no NULL +explain (costs false) select * from s where n not in (select u from l); + QUERY PLAN +----------------------------------------------- + Nested Loop Anti Join + InitPlan 1 (returns $0) + -> Seq Scan on l l_1 + -> Seq Scan on s + Filter: ((n IS NOT NULL) OR (NOT $0)) + -> Index Only Scan using l_u on l + Index Cond: (u = s.n) +(7 rows) + +select * from s where n not in (select u from l); + u | n | nn | p +---------+---------+---------+-------- + 1000002 | 1000002 | 1000002 | foofoo +(1 row) + +-- correctness test 4: inner has predicate +explain (costs false) select * from s where n not in (select n from l where u > 7); + QUERY PLAN +-------------------------------------------------- + Nested Loop Anti Join + InitPlan 1 (returns $0) + -> Seq Scan on l l_1 + Filter: (u > 7) + -> Seq Scan on s + Filter: ((n IS NOT NULL) OR (NOT $0)) + -> Bitmap Heap Scan on l + Recheck Cond: ((s.n = n) OR (n IS NULL)) + Filter: (u > 7) + -> BitmapOr + -> Bitmap Index Scan on l_n + Index Cond: (n = s.n) + -> Bitmap Index Scan on l_n + Index Cond: (n IS NULL) +(14 rows) + +select * from s where n not in (select n from l where u > 7); + u | n | nn | p +---------+---------+---------+-------- + 1 | 1 | 1 | foo + 2 | 2 | 2 | foo + 1000002 | 1000002 | 1000002 | foofoo +(3 rows) + +-- correctness test 5: multi-expression, (2, 2, null, 2, foo) should be in the result +explain (costs false) select * from s1 where (n,n1) not in (select u,nn from l where u >= 3); + QUERY PLAN +----------------------------------------------------------------- + Nested Loop Anti Join + Join Filter: (((s1.n = l.u) AND (s1.n1 = l.nn)) IS NOT FALSE) + -> Seq Scan on s1 + -> Materialize + -> Seq Scan on l + Filter: (u >= 3) +(6 rows) + +select * from s1 where (n,n1) not in (select u,nn from l where u >= 3); + u | n | n1 | nn | p +---------+---------+---------+---------+-------- + 1 | 1 | 1 | 1 | foo + 1000003 | 1000003 | 1000003 | 1000003 | foofoo + 2 | 2 | | 2 | foo +(3 rows) + +-- correctness test 6: multi-expression, (3, null, null, 3, foo) should not be in the result +explain (costs false) select * from s1 where (n,n1) not in (select u,nn from l where u > 0); + QUERY PLAN +----------------------------------------------------------------- + Nested Loop Anti Join + Join Filter: (((s1.n = l.u) AND (s1.n1 = l.nn)) IS NOT FALSE) + -> Seq Scan on s1 + -> Materialize + -> Seq Scan on l + Filter: (u > 0) +(6 rows) + +select * from s1 where (n,n1) not in (select u,nn from l where u > 0); + u | n | n1 | nn | p +---------+---------+---------+---------+-------- + 1000003 | 1000003 | 1000003 | 1000003 | foofoo +(1 row) + +-- correctness test 6: multi-expression, (3, null, null, 3, foo) should be in the result +explain (costs false) select * from s1 where (n,n1) not in (select u,nn from l where u < 0); + QUERY PLAN +------------------------------------ + Seq Scan on s1 + Filter: (NOT (hashed SubPlan 1)) + SubPlan 1 + -> Index Scan using l_u on l + Index Cond: (u < 0) +(5 rows) + +select * from s1 where (n,n1) not in (select u,nn from l where u < 0); + u | n | n1 | nn | p +---------+---------+---------+---------+-------- + 1 | 1 | 1 | 1 | foo + 1000003 | 1000003 | 1000003 | 1000003 | foofoo + 1003 | 1003 | 1003 | 1003 | foofoo + 2 | 2 | | 2 | foo + 3 | | | 3 | foo +(5 rows) + +-- test using hashed subplan when inner fits in work_mem +explain (costs false) select * from l where n not in (select n from s); + QUERY PLAN +------------------------------------ + Seq Scan on l + Filter: (NOT (hashed SubPlan 1)) + SubPlan 1 + -> Seq Scan on s +(4 rows) + +select * from l where n not in (select n from s); + u | n | nn | p +---+---+----+--- +(0 rows) + +-- test single expression +explain (costs false) select * from s where n not in (select n from l); + QUERY PLAN +-------------------------------------------------- + Nested Loop Anti Join + InitPlan 1 (returns $0) + -> Seq Scan on l l_1 + -> Seq Scan on s + Filter: ((n IS NOT NULL) OR (NOT $0)) + -> Bitmap Heap Scan on l + Recheck Cond: ((s.n = n) OR (n IS NULL)) + -> BitmapOr + -> Bitmap Index Scan on l_n + Index Cond: (n = s.n) + -> Bitmap Index Scan on l_n + Index Cond: (n IS NULL) +(12 rows) + +select * from s where n not in (select n from l); + u | n | nn | p +---+---+----+--- +(0 rows) + +explain (costs false) select * from s where u not in (select u from l); + QUERY PLAN +-------------------------------------- + Nested Loop Anti Join + -> Seq Scan on s + -> Index Only Scan using l_u on l + Index Cond: (u = s.u) +(4 rows) + +select * from s where u not in (select u from l); + u | n | nn | p +---------+---------+---------+-------- + 1000002 | 1000002 | 1000002 | foofoo +(1 row) + +explain (costs false) select * from s where 3*n not in (select n from l); + QUERY PLAN +-------------------------------------------------------- + Nested Loop Anti Join + InitPlan 1 (returns $0) + -> Seq Scan on l l_1 + -> Seq Scan on s + Filter: (((3 * n) IS NOT NULL) OR (NOT $0)) + -> Bitmap Heap Scan on l + Recheck Cond: (((3 * s.n) = n) OR (n IS NULL)) + -> BitmapOr + -> Bitmap Index Scan on l_n + Index Cond: (n = (3 * s.n)) + -> Bitmap Index Scan on l_n + Index Cond: (n IS NULL) +(12 rows) + +select * from s where 3*n not in (select n from l); + u | n | nn | p +---+---+----+--- +(0 rows) + +explain (costs false) select * from s where n not in (select 3*n from l); + QUERY PLAN +----------------------------------------------------------- + Nested Loop Anti Join + Join Filter: ((s.n = (3 * l.n)) OR ((3 * l.n) IS NULL)) + InitPlan 1 (returns $0) + -> Seq Scan on l l_1 + -> Seq Scan on s + Filter: ((n IS NOT NULL) OR (NOT $0)) + -> Materialize + -> Seq Scan on l +(8 rows) + +select * from s where n not in (select 3*n from l); + u | n | nn | p +---+---+----+--- +(0 rows) + +-- test single expression with predicates +explain (costs false) select * from s where n not in (select n from l where u > 0); + QUERY PLAN +-------------------------------------------------- + Nested Loop Anti Join + InitPlan 1 (returns $0) + -> Seq Scan on l l_1 + Filter: (u > 0) + -> Seq Scan on s + Filter: ((n IS NOT NULL) OR (NOT $0)) + -> Bitmap Heap Scan on l + Recheck Cond: ((s.n = n) OR (n IS NULL)) + Filter: (u > 0) + -> BitmapOr + -> Bitmap Index Scan on l_n + Index Cond: (n = s.n) + -> Bitmap Index Scan on l_n + Index Cond: (n IS NULL) +(14 rows) + +select * from s where n not in (select n from l where u > 0); + u | n | nn | p +---+---+----+--- +(0 rows) + +explain (costs false) select * from s where n not in (select n from l where u > 100); + QUERY PLAN +-------------------------------------------------- + Nested Loop Anti Join + InitPlan 1 (returns $0) + -> Seq Scan on l l_1 + Filter: (u > 100) + -> Seq Scan on s + Filter: ((n IS NOT NULL) OR (NOT $0)) + -> Bitmap Heap Scan on l + Recheck Cond: ((s.n = n) OR (n IS NULL)) + Filter: (u > 100) + -> BitmapOr + -> Bitmap Index Scan on l_n + Index Cond: (n = s.n) + -> Bitmap Index Scan on l_n + Index Cond: (n IS NULL) +(14 rows) + +select * from s where n not in (select n from l where u > 100); + u | n | nn | p +---------+---------+---------+-------- + 1 | 1 | 1 | foo + 2 | 2 | 2 | foo + 1000002 | 1000002 | 1000002 | foofoo +(3 rows) + +-- test multi expression +explain (costs false) select * from s where (n,u) not in (select n,u from l); + QUERY PLAN +------------------------------------------------------------- + Nested Loop Anti Join + -> Seq Scan on s + -> Index Scan using l_u on l + Index Cond: (u = s.u) + Filter: ((s.n = n) OR (n IS NULL) OR (s.n IS NULL)) +(5 rows) + +select * from s where (n,u) not in (select n,u from l); + u | n | nn | p +---------+---------+---------+-------- + 1000002 | 1000002 | 1000002 | foofoo +(1 row) + +explain (costs false) select * from s where (u, nn) not in (select u, nn from l); + QUERY PLAN +---------------------------------- + Nested Loop Anti Join + -> Seq Scan on s + -> Index Scan using l_nn on l + Index Cond: (nn = s.nn) + Filter: (s.u = u) +(5 rows) + +select * from s where (u, nn) not in (select u, nn from l); + u | n | nn | p +---------+---------+---------+-------- + 1000002 | 1000002 | 1000002 | foofoo +(1 row) + +explain (costs false) select * from s where (n,u) not in (select u,n from l); + QUERY PLAN +-------------------------------------------------- + Nested Loop Anti Join + -> Seq Scan on s + -> Bitmap Heap Scan on l + Recheck Cond: ((s.u = n) OR (n IS NULL)) + Filter: ((s.n = u) OR (s.n IS NULL)) + -> BitmapOr + -> Bitmap Index Scan on l_n + Index Cond: (n = s.u) + -> Bitmap Index Scan on l_n + Index Cond: (n IS NULL) +(10 rows) + +select * from s where (n,u) not in (select u,n from l); + u | n | nn | p +---------+---------+---------+-------- + 1000002 | 1000002 | 1000002 | foofoo +(1 row) + +explain (costs false) select * from s where (n,u,nn) not in (select u,n,nn from l); + QUERY PLAN +------------------------------------------------------------------------------- + Nested Loop Anti Join + -> Seq Scan on s + -> Index Scan using l_nn on l + Index Cond: (nn = s.nn) + Filter: (((s.n = u) OR (s.n IS NULL)) AND ((s.u = n) OR (n IS NULL))) +(5 rows) + +select * from s where (n,u,nn) not in (select u,n,nn from l); + u | n | nn | p +---------+---------+---------+-------- + 1000002 | 1000002 | 1000002 | foofoo +(1 row) + +explain (costs false) select * from s where (n,u,nn) not in (select u,n,nn from l where u > 1000); + QUERY PLAN +---------------------------------------------------------------------------------------------- + Nested Loop Anti Join + -> Seq Scan on s + -> Index Scan using l_nn on l + Index Cond: (nn = s.nn) + Filter: ((u > 1000) AND ((s.n = u) OR (s.n IS NULL)) AND ((s.u = n) OR (n IS NULL))) +(5 rows) + +select * from s where (n,u,nn) not in (select u,n,nn from l where u > 1000); + u | n | nn | p +---------+---------+---------+-------- + 1 | 1 | 1 | foo + 2 | 2 | 2 | foo + 1000002 | 1000002 | 1000002 | foofoo + 3 | | 3 | foo +(4 rows) + +explain (costs false) select * from s where (n,u,nn) not in (select u,n,nn from l where u > 0); + QUERY PLAN +------------------------------------------------------------------------------------------- + Nested Loop Anti Join + -> Seq Scan on s + -> Index Scan using l_nn on l + Index Cond: (nn = s.nn) + Filter: ((u > 0) AND ((s.n = u) OR (s.n IS NULL)) AND ((s.u = n) OR (n IS NULL))) +(5 rows) + +select * from s where (n,u,nn) not in (select u,n,nn from l where u > 0); + u | n | nn | p +---------+---------+---------+-------- + 1000002 | 1000002 | 1000002 | foofoo +(1 row) + +explain (costs false) select * from s where (n,u,nn) not in (select u,n,nn from l where u > 1); + QUERY PLAN +------------------------------------------------------------------------------------------- + Nested Loop Anti Join + -> Seq Scan on s + -> Index Scan using l_nn on l + Index Cond: (nn = s.nn) + Filter: ((u > 1) AND ((s.n = u) OR (s.n IS NULL)) AND ((s.u = n) OR (n IS NULL))) +(5 rows) + +select * from s where (n,u,nn) not in (select u,n,nn from l where u > 1); + u | n | nn | p +---------+---------+---------+-------- + 1 | 1 | 1 | foo + 1000002 | 1000002 | 1000002 | foofoo +(2 rows) + +-- test multi-table +explain (costs false) select count(*) from s, l where s.n not in (select n from l); + QUERY PLAN +-------------------------------------------------------------- + Aggregate + InitPlan 1 (returns $0) + -> Seq Scan on l l_2 + -> Nested Loop + -> Nested Loop Anti Join + -> Seq Scan on s + Filter: ((n IS NOT NULL) OR (NOT $0)) + -> Bitmap Heap Scan on l l_1 + Recheck Cond: ((s.n = n) OR (n IS NULL)) + -> BitmapOr + -> Bitmap Index Scan on l_n + Index Cond: (n = s.n) + -> Bitmap Index Scan on l_n + Index Cond: (n IS NULL) + -> Seq Scan on l +(15 rows) + +select count(*) from s, l where s.n not in (select n from l); + count +------- + 0 +(1 row) + +explain (costs false) select count(*) from s, l where s.nn not in (select nn from l); + QUERY PLAN +------------------------------------------------------- + Aggregate + -> Nested Loop + -> Nested Loop Anti Join + -> Seq Scan on s + -> Index Only Scan using l_nn on l l_1 + Index Cond: (nn = s.nn) + -> Seq Scan on l +(7 rows) + +select count(*) from s, l where s.nn not in (select nn from l); + count +------- + 10000 +(1 row) + +-- test null padded results from outer join +explain (costs false) select * from s where n not in (select s.nn from l left join s on l.nn = s.nn); + QUERY PLAN +----------------------------------------------------- + Nested Loop Anti Join + Join Filter: ((s.n = s_1.nn) OR (s_1.nn IS NULL)) + InitPlan 1 (returns $0) + -> Nested Loop Left Join + Join Filter: (l_1.nn = s_2.nn) + -> Seq Scan on l l_1 + -> Materialize + -> Seq Scan on s s_2 + -> Seq Scan on s + Filter: ((n IS NOT NULL) OR (NOT $0)) + -> Hash Left Join + Hash Cond: (l.nn = s_1.nn) + -> Seq Scan on l + -> Hash + -> Seq Scan on s s_1 +(15 rows) + +select * from s where n not in (select s.nn from l left join s on l.nn = s.nn); + u | n | nn | p +---+---+----+--- +(0 rows) + +explain (costs false) select * from s where n not in (select s.nn from s right join l on s.nn = l.nn); + QUERY PLAN +----------------------------------------------------- + Nested Loop Anti Join + Join Filter: ((s.n = s_1.nn) OR (s_1.nn IS NULL)) + InitPlan 1 (returns $0) + -> Nested Loop Left Join + Join Filter: (s_2.nn = l_1.nn) + -> Seq Scan on l l_1 + -> Materialize + -> Seq Scan on s s_2 + -> Seq Scan on s + Filter: ((n IS NOT NULL) OR (NOT $0)) + -> Hash Left Join + Hash Cond: (l.nn = s_1.nn) + -> Seq Scan on l + -> Hash + -> Seq Scan on s s_1 +(15 rows) + +select * from s where n not in (select s.nn from s right join l on s.nn = l.nn); + u | n | nn | p +---+---+----+--- +(0 rows) + +explain (costs false) select count(*) from s right join l on s.nn = l.nn where l.nn not in (select nn from s); + QUERY PLAN +------------------------------------------------ + Aggregate + -> Hash Left Join + Hash Cond: (l.nn = s.nn) + -> Seq Scan on l + Filter: (NOT (hashed SubPlan 1)) + SubPlan 1 + -> Seq Scan on s s_1 + -> Hash + -> Seq Scan on s +(9 rows) + +select count(*) from s right join l on s.nn = l.nn where l.nn not in (select nn from s); + count +------- + 9997 +(1 row) + +explain (costs false) select count(*) from s right join l on s.nn = l.nn where s.nn not in (select nn from s); + QUERY PLAN +------------------------------------------ + Aggregate + -> Hash Left Join + Hash Cond: (l.nn = s.nn) + Filter: (NOT (hashed SubPlan 1)) + -> Seq Scan on l + -> Hash + -> Seq Scan on s + SubPlan 1 + -> Seq Scan on s s_1 +(9 rows) + +select count(*) from s right join l on s.nn = l.nn where s.nn not in (select nn from s); + count +------- + 0 +(1 row) + +explain (costs false) select count(*) from s right join l on s.nn=l.nn where l.nn not in (select l.nn from l left join son l.nn = s.nn); + QUERY PLAN +-------------------------------------------------------- + Aggregate + -> Nested Loop Left Join + Join Filter: (s.nn = l.nn) + -> Hash Anti Join + Hash Cond: (l.nn = l_1.nn) + -> Seq Scan on l + -> Hash + -> Hash Left Join + Hash Cond: (l_1.nn = s_1.nn) + -> Seq Scan on l l_1 + -> Hash + -> Seq Scan on s s_1 + -> Seq Scan on s +(13 rows) + +select count(*) from s right join l on s.nn=l.nn where l.nn not in (select l.nn from l left join s on l.nn = s.nn); + count +------- + 0 +(1 row) + +explain (costs false) select count(*) from s right join l on s.nn=l.nn where s.nn not in (select s.nn from l left join son l.nn = s.nn); + QUERY PLAN +------------------------------------------------------------ + Aggregate + InitPlan 1 (returns $0) + -> Nested Loop Left Join + Join Filter: (l_2.nn = s_2.nn) + -> Seq Scan on l l_2 + -> Materialize + -> Seq Scan on s s_2 + -> Nested Loop Anti Join + Join Filter: ((s.nn = s_1.nn) OR (s_1.nn IS NULL)) + -> Hash Left Join + Hash Cond: (l.nn = s.nn) + Filter: ((s.nn IS NOT NULL) OR (NOT $0)) + -> Seq Scan on l + -> Hash + -> Seq Scan on s + -> Materialize + -> Hash Left Join + Hash Cond: (l_1.nn = s_1.nn) + -> Seq Scan on l l_1 + -> Hash + -> Seq Scan on s s_1 +(21 rows) + +select count(*) from s right join l on s.nn=l.nn where s.nn not in (select s.nn from l left join s on l.nn = s.nn); + count +------- + 0 +(1 row) + +explain (costs false) select count(*) from s left join s1 on s.u=s1.u join l on s.u=l.u where s.nn not in (select nn froml); + QUERY PLAN +------------------------------------------------------------- + Aggregate + -> Nested Loop + -> Nested Loop Left Join + Join Filter: (s.u = s1.u) + -> Nested Loop Anti Join + -> Seq Scan on s + -> Index Only Scan using l_nn on l l_1 + Index Cond: (nn = s.nn) + -> Seq Scan on s1 + -> Index Only Scan using l_u on l + Index Cond: (u = s.u) +(11 rows) + +select count(*) from s left join s1 on s.u=s1.u join l on s.u=l.u where s.nn not in (select nn from l); + count +------- + 0 +(1 row) + +explain (costs false) select count(*) from s left join s1 on s.u=s1.u right join l on s.u=l.u where s.nn not in (selectnn from l); + QUERY PLAN +-------------------------------------------------------- + Aggregate + InitPlan 1 (returns $0) + -> Seq Scan on l l_2 + -> Hash Anti Join + Hash Cond: (s.nn = l_1.nn) + -> Hash Left Join + Hash Cond: (l.u = s.u) + Filter: ((s.nn IS NOT NULL) OR (NOT $0)) + -> Seq Scan on l + -> Hash + -> Hash Right Join + Hash Cond: (s1.u = s.u) + -> Seq Scan on s1 + -> Hash + -> Seq Scan on s + -> Hash + -> Seq Scan on l l_1 +(17 rows) + +select count(*) from s left join s1 on s.u=s1.u right join l on s.u=l.u where s.nn not in (select nn from l); + count +------- + 0 +(1 row) + +explain (costs false) select count(*) from s left join s1 on s.u=s1.u left join l on s.u=l.u where s.nn not in (select nnfrom l); + QUERY PLAN +--------------------------------------------------- + Aggregate + -> Nested Loop Left Join + Join Filter: (s.u = s1.u) + -> Nested Loop Anti Join + -> Seq Scan on s + -> Index Only Scan using l_nn on l + Index Cond: (nn = s.nn) + -> Seq Scan on s1 +(8 rows) + +select count(*) from s left join s1 on s.u=s1.u left join l on s.u=l.u where s.nn not in (select nn from l); + count +------- + 1 +(1 row) + +explain (costs false) select count(*) from s right join s1 on s.u=s1.u join l on s.u=l.u where s.nn not in (select nn froml); + QUERY PLAN +------------------------------------------------------------- + Aggregate + -> Nested Loop + -> Nested Loop + Join Filter: (s.u = s1.u) + -> Nested Loop Anti Join + -> Seq Scan on s + -> Index Only Scan using l_nn on l l_1 + Index Cond: (nn = s.nn) + -> Seq Scan on s1 + -> Index Only Scan using l_u on l + Index Cond: (u = s.u) +(11 rows) + +select count(*) from s right join s1 on s.u=s1.u join l on s.u=l.u where s.nn not in (select nn from l); + count +------- + 0 +(1 row) + +explain (costs false) select count(*) from s join s1 on s.u=s1.u right join l on s.u=l.u where s.nn not in (select nn froml); + QUERY PLAN +-------------------------------------------------------- + Aggregate + InitPlan 1 (returns $0) + -> Seq Scan on l l_2 + -> Hash Anti Join + Hash Cond: (s.nn = l_1.nn) + -> Hash Left Join + Hash Cond: (l.u = s.u) + Filter: ((s.nn IS NOT NULL) OR (NOT $0)) + -> Seq Scan on l + -> Hash + -> Hash Join + Hash Cond: (s1.u = s.u) + -> Seq Scan on s1 + -> Hash + -> Seq Scan on s + -> Hash + -> Seq Scan on l l_1 +(17 rows) + +select * from s join s1 on s.u=s1.u right join l on s.u=l.u where s.nn not in (select nn from l); + u | n | nn | p | u | n | n1 | nn | p | u | n | nn | p +---+---+----+---+---+---+----+----+---+---+---+----+--- +(0 rows) + +explain (costs false) select count(*) from s full join s1 on s.u=s1.u join l on s.u=l.u where s.nn not in (select nn froml); + QUERY PLAN +------------------------------------------------------------- + Aggregate + -> Nested Loop + -> Nested Loop Left Join + Join Filter: (s.u = s1.u) + -> Nested Loop Anti Join + -> Seq Scan on s + -> Index Only Scan using l_nn on l l_1 + Index Cond: (nn = s.nn) + -> Seq Scan on s1 + -> Index Only Scan using l_u on l + Index Cond: (u = s.u) +(11 rows) + +select count(*) from s full join s1 on s.u=s1.u join l on s.u=l.u where s.nn not in (select nn from l); + count +------- + 0 +(1 row) + +explain (costs false) select count(*) from s join s1 on s.u=s1.u full join l on s.u=l.u where s.nn not in (select nn froml); + QUERY PLAN +-------------------------------------------------------- + Aggregate + InitPlan 1 (returns $0) + -> Seq Scan on l l_2 + -> Hash Anti Join + Hash Cond: (s.nn = l_1.nn) + -> Hash Full Join + Hash Cond: (l.u = s.u) + Filter: ((s.nn IS NOT NULL) OR (NOT $0)) + -> Seq Scan on l + -> Hash + -> Hash Join + Hash Cond: (s1.u = s.u) + -> Seq Scan on s1 + -> Hash + -> Seq Scan on s + -> Hash + -> Seq Scan on l l_1 +(17 rows) + +select count(*) from s join s1 on s.u=s1.u full join l on s.u=l.u where s.nn not in (select nn from l); + count +------- + 0 +(1 row) + +explain (costs false) select * from s where s.nn not in (select l.nn from l left join s on l.nn=s.nn left join s1 on l.nn=s1.nn); + QUERY PLAN +--------------------------------------------------- + Nested Loop Anti Join + -> Seq Scan on s + -> Nested Loop Left Join + Join Filter: (l.nn = s1.nn) + -> Nested Loop Left Join + Join Filter: (l.nn = s_1.nn) + -> Index Only Scan using l_nn on l + Index Cond: (nn = s.nn) + -> Seq Scan on s s_1 + -> Seq Scan on s1 +(10 rows) + +select * from s where s.nn not in (select l.nn from l left join s on l.nn=s.nn left join s1 on l.nn=s1.nn); + u | n | nn | p +---------+---------+---------+-------- + 1000002 | 1000002 | 1000002 | foofoo +(1 row) + +explain (costs false) select * from s where s.nn not in (select l.nn from l left join s on l.nn=s.nn right join s1 on l.nn=s1.nn); + QUERY PLAN +----------------------------------------------------- + Seq Scan on s + Filter: (NOT (hashed SubPlan 1)) + SubPlan 1 + -> Nested Loop Left Join + Join Filter: (l.nn = s_1.nn) + -> Nested Loop Left Join + -> Seq Scan on s1 + -> Index Only Scan using l_nn on l + Index Cond: (nn = s1.nn) + -> Materialize + -> Seq Scan on s s_1 +(11 rows) + +select * from s where s.nn not in (select l.nn from l left join s on l.nn=s.nn right join s1 on l.nn=s1.nn); + u | n | nn | p +---+---+----+--- +(0 rows) + +explain (costs false) select * from s where (n,u,nn) not in (select l.n,l.u,l.nn from l left join s on l.nn = s.nn); + QUERY PLAN +----------------------------------------------------------------------------------- + Nested Loop Anti Join + -> Seq Scan on s + -> Nested Loop Left Join + Join Filter: (l.nn = s_1.nn) + -> Index Scan using l_nn on l + Index Cond: (nn = s.nn) + Filter: (((s.n = n) OR (n IS NULL) OR (s.n IS NULL)) AND (s.u = u)) + -> Seq Scan on s s_1 +(8 rows) + +select * from s where (n,u,nn) not in (select l.n,l.u,l.nn from l left join s on l.nn = s.nn); + u | n | nn | p +---------+---------+---------+-------- + 1000002 | 1000002 | 1000002 | foofoo +(1 row) + +explain (costs false) select * from s where (n,u,nn) not in (select l.n,l.u,l.nn from l right join s on l.nn = s.nn); + QUERY PLAN +------------------------------------------- + Seq Scan on s + Filter: (NOT (hashed SubPlan 1)) + SubPlan 1 + -> Nested Loop Left Join + -> Seq Scan on s s_1 + -> Index Scan using l_nn on l + Index Cond: (nn = s_1.nn) +(7 rows) + +select * from s where (n,u,nn) not in (select l.n,l.u,l.nn from l left join s on l.nn = s.nn); + u | n | nn | p +---------+---------+---------+-------- + 1000002 | 1000002 | 1000002 | foofoo +(1 row) + +--test reduce outer joins from outer query +explain (costs false) select count(*) from s right join l on s.nn = l.nn where s.nn not in (select nn from l); + QUERY PLAN +-------------------------------------------------------- + Aggregate + InitPlan 1 (returns $0) + -> Seq Scan on l l_2 + -> Hash Anti Join + Hash Cond: (s.nn = l_1.nn) + -> Hash Left Join + Hash Cond: (l.nn = s.nn) + Filter: ((s.nn IS NOT NULL) OR (NOT $0)) + -> Seq Scan on l + -> Hash + -> Seq Scan on s + -> Hash + -> Seq Scan on l l_1 +(13 rows) + +select count(*) from s right join l on s.nn = l.nn where s.nn not in (select nn from l); + count +------- + 0 +(1 row) + +explain (costs false) select count(*) from s right join l on s.nn = l.nn where s.nn not in (select nn from l) and s.u>0; + QUERY PLAN +------------------------------------------------------- + Aggregate + -> Nested Loop + -> Nested Loop Anti Join + -> Seq Scan on s + Filter: (u > 0) + -> Index Only Scan using l_nn on l l_1 + Index Cond: (nn = s.nn) + -> Index Only Scan using l_nn on l + Index Cond: (nn = s.nn) +(9 rows) + +select count(*) from s right join l on s.nn = l.nn where s.nn not in (select nn from l) and s.u>0; + count +------- + 0 +(1 row) + +explain (costs false) select count(*) from s right join l on s.nn = l.nn join s1 on s.u = s1.u where s.nn not in (selectnn from l); + QUERY PLAN +------------------------------------------------------------- + Aggregate + -> Nested Loop + Join Filter: (s.u = s1.u) + -> Nested Loop + -> Nested Loop Anti Join + -> Seq Scan on s + -> Index Only Scan using l_nn on l l_1 + Index Cond: (nn = s.nn) + -> Index Only Scan using l_nn on l + Index Cond: (nn = s.nn) + -> Seq Scan on s1 +(11 rows) + +select count(*) from s right join l on s.nn = l.nn join s1 on s.u = s1.u where s.nn not in (select nn from l); + count +------- + 0 +(1 row) + +explain (costs false) select count(*) from s right join l on s.nn = l.nn right join s1 on s.u = s1.u where s.nn not in (selectnn from l); + QUERY PLAN +--------------------------------------------------------------- + Aggregate + InitPlan 1 (returns $0) + -> Seq Scan on l l_2 + -> Nested Loop Anti Join + -> Nested Loop Left Join + Join Filter: (s.u = s1.u) + Filter: ((s.nn IS NOT NULL) OR (NOT $0)) + -> Seq Scan on s1 + -> Materialize + -> Nested Loop + -> Seq Scan on s + -> Index Only Scan using l_nn on l + Index Cond: (nn = s.nn) + -> Index Only Scan using l_nn on l l_1 + Index Cond: (nn = s.nn) +(15 rows) + +select count(*) from s right join l on s.nn = l.nn right join s1 on s.u = s1.u where s.nn not in (select nn from l); + count +------- + 0 +(1 row) + +explain (costs false) select count(*) from s right join l on s.nn = l.nn left join s1 on s.u = s1.u where s.nn not in (selectnn from l); + QUERY PLAN +-------------------------------------------------------------- + Aggregate + InitPlan 1 (returns $0) + -> Seq Scan on l l_2 + -> Nested Loop Left Join + Join Filter: (s.u = s1.u) + -> Hash Anti Join + Hash Cond: (s.nn = l_1.nn) + -> Hash Left Join + Hash Cond: (l.nn = s.nn) + Filter: ((s.nn IS NOT NULL) OR (NOT $0)) + -> Seq Scan on l + -> Hash + -> Seq Scan on s + -> Hash + -> Seq Scan on l l_1 + -> Seq Scan on s1 +(16 rows) + +select count(*) from s right join l on s.nn = l.nn left join s1 on s.u = s1.u where s.nn not in (select nn from l); + count +------- + 0 +(1 row) + +--test reduce outer joins from subquery +explain (costs false) select * from s where nn not in (select l.nn from l right join s on l.nn = s.nn); + QUERY PLAN +----------------------------------------------- + Seq Scan on s + Filter: (NOT (hashed SubPlan 1)) + SubPlan 1 + -> Nested Loop Left Join + -> Seq Scan on s s_1 + -> Index Only Scan using l_nn on l + Index Cond: (nn = s_1.nn) +(7 rows) + +select * from s where nn not in (select l.nn from l right join s on l.nn = s.nn); + u | n | nn | p +---+---+----+--- +(0 rows) + +explain (costs false) select * from s where nn not in (select l.nn from l right join s on l.nn = s.nn where l.u > 9); + QUERY PLAN +------------------------------------------- + Seq Scan on s + Filter: (NOT (hashed SubPlan 1)) + SubPlan 1 + -> Nested Loop + -> Seq Scan on s s_1 + -> Index Scan using l_nn on l + Index Cond: (nn = s_1.nn) + Filter: (u > 9) +(8 rows) + +select * from s where nn not in (select l.nn from l right join s on l.nn = s.nn where l.u > 9); + u | n | nn | p +---------+---------+---------+-------- + 1 | 1 | 1 | foo + 2 | 2 | 2 | foo + 1000002 | 1000002 | 1000002 | foofoo + 3 | | 3 | foo +(4 rows) + +explain (costs false) select * from s where nn not in (select l.nn from l right join s on l.nn = s.nn where s.u > 9); + QUERY PLAN +----------------------------------------------- + Seq Scan on s + Filter: (NOT (hashed SubPlan 1)) + SubPlan 1 + -> Nested Loop Left Join + -> Seq Scan on s s_1 + Filter: (u > 9) + -> Index Only Scan using l_nn on l + Index Cond: (nn = s_1.nn) +(8 rows) + +select * from s where nn not in (select l.nn from l right join s on l.nn = s.nn where s.u > 9); + u | n | nn | p +---+---+----+--- +(0 rows) + +explain (costs false) select * from s where nn not in (select l.nn from l right join s on l.nn = s.nn join s1 on l.n = s1.n); + QUERY PLAN +------------------------------------------------------- + Seq Scan on s + Filter: (NOT (hashed SubPlan 1)) + SubPlan 1 + -> Nested Loop + Join Filter: (l.n = s1.n) + -> Seq Scan on s1 + -> Materialize + -> Nested Loop + -> Seq Scan on s s_1 + -> Index Scan using l_nn on l + Index Cond: (nn = s_1.nn) +(11 rows) + +select * from s where nn not in (select l.nn from l right join s on l.nn = s.nn join s1 on l.n = s1.n); + u | n | nn | p +---------+---------+---------+-------- + 1000002 | 1000002 | 1000002 | foofoo + 3 | | 3 | foo +(2 rows) + +explain (costs false) select * from s where nn not in (select l.nn from l right join s on l.nn = s.nn right join s1 on l.n= s1.n); + QUERY PLAN +------------------------------------------------------- + Seq Scan on s + Filter: (NOT (hashed SubPlan 1)) + SubPlan 1 + -> Nested Loop Left Join + Join Filter: (l.n = s1.n) + -> Seq Scan on s1 + -> Materialize + -> Nested Loop + -> Seq Scan on s s_1 + -> Index Scan using l_nn on l + Index Cond: (nn = s_1.nn) +(11 rows) + +select * from s where nn not in (select l.nn from l right join s on l.nn = s.nn right join s1 on l.n = s1.n); + u | n | nn | p +---+---+----+--- +(0 rows) + +explain (costs false) select * from s where nn not in (select l.nn from l right join s on l.nn = s.nn left join s1 on l.n= s1.n); + QUERY PLAN +------------------------------------------------- + Seq Scan on s + Filter: (NOT (hashed SubPlan 1)) + SubPlan 1 + -> Nested Loop Left Join + Join Filter: (l.n = s1.n) + -> Nested Loop Left Join + -> Seq Scan on s s_1 + -> Index Scan using l_nn on l + Index Cond: (nn = s_1.nn) + -> Materialize + -> Seq Scan on s1 +(11 rows) + +select * from s where nn not in (select l.nn from l right join s on l.nn = s.nn left join s1 on l.n = s1.n); + u | n | nn | p +---+---+----+--- +(0 rows) + +--test reduce outer join on outer and sub-query +explain (costs false) select count(*) from s right join l on s.nn = l.nn join s1 on s.u = s1.u where s.nn not in (selectl.nn from l right join s on l.nn = s.nn join s1 on l.n = s1.n); + QUERY PLAN +---------------------------------------------------------------------------------- + Aggregate + -> Nested Loop + Join Filter: (s.u = s1.u) + -> Seq Scan on s1 + -> Materialize + -> Nested Loop + -> Seq Scan on s + Filter: (NOT (hashed SubPlan 1)) + SubPlan 1 + -> Nested Loop + Join Filter: (l_1.n = s1_1.n) + -> Seq Scan on s1 s1_1 + -> Materialize + -> Nested Loop + -> Seq Scan on s s_1 + -> Index Scan using l_nn on l l_1 + Index Cond: (nn = s_1.nn) + -> Index Only Scan using l_nn on l + Index Cond: (nn = s.nn) +(19 rows) + +select count(*) from s right join l on s.nn = l.nn join s1 on s.u = s1.u where s.nn not in (select l.nn from l right joins on l.nn = s.nn join s1 on l.n = s1.n); + count +------- + 1 +(1 row) + +explain (costs false) select count(*) from s right join l on s.nn = l.nn left join s1 on s.u = s1.u where s.nn not in (selectl.nn from l right join s on l.nn = s.nn left join s1 on l.n = s1.n); + QUERY PLAN +---------------------------------------------------------- + Aggregate + -> Hash Left Join + Hash Cond: (l.nn = s.nn) + Filter: (NOT (hashed SubPlan 1)) + -> Seq Scan on l + -> Hash + -> Hash Right Join + Hash Cond: (s1.u = s.u) + -> Seq Scan on s1 + -> Hash + -> Seq Scan on s + SubPlan 1 + -> Nested Loop Left Join + Join Filter: (l_1.n = s1_1.n) + -> Nested Loop Left Join + -> Seq Scan on s s_1 + -> Index Scan using l_nn on l l_1 + Index Cond: (nn = s_1.nn) + -> Materialize + -> Seq Scan on s1 s1_1 +(20 rows) + +select count(*) from s right join l on s.nn = l.nn left join s1 on s.u = s1.u where s.nn not in (select l.nn from l rightjoin s on l.nn = s.nn left join s1 on l.n = s1.n); + count +------- + 0 +(1 row) + +-- test union all +explain (costs false) select * from s as t where not exists +(select 1 from (select n as y from l union all + select u as y from s union all + select nn as y from s) as v where t.n=v.y or v.y is null) and n is not null; + QUERY PLAN +-------------------------------------------------------- + Nested Loop Anti Join + -> Seq Scan on s t + Filter: (n IS NOT NULL) + -> Append + -> Bitmap Heap Scan on l + Recheck Cond: ((t.n = n) OR (n IS NULL)) + -> BitmapOr + -> Bitmap Index Scan on l_n + Index Cond: (n = t.n) + -> Bitmap Index Scan on l_n + Index Cond: (n IS NULL) + -> Seq Scan on s + Filter: ((t.n = u) OR (u IS NULL)) + -> Seq Scan on s s_1 + Filter: ((t.n = nn) OR (nn IS NULL)) +(15 rows) + +select * from s as t where not exists +(select 1 from (select n as y from l union all + select u as y from s union all + select nn as y from s) as v where t.n=v.y or v.y is null) and n is not null; + u | n | nn | p +---+---+----+--- +(0 rows) + +explain (costs false) select * from s where n not in +(select n as y from l union all + select u as y from s union all + select nn as y from s); + QUERY PLAN +-------------------------------------------------------- + Nested Loop Anti Join + InitPlan 1 (returns $0) + -> Result + -> Append + -> Seq Scan on l l_1 + -> Seq Scan on s s_3 + -> Seq Scan on s s_4 + -> Seq Scan on s + Filter: ((n IS NOT NULL) OR (NOT $0)) + -> Append + -> Bitmap Heap Scan on l + Recheck Cond: ((s.n = n) OR (n IS NULL)) + -> BitmapOr + -> Bitmap Index Scan on l_n + Index Cond: (n = s.n) + -> Bitmap Index Scan on l_n + Index Cond: (n IS NULL) + -> Seq Scan on s s_1 + Filter: ((s.n = u) OR (u IS NULL)) + -> Seq Scan on s s_2 + Filter: ((s.n = nn) OR (nn IS NULL)) +(21 rows) + +select * from s where n not in +(select n as y from l union all + select u as y from s union all + select nn as y from s); + u | n | nn | p +---+---+----+--- +(0 rows) + +explain (costs false) select count(*) from +(select n as x from s union all select u as x from l) t where t.x not in +(select nn from l); + QUERY PLAN +----------------------------------------------------------- + Aggregate + InitPlan 1 (returns $0) + -> Seq Scan on l l_2 + -> Hash Anti Join + Hash Cond: (s.n = l_1.nn) + -> Append + -> Seq Scan on s + Filter: ((n IS NOT NULL) OR (NOT $0)) + -> Seq Scan on l + Filter: ((u IS NOT NULL) OR (NOT $0)) + -> Hash + -> Seq Scan on l l_1 +(12 rows) + +select count(*) from +(select n as x from s union all select u as x from l) t where t.x not in +(select nn from l); + count +------- + 1 +(1 row) + +explain (costs false) select count(*) from +(select n as x from s union all select n as x from l) t where t.x not in +(select nn from empty); + QUERY PLAN +------------------------------------------------ + Aggregate + -> Append + -> Seq Scan on s + Filter: (NOT (hashed SubPlan 1)) + SubPlan 1 + -> Seq Scan on empty + -> Seq Scan on l + Filter: (NOT (hashed SubPlan 1)) +(8 rows) + +select count(*) from +(select n as x from s union all select n as x from l) t where t.x not in +(select nn from empty); + count +------- + 10004 +(1 row) + +explain (costs false) select count(*) from +(select n as x from s union all select u as x from l) t where t.x not in +(select n as y from l union all + select u as y from s union all + select nn as y from s); + QUERY PLAN +-------------------------------------------------------------------------- + Finalize Aggregate + InitPlan 1 (returns $0) + -> Result + -> Append + -> Seq Scan on l l_2 + -> Seq Scan on s s_3 + -> Seq Scan on s s_4 + -> Gather + Workers Planned: 2 + Params Evaluated: $0 + -> Partial Aggregate + -> Nested Loop Anti Join + -> Parallel Append + -> Parallel Seq Scan on l + Filter: ((u IS NOT NULL) OR (NOT $0)) + -> Parallel Seq Scan on s + Filter: ((n IS NOT NULL) OR (NOT $0)) + -> Append + -> Bitmap Heap Scan on l l_1 + Recheck Cond: ((l.u = n) OR (n IS NULL)) + -> BitmapOr + -> Bitmap Index Scan on l_n + Index Cond: (n = l.u) + -> Bitmap Index Scan on l_n + Index Cond: (n IS NULL) + -> Seq Scan on s s_1 + Filter: ((l.u = u) OR (u IS NULL)) + -> Seq Scan on s s_2 + Filter: ((l.u = nn) OR (nn IS NULL)) +(29 rows) + +select count(*) from +(select n as x from s union all select u as x from l) t where t.x not in +(select n as y from l union all + select u as y from s union all + select nn as y from s); + count +------- + 0 +(1 row) + +-- test multi-levels of NOT IN +explain (costs false) select * from s where n not in (select n from s where n not in (select n from l)); + QUERY PLAN +------------------------------------------------------------ + Seq Scan on s + Filter: (NOT (hashed SubPlan 2)) + SubPlan 2 + -> Nested Loop Anti Join + InitPlan 1 (returns $0) + -> Seq Scan on l + -> Seq Scan on s s_1 + Filter: ((n IS NOT NULL) OR (NOT $0)) + -> Bitmap Heap Scan on l l_1 + Recheck Cond: ((s_1.n = n) OR (n IS NULL)) + -> BitmapOr + -> Bitmap Index Scan on l_n + Index Cond: (n = s_1.n) + -> Bitmap Index Scan on l_n + Index Cond: (n IS NULL) +(15 rows) + +select * from s where n not in (select n from s where n not in (select n from l)); + u | n | nn | p +---------+---------+---------+-------- + 1 | 1 | 1 | foo + 2 | 2 | 2 | foo + 1000002 | 1000002 | 1000002 | foofoo + 3 | | 3 | foo +(4 rows) + +explain (costs false) select * from s where n not in (select n from s where n not in (select u from l)); + QUERY PLAN +------------------------------------------------------- + Seq Scan on s + Filter: (NOT (hashed SubPlan 2)) + SubPlan 2 + -> Nested Loop Anti Join + InitPlan 1 (returns $0) + -> Seq Scan on l + -> Seq Scan on s s_1 + Filter: ((n IS NOT NULL) OR (NOT $0)) + -> Index Only Scan using l_u on l l_1 + Index Cond: (u = s_1.n) +(10 rows) + +select * from s where n not in (select n from s where n not in (select u from l)); + u | n | nn | p +---+---+----+----- + 1 | 1 | 1 | foo + 2 | 2 | 2 | foo +(2 rows) + +explain (costs false) select count(*) from s where u not in +(select n from s1 where not exists + (select 1 from (select n from s1 where u not in (select n from l)) t where t.n = s.n)); + QUERY PLAN +------------------------------------------------------------------------- + Aggregate + -> Seq Scan on s + Filter: (NOT (SubPlan 2)) + SubPlan 2 + -> Result + One-Time Filter: (NOT $2) + InitPlan 1 (returns $2) + -> Nested Loop Anti Join + -> Seq Scan on s1 + Filter: (n = s.n) + -> Bitmap Heap Scan on l + Recheck Cond: ((s1.u = n) OR (n IS NULL)) + -> BitmapOr + -> Bitmap Index Scan on l_n + Index Cond: (n = s1.u) + -> Bitmap Index Scan on l_n + Index Cond: (n IS NULL) + -> Seq Scan on s1 s1_1 +(18 rows) + +select count(*) from s where u not in +(select n from s1 where not exists + (select 1 from (select n from s1 where u not in (select n from l)) t where t.n = s.n)); + count +------- + 0 +(1 row) + +explain (costs false) select * from s where n not in (select n from s1) and u not in (select u from s1) and nn not in (selectnn from s1); + QUERY PLAN +------------------------------------------------------------------------------------------------ + Seq Scan on s + Filter: ((NOT (hashed SubPlan 1)) AND (NOT (hashed SubPlan 2)) AND (NOT (hashed SubPlan 3))) + SubPlan 1 + -> Seq Scan on s1 + SubPlan 2 + -> Seq Scan on s1 s1_1 + SubPlan 3 + -> Seq Scan on s1 s1_2 +(8 rows) + +select * from s where n not in (select n from s1) and u not in (select u from s1) and nn not in (select nn from s1); + u | n | nn | p +---+---+----+--- +(0 rows) + +explain (costs false) select * from s where n not in (select n from s1) and u not in (select u from s1) and nn not in (selectnn from l); + QUERY PLAN +------------------------------------------------------------------------- + Nested Loop Anti Join + -> Seq Scan on s + Filter: ((NOT (hashed SubPlan 1)) AND (NOT (hashed SubPlan 2))) + SubPlan 1 + -> Seq Scan on s1 + SubPlan 2 + -> Seq Scan on s1 s1_1 + -> Index Only Scan using l_nn on l + Index Cond: (nn = s.nn) +(9 rows) + +select * from s where n not in (select n from s1) and u not in (select u from s1) and nn not in (select nn from l); + u | n | nn | p +---+---+----+--- +(0 rows) + +explain (costs false) select count(*) from s where u not in +(select n from s1 where not exists + (select 1 from (select n from s1 where u not in (select n from l)) t where t.n = s.n)) +and nn not in +(select n from s1 where not exists + (select 1 from (select n from s1 where u not in (select n from l)) t where t.n = s.n)); + QUERY PLAN +--------------------------------------------------------------------------- + Aggregate + -> Seq Scan on s + Filter: ((NOT (SubPlan 2)) AND (NOT (SubPlan 4))) + SubPlan 2 + -> Result + One-Time Filter: (NOT $2) + InitPlan 1 (returns $2) + -> Nested Loop Anti Join + -> Seq Scan on s1 + Filter: (n = s.n) + -> Bitmap Heap Scan on l + Recheck Cond: ((s1.u = n) OR (n IS NULL)) + -> BitmapOr + -> Bitmap Index Scan on l_n + Index Cond: (n = s1.u) + -> Bitmap Index Scan on l_n + Index Cond: (n IS NULL) + -> Seq Scan on s1 s1_1 + SubPlan 4 + -> Result + One-Time Filter: (NOT $6) + InitPlan 3 (returns $6) + -> Nested Loop Anti Join + -> Seq Scan on s1 s1_2 + Filter: (n = s.n) + -> Bitmap Heap Scan on l l_1 + Recheck Cond: ((s1_2.u = n) OR (n IS NULL)) + -> BitmapOr + -> Bitmap Index Scan on l_n + Index Cond: (n = s1_2.u) + -> Bitmap Index Scan on l_n + Index Cond: (n IS NULL) + -> Seq Scan on s1 s1_3 +(33 rows) + +select count(*) from s where u not in +(select n from s1 where not exists + (select 1 from (select n from s1 where u not in (select n from l)) t where t.n = s.n)) +and nn not in +(select n from s1 where not exists + (select 1 from (select n from s1 where u not in (select n from l)) t where t.n = s.n)); + count +------- + 0 +(1 row) + +--test COALESCE +explain (costs false) select * from s where COALESCE(n, -1) not in (select COALESCE(n, -1) from l); + QUERY PLAN +---------------------------------------------------------------------------- + Hash Anti Join + Hash Cond: (COALESCE(s.n, '-1'::integer) = COALESCE(l.n, '-1'::integer)) + -> Seq Scan on s + -> Hash + -> Seq Scan on l +(5 rows) + +select * from s where COALESCE(n, -1) not in (select COALESCE(n, -1) from l); + u | n | nn | p +---------+---------+---------+-------- + 1000002 | 1000002 | 1000002 | foofoo +(1 row) + +explain (costs false) select * from s where COALESCE(n, NULL, -1) not in (select COALESCE(n, NULL, -1) from l); + QUERY PLAN +---------------------------------------------------------------------------- + Hash Anti Join + Hash Cond: (COALESCE(s.n, '-1'::integer) = COALESCE(l.n, '-1'::integer)) + -> Seq Scan on s + -> Hash + -> Seq Scan on l +(5 rows) + +select * from s where COALESCE(n, NULL, -1) not in (select COALESCE(n, NULL, -1) from l); + u | n | nn | p +---------+---------+---------+-------- + 1000002 | 1000002 | 1000002 | foofoo +(1 row) + +explain (costs false) select * from s where COALESCE(n, NULL, NULL) not in (select COALESCE(n, NULL, NULL) from l); + QUERY PLAN +----------------------------------------------------------------------------- + Nested Loop Anti Join + Join Filter: ((COALESCE(s.n) = COALESCE(l.n)) OR (COALESCE(l.n) IS NULL)) + InitPlan 1 (returns $0) + -> Seq Scan on l l_1 + -> Seq Scan on s + Filter: ((COALESCE(n) IS NOT NULL) OR (NOT $0)) + -> Materialize + -> Seq Scan on l +(8 rows) + +select * from s where COALESCE(n, NULL, NULL) not in (select COALESCE(n, NULL, NULL) from l); + u | n | nn | p +---+---+----+--- +(0 rows) + +explain (costs false) select * from s where COALESCE(n, nn) not in (select COALESCE(n, nn) from l); + QUERY PLAN +---------------------------------------------------------- + Hash Anti Join + Hash Cond: (COALESCE(s.n, s.nn) = COALESCE(l.n, l.nn)) + -> Seq Scan on s + -> Hash + -> Seq Scan on l +(5 rows) + +select * from s where COALESCE(n, nn) not in (select COALESCE(n, nn) from l); + u | n | nn | p +---------+---------+---------+-------- + 1000002 | 1000002 | 1000002 | foofoo +(1 row) + +explain (costs false) select * from s where COALESCE(nn, NULL) not in (select COALESCE(nn, NULL) from l); + QUERY PLAN +------------------------------------------------ + Hash Anti Join + Hash Cond: (COALESCE(s.nn) = COALESCE(l.nn)) + -> Seq Scan on s + -> Hash + -> Seq Scan on l +(5 rows) + +select * from s where COALESCE(nn, NULL) not in (select COALESCE(nn, NULL) from l); + u | n | nn | p +---------+---------+---------+-------- + 1000002 | 1000002 | 1000002 | foofoo +(1 row) + +explain (costs false) select * from s where (COALESCE(n, -1), nn, COALESCE(n, u)) not in (select COALESCE(n, -1), nn, COALESCE(n,u) from l); + QUERY PLAN +------------------------------------------------------------------------------------------------------------------------- + Nested Loop Anti Join + -> Seq Scan on s + -> Index Scan using l_nn on l + Index Cond: (nn = s.nn) + Filter: ((COALESCE(s.n, '-1'::integer) = COALESCE(n, '-1'::integer)) AND (COALESCE(s.n, s.u) = COALESCE(n, u))) +(5 rows) + +select * from s where (COALESCE(n, -1), nn, COALESCE(n, u)) not in (select COALESCE(n, -1), nn, COALESCE(n, u) from l); + u | n | nn | p +---------+---------+---------+-------- + 1000002 | 1000002 | 1000002 | foofoo + 3 | | 3 | foo +(2 rows) + +-- test miscellaneous outer nullable cases +explain (costs false) select * from s where (n,n) not in (select n,n from l); + QUERY PLAN +------------------------------------------------------------- + Nested Loop Anti Join + Join Filter: (((s.n = l.n) AND (s.n = l.n)) IS NOT FALSE) + -> Seq Scan on s + -> Materialize + -> Seq Scan on l +(5 rows) + +select * from s where (n,n) not in (select n,n from l); + u | n | nn | p +---+---+----+--- +(0 rows) + +explain (costs false) select * from s right join l on s.nn = l.nn where (s.n,s.u,s.nn) not in (select n,u,nn from l); + QUERY PLAN +------------------------------------------------------------------------------------- + Nested Loop Anti Join + Join Filter: (((s.n = l_1.n) AND (s.u = l_1.u) AND (s.nn = l_1.nn)) IS NOT FALSE) + -> Hash Left Join + Hash Cond: (l.nn = s.nn) + -> Seq Scan on l + -> Hash + -> Seq Scan on s + -> Materialize + -> Seq Scan on l l_1 +(9 rows) + +select * from s right join l on s.nn = l.nn where (s.n,s.u,s.nn) not in (select n,u,nn from l); + u | n | nn | p | u | n | nn | p +---+---+----+---+---+---+----+--- +(0 rows) + +explain (costs false) select count(*) from s right join l on s.nn = l.nn where (s.n,s.u,s.nn) not in (select n,u,nn froml where u < 0); + QUERY PLAN +--------------------------------------------- + Aggregate + -> Hash Left Join + Hash Cond: (l.nn = s.nn) + Filter: (NOT (hashed SubPlan 1)) + -> Seq Scan on l + -> Hash + -> Seq Scan on s + SubPlan 1 + -> Index Scan using l_u on l l_1 + Index Cond: (u < 0) +(10 rows) + +select count(*) from s right join l on s.nn = l.nn where (s.n,s.u,s.nn) not in (select n,u,nn from l where u < 0); + count +------- + 10000 +(1 row) + +explain (costs false) select * from s where (n,n,n) not in (select distinct n,n,n from l where u > 0 limit 3) order by n; + QUERY PLAN +----------------------------------------------------- + Sort + Sort Key: s.n + -> Seq Scan on s + Filter: (NOT (hashed SubPlan 1)) + SubPlan 1 + -> Limit + -> Unique + -> Index Scan using l_n on l + Filter: (u > 0) +(9 rows) + +select * from s where (n,n,n) not in (select distinct n,n,n from l where u > 0 limit 3) order by n; + u | n | nn | p +---------+---------+---------+-------- + 1000002 | 1000002 | 1000002 | foofoo +(1 row) + +--test outer has strict predicate or inner join +explain (costs false) select * from s where n not in (select n from l) and n > 0; + QUERY PLAN +-------------------------------------------------- + Nested Loop Anti Join + -> Seq Scan on s + Filter: (n > 0) + -> Bitmap Heap Scan on l + Recheck Cond: ((s.n = n) OR (n IS NULL)) + -> BitmapOr + -> Bitmap Index Scan on l_n + Index Cond: (n = s.n) + -> Bitmap Index Scan on l_n + Index Cond: (n IS NULL) +(10 rows) + +select * from s where n not in (select n from l) and n > 0; + u | n | nn | p +---+---+----+--- +(0 rows) + +explain (costs false) select * from s where n not in (select n from l) and u > 0; + QUERY PLAN +------------------------------------------------------------- + Nested Loop Anti Join + InitPlan 1 (returns $0) + -> Seq Scan on l l_1 + -> Seq Scan on s + Filter: (((n IS NOT NULL) OR (NOT $0)) AND (u > 0)) + -> Bitmap Heap Scan on l + Recheck Cond: ((s.n = n) OR (n IS NULL)) + -> BitmapOr + -> Bitmap Index Scan on l_n + Index Cond: (n = s.n) + -> Bitmap Index Scan on l_n + Index Cond: (n IS NULL) +(12 rows) + +select * from s where n not in (select n from l) and u > 0; + u | n | nn | p +---+---+----+--- +(0 rows) + +explain (costs false) select * from s where n not in (select n from l) and n is not null; + QUERY PLAN +-------------------------------------------------- + Nested Loop Anti Join + -> Seq Scan on s + Filter: (n IS NOT NULL) + -> Bitmap Heap Scan on l + Recheck Cond: ((s.n = n) OR (n IS NULL)) + -> BitmapOr + -> Bitmap Index Scan on l_n + Index Cond: (n = s.n) + -> Bitmap Index Scan on l_n + Index Cond: (n IS NULL) +(10 rows) + +select * from s where n not in (select n from l) and n is not null; + u | n | nn | p +---+---+----+--- +(0 rows) + +explain (costs false) select * from s join l on s.n = l.n where s.n not in (select n from l); + QUERY PLAN +-------------------------------------------------------- + Nested Loop + -> Nested Loop Anti Join + -> Seq Scan on s + -> Bitmap Heap Scan on l l_1 + Recheck Cond: ((s.n = n) OR (n IS NULL)) + -> BitmapOr + -> Bitmap Index Scan on l_n + Index Cond: (n = s.n) + -> Bitmap Index Scan on l_n + Index Cond: (n IS NULL) + -> Index Scan using l_n on l + Index Cond: (n = s.n) +(12 rows) + +select * from s join l on s.n = l.n where s.n not in (select n from l); + u | n | nn | p | u | n | nn | p +---+---+----+---+---+---+----+--- +(0 rows) + +explain (costs false) select count(*) from s right join l on s.n = l.n where s.n not in (select n from l); + QUERY PLAN +-------------------------------------------------------- + Aggregate + InitPlan 1 (returns $0) + -> Seq Scan on l l_2 + -> Nested Loop Anti Join + -> Hash Left Join + Hash Cond: (l.n = s.n) + Filter: ((s.n IS NOT NULL) OR (NOT $0)) + -> Seq Scan on l + -> Hash + -> Seq Scan on s + -> Bitmap Heap Scan on l l_1 + Recheck Cond: ((s.n = n) OR (n IS NULL)) + -> BitmapOr + -> Bitmap Index Scan on l_n + Index Cond: (n = s.n) + -> Bitmap Index Scan on l_n + Index Cond: (n IS NULL) +(17 rows) + +select count(*) from s right join l on s.n = l.n where s.n not in (select n from l); + count +------- + 0 +(1 row) + +explain (costs false) select count(*) from s right join l on s.n = l.n join s1 on s.u = s1.u where s.n not in (select nfrom l); + QUERY PLAN +-------------------------------------------------------------------- + Aggregate + -> Nested Loop + Join Filter: (s.u = s1.u) + -> Nested Loop + -> Nested Loop Anti Join + -> Seq Scan on s + -> Bitmap Heap Scan on l l_1 + Recheck Cond: ((s.n = n) OR (n IS NULL)) + -> BitmapOr + -> Bitmap Index Scan on l_n + Index Cond: (n = s.n) + -> Bitmap Index Scan on l_n + Index Cond: (n IS NULL) + -> Index Only Scan using l_n on l + Index Cond: (n = s.n) + -> Seq Scan on s1 +(16 rows) + +select count(*) from s right join l on s.n = l.n join s1 on s.u = s1.u where s.n not in (select n from l); + count +------- + 0 +(1 row) + +explain (costs false) select count(*) from s join l on s.n = l.n right join s1 on s.u = s1.u where s.n not in (select nfrom l); + QUERY PLAN +-------------------------------------------------------------- + Aggregate + InitPlan 1 (returns $0) + -> Seq Scan on l l_2 + -> Nested Loop Anti Join + -> Nested Loop Left Join + Join Filter: (s.u = s1.u) + Filter: ((s.n IS NOT NULL) OR (NOT $0)) + -> Seq Scan on s1 + -> Materialize + -> Nested Loop + -> Seq Scan on s + -> Index Only Scan using l_n on l + Index Cond: (n = s.n) + -> Bitmap Heap Scan on l l_1 + Recheck Cond: ((s.n = n) OR (n IS NULL)) + -> BitmapOr + -> Bitmap Index Scan on l_n + Index Cond: (n = s.n) + -> Bitmap Index Scan on l_n + Index Cond: (n IS NULL) +(20 rows) + +select count(*) from s join l on s.n = l.n right join s1 on s.u = s1.u where s.n not in (select n from l); + count +------- + 0 +(1 row) + +--test inner has strict predicate or inner join +explain (costs false) select * from s where u not in (select n from l where n > 0); + QUERY PLAN +--------------------------------------------- + Nested Loop Anti Join + -> Seq Scan on s + -> Index Only Scan using l_n on l + Index Cond: ((n = s.u) AND (n > 0)) +(4 rows) + +select * from s where u not in (select n from l where n > 0); + u | n | nn | p +---------+---------+---------+-------- + 1000002 | 1000002 | 1000002 | foofoo +(1 row) + +explain (costs false) select * from s where u not in (select n from l where u > 0); + QUERY PLAN +-------------------------------------------------- + Nested Loop Anti Join + -> Seq Scan on s + -> Bitmap Heap Scan on l + Recheck Cond: ((s.u = n) OR (n IS NULL)) + Filter: (u > 0) + -> BitmapOr + -> Bitmap Index Scan on l_n + Index Cond: (n = s.u) + -> Bitmap Index Scan on l_n + Index Cond: (n IS NULL) +(10 rows) + +select * from s where u not in (select n from l where u > 0); + u | n | nn | p +---+---+----+--- +(0 rows) + +explain (costs false) select * from s where u not in (select n from l where n is not null); + QUERY PLAN +----------------------------------------------------- + Nested Loop Anti Join + -> Seq Scan on s + -> Index Only Scan using l_n on l + Index Cond: ((n = s.u) AND (n IS NOT NULL)) +(4 rows) + +select * from s where u not in (select n from l where n is not null); + u | n | nn | p +---------+---------+---------+-------- + 1000002 | 1000002 | 1000002 | foofoo +(1 row) + +explain (costs false) select * from s where u not in (select l.n from l join s on l.n=s.n); + QUERY PLAN +---------------------------------------------- + Seq Scan on s + Filter: (NOT (hashed SubPlan 1)) + SubPlan 1 + -> Nested Loop + -> Seq Scan on s s_1 + -> Index Only Scan using l_n on l + Index Cond: (n = s_1.n) +(7 rows) + +select * from s where u not in (select l.n from l join s on l.n=s.n); + u | n | nn | p +---------+---------+---------+-------- + 1000002 | 1000002 | 1000002 | foofoo + 3 | | 3 | foo +(2 rows) + +explain (costs false) select * from s where u not in (select l.n from l join s on l.u=s.u); + QUERY PLAN +----------------------------------------- + Seq Scan on s + Filter: (NOT (hashed SubPlan 1)) + SubPlan 1 + -> Nested Loop + -> Seq Scan on s s_1 + -> Index Scan using l_u on l + Index Cond: (u = s_1.u) +(7 rows) + +select * from s where u not in (select l.n from l join s on l.u=s.u); + u | n | nn | p +---------+---------+---------+-------- + 1000002 | 1000002 | 1000002 | foofoo +(1 row) + +explain (costs false) select * from s where u not in (select l.n from l join s on l.n = s.n); + QUERY PLAN +---------------------------------------------- + Seq Scan on s + Filter: (NOT (hashed SubPlan 1)) + SubPlan 1 + -> Nested Loop + -> Seq Scan on s s_1 + -> Index Only Scan using l_n on l + Index Cond: (n = s_1.n) +(7 rows) + +select * from s where u not in (select l.n from l join s on l.n = s.n); + u | n | nn | p +---------+---------+---------+-------- + 1000002 | 1000002 | 1000002 | foofoo + 3 | | 3 | foo +(2 rows) + +explain (costs false) select * from s where u not in (select l.n from l right join s on l.n = s.n); + QUERY PLAN +---------------------------------------------- + Seq Scan on s + Filter: (NOT (hashed SubPlan 1)) + SubPlan 1 + -> Nested Loop Left Join + -> Seq Scan on s s_1 + -> Index Only Scan using l_n on l + Index Cond: (n = s_1.n) +(7 rows) + +select * from s where u not in (select l.n from l right join s on l.n = s.n); + u | n | nn | p +---+---+----+--- +(0 rows) + +explain (costs false) select * from s where u not in (select l.n from l right join s on l.n=s.n join s1 on l.n=s1.n); + QUERY PLAN +---------------------------------------------- + Seq Scan on s + Filter: (NOT (hashed SubPlan 1)) + SubPlan 1 + -> Nested Loop + -> Nested Loop + Join Filter: (s_1.n = s1.n) + -> Seq Scan on s1 + -> Materialize + -> Seq Scan on s s_1 + -> Index Only Scan using l_n on l + Index Cond: (n = s_1.n) +(11 rows) + +select * from s where u not in (select l.n from l right join s on l.n=s.n join s1 on l.n=s1.n); + u | n | nn | p +---------+---------+---------+-------- + 1000002 | 1000002 | 1000002 | foofoo + 3 | | 3 | foo +(2 rows) + +explain (costs false) select * from s where u not in (select l.n from l join s on l.n=s.n right join s1 on l.n=s1.n); + QUERY PLAN +---------------------------------------------------------- + Seq Scan on s + Filter: (NOT (hashed SubPlan 1)) + SubPlan 1 + -> Nested Loop Left Join + Join Filter: (l.n = s1.n) + -> Seq Scan on s1 + -> Materialize + -> Nested Loop + -> Seq Scan on s s_1 + -> Index Only Scan using l_n on l + Index Cond: (n = s_1.n) +(11 rows) + +select * from s where u not in (select l.n from l join s on l.n=s.n right join s1 on l.n=s1.n); + u | n | nn | p +---+---+----+--- +(0 rows) + +--test both sides have strict predicate or inner join +explain (costs false) select * from s where n not in (select n from l where n > 0) and n > 0; + QUERY PLAN +--------------------------------------------- + Nested Loop Anti Join + -> Seq Scan on s + Filter: (n > 0) + -> Index Only Scan using l_n on l + Index Cond: ((n = s.n) AND (n > 0)) +(5 rows) + +select * from s where n not in (select n from l where n > 0) and n > 0; + u | n | nn | p +---------+---------+---------+-------- + 1000002 | 1000002 | 1000002 | foofoo +(1 row) + +explain (costs false) select * from s where n not in (select n from l where u > 0) and n > 0; + QUERY PLAN +-------------------------------------------------- + Nested Loop Anti Join + -> Seq Scan on s + Filter: (n > 0) + -> Bitmap Heap Scan on l + Recheck Cond: ((s.n = n) OR (n IS NULL)) + Filter: (u > 0) + -> BitmapOr + -> Bitmap Index Scan on l_n + Index Cond: (n = s.n) + -> Bitmap Index Scan on l_n + Index Cond: (n IS NULL) +(11 rows) + +select * from s where n not in (select n from l where u > 0) and n > 0; + u | n | nn | p +---+---+----+--- +(0 rows) + +explain (costs false) select * from s where n not in (select n from l where n > 0) and u > 0; + QUERY PLAN +------------------------------------------------------------- + Nested Loop Anti Join + InitPlan 1 (returns $0) + -> Seq Scan on l l_1 + Filter: (n > 0) + -> Seq Scan on s + Filter: (((n IS NOT NULL) OR (NOT $0)) AND (u > 0)) + -> Index Only Scan using l_n on l + Index Cond: ((n = s.n) AND (n > 0)) +(8 rows) + +select * from s where n not in (select n from l where n > 0) and u > 0; + u | n | nn | p +---------+---------+---------+-------- + 1000002 | 1000002 | 1000002 | foofoo +(1 row) + +explain (costs false) select * from s right join l on s.n = l.n join s1 on s.u = s1.u where s.n not in (select l.n froml right join s on l.n=s.n join s s1 on l.n=s1.n); + QUERY PLAN +-------------------------------------------------------------------- + Nested Loop + Join Filter: (s.u = s1.u) + -> Seq Scan on s1 + -> Materialize + -> Nested Loop + -> Seq Scan on s + Filter: (NOT (hashed SubPlan 1)) + SubPlan 1 + -> Nested Loop + -> Nested Loop + Join Filter: (s_1.n = s1_1.n) + -> Seq Scan on s s_1 + -> Materialize + -> Seq Scan on s s1_1 + -> Index Only Scan using l_n on l l_1 + Index Cond: (n = s_1.n) + -> Index Scan using l_n on l + Index Cond: (n = s.n) +(18 rows) + +select * from s right join l on s.n = l.n join s1 on s.u = s1.u where s.n not in (select l.n from l right join s on l.n=s.njoin s s1 on l.n=s1.n); + u | n | nn | p | u | n | nn | p | u | n | n1 | nn | p +---+---+----+---+---+---+----+---+---+---+----+----+--- +(0 rows) + +explain (costs false) select * from s right join l on s.n = l.n join s1 on s.u = s1.u where s.n not in (select l.n froml join s on l.n=s.n right join s s1 on l.n=s1.n); + QUERY PLAN +-------------------------------------------------------------------------------- + Nested Loop + Join Filter: (s.u = s1.u) + -> Seq Scan on s1 + -> Materialize + -> Nested Loop + -> Seq Scan on s + Filter: (NOT (hashed SubPlan 1)) + SubPlan 1 + -> Nested Loop Left Join + Join Filter: (l_1.n = s1_1.n) + -> Seq Scan on s s1_1 + -> Materialize + -> Nested Loop + -> Seq Scan on s s_1 + -> Index Only Scan using l_n on l l_1 + Index Cond: (n = s_1.n) + -> Index Scan using l_n on l + Index Cond: (n = s.n) +(18 rows) + +select * from s right join l on s.n = l.n join s1 on s.u = s1.u where s.n not in (select l.n from l join s on l.n=s.n rightjoin s s1 on l.n=s1.n); + u | n | nn | p | u | n | nn | p | u | n | n1 | nn | p +---+---+----+---+---+---+----+---+---+---+----+----+--- +(0 rows) + +explain (costs false) select * from s join l on s.n = l.n right join s1 on s.u = s1.u where s.n not in (select l.n froml join s on l.n=s.n right join s s1 on l.n=s1.n); + QUERY PLAN +-------------------------------------------------------------- + Nested Loop Left Join + Join Filter: (s.u = s1.u) + Filter: (NOT (hashed SubPlan 1)) + -> Seq Scan on s1 + -> Materialize + -> Nested Loop + -> Seq Scan on s + -> Index Scan using l_n on l + Index Cond: (n = s.n) + SubPlan 1 + -> Nested Loop Left Join + Join Filter: (l_1.n = s1_1.n) + -> Seq Scan on s s1_1 + -> Materialize + -> Nested Loop + -> Seq Scan on s s_1 + -> Index Only Scan using l_n on l l_1 + Index Cond: (n = s_1.n) +(18 rows) + +select * from s join l on s.n = l.n right join s1 on s.u = s1.u where s.n not in (select l.n from l join s on l.n=s.n rightjoin s s1 on l.n=s1.n); + u | n | nn | p | u | n | nn | p | u | n | n1 | nn | p +---+---+----+---+---+---+----+---+---+---+----+----+--- +(0 rows) + +--JIRA-7279 CTE with NOT IN +create table public.testing +( +a integer, +b integer, +c integer +); +explain (costs false) with +selected(a,b,c) as (values(1,2,3)), +updated(d,e,f) as (values(4,5,6)) +insert into public.testing +select * from selected +where (a,b,c) not in (select d,e,f from updated); + QUERY PLAN +--------------------------------------------------- + Insert on testing + -> Result + One-Time Filter: (NOT (hashed SubPlan 1)) + SubPlan 1 + -> Result +(5 rows) + +with +selected(a,b,c) as (values(1,2,3)), +updated(d,e,f) as (values(4,5,6)) +insert into public.testing +select * from selected +where (a,b,c) not in (select d,e,f from updated); +select * from public.testing; + a | b | c +---+---+--- + 1 | 2 | 3 +(1 row) + +explain (costs false) with +selected(a,b,c) as (select u, n, nn from s), +updated(d,e,f) as (select u, n, nn from l) +insert into public.testing +select * from selected +where (a,b,c) not in (select d,e,f from updated); + QUERY PLAN +------------------------------------------------------------------------------------- + Insert on testing + -> Nested Loop Anti Join + Join Filter: (((s.u = l.u) AND (s.n = l.n) AND (s.nn = l.nn)) IS NOT FALSE) + -> Seq Scan on s + -> Materialize + -> Seq Scan on l +(6 rows) + +with +selected(a,b,c) as (select u, n, nn from s), +updated(d,e,f) as (select u, n, nn from l) +insert into public.testing +select * from selected +where (a,b,c) not in (select d,e,f from updated); +select * from public.testing; + a | b | c +---------+---------+--------- + 1 | 2 | 3 + 1000002 | 1000002 | 1000002 +(2 rows) + +-- expect to get Hash Anti Join +explain (costs false) with +selected(a,b,c) as (select u, n, nn from s), +updated(d,e,f) as (select u, n, nn from l) +insert into public.testing +select * from selected +where a not in (select d from updated); + QUERY PLAN +----------------------------------------------------- + Insert on testing + InitPlan 1 (returns $0) + -> Seq Scan on l l_1 + -> Nested Loop Anti Join + -> Seq Scan on s + Filter: ((u IS NOT NULL) OR (NOT $0)) + -> Index Only Scan using l_u on l + Index Cond: (u = s.u) +(8 rows) + +with +selected(a,b,c) as (select u, n, nn from s), +updated(d,e,f) as (select u, n, nn from l) +insert into public.testing +select * from selected +where a not in (select d from updated); +select * from public.testing; + a | b | c +---------+---------+--------- + 1 | 2 | 3 + 1000002 | 1000002 | 1000002 + 1000002 | 1000002 | 1000002 +(3 rows) + +explain (costs false) with +selected(a,b,c) as (select u, n, nn from s), +updated(d,e,f) as (select u, n, nn from l) +insert into public.testing +select * from selected +where b not in (select e from updated); + QUERY PLAN +-------------------------------------------------------- + Insert on testing + InitPlan 1 (returns $0) + -> Seq Scan on l l_1 + -> Nested Loop Anti Join + -> Seq Scan on s + Filter: ((n IS NOT NULL) OR (NOT $0)) + -> Bitmap Heap Scan on l + Recheck Cond: ((s.n = n) OR (n IS NULL)) + -> BitmapOr + -> Bitmap Index Scan on l_n + Index Cond: (n = s.n) + -> Bitmap Index Scan on l_n + Index Cond: (n IS NULL) +(13 rows) + +with +selected(a,b,c) as (select u, n, nn from s), +updated(d,e,f) as (select u, n, nn from l) +insert into public.testing +select * from selected +where b not in (select e from updated); +select * from public.testing; + a | b | c +---------+---------+--------- + 1 | 2 | 3 + 1000002 | 1000002 | 1000002 + 1000002 | 1000002 | 1000002 +(3 rows) + +explain (costs false) with +selected(a,b,c) as (select u, n, nn from s), +updated(d,e,f) as (select u, n, nn from l) +insert into public.testing +select * from selected +where b not in (select d from updated); + QUERY PLAN +----------------------------------------------------- + Insert on testing + InitPlan 1 (returns $0) + -> Seq Scan on l l_1 + -> Nested Loop Anti Join + -> Seq Scan on s + Filter: ((n IS NOT NULL) OR (NOT $0)) + -> Index Only Scan using l_u on l + Index Cond: (u = s.n) +(8 rows) + +with +selected(a,b,c) as (select u, n, nn from s), +updated(d,e,f) as (select u, n, nn from l) +insert into public.testing +select * from selected +where b not in (select d from updated); +select * from public.testing; + a | b | c +---------+---------+--------- + 1 | 2 | 3 + 1000002 | 1000002 | 1000002 + 1000002 | 1000002 | 1000002 + 1000002 | 1000002 | 1000002 +(4 rows) + +-- two levels of NOT IN with CTE, 2nd NOT IN +-- subquery access CTE two levels above +explain (costs false) with +selected(a,b,c) as (select u, n, nn from s), +updated(d,e,f) as (select u, n, nn from l) +insert into public.testing +select * from selected +where (a,b,c) not in (select d,e,f from updated + where d not in (select a from selected)); + QUERY PLAN +--------------------------------------------------------------------------------------------------------- + Insert on testing + CTE selected + -> Seq Scan on s + -> Nested Loop Anti Join + Join Filter: (((selected.a = l.u) AND (selected.b = l.n) AND (selected.c = l.nn)) IS NOT FALSE) + -> CTE Scan on selected + -> Materialize + -> Seq Scan on l + Filter: (NOT (hashed SubPlan 3)) + SubPlan 3 + -> CTE Scan on selected selected_1 +(11 rows) + +with +selected(a,b,c) as (select u, n, nn from s), +updated(d,e,f) as (select u, n, nn from l) +insert into public.testing +select * from selected +where (a,b,c) not in (select d,e,f from updated + where d not in (select a from selected)); +select * from public.testing; + a | b | c +---------+---------+--------- + 1 | 2 | 3 + 1000002 | 1000002 | 1000002 + 1000002 | 1000002 | 1000002 + 1000002 | 1000002 | 1000002 + 1 | 1 | 1 + 2 | 2 | 2 + 1000002 | 1000002 | 1000002 + 3 | | 3 +(8 rows) + +-- With clause inside a query block +explain select count(distinct t.a) from +(with +selected(a,b,c) as (select u, n, nn from s), +updated(d,e,f) as (select u, n, nn from l) +select * from selected where (a,b,c) not in +(select d,e,f from updated + where d not in (select a from selected))) as t; + QUERY PLAN +--------------------------------------------------------------------------------------------------------- + Aggregate (cost=693.75..693.76 rows=1 width=8) + -> Nested Loop Anti Join (cost=1.13..693.71 rows=3 width=12) + Join Filter: (((selected.a = l.u) AND (selected.b = l.n) AND (selected.c = l.nn)) IS NOT FALSE) + CTE selected + -> Seq Scan on s (cost=0.00..1.04 rows=4 width=12) + -> CTE Scan on selected (cost=0.00..0.08 rows=4 width=12) + -> Materialize (cost=0.09..230.09 rows=5000 width=12) + -> Seq Scan on l (cost=0.09..180.09 rows=5000 width=12) + Filter: (NOT (hashed SubPlan 3)) + SubPlan 3 + -> CTE Scan on selected selected_1 (cost=0.00..0.08 rows=4 width=4) +(11 rows) + +select count(distinct t.a) from +(with +selected(a,b,c) as (select u, n, nn from s), +updated(d,e,f) as (select u, n, nn from l) +select * from selected where (a,b,c) not in +(select d,e,f from updated + where d not in (select a from selected))) as t; + count +------- + 4 +(1 row) + +-- With clause in subquery, can't flatten subquery to anti join +explain (costs false) with +selected(a,b,c) as (select u, n, nn from s) +insert into public.testing +select * from selected where (a,b,c) not in +(with updated (d,e,f) as (select u, n, nn from l) +select d,e,f from updated); + QUERY PLAN +----------------------------------- + Insert on testing + -> Seq Scan on s + Filter: (NOT (SubPlan 1)) + SubPlan 1 + -> Materialize + -> Seq Scan on l +(6 rows) + +with +selected(a,b,c) as (select u, n, nn from s) +insert into public.testing +select * from selected where (a,b,c) not in +(with updated (d,e,f) as (select u, n, nn from l) +select d,e,f from updated); +select * from public.testing; + a | b | c +---------+---------+--------- + 1 | 2 | 3 + 1000002 | 1000002 | 1000002 + 1000002 | 1000002 | 1000002 + 1000002 | 1000002 | 1000002 + 1 | 1 | 1 + 2 | 2 | 2 + 1000002 | 1000002 | 1000002 + 3 | | 3 + 1000002 | 1000002 | 1000002 +(9 rows) + +-- With clause in subquery, subsubquery access CTE in subquery +explain (costs false) with +selected(a,b,c) as (select u, n, nn from s) +insert into public.testing +select * from selected where (a,b,c) not in +( +with updated(d,e,f) as (select u, n, nn from l) +select d,e,f from updated where d not in (select d from updated) +); + QUERY PLAN +------------------------------------------------------------------- + Insert on testing + -> Seq Scan on s + Filter: (NOT (SubPlan 3)) + SubPlan 3 + -> Materialize + CTE updated + -> Seq Scan on l + InitPlan 2 (returns $1) + -> CTE Scan on updated + -> Hash Anti Join + Hash Cond: (updated_1.d = updated_2.d) + -> CTE Scan on updated updated_1 + Filter: ((d IS NOT NULL) OR (NOT $1)) + -> Hash + -> CTE Scan on updated updated_2 +(15 rows) + +with +selected(a,b,c) as (select u, n, nn from s) +insert into public.testing +select * from selected where (a,b,c) not in +( +with updated(d,e,f) as (select u, n, nn from l) +select d,e,f from updated where d not in (select d from updated) +); +select * from public.testing; + a | b | c +---------+---------+--------- + 1 | 2 | 3 + 1000002 | 1000002 | 1000002 + 1000002 | 1000002 | 1000002 + 1000002 | 1000002 | 1000002 + 1 | 1 | 1 + 2 | 2 | 2 + 1000002 | 1000002 | 1000002 + 3 | | 3 + 1000002 | 1000002 | 1000002 + 1 | 1 | 1 + 2 | 2 | 2 + 1000002 | 1000002 | 1000002 + 3 | | 3 +(13 rows) + +-- Recursive CTE +CREATE TABLE employees ( + id serial, + name varchar(255), + manager_id int +); +INSERT INTO employees VALUES (1, 'Mark', null); +INSERT INTO employees VALUES (2, 'John', 1); +INSERT INTO employees VALUES (3, 'Dan', 2); +INSERT INTO employees VALUES (4, 'Clark', 1); +INSERT INTO employees VALUES (5, 'Linda', 2); +INSERT INTO employees VALUES (6, 'Willy', 2); +INSERT INTO employees VALUES (7, 'Barack', 2); +INSERT INTO employees VALUES (8, 'Elen', 2); +INSERT INTO employees VALUES (9, 'Kate', 3); +INSERT INTO employees VALUES (10, 'Terry', 4); +WITH RECURSIVE managertree AS ( + SELECT id, name, manager_id + FROM employees + WHERE id = 2 + UNION ALL + SELECT e.id, e.name, e.manager_id + FROM employees e + INNER JOIN managertree mtree ON mtree.id = e.manager_id +) +SELECT * +FROM managertree; + id | name | manager_id +----+--------+------------ + 2 | John | 1 + 3 | Dan | 2 + 5 | Linda | 2 + 6 | Willy | 2 + 7 | Barack | 2 + 8 | Elen | 2 + 9 | Kate | 3 +(7 rows) + +-- NOT IN subquery access Recursive CTE +EXPLAIN (COSTS FALSE) WITH RECURSIVE managertree AS ( + SELECT id, name, manager_id + FROM employees + WHERE id = 2 + UNION ALL + SELECT e.id, e.name, e.manager_id + FROM employees e + INNER JOIN managertree mtree ON mtree.id = e.manager_id +) +SELECT * +FROM managertree mt WHERE mt.manager_id NOT IN (SELECT id FROM managertree); + QUERY PLAN +--------------------------------------------------------------- + CTE Scan on managertree mt + Filter: (NOT (hashed SubPlan 2)) + CTE managertree + -> Recursive Union + -> Seq Scan on employees + Filter: (id = 2) + -> Hash Join + Hash Cond: (e.manager_id = mtree.id) + -> Seq Scan on employees e + -> Hash + -> WorkTable Scan on managertree mtree + SubPlan 2 + -> CTE Scan on managertree +(13 rows) + +WITH RECURSIVE managertree AS ( + SELECT id, name, manager_id + FROM employees + WHERE id = 2 + UNION ALL + SELECT e.id, e.name, e.manager_id + FROM employees e + INNER JOIN managertree mtree ON mtree.id = e.manager_id +) +SELECT * +FROM managertree mt WHERE mt.manager_id NOT IN (SELECT id FROM managertree); + id | name | manager_id +----+------+------------ + 2 | John | 1 +(1 row) + +-- NOT IN under UNION ALL inside Recursive CTE +EXPLAIN (COSTS FALSE) WITH RECURSIVE managertree AS ( + SELECT id, name, manager_id + FROM employees + WHERE id = 2 + UNION ALL + SELECT e.id, e.name, e.manager_id + FROM employees e + INNER JOIN managertree mtree ON + (mtree.id = e.manager_id AND + mtree.manager_id NOT IN (SELECT manager_id FROM employees) + ) +) +SELECT * +FROM managertree; + QUERY PLAN +--------------------------------------------------------------- + CTE Scan on managertree + CTE managertree + -> Recursive Union + -> Seq Scan on employees employees_1 + Filter: (id = 2) + -> Hash Join + Hash Cond: (e.manager_id = mtree.id) + -> Seq Scan on employees e + -> Hash + -> WorkTable Scan on managertree mtree + Filter: (NOT (hashed SubPlan 1)) + SubPlan 1 + -> Seq Scan on employees +(13 rows) + +WITH RECURSIVE managertree AS ( + SELECT id, name, manager_id + FROM employees + WHERE id = 2 + UNION ALL + SELECT e.id, e.name, e.manager_id + FROM employees e + INNER JOIN managertree mtree ON + (mtree.id = e.manager_id AND + mtree.manager_id NOT IN (SELECT manager_id FROM employees) + ) +) +SELECT * +FROM managertree; + id | name | manager_id +----+------+------------ + 2 | John | 1 +(1 row) + +--Manfred-7613 CTE NOT IN with Union All +create table cocotero as ( + select * from( + values(1,2,3)) as data(a,b,c) +); +explain (costs off) with selected as ( + select * + from cocotero +), +updated as ( + update cocotero + set a = 3 + from selected + where cocotero.a = selected.a + returning selected.a,selected.b,selected.c +), +inserted as ( + insert into cocotero + select * + from selected + where a not in (select a from updated) + returning * +) +select 'updated' as action, count(*) as lines from updated +union all +select 'inserted' as action, count(*) as lines from inserted; + QUERY PLAN +-------------------------------------------------------------------------------------- + Append + CTE selected + -> Seq Scan on cocotero + CTE updated + -> Update on cocotero cocotero_1 + -> Merge Join + Merge Cond: (cocotero_1.a = selected.a) + -> Sort + Sort Key: cocotero_1.a + -> Seq Scan on cocotero cocotero_1 + -> Materialize + -> Sort + Sort Key: selected.a + -> CTE Scan on selected + CTE inserted + -> Insert on cocotero cocotero_2 + InitPlan 3 (returns $3) + -> CTE Scan on updated updated_1 + -> Nested Loop Anti Join + Join Filter: ((selected_1.a = updated_2.a) OR (updated_2.a IS NULL)) + -> CTE Scan on selected selected_1 + Filter: ((a IS NOT NULL) OR (NOT $3)) + -> CTE Scan on updated updated_2 + -> Aggregate + -> CTE Scan on updated + -> Aggregate + -> CTE Scan on inserted +(27 rows) + +with selected as ( + select * + from cocotero +), +updated as ( + update cocotero + set a = 3 + from selected + where cocotero.a = selected.a + returning selected.a,selected.b,selected.c +), +inserted as ( + insert into cocotero + select * + from selected + where a not in (select a from updated) + returning * +) +select 'updated' as action, count(*) as lines from updated +union all +select 'inserted' as action, count(*) as lines from inserted; + action | lines +----------+------- + updated | 1 + inserted | 0 +(2 rows) + +--test enable_not_in_transform +explain (costs off) select count(*) from s where s.u not in (select l.u from l); + QUERY PLAN +-------------------------------------------- + Aggregate + -> Nested Loop Anti Join + -> Seq Scan on s + -> Index Only Scan using l_u on l + Index Cond: (u = s.u) +(5 rows) + +set enable_not_in_transform = off; +explain (costs off) select count(*) from s where s.u not in (select l.u from l); + QUERY PLAN +----------------------------------- + Aggregate + -> Seq Scan on s + Filter: (NOT (SubPlan 1)) + SubPlan 1 + -> Materialize + -> Seq Scan on l +(6 rows) + +-- clean up +reset work_mem; +reset enable_not_in_transform; +drop table s; +drop table s1; +drop table l; +drop table empty; +drop table public.testing; +drop table employees; +drop table cocotero; diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out index 715842b..6e3aee8 100644 --- a/src/test/regress/expected/sysviews.out +++ b/src/test/regress/expected/sysviews.out @@ -83,6 +83,7 @@ select name, setting from pg_settings where name like 'enable%'; enable_material | on enable_mergejoin | on enable_nestloop | on + enable_not_in_transform | on enable_parallel_append | on enable_parallel_hash | on enable_partition_pruning | on @@ -91,7 +92,7 @@ select name, setting from pg_settings where name like 'enable%'; enable_seqscan | on enable_sort | on enable_tidscan | on -(19 rows) +(20 rows) -- Test that the pg_timezone_names and pg_timezone_abbrevs views are -- more-or-less working. We can't test their contents in any great detail diff --git a/src/test/regress/sql/subselect.sql b/src/test/regress/sql/subselect.sql index 893d8d0..4553a9f 100644 --- a/src/test/regress/sql/subselect.sql +++ b/src/test/regress/sql/subselect.sql @@ -831,3 +831,818 @@ select * from (with x as (select 2 as y) select * from x) ss; explain (verbose, costs off) with x as (select * from subselect_tbl) select * from x for update; + +-- test NON IN to ANTI JOIN conversion +CREATE TABLE s (u INTEGER NOT NULL, n INTEGER NULL, nn INTEGER NOT NULL, p VARCHAR(128) NULL); +insert into s (u, n, nn, p) + select + generate_series(1,3) as u, + generate_series(1,3) as n, + generate_series(1,3) as nn, + 'foo' as p; +insert into s values(1000002, 1000002, 1000002, 'foofoo'); +UPDATE s set n = NULL WHERE n = 3; +analyze s; + +CREATE TABLE l (u INTEGER NOT NULL, n INTEGER NULL, nn INTEGER NOT NULL, p VARCHAR(128) NULL); +insert into l (u, n, nn, p) + select + generate_series(1,10000 ) as u, + generate_series(1,10000 ) as n, + generate_series(1,10000 ) as nn, + 'bar' as p; +UPDATE l set n = NULL WHERE n = 7; + +CREATE UNIQUE INDEX l_u ON l (u); +CREATE INDEX l_n ON l (n); +CREATE INDEX l_nn ON l (nn); +analyze l; + +CREATE TABLE s1 (u INTEGER NOT NULL, n INTEGER NULL, n1 INTEGER NULL, nn INTEGER NOT NULL, p VARCHAR(128) NULL); +insert into s1 (u, n, n1, nn, p) + select + generate_series(1,3) as u, + generate_series(1,3) as n, + generate_series(1,3) as n1, + generate_series(1,3) as nn, + 'foo' as p; +insert into s1 values(1000003, 1000003, 1000003, 1000003, 'foofoo'); +insert into s1 values(1003, 1003, 1003, 1003, 'foofoo'); +UPDATE s1 set n = NULL WHERE n = 3; +UPDATE s1 set n1 = NULL WHERE n = 2; +UPDATE s1 set n1 = NULL WHERE n1 = 3; +analyze s1; + +CREATE TABLE empty (u INTEGER NOT NULL, n INTEGER NULL, nn INTEGER NOT NULL, p VARCHAR(128) NULL); +analyze empty; + +-- set work_mem to 64KB so that NOT IN to ANTI JOIN optimization will kick in +set work_mem = 64; + +-- correctness test 1: inner empty, return every thing from outer including NULL +explain (costs false) select * from s where n not in (select n from empty); + +select * from s where n not in (select n from empty); + +-- correctness test 2: inner has NULL, return empty result +explain (costs false) select * from s where n not in (select n from l); + +select * from s where n not in (select n from l); + +-- correctness test 3: inner non-null, result has no NULL +explain (costs false) select * from s where n not in (select u from l); + +select * from s where n not in (select u from l); + +-- correctness test 4: inner has predicate +explain (costs false) select * from s where n not in (select n from l where u > 7); + +select * from s where n not in (select n from l where u > 7); + +-- correctness test 5: multi-expression, (2, 2, null, 2, foo) should be in the result +explain (costs false) select * from s1 where (n,n1) not in (select u,nn from l where u >= 3); + +select * from s1 where (n,n1) not in (select u,nn from l where u >= 3); + +-- correctness test 6: multi-expression, (3, null, null, 3, foo) should not be in the result +explain (costs false) select * from s1 where (n,n1) not in (select u,nn from l where u > 0); + +select * from s1 where (n,n1) not in (select u,nn from l where u > 0); + +-- correctness test 6: multi-expression, (3, null, null, 3, foo) should be in the result +explain (costs false) select * from s1 where (n,n1) not in (select u,nn from l where u < 0); + +select * from s1 where (n,n1) not in (select u,nn from l where u < 0); + +-- test using hashed subplan when inner fits in work_mem +explain (costs false) select * from l where n not in (select n from s); + +select * from l where n not in (select n from s); + +-- test single expression +explain (costs false) select * from s where n not in (select n from l); + +select * from s where n not in (select n from l); + +explain (costs false) select * from s where u not in (select u from l); + +select * from s where u not in (select u from l); + +explain (costs false) select * from s where 3*n not in (select n from l); + +select * from s where 3*n not in (select n from l); + +explain (costs false) select * from s where n not in (select 3*n from l); + +select * from s where n not in (select 3*n from l); + +-- test single expression with predicates +explain (costs false) select * from s where n not in (select n from l where u > 0); + +select * from s where n not in (select n from l where u > 0); + +explain (costs false) select * from s where n not in (select n from l where u > 100); + +select * from s where n not in (select n from l where u > 100); + +-- test multi expression +explain (costs false) select * from s where (n,u) not in (select n,u from l); + +select * from s where (n,u) not in (select n,u from l); + +explain (costs false) select * from s where (u, nn) not in (select u, nn from l); + +select * from s where (u, nn) not in (select u, nn from l); + +explain (costs false) select * from s where (n,u) not in (select u,n from l); + +select * from s where (n,u) not in (select u,n from l); + +explain (costs false) select * from s where (n,u,nn) not in (select u,n,nn from l); + +select * from s where (n,u,nn) not in (select u,n,nn from l); + +explain (costs false) select * from s where (n,u,nn) not in (select u,n,nn from l where u > 1000); + +select * from s where (n,u,nn) not in (select u,n,nn from l where u > 1000); + +explain (costs false) select * from s where (n,u,nn) not in (select u,n,nn from l where u > 0); + +select * from s where (n,u,nn) not in (select u,n,nn from l where u > 0); + +explain (costs false) select * from s where (n,u,nn) not in (select u,n,nn from l where u > 1); + +select * from s where (n,u,nn) not in (select u,n,nn from l where u > 1); + +-- test multi-table +explain (costs false) select count(*) from s, l where s.n not in (select n from l); + +select count(*) from s, l where s.n not in (select n from l); + +explain (costs false) select count(*) from s, l where s.nn not in (select nn from l); + +select count(*) from s, l where s.nn not in (select nn from l); + +-- test null padded results from outer join +explain (costs false) select * from s where n not in (select s.nn from l left join s on l.nn = s.nn); + +select * from s where n not in (select s.nn from l left join s on l.nn = s.nn); + +explain (costs false) select * from s where n not in (select s.nn from s right join l on s.nn = l.nn); + +select * from s where n not in (select s.nn from s right join l on s.nn = l.nn); + +explain (costs false) select count(*) from s right join l on s.nn = l.nn where l.nn not in (select nn from s); + +select count(*) from s right join l on s.nn = l.nn where l.nn not in (select nn from s); + +explain (costs false) select count(*) from s right join l on s.nn = l.nn where s.nn not in (select nn from s); + +select count(*) from s right join l on s.nn = l.nn where s.nn not in (select nn from s); + +explain (costs false) select count(*) from s right join l on s.nn=l.nn where l.nn not in (select l.nn from l left join son l.nn = s.nn); + +select count(*) from s right join l on s.nn=l.nn where l.nn not in (select l.nn from l left join s on l.nn = s.nn); + +explain (costs false) select count(*) from s right join l on s.nn=l.nn where s.nn not in (select s.nn from l left join son l.nn = s.nn); + +select count(*) from s right join l on s.nn=l.nn where s.nn not in (select s.nn from l left join s on l.nn = s.nn); + +explain (costs false) select count(*) from s left join s1 on s.u=s1.u join l on s.u=l.u where s.nn not in (select nn froml); + +select count(*) from s left join s1 on s.u=s1.u join l on s.u=l.u where s.nn not in (select nn from l); + +explain (costs false) select count(*) from s left join s1 on s.u=s1.u right join l on s.u=l.u where s.nn not in (selectnn from l); + +select count(*) from s left join s1 on s.u=s1.u right join l on s.u=l.u where s.nn not in (select nn from l); + +explain (costs false) select count(*) from s left join s1 on s.u=s1.u left join l on s.u=l.u where s.nn not in (select nnfrom l); + +select count(*) from s left join s1 on s.u=s1.u left join l on s.u=l.u where s.nn not in (select nn from l); + +explain (costs false) select count(*) from s right join s1 on s.u=s1.u join l on s.u=l.u where s.nn not in (select nn froml); + +select count(*) from s right join s1 on s.u=s1.u join l on s.u=l.u where s.nn not in (select nn from l); + +explain (costs false) select count(*) from s join s1 on s.u=s1.u right join l on s.u=l.u where s.nn not in (select nn froml); + +select * from s join s1 on s.u=s1.u right join l on s.u=l.u where s.nn not in (select nn from l); + +explain (costs false) select count(*) from s full join s1 on s.u=s1.u join l on s.u=l.u where s.nn not in (select nn froml); + +select count(*) from s full join s1 on s.u=s1.u join l on s.u=l.u where s.nn not in (select nn from l); + +explain (costs false) select count(*) from s join s1 on s.u=s1.u full join l on s.u=l.u where s.nn not in (select nn froml); + +select count(*) from s join s1 on s.u=s1.u full join l on s.u=l.u where s.nn not in (select nn from l); + +explain (costs false) select * from s where s.nn not in (select l.nn from l left join s on l.nn=s.nn left join s1 on l.nn=s1.nn); + +select * from s where s.nn not in (select l.nn from l left join s on l.nn=s.nn left join s1 on l.nn=s1.nn); + +explain (costs false) select * from s where s.nn not in (select l.nn from l left join s on l.nn=s.nn right join s1 on l.nn=s1.nn); + +select * from s where s.nn not in (select l.nn from l left join s on l.nn=s.nn right join s1 on l.nn=s1.nn); + +explain (costs false) select * from s where (n,u,nn) not in (select l.n,l.u,l.nn from l left join s on l.nn = s.nn); + +select * from s where (n,u,nn) not in (select l.n,l.u,l.nn from l left join s on l.nn = s.nn); + +explain (costs false) select * from s where (n,u,nn) not in (select l.n,l.u,l.nn from l right join s on l.nn = s.nn); + +select * from s where (n,u,nn) not in (select l.n,l.u,l.nn from l left join s on l.nn = s.nn); + +--test reduce outer joins from outer query +explain (costs false) select count(*) from s right join l on s.nn = l.nn where s.nn not in (select nn from l); + +select count(*) from s right join l on s.nn = l.nn where s.nn not in (select nn from l); + +explain (costs false) select count(*) from s right join l on s.nn = l.nn where s.nn not in (select nn from l) and s.u>0; + +select count(*) from s right join l on s.nn = l.nn where s.nn not in (select nn from l) and s.u>0; + +explain (costs false) select count(*) from s right join l on s.nn = l.nn join s1 on s.u = s1.u where s.nn not in (selectnn from l); + +select count(*) from s right join l on s.nn = l.nn join s1 on s.u = s1.u where s.nn not in (select nn from l); + +explain (costs false) select count(*) from s right join l on s.nn = l.nn right join s1 on s.u = s1.u where s.nn not in (selectnn from l); + +select count(*) from s right join l on s.nn = l.nn right join s1 on s.u = s1.u where s.nn not in (select nn from l); + +explain (costs false) select count(*) from s right join l on s.nn = l.nn left join s1 on s.u = s1.u where s.nn not in (selectnn from l); + +select count(*) from s right join l on s.nn = l.nn left join s1 on s.u = s1.u where s.nn not in (select nn from l); + +--test reduce outer joins from subquery +explain (costs false) select * from s where nn not in (select l.nn from l right join s on l.nn = s.nn); + +select * from s where nn not in (select l.nn from l right join s on l.nn = s.nn); + +explain (costs false) select * from s where nn not in (select l.nn from l right join s on l.nn = s.nn where l.u > 9); + +select * from s where nn not in (select l.nn from l right join s on l.nn = s.nn where l.u > 9); + +explain (costs false) select * from s where nn not in (select l.nn from l right join s on l.nn = s.nn where s.u > 9); + +select * from s where nn not in (select l.nn from l right join s on l.nn = s.nn where s.u > 9); + +explain (costs false) select * from s where nn not in (select l.nn from l right join s on l.nn = s.nn join s1 on l.n = s1.n); + +select * from s where nn not in (select l.nn from l right join s on l.nn = s.nn join s1 on l.n = s1.n); + +explain (costs false) select * from s where nn not in (select l.nn from l right join s on l.nn = s.nn right join s1 on l.n= s1.n); + +select * from s where nn not in (select l.nn from l right join s on l.nn = s.nn right join s1 on l.n = s1.n); + +explain (costs false) select * from s where nn not in (select l.nn from l right join s on l.nn = s.nn left join s1 on l.n= s1.n); + +select * from s where nn not in (select l.nn from l right join s on l.nn = s.nn left join s1 on l.n = s1.n); + +--test reduce outer join on outer and sub-query +explain (costs false) select count(*) from s right join l on s.nn = l.nn join s1 on s.u = s1.u where s.nn not in (selectl.nn from l right join s on l.nn = s.nn join s1 on l.n = s1.n); + +select count(*) from s right join l on s.nn = l.nn join s1 on s.u = s1.u where s.nn not in (select l.nn from l right joins on l.nn = s.nn join s1 on l.n = s1.n); + +explain (costs false) select count(*) from s right join l on s.nn = l.nn left join s1 on s.u = s1.u where s.nn not in (selectl.nn from l right join s on l.nn = s.nn left join s1 on l.n = s1.n); + +select count(*) from s right join l on s.nn = l.nn left join s1 on s.u = s1.u where s.nn not in (select l.nn from l rightjoin s on l.nn = s.nn left join s1 on l.n = s1.n); + +-- test union all +explain (costs false) select * from s as t where not exists +(select 1 from (select n as y from l union all + select u as y from s union all + select nn as y from s) as v where t.n=v.y or v.y is null) and n is not null; + +select * from s as t where not exists +(select 1 from (select n as y from l union all + select u as y from s union all + select nn as y from s) as v where t.n=v.y or v.y is null) and n is not null; + +explain (costs false) select * from s where n not in +(select n as y from l union all + select u as y from s union all + select nn as y from s); + +select * from s where n not in +(select n as y from l union all + select u as y from s union all + select nn as y from s); + +explain (costs false) select count(*) from +(select n as x from s union all select u as x from l) t where t.x not in +(select nn from l); + +select count(*) from +(select n as x from s union all select u as x from l) t where t.x not in +(select nn from l); + +explain (costs false) select count(*) from +(select n as x from s union all select n as x from l) t where t.x not in +(select nn from empty); + +select count(*) from +(select n as x from s union all select n as x from l) t where t.x not in +(select nn from empty); + +explain (costs false) select count(*) from +(select n as x from s union all select u as x from l) t where t.x not in +(select n as y from l union all + select u as y from s union all + select nn as y from s); + +select count(*) from +(select n as x from s union all select u as x from l) t where t.x not in +(select n as y from l union all + select u as y from s union all + select nn as y from s); + +-- test multi-levels of NOT IN +explain (costs false) select * from s where n not in (select n from s where n not in (select n from l)); + +select * from s where n not in (select n from s where n not in (select n from l)); + +explain (costs false) select * from s where n not in (select n from s where n not in (select u from l)); + +select * from s where n not in (select n from s where n not in (select u from l)); + +explain (costs false) select count(*) from s where u not in +(select n from s1 where not exists + (select 1 from (select n from s1 where u not in (select n from l)) t where t.n = s.n)); + +select count(*) from s where u not in +(select n from s1 where not exists + (select 1 from (select n from s1 where u not in (select n from l)) t where t.n = s.n)); + +explain (costs false) select * from s where n not in (select n from s1) and u not in (select u from s1) and nn not in (selectnn from s1); + +select * from s where n not in (select n from s1) and u not in (select u from s1) and nn not in (select nn from s1); + +explain (costs false) select * from s where n not in (select n from s1) and u not in (select u from s1) and nn not in (selectnn from l); + +select * from s where n not in (select n from s1) and u not in (select u from s1) and nn not in (select nn from l); + +explain (costs false) select count(*) from s where u not in +(select n from s1 where not exists + (select 1 from (select n from s1 where u not in (select n from l)) t where t.n = s.n)) +and nn not in +(select n from s1 where not exists + (select 1 from (select n from s1 where u not in (select n from l)) t where t.n = s.n)); + +select count(*) from s where u not in +(select n from s1 where not exists + (select 1 from (select n from s1 where u not in (select n from l)) t where t.n = s.n)) +and nn not in +(select n from s1 where not exists + (select 1 from (select n from s1 where u not in (select n from l)) t where t.n = s.n)); + +--test COALESCE +explain (costs false) select * from s where COALESCE(n, -1) not in (select COALESCE(n, -1) from l); + +select * from s where COALESCE(n, -1) not in (select COALESCE(n, -1) from l); + +explain (costs false) select * from s where COALESCE(n, NULL, -1) not in (select COALESCE(n, NULL, -1) from l); + +select * from s where COALESCE(n, NULL, -1) not in (select COALESCE(n, NULL, -1) from l); + +explain (costs false) select * from s where COALESCE(n, NULL, NULL) not in (select COALESCE(n, NULL, NULL) from l); + +select * from s where COALESCE(n, NULL, NULL) not in (select COALESCE(n, NULL, NULL) from l); + +explain (costs false) select * from s where COALESCE(n, nn) not in (select COALESCE(n, nn) from l); + +select * from s where COALESCE(n, nn) not in (select COALESCE(n, nn) from l); + +explain (costs false) select * from s where COALESCE(nn, NULL) not in (select COALESCE(nn, NULL) from l); + +select * from s where COALESCE(nn, NULL) not in (select COALESCE(nn, NULL) from l); + +explain (costs false) select * from s where (COALESCE(n, -1), nn, COALESCE(n, u)) not in (select COALESCE(n, -1), nn, COALESCE(n,u) from l); + +select * from s where (COALESCE(n, -1), nn, COALESCE(n, u)) not in (select COALESCE(n, -1), nn, COALESCE(n, u) from l); + +-- test miscellaneous outer nullable cases + +explain (costs false) select * from s where (n,n) not in (select n,n from l); + +select * from s where (n,n) not in (select n,n from l); + +explain (costs false) select * from s right join l on s.nn = l.nn where (s.n,s.u,s.nn) not in (select n,u,nn from l); + +select * from s right join l on s.nn = l.nn where (s.n,s.u,s.nn) not in (select n,u,nn from l); + +explain (costs false) select count(*) from s right join l on s.nn = l.nn where (s.n,s.u,s.nn) not in (select n,u,nn froml where u < 0); + +select count(*) from s right join l on s.nn = l.nn where (s.n,s.u,s.nn) not in (select n,u,nn from l where u < 0); + +explain (costs false) select * from s where (n,n,n) not in (select distinct n,n,n from l where u > 0 limit 3) order by n; + +select * from s where (n,n,n) not in (select distinct n,n,n from l where u > 0 limit 3) order by n; + +--test outer has strict predicate or inner join +explain (costs false) select * from s where n not in (select n from l) and n > 0; + +select * from s where n not in (select n from l) and n > 0; + +explain (costs false) select * from s where n not in (select n from l) and u > 0; + +select * from s where n not in (select n from l) and u > 0; + +explain (costs false) select * from s where n not in (select n from l) and n is not null; + +select * from s where n not in (select n from l) and n is not null; + +explain (costs false) select * from s join l on s.n = l.n where s.n not in (select n from l); + +select * from s join l on s.n = l.n where s.n not in (select n from l); + +explain (costs false) select count(*) from s right join l on s.n = l.n where s.n not in (select n from l); + +select count(*) from s right join l on s.n = l.n where s.n not in (select n from l); + +explain (costs false) select count(*) from s right join l on s.n = l.n join s1 on s.u = s1.u where s.n not in (select nfrom l); + +select count(*) from s right join l on s.n = l.n join s1 on s.u = s1.u where s.n not in (select n from l); + +explain (costs false) select count(*) from s join l on s.n = l.n right join s1 on s.u = s1.u where s.n not in (select nfrom l); + +select count(*) from s join l on s.n = l.n right join s1 on s.u = s1.u where s.n not in (select n from l); + +--test inner has strict predicate or inner join +explain (costs false) select * from s where u not in (select n from l where n > 0); + +select * from s where u not in (select n from l where n > 0); + +explain (costs false) select * from s where u not in (select n from l where u > 0); + +select * from s where u not in (select n from l where u > 0); + +explain (costs false) select * from s where u not in (select n from l where n is not null); + +select * from s where u not in (select n from l where n is not null); + +explain (costs false) select * from s where u not in (select l.n from l join s on l.n=s.n); + +select * from s where u not in (select l.n from l join s on l.n=s.n); + +explain (costs false) select * from s where u not in (select l.n from l join s on l.u=s.u); + +select * from s where u not in (select l.n from l join s on l.u=s.u); + +explain (costs false) select * from s where u not in (select l.n from l join s on l.n = s.n); + +select * from s where u not in (select l.n from l join s on l.n = s.n); + +explain (costs false) select * from s where u not in (select l.n from l right join s on l.n = s.n); + +select * from s where u not in (select l.n from l right join s on l.n = s.n); + +explain (costs false) select * from s where u not in (select l.n from l right join s on l.n=s.n join s1 on l.n=s1.n); + +select * from s where u not in (select l.n from l right join s on l.n=s.n join s1 on l.n=s1.n); + +explain (costs false) select * from s where u not in (select l.n from l join s on l.n=s.n right join s1 on l.n=s1.n); + +select * from s where u not in (select l.n from l join s on l.n=s.n right join s1 on l.n=s1.n); + +--test both sides have strict predicate or inner join +explain (costs false) select * from s where n not in (select n from l where n > 0) and n > 0; + +select * from s where n not in (select n from l where n > 0) and n > 0; + +explain (costs false) select * from s where n not in (select n from l where u > 0) and n > 0; + +select * from s where n not in (select n from l where u > 0) and n > 0; + +explain (costs false) select * from s where n not in (select n from l where n > 0) and u > 0; + +select * from s where n not in (select n from l where n > 0) and u > 0; + +explain (costs false) select * from s right join l on s.n = l.n join s1 on s.u = s1.u where s.n not in (select l.n froml right join s on l.n=s.n join s s1 on l.n=s1.n); + +select * from s right join l on s.n = l.n join s1 on s.u = s1.u where s.n not in (select l.n from l right join s on l.n=s.njoin s s1 on l.n=s1.n); + +explain (costs false) select * from s right join l on s.n = l.n join s1 on s.u = s1.u where s.n not in (select l.n froml join s on l.n=s.n right join s s1 on l.n=s1.n); + +select * from s right join l on s.n = l.n join s1 on s.u = s1.u where s.n not in (select l.n from l join s on l.n=s.n rightjoin s s1 on l.n=s1.n); + +explain (costs false) select * from s join l on s.n = l.n right join s1 on s.u = s1.u where s.n not in (select l.n froml join s on l.n=s.n right join s s1 on l.n=s1.n); + +select * from s join l on s.n = l.n right join s1 on s.u = s1.u where s.n not in (select l.n from l join s on l.n=s.n rightjoin s s1 on l.n=s1.n); + +--JIRA-7279 CTE with NOT IN +create table public.testing +( +a integer, +b integer, +c integer +); + +explain (costs false) with +selected(a,b,c) as (values(1,2,3)), +updated(d,e,f) as (values(4,5,6)) +insert into public.testing +select * from selected +where (a,b,c) not in (select d,e,f from updated); + +with +selected(a,b,c) as (values(1,2,3)), +updated(d,e,f) as (values(4,5,6)) +insert into public.testing +select * from selected +where (a,b,c) not in (select d,e,f from updated); + +select * from public.testing; + +explain (costs false) with +selected(a,b,c) as (select u, n, nn from s), +updated(d,e,f) as (select u, n, nn from l) +insert into public.testing +select * from selected +where (a,b,c) not in (select d,e,f from updated); + +with +selected(a,b,c) as (select u, n, nn from s), +updated(d,e,f) as (select u, n, nn from l) +insert into public.testing +select * from selected +where (a,b,c) not in (select d,e,f from updated); + +select * from public.testing; + +-- expect to get Hash Anti Join +explain (costs false) with +selected(a,b,c) as (select u, n, nn from s), +updated(d,e,f) as (select u, n, nn from l) +insert into public.testing +select * from selected +where a not in (select d from updated); + +with +selected(a,b,c) as (select u, n, nn from s), +updated(d,e,f) as (select u, n, nn from l) +insert into public.testing +select * from selected +where a not in (select d from updated); + +select * from public.testing; + +explain (costs false) with +selected(a,b,c) as (select u, n, nn from s), +updated(d,e,f) as (select u, n, nn from l) +insert into public.testing +select * from selected +where b not in (select e from updated); + +with +selected(a,b,c) as (select u, n, nn from s), +updated(d,e,f) as (select u, n, nn from l) +insert into public.testing +select * from selected +where b not in (select e from updated); + +select * from public.testing; + +explain (costs false) with +selected(a,b,c) as (select u, n, nn from s), +updated(d,e,f) as (select u, n, nn from l) +insert into public.testing +select * from selected +where b not in (select d from updated); + +with +selected(a,b,c) as (select u, n, nn from s), +updated(d,e,f) as (select u, n, nn from l) +insert into public.testing +select * from selected +where b not in (select d from updated); + +select * from public.testing; + +-- two levels of NOT IN with CTE, 2nd NOT IN +-- subquery access CTE two levels above +explain (costs false) with +selected(a,b,c) as (select u, n, nn from s), +updated(d,e,f) as (select u, n, nn from l) +insert into public.testing +select * from selected +where (a,b,c) not in (select d,e,f from updated + where d not in (select a from selected)); + +with +selected(a,b,c) as (select u, n, nn from s), +updated(d,e,f) as (select u, n, nn from l) +insert into public.testing +select * from selected +where (a,b,c) not in (select d,e,f from updated + where d not in (select a from selected)); + +select * from public.testing; + +-- With clause inside a query block +explain select count(distinct t.a) from +(with +selected(a,b,c) as (select u, n, nn from s), +updated(d,e,f) as (select u, n, nn from l) +select * from selected where (a,b,c) not in +(select d,e,f from updated + where d not in (select a from selected))) as t; + +select count(distinct t.a) from +(with +selected(a,b,c) as (select u, n, nn from s), +updated(d,e,f) as (select u, n, nn from l) +select * from selected where (a,b,c) not in +(select d,e,f from updated + where d not in (select a from selected))) as t; + +-- With clause in subquery, can't flatten subquery to anti join +explain (costs false) with +selected(a,b,c) as (select u, n, nn from s) +insert into public.testing +select * from selected where (a,b,c) not in +(with updated (d,e,f) as (select u, n, nn from l) +select d,e,f from updated); + +with +selected(a,b,c) as (select u, n, nn from s) +insert into public.testing +select * from selected where (a,b,c) not in +(with updated (d,e,f) as (select u, n, nn from l) +select d,e,f from updated); + +select * from public.testing; + +-- With clause in subquery, subsubquery access CTE in subquery +explain (costs false) with +selected(a,b,c) as (select u, n, nn from s) +insert into public.testing +select * from selected where (a,b,c) not in +( +with updated(d,e,f) as (select u, n, nn from l) +select d,e,f from updated where d not in (select d from updated) +); + +with +selected(a,b,c) as (select u, n, nn from s) +insert into public.testing +select * from selected where (a,b,c) not in +( +with updated(d,e,f) as (select u, n, nn from l) +select d,e,f from updated where d not in (select d from updated) +); + +select * from public.testing; + +-- Recursive CTE +CREATE TABLE employees ( + id serial, + name varchar(255), + manager_id int +); + +INSERT INTO employees VALUES (1, 'Mark', null); +INSERT INTO employees VALUES (2, 'John', 1); +INSERT INTO employees VALUES (3, 'Dan', 2); +INSERT INTO employees VALUES (4, 'Clark', 1); +INSERT INTO employees VALUES (5, 'Linda', 2); +INSERT INTO employees VALUES (6, 'Willy', 2); +INSERT INTO employees VALUES (7, 'Barack', 2); +INSERT INTO employees VALUES (8, 'Elen', 2); +INSERT INTO employees VALUES (9, 'Kate', 3); +INSERT INTO employees VALUES (10, 'Terry', 4); + +WITH RECURSIVE managertree AS ( + SELECT id, name, manager_id + FROM employees + WHERE id = 2 + UNION ALL + SELECT e.id, e.name, e.manager_id + FROM employees e + INNER JOIN managertree mtree ON mtree.id = e.manager_id +) +SELECT * +FROM managertree; + +-- NOT IN subquery access Recursive CTE +EXPLAIN (COSTS FALSE) WITH RECURSIVE managertree AS ( + SELECT id, name, manager_id + FROM employees + WHERE id = 2 + UNION ALL + SELECT e.id, e.name, e.manager_id + FROM employees e + INNER JOIN managertree mtree ON mtree.id = e.manager_id +) +SELECT * +FROM managertree mt WHERE mt.manager_id NOT IN (SELECT id FROM managertree); + +WITH RECURSIVE managertree AS ( + SELECT id, name, manager_id + FROM employees + WHERE id = 2 + UNION ALL + SELECT e.id, e.name, e.manager_id + FROM employees e + INNER JOIN managertree mtree ON mtree.id = e.manager_id +) +SELECT * +FROM managertree mt WHERE mt.manager_id NOT IN (SELECT id FROM managertree); + +-- NOT IN under UNION ALL inside Recursive CTE +EXPLAIN (COSTS FALSE) WITH RECURSIVE managertree AS ( + SELECT id, name, manager_id + FROM employees + WHERE id = 2 + UNION ALL + SELECT e.id, e.name, e.manager_id + FROM employees e + INNER JOIN managertree mtree ON + (mtree.id = e.manager_id AND + mtree.manager_id NOT IN (SELECT manager_id FROM employees) + ) +) +SELECT * +FROM managertree; + +WITH RECURSIVE managertree AS ( + SELECT id, name, manager_id + FROM employees + WHERE id = 2 + UNION ALL + SELECT e.id, e.name, e.manager_id + FROM employees e + INNER JOIN managertree mtree ON + (mtree.id = e.manager_id AND + mtree.manager_id NOT IN (SELECT manager_id FROM employees) + ) +) +SELECT * +FROM managertree; + +--Manfred-7613 CTE NOT IN with Union All +create table cocotero as ( + select * from( + values(1,2,3)) as data(a,b,c) +); + +explain (costs off) with selected as ( + select * + from cocotero +), +updated as ( + update cocotero + set a = 3 + from selected + where cocotero.a = selected.a + returning selected.a,selected.b,selected.c +), +inserted as ( + insert into cocotero + select * + from selected + where a not in (select a from updated) + returning * +) +select 'updated' as action, count(*) as lines from updated +union all +select 'inserted' as action, count(*) as lines from inserted; + +with selected as ( + select * + from cocotero +), +updated as ( + update cocotero + set a = 3 + from selected + where cocotero.a = selected.a + returning selected.a,selected.b,selected.c +), +inserted as ( + insert into cocotero + select * + from selected + where a not in (select a from updated) + returning * +) +select 'updated' as action, count(*) as lines from updated +union all +select 'inserted' as action, count(*) as lines from inserted; + +--test enable_not_in_transform +explain (costs off) select count(*) from s where s.u not in (select l.u from l); + +set enable_not_in_transform = off; + +explain (costs off) select count(*) from s where s.u not in (select l.u from l); + +-- clean up +reset work_mem; +reset enable_not_in_transform; +drop table s; +drop table s1; +drop table l; +drop table empty; +drop table public.testing; +drop table employees; +drop table cocotero;
Hi Tom, Thanks for the feedback. * I find it entirely unacceptable to stick some planner temporary fields into struct SubLink. If you need that storage you'll have to find some other place to put it. But in point of fact I don't think you need it; it doesn't look to me to be critical to generate the subquery's plan any earlier than make_subplan would have done it. Moreover, you should really strive to *not* do that, because it's likely to get in the way of other future optimizations. As the existing comment in make_subplan already suggests, we might want to delay subplan planning even further than that in future. The reason for calling make_subplan this early is that we want to Call subplan_is_hashable(plan), to decide whether to proceed with the proposed transformation. We try to stick with the fast hashed subplan when possible to avoid potential performance degradation from the transformation which may restrict the planner to choose Nested Loop Anti Join in order to handle Null properly, the following is an example from subselect.out: explain (costs false) select * from s where n not in (select u from l); QUERY PLAN ----------------------------------------------- Nested Loop Anti Join InitPlan 1 (returns $0) -> Seq Scan on l l_1 -> Seq Scan on s Filter: ((n IS NOT NULL) OR (NOT $0)) -> Index Only Scan using l_u on l Index Cond: (u = s.n) However, if the subplan is not hashable, the above Nested Loop Anti Join is actually faster. * I'm also not too happy with the (undocumented) rearrangement of reduce_outer_joins. There's a specific sequence of processing that that's involved in, as documented at the top of prepjointree.c, and I doubt that you can just randomly call it from other places and expect good results. In particular, since JOIN alias var flattening won't have happened yet when this code is being run from pull_up_sublinks, it's unlikely that reduce_outer_joins will reliably get the same answers it would get normally. I also wonder whether it's safe to make the parsetree changes it makes earlier than normal, and whether it will be problematic to run it twice on the same tree, and whether its rather indirect connection to distribute_qual_to_rels is going to misbehave. The rearrangement of reduce_outer_joins was to make the null test function is_node_nonnullable() more accurate. Later we added strict predicates logic in is_node_nonnullable(), so I think we can get rid of the rearrangement of reduce_outer_joins now without losing accuracy. * The proposed test additions seem to about triple the runtime of subselect.sql. This seems excessive. I also wonder why it's necessary for this test to build its own large tables; couldn't it re-use ones that already exist in the regression database? I added a lot of test cases. But yes, I can reuse the existing large table if there is one that doesn't fit in 64KB work_mem. * Not really sure that we need a new planner GUC for this, but if we do, it needs to be documented. The new GUC is just in case if anything goes wrong, the user can easily turn it off. Regards, Zheng
"Li, Zheng" <zhelli@amazon.com> writes: > * I find it entirely unacceptable to stick some planner temporary > fields into struct SubLink. If you need that storage you'll have > to find some other place to put it. But in point of fact I don't > think you need it; it doesn't look to me to be critical to generate > the subquery's plan any earlier than make_subplan would have done it. > Moreover, you should really strive to *not* do that, because it's > likely to get in the way of other future optimizations. As the > existing comment in make_subplan already suggests, we might want to > delay subplan planning even further than that in future. > The reason for calling make_subplan this early is that we want to > Call subplan_is_hashable(plan), to decide whether to proceed with the proposed > transformation. Well, you're going to have to find another way, because this one won't do. If you really need to get whacked over the head with a counterexample for this approach, consider what happens if some part of the planner decides to pass the SubLink through copyObject, expression_tree_mutator, etc in between where you've done the planning and where make_subplan looks at it. Since you haven't taught copyfuncs.c about these fields, they'll semi-accidentally wind up as NULL afterwards, meaning you lost the information anyway. (In fact, I wouldn't be surprised if that's happening already in some cases; you couldn't really tell, since make_subplan will just repeat the work.) On the other hand, you can't have copyfuncs.c copying such fields either --- we don't have copyfuncs support for PlannerInfo, and if we did, the case would end up as infinite recursion. Nor would it be particularly cool to try to fake things out by copying the pointers as scalars; that will lead to dangling pointers later on. BTW, so far as I can see, the only reason you're bothering with the whole thing is to compare the size of the subquery output with work_mem, because that's all that subplan_is_hashable does. I wonder whether that consideration is even still necessary in the wake of 1f39bce02. If it is, I wonder whether there isn't a cheaper way to figure it out. (Note similar comment in make_subplan.) Also ... > We try to stick with the fast hashed subplan when possible to avoid > potential performance degradation from the transformation which may > restrict the planner to choose Nested Loop Anti Join in order to handle > Null properly, But can't you detect that case directly? It seems like you'd need to figure out the NULL situation anyway to know whether the transformation to antijoin is valid in the first place. regards, tom lane
>BTW, so far as I can see, the only reason you're bothering with the whole thing is to compare the size of the subquery output with work_mem, because that's all that subplan_is_hashable does. I wonder whether that consideration is even still necessary in the wake of 1f39bce02. If it is, I wonder whether there isn't a cheaper way to figure it out. (Note similar comment in make_subplan.) The comment in make_subplan says there is no cheaper way to figure out: /* At present, however, we can only check hashability after * we've made the subplan :-(. (Determining whether it'll fit in work_mem * is the really hard part.) */ I don't see why commit 1f39bce02 is related to this problem. Can you expand on this? >But can't you detect that case directly? It seems like you'd need to figure out the NULL situation anyway to know whether the transformation to antijoin is valid in the first place. Yes, we do need to figure out the NULL situation, and there is always valid transformation to antijoin, it's just in the NULL case we need to stuff additional clause to the anti join condition, and in these cases the transformation actually outperforms Subplan (non-hashed), but underperforms the hashed Subplan. The unmodified anti hash join has similar performance compared to hashed Subplan.
You should do small rebase (conflict with 911e7020770) and pgindent of the patch to repair problems with long lines and backspaces. I am reviewing your patch in small steps. Questions: 1. In the find_innerjoined_rels() routine you stop descending on JOIN_FULL node type. I think it is wrong because if var has NOT NULL constraint, full join can't change it to NULL. 2. The convert_NOT_IN_to_join() routine is ok, but its name is misleading. May be you can use something like make_NOT_IN_to_join_quals()? 3. pull_up_sublinks_qual_recurse(). Comment: "Return pullout predicate (x is NOT NULL)..." may be change to "Return pullout predicate (x is NOT NULL or NOT EXISTS...)"? 4. is_node_nonnullable(): I think one more case of non-nullable var may be foreign key constraint. -- Andrey Lepikhov Postgres Professional https://postgrespro.com The Russian Postgres Company
On 3/26/20 4:58 PM, Li, Zheng wrote: > >BTW, so far as I can see, the only reason you're bothering with the whole > thing is to compare the size of the subquery output with work_mem, because > that's all that subplan_is_hashable does. I wonder whether that > consideration is even still necessary in the wake of 1f39bce02. If it is, > I wonder whether there isn't a cheaper way to figure it out. (Note > similar comment in make_subplan.) > > The comment in make_subplan says there is no cheaper way to figure out: > /* At present, however, we can only check hashability after > * we've made the subplan :-(. (Determining whether it'll fit in work_mem > * is the really hard part.) > */ > > I don't see why commit 1f39bce02 is related to this problem. Can you expand on this? > > >But can't you detect that case directly? It seems like you'd need to > figure out the NULL situation anyway to know whether the transformation > to antijoin is valid in the first place. > > Yes, we do need to figure out the NULL situation, and there is always valid transformation > to antijoin, it's just in the NULL case we need to stuff additional clause to the anti join > condition, and in these cases the transformation actually outperforms Subplan (non-hashed), > but underperforms the hashed Subplan. The unmodified anti hash join has similar performance > compared to hashed Subplan. There seem to be enough questions about this implementation that I think it makes sense to mark this patch Returned with Feedback. Feel free to resubmit it to a future CF when there is more of a consensus on the implementation. Regards, -- -David david@pgmasters.net