Re: ERROR: wrong varnullingrels (b 3) (expected (b)) for Var 2/1

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: ERROR: wrong varnullingrels (b 3) (expected (b)) for Var 2/1
Дата
Msg-id 469843.1686326910@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: ERROR: wrong varnullingrels (b 3) (expected (b)) for Var 2/1  (Richard Guo <guofenglinux@gmail.com>)
Ответы Re: ERROR: wrong varnullingrels (b 3) (expected (b)) for Var 2/1  (Richard Guo <guofenglinux@gmail.com>)
Список pgsql-hackers
Richard Guo <guofenglinux@gmail.com> writes:
> On Wed, May 31, 2023 at 10:47 AM Richard Guo <guofenglinux@gmail.com> wrote:
>> When we transform the first form of identity 3 to the second form, we've
>> converted Pb*c to Pbc in deconstruct_distribute_oj_quals.  But we
>> neglect to consider that rel C might be a RTE_SUBQUERY and contains
>> quals that have lateral references to B.  So the B vars in such quals
>> have wrong nulling bitmaps and we'd finally notice that when we do
>> fix_upper_expr for the NestLoopParam expressions.

Right.  One question that immediately raises is whether it's even safe
to apply the identity if C has lateral references to B, because that
almost certainly means that C will produce different output when
joined to a nulled B row than when joined to a not-nulled row.
I think that's okay because if the B row will fail Pab then it doesn't
matter what row(s) C produces, but maybe I've missed something.

> We can identify in which form of identity 3 the plan is built up by
> checking the relids of the B/C join's outer rel.  If it's in the first
> form, the outer rel's relids must contain the A/B join.  Otherwise it
> should only contain B's relid.  So I'm considering that maybe we can
> adjust the nulling bitmap for nestloop parameters according to that.
> Attached is a patch for that.  Does this make sense?

Hmm.  I don't really want to do it in identify_current_nestloop_params
because that gets applied to all nestloop params, so it seems like
that risks masking bugs of other kinds.  I'd rather do it in
process_subquery_nestloop_params, which we know is only applied to
subquery LATERAL references.  So more or less as attached.

(I dropped the equal() assertions in process_subquery_nestloop_params
because they've never caught anything and it'd be too complicated to
make them still work.)

Thoughts?

            regards, tom lane

diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 1ca26baa25..3585a703fb 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -2289,9 +2289,11 @@ set_join_references(PlannerInfo *root, Join *join, int rtoffset)
              * the outer-join level at which they are used, Vars seen in the
              * NestLoopParam expression may have nullingrels that are just a
              * subset of those in the Vars actually available from the outer
-             * side.  Not checking this exactly is a bit grotty, but the work
-             * needed to make things match up perfectly seems well out of
-             * proportion to the value.
+             * side.  Another case that can cause that to happen is explained
+             * in the comments for process_subquery_nestloop_params.  Not
+             * checking this exactly is a bit grotty, but the work needed to
+             * make things match up perfectly seems well out of proportion to
+             * the value.
              */
             nlp->paramval = (Var *) fix_upper_expr(root,
                                                    (Node *) nlp->paramval,
diff --git a/src/backend/optimizer/util/paramassign.c b/src/backend/optimizer/util/paramassign.c
index 66534c0a78..e94f7e7563 100644
--- a/src/backend/optimizer/util/paramassign.c
+++ b/src/backend/optimizer/util/paramassign.c
@@ -421,8 +421,26 @@ replace_nestloop_param_placeholdervar(PlannerInfo *root, PlaceHolderVar *phv)
  * provide these values.  This differs from replace_nestloop_param_var in
  * that the PARAM_EXEC slots to use have already been determined.
  *
+ * An additional complication is that the subplan_params may contain
+ * nullingrel markers that need adjustment.  This occurs if we have applied
+ * outer join identity 3,
+ *        (A leftjoin B on (Pab)) leftjoin C on (Pb*c)
+ *        = A leftjoin (B leftjoin C on (Pbc)) on (Pab)
+ * and C is a subquery containing lateral references to B.  It's still safe
+ * to apply the identity, but the parser will have created those references
+ * in the form "b*" (i.e., with varnullingrels listing the A/B join), while
+ * what we will have available from the nestloop's outer side is just "b".
+ * We deal with that here by stripping the nullingrels down to what is
+ * available from the outer side according to root->curOuterRels.
+ * That fixes matters for the case of forward application of identity 3.
+ * If the identity was applied in the reverse direction, we will have
+ * subplan_params containing too few nullingrel bits rather than too many.
+ * Currently, that causes no problems because setrefs.c applies only a
+ * subset check to nullingrels in NestLoopParams, but we'd have to work
+ * harder if we ever want to tighten that check.
+ *
  * Note that we also use root->curOuterRels as an implicit parameter for
- * sanity checks.
+ * sanity checks and nullingrel adjustments.
  */
 void
 process_subquery_nestloop_params(PlannerInfo *root, List *subplan_params)
@@ -449,17 +467,19 @@ process_subquery_nestloop_params(PlannerInfo *root, List *subplan_params)
                 nlp = (NestLoopParam *) lfirst(lc2);
                 if (nlp->paramno == pitem->paramId)
                 {
-                    Assert(equal(var, nlp->paramval));
                     /* Present, so nothing to do */
                     break;
                 }
             }
             if (lc2 == NULL)
             {
-                /* No, so add it */
+                /* No, so add it after adjusting its nullingrels */
+                var = copyObject(var);
+                var->varnullingrels = bms_intersect(var->varnullingrels,
+                                                    root->curOuterRels);
                 nlp = makeNode(NestLoopParam);
                 nlp->paramno = pitem->paramId;
-                nlp->paramval = copyObject(var);
+                nlp->paramval = var;
                 root->curOuterParams = lappend(root->curOuterParams, nlp);
             }
         }
@@ -480,17 +500,19 @@ process_subquery_nestloop_params(PlannerInfo *root, List *subplan_params)
                 nlp = (NestLoopParam *) lfirst(lc2);
                 if (nlp->paramno == pitem->paramId)
                 {
-                    Assert(equal(phv, nlp->paramval));
                     /* Present, so nothing to do */
                     break;
                 }
             }
             if (lc2 == NULL)
             {
-                /* No, so add it */
+                /* No, so add it after adjusting its nullingrels */
+                phv = copyObject(phv);
+                phv->phnullingrels = bms_intersect(phv->phnullingrels,
+                                                   root->curOuterRels);
                 nlp = makeNode(NestLoopParam);
                 nlp->paramno = pitem->paramId;
-                nlp->paramval = (Var *) copyObject(phv);
+                nlp->paramval = (Var *) phv;
                 root->curOuterParams = lappend(root->curOuterParams, nlp);
             }
         }
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index d04648df3f..4999c99f3b 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -2589,6 +2589,24 @@ on t2.q2 = 123;
                ->  Seq Scan on int8_tbl t5
 (12 rows)

+explain (costs off)
+select * from int8_tbl t1
+    left join int8_tbl t2 on true
+    left join lateral
+      (select * from int8_tbl t3 where t3.q1 = t2.q1 offset 0) s
+      on t2.q1 = 1;
+                QUERY PLAN
+-------------------------------------------
+ Nested Loop Left Join
+   ->  Seq Scan on int8_tbl t1
+   ->  Materialize
+         ->  Nested Loop Left Join
+               Join Filter: (t2.q1 = 1)
+               ->  Seq Scan on int8_tbl t2
+               ->  Seq Scan on int8_tbl t3
+                     Filter: (q1 = t2.q1)
+(8 rows)
+
 --
 -- check a case where we formerly got confused by conflicting sort orders
 -- in redundant merge join path keys
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 0308258a91..56ca759772 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -514,6 +514,13 @@ select * from int8_tbl t1 left join
     left join int8_tbl t5 on t2.q1 = t5.q1
 on t2.q2 = 123;

+explain (costs off)
+select * from int8_tbl t1
+    left join int8_tbl t2 on true
+    left join lateral
+      (select * from int8_tbl t3 where t3.q1 = t2.q1 offset 0) s
+      on t2.q1 = 1;
+
 --
 -- check a case where we formerly got confused by conflicting sort orders
 -- in redundant merge join path keys

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Jeff Davis
Дата:
Сообщение: Re: Order changes in PG16 since ICU introduction
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Views no longer in rangeTabls?