Обсуждение: Clamping reulst row number of joins.

Поиск

Список

Период

Сортировка

Clamping reulst row number of joins.

От

Kyotaro HORIGUCHI

Дата:

06 марта 2015 г., 08:35:16

Hello, I had a report that a query gets wired estimated row
numbers and it makes subsequent plans wrong.

Although the detailed explanation is shown later in this mail,
the problem here was that a query like following makes the path
with apparently wrong row number.

EXPLAIN SELECT a FROM (SELECT a FROM t1 UNION ALL SELECT 0) t        JOIN (VALUES (1)) tt(x) ON tt.x = t.a;

0> Nested Loop  (cost=0.43..8.51 rows=68732 width=4)
1>   ->  Values Scan on "*VALUES*"  (cost=0.00..0.01 rows=1 width=4)
2>   ->  Append  (cost=0.43..8.48 rows=2 width=4)
3>         ->  Index Only Scan using i_t1_a on t1
4>               (cost=0.43..8.46 rows=1 width=4)
5>               Index Cond: (a = "*VALUES*".column1)
6>         ->  Subquery Scan on "*SELECT* 2"  (cost=0.00..0.02 rows=1 width=4)
7>               Filter: ("*VALUES*".column1 = "*SELECT* 2"."?column?")
8>               ->  Result  (cost=0.00..0.01 rows=1 width=0)

The Nested Loop at line 0 says that it emits 68832 rows even
though the underlying nodes, Values at line 1 and Append at line
2 are giving 1 and 2 respectively as their reults. This query
actually returns only 1 row. It is seemingly wrong and causes the
plans above go wrong.

This is caused by the Append node, which don't have statistics,
so eqjoinsel takes 0.5 as the default selectivity for the nested
loop. Even though it is defficult to get statistics for Append
node, the nested loop could get more sane row nubmer if it
doesn't ignore the parameterized path selected for the scan on
t1.

I suppose that join nodes can safely clamp the number of its
result rows by the product of the row numbers of underlying two
paths. And if it is correct, the attached patch can correct the
estimation like this.

>  Nested Loop  (cost=0.43..16.51 rows=2 width=4)
>    ->  Values Scan on "*VALUES*"  (cost=0.00..0.01 rows=1 width=4)
>    ->  Append  (cost=0.43..16.48 rows=2 width=4)
>          ->  Index Only Scan using i_t1_a on t1 
>                        (cost=0.43..16.45 rows=1 width=4)
>                Index Cond: (a = "*VALUES*".column1)
>          ->  Subquery Scan on "*SELECT* 2"  (cost=0.00..0.02 rows=1 width=4)
>                Filter: ("*VALUES*".column1 = "*SELECT* 2"."?column?")
>                ->  Result  (cost=0.00..0.01 rows=1 width=0)

Is it correct? And Does it have no bad side effects?
And is this applicable for 9.5?


The detailed test environment is described below.

Before this fix, the inner path of the top-level join is doing
hash because the inner path says that it will give 68732
rows. (But only 1 actually).

After this fix, the outer path declaring that it gives 2 rows so
the top-level join becomes nested loop and the inner path becomes
index scan.

=======
CREATE TABLE t1 (a int);
INSERT INTO t1 (SELECT (a%2)*500000 + a / 2 FROM generate_series(0, 1000000) a);
CREATE INDEX i_t1_a ON t1 (a);
CREATE TABLE t2 AS SELECT * from t1;
ANALYZE t1;
ANALYZE t2;
-- emulating the problematic situation
SET join_collapse_limit TO 1;
SET random_page_cost TO 8.0;
SET seq_page_cost TO 0.5;

EXPLAIN SELECT tt2.a FROM (SELECT a       FROM (SELECT a FROM t1             UNION ALL SELECT 0) t       JOIN (VALUES
(1))tt(x) ON tt.x = t.a) tt2 JOIN t2 ON (tt2.a = t2.a);
 


======== The plan before fixHash Join  (cost=30832.46..36230.60 rows=68732 width=4)           (actual
time=1258.880..1260.519rows=1 loops=1)  Hash Cond: ("*VALUES*".column1 = t2.a)  ->  Nested Loop  (cost=0.43..8.51
rows=68732width=8)                   (actual time=0.071..0.104 rows=1 loops=1)        ->  Values Scan on "*VALUES*"
(cost=0.00..0.01rows=1 width=4)                                     (actual time=0.002..0.003 rows=1 loops=1)        ->
Append  (cost=0.43..8.48 rows=2 width=4)                    (actual time=0.063..0.093 rows=1 loops=1)              ->
IndexOnly Scan using i_t1_a on t1                       (cost=0.43..8.46 rows=1 width=4)                       (actual
time=0.059..0.075rows=1 loops=1)                    Index Cond: (a = "*VALUES*".column1)                    Heap
Fetches:2              ->  Subquery Scan on "*SELECT* 2"                     (cost=0.00..0.02 rows=1 width=4)
        (actual time=0.010..0.010 rows=0 loops=1)                    Filter: ("*VALUES*".column1 = "*SELECT*
2"."?column?")                   Rows Removed by Filter: 1                    ->  Result  (cost=0.00..0.01 rows=1
width=0)                               (actual time=0.001..0.002 rows=1 loops=1)  ->  Hash  (cost=14425.01..14425.01
rows=1000001width=4)            (actual time=1250.274..1250.274 rows=1000001 loops=1)        Buckets: 131072  Batches:
16 Memory Usage: 3227kB        ->  Seq Scan on t2  (cost=0.00..14425.01 rows=1000001 width=4)
(actual time=0.021..608.245 rows=1000001 loops=1)Planning time: 0.319 msExecution time: 1260.792 ms
 

======== The plan after fixNested Loop  (cost=0.86..17.43 rows=2 width=4)             (actual time=0.103..0.118 rows=1
loops=1) Join Filter: ("*VALUES*".column1 = t2.a)  ->  Nested Loop  (cost=0.43..16.51 rows=2 width=8)
(actualtime=0.062..0.073 rows=1 loops=1)        ->  Values Scan on "*VALUES*"  (cost=0.00..0.01 rows=1 width=4)
                            (actual time=0.003..0.004 rows=1 loops=1)        ->  Append  (cost=0.43..16.48 rows=2
width=4)                   (actual time=0.051..0.060 rows=1 loops=1)              ->  Index Only Scan using i_t1_a on
t1                   (cost=0.43..16.45 rows=1 width=4)                    (actual time=0.045..0.046 rows=1 loops=1)
              Index Cond: (a = "*VALUES*".column1)                    Heap Fetches: 1              ->  Subquery Scan on
"*SELECT*2"                     (cost=0.00..0.02 rows=1 width=4)                    (actual time=0.005..0.005 rows=0
loops=1)                   Filter: ("*VALUES*".column1 = "*SELECT* 2"."?column?")                    Rows Removed by
Filter:1                    ->  Result  (cost=0.00..0.01 rows=1 width=0)                                (actual
time=0.002..0.002rows=1 loops=1)  ->  Index Only Scan using i_t2_a on t2  (cost=0.42..0.45 rows=1 width=4)
                  (actual time=0.026..0.027 rows=1 loops=1)        Index Cond: (a = t1.a)        Heap Fetches:
1Planningtime: 0.968 msExecution time: 0.239 ms
 
=========

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Clamping reulst row number of joins.

От

Tom Lane

Дата:

06 марта 2015 г., 15:08:01

Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes:
> Hello, I had a report that a query gets wired estimated row
> numbers and it makes subsequent plans wrong.

> Although the detailed explanation is shown later in this mail,
> the problem here was that a query like following makes the path
> with apparently wrong row number.

> EXPLAIN SELECT a FROM (SELECT a FROM t1 UNION ALL SELECT 0) t
>          JOIN (VALUES (1)) tt(x) ON tt.x = t.a;

Hm, the problem evidently is that we get a default selectivity estimate
for the ON condition.  I think a proper fix for this would involve
teaching eqjoinsel (and ideally other join selectivity functions) how
to drill down into appendrels and combine estimates for the child
relations.

> I suppose that join nodes can safely clamp the number of its
> result rows by the product of the row numbers of underlying two
> paths. And if it is correct, the attached patch can correct the
> estimation like this.

Unfortunately, this patch is completely horrid, because it gives an unfair
advantage to the parameterized-nestloop plan.  (The hacks in the other
two functions are no-ops, because only a nestloop plan will have a
parameterized path for the appendrel.)  What's more, it's only a cosmetic
fix: it won't correct the planner's actual idea of the joinrel size, which
means it won't have an impact on planning any additional levels of
joining.

We really need to fix this on the selectivity-estimate side.

BTW, is that JOIN (VALUES(...)) thing common in applications, or did you
just use it to make a compact example?  If it were something worth
optimizing, it seems like we could teach the planner to "pull up VALUES"
in the same way that it flattens sub-selects.  I'm not sure if this is
worth the trouble or not, though.
        regards, tom lane

Re: Clamping reulst row number of joins.

От

Stephen Frost

Дата:

06 марта 2015 г., 15:17:03

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> BTW, is that JOIN (VALUES(...)) thing common in applications, or did you
> just use it to make a compact example?  If it were something worth
> optimizing, it seems like we could teach the planner to "pull up VALUES"
> in the same way that it flattens sub-selects.  I'm not sure if this is
> worth the trouble or not, though.

I've certainly seen and used values() constructs in joins for a variety
of reasons and I do think it'd be worthwhile for the planner to know how
to pull up a VALUES construct.

Would that be a lot of effort, either code-wise or runtime-wise?  My gut
feeling is that it wouldn't be, but you're clearly in a much better
position to determine that.
Thanks!
    Stephen

Re: Clamping reulst row number of joins.

От

Tom Lane

Дата:

06 марта 2015 г., 16:48:47

Stephen Frost <sfrost@snowman.net> writes:
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
>> BTW, is that JOIN (VALUES(...)) thing common in applications, or did you
>> just use it to make a compact example?  If it were something worth
>> optimizing, it seems like we could teach the planner to "pull up VALUES"
>> in the same way that it flattens sub-selects.  I'm not sure if this is
>> worth the trouble or not, though.

> I've certainly seen and used values() constructs in joins for a variety
> of reasons and I do think it'd be worthwhile for the planner to know how
> to pull up a VALUES construct.

> Would that be a lot of effort, either code-wise or runtime-wise?  My gut
> feeling is that it wouldn't be, but you're clearly in a much better
> position to determine that.

My guess is that it'd be pretty simple to do if we want to do it.
I've not looked at the code yet though.
        regards, tom lane

Re: Clamping reulst row number of joins.

От

Tom Lane

Дата:

06 марта 2015 г., 21:42:07

Tom Lane <tgl@sss.pgh.pa.us> writes:
> Stephen Frost <sfrost@snowman.net> writes:
>> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
>>> BTW, is that JOIN (VALUES(...)) thing common in applications, or did you
>>> just use it to make a compact example?  If it were something worth
>>> optimizing, it seems like we could teach the planner to "pull up VALUES"
>>> in the same way that it flattens sub-selects.  I'm not sure if this is
>>> worth the trouble or not, though.

>> I've certainly seen and used values() constructs in joins for a variety
>> of reasons and I do think it'd be worthwhile for the planner to know how
>> to pull up a VALUES construct.

>> Would that be a lot of effort, either code-wise or runtime-wise?  My gut
>> feeling is that it wouldn't be, but you're clearly in a much better
>> position to determine that.

> My guess is that it'd be pretty simple to do if we want to do it.
> I've not looked at the code yet though.

I spent a bit of time looking at this, and realized that the blocker
is the same as the reason why we don't pull up sub-selects with empty
rangetables ("SELECT expression(s)").  Namely, that the upper query's
jointree can't handle a null subtree.  (This is not only a matter of
the jointree data structure, but the fact that the whole planner
identifies relations by bitmapsets of RTE indexes, and subtrees with
empty RTE sets couldn't be told apart.)

We could probably fix both cases for order-of-a-hundred lines of new code
in prepjointree.  The plan I'm thinking about is to allow such vacuous
subquery jointrees to be pulled up, but only if they are in a place in
the upper query's jointree where it's okay to delete the subtree.  This
would basically be two cases: (1) the immediate parent is a FromExpr that
would have at least one remaining child, or (2) the immediate parent is
an INNER JOIN whose other child isn't also being deleted (so that we can
convert the JoinExpr to a nonempty FromExpr, or just use the other child
as-is if the JoinExpr has no quals).

More generally, it occurs to me that maybe Oracle wasn't being totally
silly when they invented DUAL.  If we had a jointree representation for
"dummy relation with exactly one row" then we could substitute that in
all vacuous-jointree cases.  However, it's not clear that there's any
functional advantage to replacing a VALUES Scan with a "DUAL Scan",
which is basically what would happen if we flattened a VALUES and then
had to put one of these things in instead.  And having such things
propagate all through the planner, executor, EXPLAIN, etc is way more
code churn than I want to contemplate right now.  The plan proposed in
the preceding para is a bit more ugly logically, but it would limit the
code effects to basically pull_up_subqueries() and its child routines.
        regards, tom lane

Re: Clamping reulst row number of joins.

От

Tom Lane

Дата:

06 марта 2015 г., 23:55:06

I wrote:
> Hm, the problem evidently is that we get a default selectivity estimate
> for the ON condition.  I think a proper fix for this would involve
> teaching eqjoinsel (and ideally other join selectivity functions) how
> to drill down into appendrels and combine estimates for the child
> relations.

I wrote a prototype patch for this.  The additions to examine_variable()
seem pretty reasonable.  However, the only selectivity function I've fixed
is eqjoinsel_inner().  If we do it like this, we're going to need
similar recursive-boilerplate additions in basically every selectivity
function, which seems like a PITA as well as a lot of code bloat.
Can anyone think of a cute way to minimize that overhead?

            regards, tom lane

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 4dd3f9f..aaf76be 100644
*** a/src/backend/utils/adt/selfuncs.c
--- b/src/backend/utils/adt/selfuncs.c
*************** static double convert_one_bytea_to_scala
*** 181,187 ****
                              int rangelo, int rangehi);
  static char *convert_string_datum(Datum value, Oid typid);
  static double convert_timevalue_to_scalar(Datum value, Oid typid);
! static void examine_simple_variable(PlannerInfo *root, Var *var,
                          VariableStatData *vardata);
  static bool get_variable_range(PlannerInfo *root, VariableStatData *vardata,
                     Oid sortop, Datum *min, Datum *max);
--- 181,187 ----
                              int rangelo, int rangehi);
  static char *convert_string_datum(Datum value, Oid typid);
  static double convert_timevalue_to_scalar(Datum value, Oid typid);
! static void examine_simple_variable(PlannerInfo *root, Var *var, Index varno,
                          VariableStatData *vardata);
  static bool get_variable_range(PlannerInfo *root, VariableStatData *vardata,
                     Oid sortop, Datum *min, Datum *max);
*************** eqjoinsel_inner(Oid operator,
*** 2221,2226 ****
--- 2221,2276 ----
      float4       *numbers2 = NULL;
      int            nnumbers2 = 0;

+     if (vardata1->children)
+     {
+         /* Recurse to appendrel children and compute weighted selectivity */
+         double        appendrelrows;
+         ListCell   *lc;
+
+         selec = 0;
+         appendrelrows = 0;
+         foreach(lc, vardata1->children)
+         {
+             VariableStatData *childdata = (VariableStatData *) lfirst(lc);
+             double        cselec;
+
+             if (childdata->rel == NULL)
+                 continue;        /* safety check */
+             cselec = eqjoinsel_inner(operator, childdata, vardata2);
+             selec += cselec * childdata->rel->rows;
+             appendrelrows += childdata->rel->rows;
+         }
+         if (appendrelrows > 0)
+             selec /= appendrelrows;
+         CLAMP_PROBABILITY(selec);
+         return selec;
+     }
+
+     if (vardata2->children)
+     {
+         /* Recurse to appendrel children and compute weighted selectivity */
+         double        appendrelrows;
+         ListCell   *lc;
+
+         selec = 0;
+         appendrelrows = 0;
+         foreach(lc, vardata2->children)
+         {
+             VariableStatData *childdata = (VariableStatData *) lfirst(lc);
+             double        cselec;
+
+             if (childdata->rel == NULL)
+                 continue;        /* safety check */
+             cselec = eqjoinsel_inner(operator, vardata1, childdata);
+             selec += cselec * childdata->rel->rows;
+             appendrelrows += childdata->rel->rows;
+         }
+         if (appendrelrows > 0)
+             selec /= appendrelrows;
+         CLAMP_PROBABILITY(selec);
+         return selec;
+     }
+
      nd1 = get_variable_numdistinct(vardata1, &isdefault1);
      nd2 = get_variable_numdistinct(vardata2, &isdefault2);

*************** get_restriction_variable(PlannerInfo *ro
*** 4192,4198 ****
      {
          *varonleft = true;
          *other = estimate_expression_value(root, rdata.var);
!         /* Assume we need no ReleaseVariableStats(rdata) here */
          return true;
      }

--- 4242,4248 ----
      {
          *varonleft = true;
          *other = estimate_expression_value(root, rdata.var);
!         ReleaseVariableStats(rdata);    /* usually unnecessary, but ... */
          return true;
      }

*************** get_restriction_variable(PlannerInfo *ro
*** 4200,4206 ****
      {
          *varonleft = false;
          *other = estimate_expression_value(root, vardata->var);
!         /* Assume we need no ReleaseVariableStats(*vardata) here */
          *vardata = rdata;
          return true;
      }
--- 4250,4256 ----
      {
          *varonleft = false;
          *other = estimate_expression_value(root, vardata->var);
!         ReleaseVariableStats(*vardata);
          *vardata = rdata;
          return true;
      }
*************** get_join_variables(PlannerInfo *root, Li
*** 4259,4283 ****
   *    node: the expression tree to examine
   *    varRelid: see specs for restriction selectivity functions
   *
!  * Outputs: *vardata is filled as follows:
!  *    var: the input expression (with any binary relabeling stripped, if
!  *        it is or contains a variable; but otherwise the type is preserved)
!  *    rel: RelOptInfo for relation containing variable; NULL if expression
!  *        contains no Vars (NOTE this could point to a RelOptInfo of a
!  *        subquery, not one in the current query).
!  *    statsTuple: the pg_statistic entry for the variable, if one exists;
!  *        otherwise NULL.
!  *    freefunc: pointer to a function to release statsTuple with.
!  *    vartype: exposed type of the expression; this should always match
!  *        the declared input type of the operator we are estimating for.
!  *    atttype, atttypmod: type data to pass to get_attstatsslot().  This is
!  *        commonly the same as the exposed type of the variable argument,
!  *        but can be different in binary-compatible-type cases.
!  *    isunique: TRUE if we were able to match the var to a unique index or a
!  *        single-column DISTINCT clause, implying its values are unique for
!  *        this query.  (Caution: this should be trusted for statistical
!  *        purposes only, since we do not check indimmediate nor verify that
!  *        the exact same definition of equality applies.)
   *
   * Caller is responsible for doing ReleaseVariableStats() before exiting.
   */
--- 4309,4316 ----
   *    node: the expression tree to examine
   *    varRelid: see specs for restriction selectivity functions
   *
!  * Outputs:
!  *    *vardata is filled as per the specs in selfuncs.h.
   *
   * Caller is responsible for doing ReleaseVariableStats() before exiting.
   */
*************** examine_variable(PlannerInfo *root, Node
*** 4317,4323 ****
          vardata->isunique = has_unique_index(vardata->rel, var->varattno);

          /* Try to locate some stats */
!         examine_simple_variable(root, var, vardata);

          return;
      }
--- 4350,4356 ----
          vardata->isunique = has_unique_index(vardata->rel, var->varattno);

          /* Try to locate some stats */
!         examine_simple_variable(root, var, var->varno, vardata);

          return;
      }
*************** examine_variable(PlannerInfo *root, Node
*** 4464,4478 ****
   *        Handle a simple Var for examine_variable
   *
   * This is split out as a subroutine so that we can recurse to deal with
!  * Vars referencing subqueries.
   *
   * We already filled in all the fields of *vardata except for the stats tuple.
   */
  static void
! examine_simple_variable(PlannerInfo *root, Var *var,
                          VariableStatData *vardata)
  {
!     RangeTblEntry *rte = root->simple_rte_array[var->varno];

      Assert(IsA(rte, RangeTblEntry));

--- 4497,4512 ----
   *        Handle a simple Var for examine_variable
   *
   * This is split out as a subroutine so that we can recurse to deal with
!  * Vars referencing subqueries.  "varno" is the RTE index to assume that
!  * the Var refers to; other fields of the Var can be taken at face value.
   *
   * We already filled in all the fields of *vardata except for the stats tuple.
   */
  static void
! examine_simple_variable(PlannerInfo *root, Var *var, Index varno,
                          VariableStatData *vardata)
  {
!     RangeTblEntry *rte = root->simple_rte_array[varno];

      Assert(IsA(rte, RangeTblEntry));

*************** examine_simple_variable(PlannerInfo *roo
*** 4499,4505 ****
                                                BoolGetDatum(rte->inh));
          vardata->freefunc = ReleaseSysCache;
      }
!     else if (rte->rtekind == RTE_SUBQUERY && !rte->inh)
      {
          /*
           * Plain subquery (not one that was converted to an appendrel).
--- 4533,4584 ----
                                                BoolGetDatum(rte->inh));
          vardata->freefunc = ReleaseSysCache;
      }
!     else if (rte->rtekind == RTE_SUBQUERY && rte->inh)
!     {
!         /*
!          * Parent of a UNION ALL appendrel; examine all children in hopes of
!          * acquiring useful stats.
!          */
!         ListCell   *lc;
!
!         foreach(lc, root->append_rel_list)
!         {
!             AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(lc);
!             VariableStatData *childdata;
!
!             /* append_rel_list contains all append rels; ignore others */
!             if (appinfo->parent_relid != varno)
!                 continue;
!
!             /* Initialize childdata with same contents as parent... */
!             childdata = (VariableStatData *) palloc(sizeof(VariableStatData));
!             memcpy(childdata, vardata, sizeof(VariableStatData));
!
!             /* ... then update rel and reset what mustn't be shared */
!             childdata->rel = find_base_rel(root, appinfo->child_relid);
!             childdata->statsTuple = NULL;
!             childdata->freefunc = NULL;
!
!             childdata->children = NIL;
!
!             /* Recursively look for stats for this child rel */
!             examine_simple_variable(root, var, appinfo->child_relid, childdata);
!
!             /* If we have nested appendrels, flatten them */
!             if (childdata->children)
!             {
!                 vardata->children = list_concat(vardata->children,
!                                                 childdata->children);
!                 /* be sure to clean up the intermediate appendrel's entry */
!                 childdata->children = NIL;
!                 ReleaseVariableStatsP(childdata);
!                 pfree(childdata);
!             }
!             else
!                 vardata->children = lappend(vardata->children, childdata);
!         }
!     }
!     else if (rte->rtekind == RTE_SUBQUERY)
      {
          /*
           * Plain subquery (not one that was converted to an appendrel).
*************** examine_simple_variable(PlannerInfo *roo
*** 4533,4539 ****
           * can't use that rel pointer either, but have to look up the Var's
           * rel afresh.
           */
!         rel = find_base_rel(root, var->varno);

          /* If the subquery hasn't been planned yet, we have to punt */
          if (rel->subroot == NULL)
--- 4612,4618 ----
           * can't use that rel pointer either, but have to look up the Var's
           * rel afresh.
           */
!         rel = find_base_rel(root, varno);

          /* If the subquery hasn't been planned yet, we have to punt */
          if (rel->subroot == NULL)
*************** examine_simple_variable(PlannerInfo *roo
*** 4600,4606 ****
               * if the underlying column is unique, the subquery may have
               * joined to other tables in a way that creates duplicates.
               */
!             examine_simple_variable(rel->subroot, var, vardata);
          }
      }
      else
--- 4679,4685 ----
               * if the underlying column is unique, the subquery may have
               * joined to other tables in a way that creates duplicates.
               */
!             examine_simple_variable(rel->subroot, var, var->varno, vardata);
          }
      }
      else
*************** examine_simple_variable(PlannerInfo *roo
*** 4615,4620 ****
--- 4694,4727 ----
  }

  /*
+  * ReleaseVariableStatsP
+  *      Release resources represented by a VariableStatData.
+  *
+  * For what are now somewhat historical reasons, this is typically invoked
+  * via the macro ReleaseVariableStats().  The VariableStatData struct itself
+  * is *not* freed here, since it is often a local variable in the caller.
+  */
+ void
+ ReleaseVariableStatsP(VariableStatData *vardata)
+ {
+     ListCell   *lc;
+
+     /* Release any pin we may have on a stats tuple */
+     if (HeapTupleIsValid(vardata->statsTuple))
+         (*vardata->freefunc) (vardata->statsTuple);
+     /* Recurse to handle any appendrel child nodes */
+     foreach(lc, vardata->children)
+     {
+         VariableStatData *childdata = (VariableStatData *) lfirst(lc);
+
+         ReleaseVariableStatsP(childdata);
+         pfree(childdata);
+     }
+     /* Might as well free the list cells too */
+     list_free(vardata->children);
+ }
+
+ /*
   * get_variable_numdistinct
   *      Estimate the number of distinct values of a variable.
   *
diff --git a/src/include/utils/selfuncs.h b/src/include/utils/selfuncs.h
index bf69f2a..266b4f2 100644
*** a/src/include/utils/selfuncs.h
--- b/src/include/utils/selfuncs.h
***************
*** 63,69 ****
      } while (0)


! /* Return data from examine_variable and friends */
  typedef struct VariableStatData
  {
      Node       *var;            /* the Var or expression tree */
--- 63,96 ----
      } while (0)


! /*
!  * Return data from examine_variable and friends
!  *
!  *    var: the input expression (with any binary relabeling stripped, if
!  *        it is or contains a variable; but otherwise the type is preserved).
!  *    rel: RelOptInfo for relation containing variable; NULL if expression
!  *        contains no Vars (NOTE this could point to a RelOptInfo of a
!  *        subquery, not one in the current query).
!  *    statsTuple: the pg_statistic entry for the variable, if one exists;
!  *        otherwise NULL.
!  *    freefunc: pointer to a function to release statsTuple with (this will
!  *        be called by ReleaseVariableStats(); callers don't do so directly).
!  *    vartype: exposed type of the expression; this should always match
!  *        the declared input type of the operator we are estimating for.
!  *    atttype, atttypmod: type data to pass to get_attstatsslot().  This is
!  *        commonly the same as the exposed type of the variable argument,
!  *        but can be different in binary-compatible-type cases.
!  *    isunique: TRUE if we were able to match the var to a unique index or a
!  *        single-column DISTINCT clause, implying its values are unique for
!  *        this query.  (Caution: this should be trusted for statistical
!  *        purposes only, since we do not check indimmediate nor verify that
!  *        the exact same definition of equality applies.)
!  *    children: if rel is an appendrel for which we could not get stats
!  *        directly, children is a List of VariableStatData structs for the
!  *        appendrel's child rels, which may have useful statistics.  Note that
!  *        the "var" fields will all point to the parent's "var"; they are not
!  *        transposed to match the child relid.
!  */
  typedef struct VariableStatData
  {
      Node       *var;            /* the Var or expression tree */
*************** typedef struct VariableStatData
*** 75,87 ****
      Oid            atttype;        /* type to pass to get_attstatsslot */
      int32        atttypmod;        /* typmod to pass to get_attstatsslot */
      bool        isunique;        /* matches unique index or DISTINCT clause */
  } VariableStatData;

! #define ReleaseVariableStats(vardata)  \
!     do { \
!         if (HeapTupleIsValid((vardata).statsTuple)) \
!             (* (vardata).freefunc) ((vardata).statsTuple); \
!     } while(0)


  typedef enum
--- 102,111 ----
      Oid            atttype;        /* type to pass to get_attstatsslot */
      int32        atttypmod;        /* typmod to pass to get_attstatsslot */
      bool        isunique;        /* matches unique index or DISTINCT clause */
+     List       *children;        /* List of child VariableStatData's, or NIL */
  } VariableStatData;

! #define ReleaseVariableStats(vardata)  ReleaseVariableStatsP(&(vardata))


  typedef enum
*************** extern PGDLLIMPORT get_index_stats_hook_
*** 111,116 ****
--- 135,141 ----

  extern void examine_variable(PlannerInfo *root, Node *node, int varRelid,
                   VariableStatData *vardata);
+ extern void ReleaseVariableStatsP(VariableStatData *vardata);
  extern bool get_restriction_variable(PlannerInfo *root, List *args,
                           int varRelid,
                           VariableStatData *vardata, Node **other,

Re: Clamping reulst row number of joins.

От

Tom Lane

Дата:

08 марта 2015 г., 23:30:48

I wrote:
>> Stephen Frost <sfrost@snowman.net> writes:
>>> I've certainly seen and used values() constructs in joins for a variety
>>> of reasons and I do think it'd be worthwhile for the planner to know how
>>> to pull up a VALUES construct.

> I spent a bit of time looking at this, and realized that the blocker
> is the same as the reason why we don't pull up sub-selects with empty
> rangetables ("SELECT expression(s)").  Namely, that the upper query's
> jointree can't handle a null subtree.  (This is not only a matter of
> the jointree data structure, but the fact that the whole planner
> identifies relations by bitmapsets of RTE indexes, and subtrees with
> empty RTE sets couldn't be told apart.)

> We could probably fix both cases for order-of-a-hundred lines of new code
> in prepjointree.  The plan I'm thinking about is to allow such vacuous
> subquery jointrees to be pulled up, but only if they are in a place in
> the upper query's jointree where it's okay to delete the subtree.  This
> would basically be two cases: (1) the immediate parent is a FromExpr that
> would have at least one remaining child, or (2) the immediate parent is
> an INNER JOIN whose other child isn't also being deleted (so that we can
> convert the JoinExpr to a nonempty FromExpr, or just use the other child
> as-is if the JoinExpr has no quals).

Here's a draft patch along those lines.  Unsurprisingly, it changes the
plans generated for a number of regression-test queries.  In most cases
I felt it desirable to force the old plan shape to be retained (by
inserting "offset 0" or equivalent) because the test was trying to test
proper generation of a query plan of that shape.  I did add a couple
cases where the optimization was allowed to go through.

The patch is a bit bigger than I'd hoped (a net of about 330 lines added
to prepjointree.c), but it's not hugely ugly, and it doesn't add any
meaningful overhead in cases where no optimization happens.  Barring
objections I will commit this.

            regards, tom lane

diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 775f482..0d3db71 100644
*** a/src/backend/nodes/outfuncs.c
--- b/src/backend/nodes/outfuncs.c
*************** _outPlannerInfo(StringInfo str, const Pl
*** 1762,1767 ****
--- 1762,1768 ----
      WRITE_BOOL_FIELD(hasInheritedTarget);
      WRITE_BOOL_FIELD(hasJoinRTEs);
      WRITE_BOOL_FIELD(hasLateralRTEs);
+     WRITE_BOOL_FIELD(hasDeletedRTEs);
      WRITE_BOOL_FIELD(hasHavingQual);
      WRITE_BOOL_FIELD(hasPseudoConstantQuals);
      WRITE_BOOL_FIELD(hasRecursion);
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index b02a107..88b91f1 100644
*** a/src/backend/optimizer/plan/planner.c
--- b/src/backend/optimizer/plan/planner.c
*************** subquery_planner(PlannerGlobal *glob, Qu
*** 352,359 ****
       * Check to see if any subqueries in the jointree can be merged into this
       * query.
       */
!     parse->jointree = (FromExpr *)
!         pull_up_subqueries(root, (Node *) parse->jointree);

      /*
       * If this is a simple UNION ALL query, flatten it into an appendrel. We
--- 352,358 ----
       * Check to see if any subqueries in the jointree can be merged into this
       * query.
       */
!     pull_up_subqueries(root);

      /*
       * If this is a simple UNION ALL query, flatten it into an appendrel. We
diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
index 8a0199b..50acfe4 100644
*** a/src/backend/optimizer/prep/prepjointree.c
--- b/src/backend/optimizer/prep/prepjointree.c
*************** static Node *pull_up_sublinks_qual_recur
*** 65,76 ****
  static Node *pull_up_subqueries_recurse(PlannerInfo *root, Node *jtnode,
                             JoinExpr *lowest_outer_join,
                             JoinExpr *lowest_nulling_outer_join,
!                            AppendRelInfo *containing_appendrel);
  static Node *pull_up_simple_subquery(PlannerInfo *root, Node *jtnode,
                          RangeTblEntry *rte,
                          JoinExpr *lowest_outer_join,
                          JoinExpr *lowest_nulling_outer_join,
!                         AppendRelInfo *containing_appendrel);
  static Node *pull_up_simple_union_all(PlannerInfo *root, Node *jtnode,
                           RangeTblEntry *rte);
  static void pull_up_union_leaf_queries(Node *setOp, PlannerInfo *root,
--- 65,78 ----
  static Node *pull_up_subqueries_recurse(PlannerInfo *root, Node *jtnode,
                             JoinExpr *lowest_outer_join,
                             JoinExpr *lowest_nulling_outer_join,
!                            AppendRelInfo *containing_appendrel,
!                            bool deletion_ok);
  static Node *pull_up_simple_subquery(PlannerInfo *root, Node *jtnode,
                          RangeTblEntry *rte,
                          JoinExpr *lowest_outer_join,
                          JoinExpr *lowest_nulling_outer_join,
!                         AppendRelInfo *containing_appendrel,
!                         bool deletion_ok);
  static Node *pull_up_simple_union_all(PlannerInfo *root, Node *jtnode,
                           RangeTblEntry *rte);
  static void pull_up_union_leaf_queries(Node *setOp, PlannerInfo *root,
*************** static void pull_up_union_leaf_queries(N
*** 79,85 ****
  static void make_setop_translation_list(Query *query, Index newvarno,
                              List **translated_vars);
  static bool is_simple_subquery(Query *subquery, RangeTblEntry *rte,
!                    JoinExpr *lowest_outer_join);
  static bool is_simple_union_all(Query *subquery);
  static bool is_simple_union_all_recurse(Node *setOp, Query *setOpQuery,
                              List *colTypes);
--- 81,92 ----
  static void make_setop_translation_list(Query *query, Index newvarno,
                              List **translated_vars);
  static bool is_simple_subquery(Query *subquery, RangeTblEntry *rte,
!                    JoinExpr *lowest_outer_join,
!                    bool deletion_ok);
! static Node *pull_up_simple_values(PlannerInfo *root, Node *jtnode,
!                       RangeTblEntry *rte);
! static bool is_simple_values(PlannerInfo *root, RangeTblEntry *rte,
!                  bool deletion_ok);
  static bool is_simple_union_all(Query *subquery);
  static bool is_simple_union_all_recurse(Node *setOp, Query *setOpQuery,
                              List *colTypes);
*************** static Node *pullup_replace_vars_callbac
*** 95,100 ****
--- 102,108 ----
                               replace_rte_variables_context *context);
  static Query *pullup_replace_vars_subquery(Query *query,
                               pullup_replace_vars_context *context);
+ static Node *pull_up_subqueries_cleanup(Node *jtnode);
  static reduce_outer_joins_state *reduce_outer_joins_pass1(Node *jtnode);
  static void reduce_outer_joins_pass2(Node *jtnode,
                           reduce_outer_joins_state *state,
*************** inline_set_returning_functions(PlannerIn
*** 593,612 ****
   *        grouping/aggregation then we can merge it into the parent's jointree.
   *        Also, subqueries that are simple UNION ALL structures can be
   *        converted into "append relations".
-  *
-  * This recursively processes the jointree and returns a modified jointree.
   */
! Node *
! pull_up_subqueries(PlannerInfo *root, Node *jtnode)
  {
!     /* Start off with no containing join nor appendrel */
!     return pull_up_subqueries_recurse(root, jtnode, NULL, NULL, NULL);
  }

  /*
   * pull_up_subqueries_recurse
   *        Recursive guts of pull_up_subqueries.
   *
   * If this jointree node is within either side of an outer join, then
   * lowest_outer_join references the lowest such JoinExpr node; otherwise
   * it is NULL.  We use this to constrain the effects of LATERAL subqueries.
--- 601,633 ----
   *        grouping/aggregation then we can merge it into the parent's jointree.
   *        Also, subqueries that are simple UNION ALL structures can be
   *        converted into "append relations".
   */
! void
! pull_up_subqueries(PlannerInfo *root)
  {
!     /* Top level of jointree must always be a FromExpr */
!     Assert(IsA(root->parse->jointree, FromExpr));
!     /* Reset flag saying we need a deletion cleanup pass */
!     root->hasDeletedRTEs = false;
!     /* Recursion starts with no containing join nor appendrel */
!     root->parse->jointree = (FromExpr *)
!         pull_up_subqueries_recurse(root, (Node *) root->parse->jointree,
!                                    NULL, NULL, NULL, false);
!     /* Apply cleanup phase if necessary */
!     if (root->hasDeletedRTEs)
!         root->parse->jointree = (FromExpr *)
!             pull_up_subqueries_cleanup((Node *) root->parse->jointree);
!     Assert(IsA(root->parse->jointree, FromExpr));
  }

  /*
   * pull_up_subqueries_recurse
   *        Recursive guts of pull_up_subqueries.
   *
+  * This recursively processes the jointree and returns a modified jointree.
+  * Or, if it's valid to drop the current node from the jointree completely,
+  * it returns NULL.
+  *
   * If this jointree node is within either side of an outer join, then
   * lowest_outer_join references the lowest such JoinExpr node; otherwise
   * it is NULL.  We use this to constrain the effects of LATERAL subqueries.
*************** pull_up_subqueries(PlannerInfo *root, No
*** 622,649 ****
   * This forces use of the PlaceHolderVar mechanism for all non-Var targetlist
   * items, and puts some additional restrictions on what can be pulled up.
   *
   * A tricky aspect of this code is that if we pull up a subquery we have
   * to replace Vars that reference the subquery's outputs throughout the
   * parent query, including quals attached to jointree nodes above the one
   * we are currently processing!  We handle this by being careful not to
!  * change the jointree structure while recursing: no nodes other than
!  * subquery RangeTblRef entries will be replaced.  Also, we can't turn
!  * pullup_replace_vars loose on the whole jointree, because it'll return a
!  * mutated copy of the tree; we have to invoke it just on the quals, instead.
!  * This behavior is what makes it reasonable to pass lowest_outer_join and
!  * lowest_nulling_outer_join as pointers rather than some more-indirect way
!  * of identifying the lowest OJs.  Likewise, we don't replace append_rel_list
!  * members but only their substructure, so the containing_appendrel reference
!  * is safe to use.
   */
  static Node *
  pull_up_subqueries_recurse(PlannerInfo *root, Node *jtnode,
                             JoinExpr *lowest_outer_join,
                             JoinExpr *lowest_nulling_outer_join,
!                            AppendRelInfo *containing_appendrel)
  {
!     if (jtnode == NULL)
!         return NULL;
      if (IsA(jtnode, RangeTblRef))
      {
          int            varno = ((RangeTblRef *) jtnode)->rtindex;
--- 643,681 ----
   * This forces use of the PlaceHolderVar mechanism for all non-Var targetlist
   * items, and puts some additional restrictions on what can be pulled up.
   *
+  * deletion_ok is TRUE if the caller can cope with us returning NULL for a
+  * deletable leaf node (for example, a VALUES RTE that could be pulled up).
+  * If it's FALSE, we'll avoid pullup in such cases.
+  *
   * A tricky aspect of this code is that if we pull up a subquery we have
   * to replace Vars that reference the subquery's outputs throughout the
   * parent query, including quals attached to jointree nodes above the one
   * we are currently processing!  We handle this by being careful not to
!  * change the jointree structure while recursing: no nodes other than leaf
!  * RangeTblRef entries and entirely-empty FromExprs will be replaced or
!  * deleted.  Also, we can't turn pullup_replace_vars loose on the whole
!  * jointree, because it'll return a mutated copy of the tree; we have to
!  * invoke it just on the quals, instead.  This behavior is what makes it
!  * reasonable to pass lowest_outer_join and lowest_nulling_outer_join as
!  * pointers rather than some more-indirect way of identifying the lowest
!  * OJs.  Likewise, we don't replace append_rel_list members but only their
!  * substructure, so the containing_appendrel reference is safe to use.
!  *
!  * Because of the rule that no jointree nodes with substructure can be
!  * replaced, we cannot fully handle the case of deleting nodes from the tree:
!  * when we delete one child of a JoinExpr, we need to replace the JoinExpr
!  * with a FromExpr, and that can't happen here.  Instead, we set the
!  * root->hasDeletedRTEs flag, which tells pull_up_subqueries() that an
!  * additional pass over the tree is needed to clean up.
   */
  static Node *
  pull_up_subqueries_recurse(PlannerInfo *root, Node *jtnode,
                             JoinExpr *lowest_outer_join,
                             JoinExpr *lowest_nulling_outer_join,
!                            AppendRelInfo *containing_appendrel,
!                            bool deletion_ok)
  {
!     Assert(jtnode != NULL);
      if (IsA(jtnode, RangeTblRef))
      {
          int            varno = ((RangeTblRef *) jtnode)->rtindex;
*************** pull_up_subqueries_recurse(PlannerInfo *
*** 657,669 ****
           * unless is_safe_append_member says so.
           */
          if (rte->rtekind == RTE_SUBQUERY &&
!             is_simple_subquery(rte->subquery, rte, lowest_outer_join) &&
              (containing_appendrel == NULL ||
               is_safe_append_member(rte->subquery)))
              return pull_up_simple_subquery(root, jtnode, rte,
                                             lowest_outer_join,
                                             lowest_nulling_outer_join,
!                                            containing_appendrel);

          /*
           * Alternatively, is it a simple UNION ALL subquery?  If so, flatten
--- 689,703 ----
           * unless is_safe_append_member says so.
           */
          if (rte->rtekind == RTE_SUBQUERY &&
!             is_simple_subquery(rte->subquery, rte,
!                                lowest_outer_join, deletion_ok) &&
              (containing_appendrel == NULL ||
               is_safe_append_member(rte->subquery)))
              return pull_up_simple_subquery(root, jtnode, rte,
                                             lowest_outer_join,
                                             lowest_nulling_outer_join,
!                                            containing_appendrel,
!                                            deletion_ok);

          /*
           * Alternatively, is it a simple UNION ALL subquery?  If so, flatten
*************** pull_up_subqueries_recurse(PlannerInfo *
*** 678,696 ****
              is_simple_union_all(rte->subquery))
              return pull_up_simple_union_all(root, jtnode, rte);

          /* Otherwise, do nothing at this node. */
      }
      else if (IsA(jtnode, FromExpr))
      {
          FromExpr   *f = (FromExpr *) jtnode;
          ListCell   *l;

          Assert(containing_appendrel == NULL);
          foreach(l, f->fromlist)
              lfirst(l) = pull_up_subqueries_recurse(root, lfirst(l),
                                                     lowest_outer_join,
                                                     lowest_nulling_outer_join,
!                                                    NULL);
      }
      else if (IsA(jtnode, JoinExpr))
      {
--- 712,779 ----
              is_simple_union_all(rte->subquery))
              return pull_up_simple_union_all(root, jtnode, rte);

+         /*
+          * Or perhaps it's a simple VALUES RTE?
+          *
+          * We don't allow VALUES pullup below an outer join nor into an
+          * appendrel (such cases are impossible anyway at the moment).
+          */
+         if (rte->rtekind == RTE_VALUES &&
+             lowest_outer_join == NULL &&
+             containing_appendrel == NULL &&
+             is_simple_values(root, rte, deletion_ok))
+             return pull_up_simple_values(root, jtnode, rte);
+
          /* Otherwise, do nothing at this node. */
      }
      else if (IsA(jtnode, FromExpr))
      {
          FromExpr   *f = (FromExpr *) jtnode;
+         bool        have_undeleted_child = false;
          ListCell   *l;

          Assert(containing_appendrel == NULL);
+
+         /*
+          * If the FromExpr has quals, it's not deletable even if its parent
+          * would allow deletion.
+          */
+         if (f->quals)
+             deletion_ok = false;
+
          foreach(l, f->fromlist)
+         {
+             /*
+              * In a non-deletable FromExpr, we can allow deletion of child
+              * nodes so long as at least one child remains; so it's okay
+              * either if any previous child survives, or if there's more to
+              * come.  If all children are deletable in themselves, we'll force
+              * the last one to remain unflattened.
+              *
+              * As a separate matter, we can allow deletion of all children of
+              * the top-level FromExpr in a query, since that's a special case
+              * anyway.
+              */
+             bool        sub_deletion_ok = (deletion_ok ||
+                                            have_undeleted_child ||
+                                            lnext(l) != NULL ||
+                                            f == root->parse->jointree);
+
              lfirst(l) = pull_up_subqueries_recurse(root, lfirst(l),
                                                     lowest_outer_join,
                                                     lowest_nulling_outer_join,
!                                                    NULL,
!                                                    sub_deletion_ok);
!             if (lfirst(l) != NULL)
!                 have_undeleted_child = true;
!         }
!
!         if (deletion_ok && !have_undeleted_child)
!         {
!             /* OK to delete this FromExpr entirely */
!             root->hasDeletedRTEs = true;        /* probably is set already */
!             return NULL;
!         }
      }
      else if (IsA(jtnode, JoinExpr))
      {
*************** pull_up_subqueries_recurse(PlannerInfo *
*** 701,714 ****
          switch (j->jointype)
          {
              case JOIN_INNER:
                  j->larg = pull_up_subqueries_recurse(root, j->larg,
                                                       lowest_outer_join,
                                                     lowest_nulling_outer_join,
!                                                      NULL);
                  j->rarg = pull_up_subqueries_recurse(root, j->rarg,
                                                       lowest_outer_join,
                                                     lowest_nulling_outer_join,
!                                                      NULL);
                  break;
              case JOIN_LEFT:
              case JOIN_SEMI:
--- 784,805 ----
          switch (j->jointype)
          {
              case JOIN_INNER:
+
+                 /*
+                  * INNER JOIN can allow deletion of either child node, but not
+                  * both.  So right child gets permission to delete only if
+                  * left child didn't get removed.
+                  */
                  j->larg = pull_up_subqueries_recurse(root, j->larg,
                                                       lowest_outer_join,
                                                     lowest_nulling_outer_join,
!                                                      NULL,
!                                                      true);
                  j->rarg = pull_up_subqueries_recurse(root, j->rarg,
                                                       lowest_outer_join,
                                                     lowest_nulling_outer_join,
!                                                      NULL,
!                                                      j->larg != NULL);
                  break;
              case JOIN_LEFT:
              case JOIN_SEMI:
*************** pull_up_subqueries_recurse(PlannerInfo *
*** 716,746 ****
                  j->larg = pull_up_subqueries_recurse(root, j->larg,
                                                       j,
                                                     lowest_nulling_outer_join,
!                                                      NULL);
                  j->rarg = pull_up_subqueries_recurse(root, j->rarg,
                                                       j,
                                                       j,
!                                                      NULL);
                  break;
              case JOIN_FULL:
                  j->larg = pull_up_subqueries_recurse(root, j->larg,
                                                       j,
                                                       j,
!                                                      NULL);
                  j->rarg = pull_up_subqueries_recurse(root, j->rarg,
                                                       j,
                                                       j,
!                                                      NULL);
                  break;
              case JOIN_RIGHT:
                  j->larg = pull_up_subqueries_recurse(root, j->larg,
                                                       j,
                                                       j,
!                                                      NULL);
                  j->rarg = pull_up_subqueries_recurse(root, j->rarg,
                                                       j,
                                                     lowest_nulling_outer_join,
!                                                      NULL);
                  break;
              default:
                  elog(ERROR, "unrecognized join type: %d",
--- 807,843 ----
                  j->larg = pull_up_subqueries_recurse(root, j->larg,
                                                       j,
                                                     lowest_nulling_outer_join,
!                                                      NULL,
!                                                      false);
                  j->rarg = pull_up_subqueries_recurse(root, j->rarg,
                                                       j,
                                                       j,
!                                                      NULL,
!                                                      false);
                  break;
              case JOIN_FULL:
                  j->larg = pull_up_subqueries_recurse(root, j->larg,
                                                       j,
                                                       j,
!                                                      NULL,
!                                                      false);
                  j->rarg = pull_up_subqueries_recurse(root, j->rarg,
                                                       j,
                                                       j,
!                                                      NULL,
!                                                      false);
                  break;
              case JOIN_RIGHT:
                  j->larg = pull_up_subqueries_recurse(root, j->larg,
                                                       j,
                                                       j,
!                                                      NULL,
!                                                      false);
                  j->rarg = pull_up_subqueries_recurse(root, j->rarg,
                                                       j,
                                                     lowest_nulling_outer_join,
!                                                      NULL,
!                                                      false);
                  break;
              default:
                  elog(ERROR, "unrecognized join type: %d",
*************** pull_up_subqueries_recurse(PlannerInfo *
*** 760,767 ****
   *
   * jtnode is a RangeTblRef that has been tentatively identified as a simple
   * subquery by pull_up_subqueries.  We return the replacement jointree node,
!  * or jtnode itself if we determine that the subquery can't be pulled up after
!  * all.
   *
   * rte is the RangeTblEntry referenced by jtnode.  Remaining parameters are
   * as for pull_up_subqueries_recurse.
--- 857,864 ----
   *
   * jtnode is a RangeTblRef that has been tentatively identified as a simple
   * subquery by pull_up_subqueries.  We return the replacement jointree node,
!  * or NULL if the subquery can be deleted entirely, or jtnode itself if we
!  * determine that the subquery can't be pulled up after all.
   *
   * rte is the RangeTblEntry referenced by jtnode.  Remaining parameters are
   * as for pull_up_subqueries_recurse.
*************** static Node *
*** 770,776 ****
  pull_up_simple_subquery(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte,
                          JoinExpr *lowest_outer_join,
                          JoinExpr *lowest_nulling_outer_join,
!                         AppendRelInfo *containing_appendrel)
  {
      Query       *parse = root->parse;
      int            varno = ((RangeTblRef *) jtnode)->rtindex;
--- 867,874 ----
  pull_up_simple_subquery(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte,
                          JoinExpr *lowest_outer_join,
                          JoinExpr *lowest_nulling_outer_join,
!                         AppendRelInfo *containing_appendrel,
!                         bool deletion_ok)
  {
      Query       *parse = root->parse;
      int            varno = ((RangeTblRef *) jtnode)->rtindex;
*************** pull_up_simple_subquery(PlannerInfo *roo
*** 832,845 ****
       * pull_up_subqueries' processing is complete for its jointree and
       * rangetable.
       *
!      * Note: we should pass NULL for containing-join info even if we are
!      * within an outer join in the upper query; the lower query starts with a
!      * clean slate for outer-join semantics.  Likewise, we say we aren't
!      * handling an appendrel member.
       */
!     subquery->jointree = (FromExpr *)
!         pull_up_subqueries_recurse(subroot, (Node *) subquery->jointree,
!                                    NULL, NULL, NULL);

      /*
       * Now we must recheck whether the subquery is still simple enough to pull
--- 930,941 ----
       * pull_up_subqueries' processing is complete for its jointree and
       * rangetable.
       *
!      * Note: it's okay that the subquery's recursion starts with NULL for
!      * containing-join info, even if we are within an outer join in the upper
!      * query; the lower query starts with a clean slate for outer-join
!      * semantics.  Likewise, we needn't pass down appendrel state.
       */
!     pull_up_subqueries(subroot);

      /*
       * Now we must recheck whether the subquery is still simple enough to pull
*************** pull_up_simple_subquery(PlannerInfo *roo
*** 849,855 ****
       * easier just to keep this "if" looking the same as the one in
       * pull_up_subqueries_recurse.
       */
!     if (is_simple_subquery(subquery, rte, lowest_outer_join) &&
          (containing_appendrel == NULL || is_safe_append_member(subquery)))
      {
          /* good to go */
--- 945,952 ----
       * easier just to keep this "if" looking the same as the one in
       * pull_up_subqueries_recurse.
       */
!     if (is_simple_subquery(subquery, rte,
!                            lowest_outer_join, deletion_ok) &&
          (containing_appendrel == NULL || is_safe_append_member(subquery)))
      {
          /* good to go */
*************** pull_up_simple_subquery(PlannerInfo *roo
*** 1075,1082 ****

      /*
       * Return the adjusted subquery jointree to replace the RangeTblRef entry
!      * in parent's jointree.
       */
      return (Node *) subquery->jointree;
  }

--- 1172,1189 ----

      /*
       * Return the adjusted subquery jointree to replace the RangeTblRef entry
!      * in parent's jointree; or, if we're flattening a subquery with empty
!      * FROM list, return NULL to signal deletion of the subquery from the
!      * parent jointree (and set hasDeletedRTEs to ensure cleanup later).
       */
+     if (subquery->jointree->fromlist == NIL)
+     {
+         Assert(deletion_ok);
+         Assert(subquery->jointree->quals == NULL);
+         root->hasDeletedRTEs = true;
+         return NULL;
+     }
+
      return (Node *) subquery->jointree;
  }

*************** pull_up_union_leaf_queries(Node *setOp,
*** 1203,1214 ****
           * must build the AppendRelInfo first, because this will modify it.)
           * Note that we can pass NULL for containing-join info even if we're
           * actually under an outer join, because the child's expressions
!          * aren't going to propagate up to the join.
           */
          rtr = makeNode(RangeTblRef);
          rtr->rtindex = childRTindex;
          (void) pull_up_subqueries_recurse(root, (Node *) rtr,
!                                           NULL, NULL, appinfo);
      }
      else if (IsA(setOp, SetOperationStmt))
      {
--- 1310,1324 ----
           * must build the AppendRelInfo first, because this will modify it.)
           * Note that we can pass NULL for containing-join info even if we're
           * actually under an outer join, because the child's expressions
!          * aren't going to propagate up to the join.  Also, we ignore the
!          * possibility that pull_up_subqueries_recurse() returns a different
!          * jointree node than what we pass it; if it does, the important thing
!          * is that it replaced the child relid in the AppendRelInfo node.
           */
          rtr = makeNode(RangeTblRef);
          rtr->rtindex = childRTindex;
          (void) pull_up_subqueries_recurse(root, (Node *) rtr,
!                                           NULL, NULL, appinfo, false);
      }
      else if (IsA(setOp, SetOperationStmt))
      {
*************** make_setop_translation_list(Query *query
*** 1263,1272 ****
   * (Note subquery is not necessarily equal to rte->subquery; it could be a
   * processed copy of that.)
   * lowest_outer_join is the lowest outer join above the subquery, or NULL.
   */
  static bool
  is_simple_subquery(Query *subquery, RangeTblEntry *rte,
!                    JoinExpr *lowest_outer_join)
  {
      /*
       * Let's just make sure it's a valid subselect ...
--- 1373,1384 ----
   * (Note subquery is not necessarily equal to rte->subquery; it could be a
   * processed copy of that.)
   * lowest_outer_join is the lowest outer join above the subquery, or NULL.
+  * deletion_ok is TRUE if it'd be okay to delete the subquery entirely.
   */
  static bool
  is_simple_subquery(Query *subquery, RangeTblEntry *rte,
!                    JoinExpr *lowest_outer_join,
!                    bool deletion_ok)
  {
      /*
       * Let's just make sure it's a valid subselect ...
*************** is_simple_subquery(Query *subquery, Rang
*** 1315,1320 ****
--- 1427,1455 ----
          return false;

      /*
+      * Don't pull up a subquery with an empty jointree, unless it has no quals
+      * and deletion_ok is TRUE.  query_planner() will correctly generate a
+      * Result plan for a jointree that's totally empty, but we can't cope with
+      * an empty FromExpr appearing lower down in a jointree: we identify join
+      * rels via baserelid sets, so we couldn't distinguish a join containing
+      * such a FromExpr from one without it.  This would for example break the
+      * PlaceHolderVar mechanism, since we'd have no way to identify where to
+      * evaluate a PHV coming out of the subquery.  We can only handle such
+      * cases if the place where the subquery is linked is a FromExpr or inner
+      * JOIN that would still be nonempty after removal of the subquery, so
+      * that it's still identifiable via its contained baserelids.  Safe
+      * contexts are signaled by deletion_ok.  But even in a safe context, we
+      * must keep the subquery if it has any quals, because it's unclear where
+      * to put them in the upper query.  (Note that deletion of a subquery is
+      * also dependent on the check below that its targetlist contains no
+      * set-returning functions.  Deletion from a FROM list or inner JOIN is
+      * okay only if the subquery must return exactly one row.)
+      */
+     if (subquery->jointree->fromlist == NIL &&
+         (subquery->jointree->quals || !deletion_ok))
+         return false;
+
+     /*
       * If the subquery is LATERAL, check for pullup restrictions from that.
       */
      if (rte->lateral)
*************** is_simple_subquery(Query *subquery, Rang
*** 1373,1379 ****
       * Don't pull up a subquery that has any set-returning functions in its
       * targetlist.  Otherwise we might well wind up inserting set-returning
       * functions into places where they mustn't go, such as quals of higher
!      * queries.
       */
      if (expression_returns_set((Node *) subquery->targetList))
          return false;
--- 1508,1514 ----
       * Don't pull up a subquery that has any set-returning functions in its
       * targetlist.  Otherwise we might well wind up inserting set-returning
       * functions into places where they mustn't go, such as quals of higher
!      * queries.  This also ensures deletion of an empty jointree is valid.
       */
      if (expression_returns_set((Node *) subquery->targetList))
          return false;
*************** is_simple_subquery(Query *subquery, Rang
*** 1389,1407 ****
      if (contain_volatile_functions((Node *) subquery->targetList))
          return false;

      /*
!      * Don't pull up a subquery with an empty jointree.  query_planner() will
!      * correctly generate a Result plan for a jointree that's totally empty,
!      * but we can't cope with an empty FromExpr appearing lower down in a
!      * jointree: we identify join rels via baserelid sets, so we couldn't
!      * distinguish a join containing such a FromExpr from one without it. This
!      * would for example break the PlaceHolderVar mechanism, since we'd have
!      * no way to identify where to evaluate a PHV coming out of the subquery.
!      * Not worth working hard on this, just to collapse SubqueryScan/Result
!      * into Result; especially since the SubqueryScan can often be optimized
!      * away by setrefs.c anyway.
       */
!     if (subquery->jointree->fromlist == NIL)
          return false;

      return true;
--- 1524,1682 ----
      if (contain_volatile_functions((Node *) subquery->targetList))
          return false;

+     return true;
+ }
+
+ /*
+  * pull_up_simple_values
+  *        Pull up a single simple VALUES RTE.
+  *
+  * jtnode is a RangeTblRef that has been identified as a simple VALUES RTE
+  * by pull_up_subqueries.  We always return NULL indicating that the RTE
+  * can be deleted entirely (all failure cases should have been detected by
+  * is_simple_values()).
+  *
+  * rte is the RangeTblEntry referenced by jtnode.  Because of the limited
+  * possible usage of VALUES RTEs, we do not need the remaining parameters
+  * of pull_up_subqueries_recurse.
+  */
+ static Node *
+ pull_up_simple_values(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte)
+ {
+     Query       *parse = root->parse;
+     int            varno = ((RangeTblRef *) jtnode)->rtindex;
+     List       *values_list;
+     List       *tlist;
+     AttrNumber    attrno;
+     pullup_replace_vars_context rvcontext;
+     ListCell   *lc;
+
+     Assert(rte->rtekind == RTE_VALUES);
+     Assert(list_length(rte->values_lists) == 1);
+
      /*
!      * Need a modifiable copy of the VALUES list to hack on, just in case it's
!      * multiply referenced.
       */
!     values_list = (List *) copyObject(linitial(rte->values_lists));
!
!     /*
!      * The VALUES RTE can't contain any Vars of level zero, let alone any that
!      * are join aliases, so no need to flatten join alias Vars.
!      */
!     Assert(!contain_vars_of_level((Node *) values_list, 0));
!
!     /*
!      * Set up required context data for pullup_replace_vars.  In particular,
!      * we have to make the VALUES list look like a subquery targetlist.
!      */
!     tlist = NIL;
!     attrno = 1;
!     foreach(lc, values_list)
!     {
!         tlist = lappend(tlist,
!                         makeTargetEntry((Expr *) lfirst(lc),
!                                         attrno,
!                                         NULL,
!                                         false));
!         attrno++;
!     }
!     rvcontext.root = root;
!     rvcontext.targetlist = tlist;
!     rvcontext.target_rte = rte;
!     rvcontext.relids = NULL;
!     rvcontext.outer_hasSubLinks = &parse->hasSubLinks;
!     rvcontext.varno = varno;
!     rvcontext.need_phvs = false;
!     rvcontext.wrap_non_vars = false;
!     /* initialize cache array with indexes 0 .. length(tlist) */
!     rvcontext.rv_cache = palloc0((list_length(tlist) + 1) *
!                                  sizeof(Node *));
!
!     /*
!      * Replace all of the top query's references to the RTE's outputs with
!      * copies of the adjusted VALUES expressions, being careful not to replace
!      * any of the jointree structure. (This'd be a lot cleaner if we could use
!      * query_tree_mutator.)  Much of this should be no-ops in the dummy Query
!      * that surrounds a VALUES RTE, but it's not enough code to be worth
!      * removing.
!      */
!     parse->targetList = (List *)
!         pullup_replace_vars((Node *) parse->targetList, &rvcontext);
!     parse->returningList = (List *)
!         pullup_replace_vars((Node *) parse->returningList, &rvcontext);
!     replace_vars_in_jointree((Node *) parse->jointree, &rvcontext, NULL);
!     Assert(parse->setOperations == NULL);
!     parse->havingQual = pullup_replace_vars(parse->havingQual, &rvcontext);
!
!     /*
!      * There should be no appendrels to fix, nor any join alias Vars, nor any
!      * outer joins and hence no PlaceHolderVars.
!      */
!     Assert(root->append_rel_list == NIL);
!     Assert(list_length(parse->rtable) == 1);
!     Assert(root->join_info_list == NIL);
!     Assert(root->lateral_info_list == NIL);
!     Assert(root->placeholder_list == NIL);
!
!     /*
!      * Return NULL to signal deletion of the VALUES RTE from the parent
!      * jointree (and set hasDeletedRTEs to ensure cleanup later).
!      */
!     root->hasDeletedRTEs = true;
!     return NULL;
! }
!
! /*
!  * is_simple_values
!  *      Check a VALUES RTE in the range table to see if it's simple enough
!  *      to pull up into the parent query.
!  *
!  * rte is the RTE_VALUES RangeTblEntry to check.
!  * deletion_ok is TRUE if it'd be okay to delete the VALUES RTE entirely.
!  */
! static bool
! is_simple_values(PlannerInfo *root, RangeTblEntry *rte, bool deletion_ok)
! {
!     Assert(rte->rtekind == RTE_VALUES);
!
!     /*
!      * We can only pull up a VALUES RTE if deletion_ok is TRUE.  It's
!      * basically the same case as a sub-select with empty FROM list; see
!      * comments in is_simple_subquery().
!      */
!     if (!deletion_ok)
!         return false;
!
!     /*
!      * Also, there must be exactly one VALUES list, else it's not semantically
!      * correct to delete the VALUES RTE.
!      */
!     if (list_length(rte->values_lists) != 1)
!         return false;
!
!     /*
!      * Because VALUES can't appear under an outer join (or at least, we won't
!      * try to pull it up if it does), we need not worry about LATERAL.
!      */
!
!     /*
!      * Don't pull up a VALUES that contains any set-returning or volatile
!      * functions.  Again, the considerations here are basically identical to
!      * restrictions on a subquery's targetlist.
!      */
!     if (expression_returns_set((Node *) rte->values_lists) ||
!         contain_volatile_functions((Node *) rte->values_lists))
!         return false;
!
!     /*
!      * Do not pull up a VALUES that's not the only RTE in its parent query.
!      * This is actually the only case that the parser will generate at the
!      * moment, and assuming this is true greatly simplifies
!      * pull_up_simple_values().
!      */
!     if (list_length(root->parse->rtable) != 1 ||
!         rte != (RangeTblEntry *) linitial(root->parse->rtable))
          return false;

      return true;
*************** pullup_replace_vars_subquery(Query *quer
*** 1909,1914 ****
--- 2184,2248 ----
                                             NULL);
  }

+ /*
+  * pull_up_subqueries_cleanup
+  *        Recursively fix up jointree after deletion of some subqueries.
+  *
+  * The jointree now contains some NULL subtrees, which we need to get rid of.
+  * In a FromExpr, just rebuild the child-node list with null entries deleted.
+  * In an inner JOIN, replace the JoinExpr node with a one-child FromExpr.
+  */
+ static Node *
+ pull_up_subqueries_cleanup(Node *jtnode)
+ {
+     Assert(jtnode != NULL);
+     if (IsA(jtnode, RangeTblRef))
+     {
+         /* Nothing to do at leaf nodes. */
+     }
+     else if (IsA(jtnode, FromExpr))
+     {
+         FromExpr   *f = (FromExpr *) jtnode;
+         List       *newfrom = NIL;
+         ListCell   *l;
+
+         foreach(l, f->fromlist)
+         {
+             Node       *child = (Node *) lfirst(l);
+
+             if (child == NULL)
+                 continue;
+             child = pull_up_subqueries_cleanup(child);
+             newfrom = lappend(newfrom, child);
+         }
+         f->fromlist = newfrom;
+     }
+     else if (IsA(jtnode, JoinExpr))
+     {
+         JoinExpr   *j = (JoinExpr *) jtnode;
+
+         if (j->larg)
+             j->larg = pull_up_subqueries_cleanup(j->larg);
+         if (j->rarg)
+             j->rarg = pull_up_subqueries_cleanup(j->rarg);
+         if (j->larg == NULL)
+         {
+             Assert(j->jointype == JOIN_INNER);
+             Assert(j->rarg != NULL);
+             return (Node *) makeFromExpr(list_make1(j->rarg), j->quals);
+         }
+         else if (j->rarg == NULL)
+         {
+             Assert(j->jointype == JOIN_INNER);
+             return (Node *) makeFromExpr(list_make1(j->larg), j->quals);
+         }
+     }
+     else
+         elog(ERROR, "unrecognized node type: %d",
+              (int) nodeTag(jtnode));
+     return jtnode;
+ }
+

  /*
   * flatten_simple_union_all
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 6845a40..b221086 100644
*** a/src/include/nodes/relation.h
--- b/src/include/nodes/relation.h
*************** typedef struct PlannerInfo
*** 245,250 ****
--- 245,251 ----
                                           * inheritance child rel */
      bool        hasJoinRTEs;    /* true if any RTEs are RTE_JOIN kind */
      bool        hasLateralRTEs; /* true if any RTEs are marked LATERAL */
+     bool        hasDeletedRTEs; /* true if any RTE was deleted from jointree */
      bool        hasHavingQual;    /* true if havingQual was non-null */
      bool        hasPseudoConstantQuals; /* true if any RestrictInfo has
                                           * pseudoconstant = true */
diff --git a/src/include/optimizer/prep.h b/src/include/optimizer/prep.h
index 385f4a8..05e46c5 100644
*** a/src/include/optimizer/prep.h
--- b/src/include/optimizer/prep.h
***************
*** 23,29 ****
   */
  extern void pull_up_sublinks(PlannerInfo *root);
  extern void inline_set_returning_functions(PlannerInfo *root);
! extern Node *pull_up_subqueries(PlannerInfo *root, Node *jtnode);
  extern void flatten_simple_union_all(PlannerInfo *root);
  extern void reduce_outer_joins(PlannerInfo *root);
  extern Relids get_relids_in_jointree(Node *jtnode, bool include_joins);
--- 23,29 ----
   */
  extern void pull_up_sublinks(PlannerInfo *root);
  extern void inline_set_returning_functions(PlannerInfo *root);
! extern void pull_up_subqueries(PlannerInfo *root);
  extern void flatten_simple_union_all(PlannerInfo *root);
  extern void reduce_outer_joins(PlannerInfo *root);
  extern Relids get_relids_in_jointree(Node *jtnode, bool include_joins);
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index ca3a17b..88a373b 100644
*** a/src/test/regress/expected/join.out
--- b/src/test/regress/expected/join.out
*************** select * from generate_series(100,200) g
*** 3595,3600 ****
--- 3595,3620 ----
  explain (costs off)
    select count(*) from tenk1 a,
      tenk1 b join lateral (values(a.unique1)) ss(x) on b.unique2 = ss.x;
+                          QUERY PLAN
+ ------------------------------------------------------------
+  Aggregate
+    ->  Merge Join
+          Merge Cond: (a.unique1 = b.unique2)
+          ->  Index Only Scan using tenk1_unique1 on tenk1 a
+          ->  Index Only Scan using tenk1_unique2 on tenk1 b
+ (5 rows)
+
+ select count(*) from tenk1 a,
+   tenk1 b join lateral (values(a.unique1)) ss(x) on b.unique2 = ss.x;
+  count
+ -------
+  10000
+ (1 row)
+
+ -- lateral with VALUES, no flattening possible
+ explain (costs off)
+   select count(*) from tenk1 a,
+     tenk1 b join lateral (values(a.unique1),(-1)) ss(x) on b.unique2 = ss.x;
                              QUERY PLAN
  ------------------------------------------------------------------
   Aggregate
*************** explain (costs off)
*** 3608,3614 ****
  (8 rows)

  select count(*) from tenk1 a,
!   tenk1 b join lateral (values(a.unique1)) ss(x) on b.unique2 = ss.x;
   count
  -------
   10000
--- 3628,3634 ----
  (8 rows)

  select count(*) from tenk1 a,
!   tenk1 b join lateral (values(a.unique1),(-1)) ss(x) on b.unique2 = ss.x;
   count
  -------
   10000
*************** select * from
*** 4176,4182 ****
      cross join
      lateral (select q1, coalesce(ss1.x,q2) as y from int8_tbl d) ss2
    ) on c.q2 = ss2.q1,
!   lateral (select ss2.y) ss3;
                                                                                    QUERY PLAN
                                                        

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   Nested Loop
--- 4196,4202 ----
      cross join
      lateral (select q1, coalesce(ss1.x,q2) as y from int8_tbl d) ss2
    ) on c.q2 = ss2.q1,
!   lateral (select ss2.y offset 0) ss3;
                                                                                    QUERY PLAN
                                                        

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   Nested Loop
*************** select c.*,a.*,ss1.q1,ss2.q1,ss3.* from
*** 4258,4266 ****
  -- check processing of postponed quals (bug #9041)
  explain (verbose, costs off)
  select * from
!   (select 1 as x) x cross join (select 2 as y) y
    left join lateral (
!     select * from (select 3 as z) z where z.z = x.x
    ) zz on zz.z = y.y;
                    QUERY PLAN
  ----------------------------------------------
--- 4278,4286 ----
  -- check processing of postponed quals (bug #9041)
  explain (verbose, costs off)
  select * from
!   (select 1 as x offset 0) x cross join (select 2 as y offset 0) y
    left join lateral (
!     select * from (select 3 as z offset 0) z where z.z = x.x
    ) zz on zz.z = y.y;
                    QUERY PLAN
  ----------------------------------------------
diff --git a/src/test/regress/expected/rangefuncs.out b/src/test/regress/expected/rangefuncs.out
index 7991e99..6dabe50 100644
*** a/src/test/regress/expected/rangefuncs.out
--- b/src/test/regress/expected/rangefuncs.out
*************** select x from int8_tbl, extractq2(int8_t
*** 2034,2040 ****
  (5 rows)

  create function extractq2_2(t int8_tbl) returns table(ret1 int8) as $$
!   select extractq2(t)
  $$ language sql immutable;
  explain (verbose, costs off)
  select x from int8_tbl, extractq2_2(int8_tbl) f(x);
--- 2034,2040 ----
  (5 rows)

  create function extractq2_2(t int8_tbl) returns table(ret1 int8) as $$
!   select extractq2(t) offset 0
  $$ language sql immutable;
  explain (verbose, costs off)
  select x from int8_tbl, extractq2_2(int8_tbl) f(x);
*************** select x from int8_tbl, extractq2_2(int8
*** 2058,2060 ****
--- 2058,2082 ----
   -4567890123456789
  (5 rows)

+ -- without the "offset 0", this function gets optimized quite differently
+ create function extractq2_2_opt(t int8_tbl) returns table(ret1 int8) as $$
+   select extractq2(t)
+ $$ language sql immutable;
+ explain (verbose, costs off)
+ select x from int8_tbl, extractq2_2_opt(int8_tbl) f(x);
+          QUERY PLAN
+ -----------------------------
+  Seq Scan on public.int8_tbl
+    Output: int8_tbl.q2
+ (2 rows)
+
+ select x from int8_tbl, extractq2_2_opt(int8_tbl) f(x);
+          x
+ -------------------
+                456
+   4567890123456789
+                123
+   4567890123456789
+  -4567890123456789
+ (5 rows)
+
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 6005476..82e5789 100644
*** a/src/test/regress/sql/join.sql
--- b/src/test/regress/sql/join.sql
*************** explain (costs off)
*** 1113,1118 ****
--- 1113,1125 ----
  select count(*) from tenk1 a,
    tenk1 b join lateral (values(a.unique1)) ss(x) on b.unique2 = ss.x;

+ -- lateral with VALUES, no flattening possible
+ explain (costs off)
+   select count(*) from tenk1 a,
+     tenk1 b join lateral (values(a.unique1),(-1)) ss(x) on b.unique2 = ss.x;
+ select count(*) from tenk1 a,
+   tenk1 b join lateral (values(a.unique1),(-1)) ss(x) on b.unique2 = ss.x;
+
  -- lateral injecting a strange outer join condition
  explain (costs off)
    select * from int8_tbl a,
*************** select * from
*** 1223,1229 ****
      cross join
      lateral (select q1, coalesce(ss1.x,q2) as y from int8_tbl d) ss2
    ) on c.q2 = ss2.q1,
!   lateral (select ss2.y) ss3;

  -- case that breaks the old ph_may_need optimization
  explain (verbose, costs off)
--- 1230,1236 ----
      cross join
      lateral (select q1, coalesce(ss1.x,q2) as y from int8_tbl d) ss2
    ) on c.q2 = ss2.q1,
!   lateral (select ss2.y offset 0) ss3;

  -- case that breaks the old ph_may_need optimization
  explain (verbose, costs off)
*************** select c.*,a.*,ss1.q1,ss2.q1,ss3.* from
*** 1241,1249 ****
  -- check processing of postponed quals (bug #9041)
  explain (verbose, costs off)
  select * from
!   (select 1 as x) x cross join (select 2 as y) y
    left join lateral (
!     select * from (select 3 as z) z where z.z = x.x
    ) zz on zz.z = y.y;

  -- test some error cases where LATERAL should have been used but wasn't
--- 1248,1256 ----
  -- check processing of postponed quals (bug #9041)
  explain (verbose, costs off)
  select * from
!   (select 1 as x offset 0) x cross join (select 2 as y offset 0) y
    left join lateral (
!     select * from (select 3 as z offset 0) z where z.z = x.x
    ) zz on zz.z = y.y;

  -- test some error cases where LATERAL should have been used but wasn't
diff --git a/src/test/regress/sql/rangefuncs.sql b/src/test/regress/sql/rangefuncs.sql
index 470571b..9484023 100644
*** a/src/test/regress/sql/rangefuncs.sql
--- b/src/test/regress/sql/rangefuncs.sql
*************** select x from int8_tbl, extractq2(int8_t
*** 621,630 ****
  select x from int8_tbl, extractq2(int8_tbl) f(x);

  create function extractq2_2(t int8_tbl) returns table(ret1 int8) as $$
!   select extractq2(t)
  $$ language sql immutable;

  explain (verbose, costs off)
  select x from int8_tbl, extractq2_2(int8_tbl) f(x);

  select x from int8_tbl, extractq2_2(int8_tbl) f(x);
--- 621,641 ----
  select x from int8_tbl, extractq2(int8_tbl) f(x);

  create function extractq2_2(t int8_tbl) returns table(ret1 int8) as $$
!   select extractq2(t) offset 0
  $$ language sql immutable;

  explain (verbose, costs off)
  select x from int8_tbl, extractq2_2(int8_tbl) f(x);

  select x from int8_tbl, extractq2_2(int8_tbl) f(x);
+
+ -- without the "offset 0", this function gets optimized quite differently
+
+ create function extractq2_2_opt(t int8_tbl) returns table(ret1 int8) as $$
+   select extractq2(t)
+ $$ language sql immutable;
+
+ explain (verbose, costs off)
+ select x from int8_tbl, extractq2_2_opt(int8_tbl) f(x);
+
+ select x from int8_tbl, extractq2_2_opt(int8_tbl) f(x);

Re: Clamping reulst row number of joins.

От

Kyotaro HORIGUCHI

Дата:

09 марта 2015 г., 07:43:48

Hello, thank you for the considerations by all of you, but I have
described incorrectly the situation. I'm terribly sorry for the
imprecise description and for your additional work based on it.

The point of this issue is not VAULES but the nature of Append
and NestLoop.  Nested Loop can fail to have correct estimation
when it contains Apeend node on insufficiently analyzed
inhertance parent or UNION ALL. The latter is not saved by Tom's
patch nor proper analyzing.

Let me try to redescribe the issue.

At Sun, 08 Mar 2015 19:30:36 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in <21519.1425857436@sss.pgh.pa.us>
> I wrote:
> >> Stephen Frost <sfrost@snowman.net> writes:
> >>> I've certainly seen and used values() constructs in joins for a variety
> >>> of reasons and I do think it'd be worthwhile for the planner to know how
> >>> to pull up a VALUES construct.

The query reported to me didn't included VALUES.  I implicitly
used it as an shortcut to make Append node that cannot retrive
statistics on itself. It was the main cause of this confision.

Addition to that, there was somehow no statistics on the
inheritance parent (not as a child), which I didn't mentioned. I
don't know how the situation was made but I can guess manual
analyzes as a cause.

After all, the exact situation will emerge by following steps.

CREATE TABLE p (a int);
CREATE TABLE c1 (like p) INHERITS (p);
INSERT INTO c1 (SELECT (a%2)*500000 + a / 2 FROM generate_series(0, 1000000) a);
CREATE INDEX i_c1_a on c1 (a);
CREATE TABLE t1 (a int);
INSERT INTO t1 VALUES (1000000);
DELETE FROM pg_statistics;
ANALYZE t1;
ANALYZE c1;
SET join_collapse_limit TO 1;
SET random_page_cost TO 8.0;
SET seq_page_cost TO 0.5;
EXPLAIN analyzE SELECT a FROM p t WHERE a IN (select a from t1);
-----------------------------------------------------------------
> Nested Loop  (cost=1.01..9.49 rows=500001 width=4)
>              (actual time=0.041..0.041 rows=0 loops=1)
>    ->  HashAggregate  (cost=1.01..1.02 rows=1 width=4)
>                       (actual time=0.017..0.017 rows=1 loops=1)
>          Group Key: t1.a
>          ->  Seq Scan on t1  (cost=0.00..1.01 rows=1 width=4)
>                              (actual time=0.007..0.009 rows=1 loops=1)
>    ->  Append  (cost=0.00..8.45 rows=2 width=4)
>                (actual time=0.018..0.018 rows=0 loops=1)
>          ->  Seq Scan on p t  (cost=0.00..0.00 rows=1 width=4)
>                               (actual time=0.000..0.000 rows=0 loops=1)
>                Filter: (t1.a = a)
>          ->  Index Only Scan using i_c1_a on c1 t_1
>                   (cost=0.42..8.45 rows=1 width=4)
>                   (actual time=0.012..0.012 rows=0 loops=1)
>                Index Cond: (a = t1.a)
>                Heap Fetches: 0
>  Planning time: 0.408 ms
>  Execution time: 0.135 ms
> (12 rows)


The nested loop above decided that rows to be 500001 because
seleqfunc_semi couldn't get MVC list for p (as the parent) and
returns 0.5 as the default join selectivity.

Therefore, ANALYZE p makes this work sane for the case.

ANALYZE p;

EXPLAIN analyzE SELECT a FROM p t WHERE a IN (select a from t1);
-----------------------------------------------------------------
> Nested Loop  (cost=1.01..9.49 rows=1 width=4)

I thought that the case of apparently wrong row numbers would be
saved by clampling the number of result rows as I mentioned but
since it is saved by the proper analyzes, the necessity is rather
low if analyze works always. Still, the following example cannot
be saved even by analyze.

CREATE TABLE c2 (LIKE p);
ANALYZE p;
EXPLAIN analyzE select a FROM (SELECT a FROM c1 UNION ALL SELECT a from c2) t                        WHERE a IN (select
afrom t1);                                  QUERY PLAN
 
---------------------------------------------------------------------------Nested Loop  (cost=1.44..51.48 rows=501276
width=4)            (actual time=0.046..0.046 rows=0 loops=1)  ->  HashAggregate  (cost=1.01..1.02 rows=1 width=4)
              (actual time=0.018..0.019 rows=1 loops=1)        Group Key: t1.a        ->  Seq Scan on t1
(cost=0.00..1.01rows=1 width=4)                            (actual time=0.008..0.010 rows=1 loops=1)  ->  Append
(cost=0.42..50.32rows=14 width=4)              (actual time=0.023..0.023 rows=0 loops=1)        ->  Index Only Scan
usingi_c1_a on c1               (cost=0.42..8.45 rows=1 width=4)               (actual time=0.016..0.016 rows=0
loops=1)             Index Cond: (a = t1.a)              Heap Fetches: 0        ->  Seq Scan on c2  (cost=0.00..41.88
rows=13width=4)                            (actual time=0.001..0.001 rows=0 loops=1)              Filter: (t1.a =
a)Planningtime: 0.384 msExecution time: 0.185 ms
 
(12 rows)

14 * 1 = 501276... Of course this cannot be saved by the
values-flatten patch unfortunately...

I think this is a not so rare case..



==============================================================
> > I spent a bit of time looking at this, and realized that the blocker
> > is the same as the reason why we don't pull up sub-selects with empty
> > rangetables ("SELECT expression(s)").  Namely, that the upper query's
> > jointree can't handle a null subtree.  (This is not only a matter of
> > the jointree data structure, but the fact that the whole planner
> > identifies relations by bitmapsets of RTE indexes, and subtrees with
> > empty RTE sets couldn't be told apart.)
> 
> > We could probably fix both cases for order-of-a-hundred lines of new code
> > in prepjointree.  The plan I'm thinking about is to allow such vacuous
> > subquery jointrees to be pulled up, but only if they are in a place in
> > the upper query's jointree where it's okay to delete the subtree.  This
> > would basically be two cases: (1) the immediate parent is a FromExpr that
> > would have at least one remaining child, or (2) the immediate parent is
> > an INNER JOIN whose other child isn't also being deleted (so that we can
> > convert the JoinExpr to a nonempty FromExpr, or just use the other child
> > as-is if the JoinExpr has no quals).
> 
> Here's a draft patch along those lines.  Unsurprisingly, it changes the
> plans generated for a number of regression-test queries.  In most cases
> I felt it desirable to force the old plan shape to be retained (by
> inserting "offset 0" or equivalent) because the test was trying to test
> proper generation of a query plan of that shape.  I did add a couple
> cases where the optimization was allowed to go through.
> 
> The patch is a bit bigger than I'd hoped (a net of about 330 lines added
> to prepjointree.c), but it's not hugely ugly, and it doesn't add any
> meaningful overhead in cases where no optimization happens.  Barring
> objections I will commit this.

However, I feel I'm in charge of looking this:) as far as I can.

The patch removes inner joins which contain a child which can be
treated as a value converting the join condition into a simple(?)
qual. The intension looks (as far as I can see) ok and to work as
intended.

It works for very limited simple cases, inner join with VALUES
with one values list, inner join with subselect without from list
and having stable tlist. As such, my first example,

> EXPLAIN SELECT a FROM (SELECT a FROM t1 UNION ALL SELECT 0) t
>          JOIN (VALUES (1)) tt(x) ON tt.x = t.a;
...
>  Append  (cost=0.00..1.01 rows=1 width=4)
>    ->  Seq Scan on t1  (cost=0.00..1.01 rows=1 width=4)
>          Filter: (1 = a)
> (3 rows)

is the one which is affected. The parallel example of subselect
is,


> EXPLAIN SELECT a FROM (SELECT a FROM t1 UNION ALL SELECT 0) t
>          JOIN (SELECT int8in('0')) tt(x) ON tt.x = t.a;

I found nothing to be commented in the code.

I dont have a solid opinion on the balance between the usability
and the code size, but there's one concern about planning time.

By this patch, planner scanns the whole jointree once always, and
more once if deletable subqueries. But I don't understand it's
worth while for the gain or not..

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Clamping reulst row number of joins.