Re: PostgreSQL 9.6 behavior change with set returning (funct).*

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: PostgreSQL 9.6 behavior change with set returning (funct).*
Дата
Msg-id 25750.1458767514@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: PostgreSQL 9.6 behavior change with set returning (funct).*  (Stephen Frost <sfrost@snowman.net>)
Ответы Re: PostgreSQL 9.6 behavior change with set returning (funct).*  ("David G. Johnston" <david.g.johnston@gmail.com>)
Re: PostgreSQL 9.6 behavior change with set returning (funct).*  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Stephen Frost <sfrost@snowman.net> writes:
> * Merlin Moncure (mmoncure@gmail.com) wrote:
>> I'm something of a backwards compatibility zealot, but I've become one
>> for very good reasons.  Personally, I'd rather we'd define precisely
>> the usages that are deprecated (I guess SRF-tlist in the presence of
>> FROM) and force them to error out with an appropriate HINT rather than
>> give a different answer than they used to.  The problem here is that
>> LATERAL is still fairly new and there is a huge body of code out there
>> leveraging the 'bad' way, as it was for years and years the only way
>> to do a number of useful things.

> I have to side with what I believe is Tom's position on this one.  I do
> like the notion of throwing an error in cases where someone sent us
> something that we're pretty sure is wrong, but I don't agree that we
> should continue to carry on bug-compatibility with things that are
> already one foot in the grave and really just need to be shoved all the
> way in.

FWIW, I think the case that we're looking at only comes up if you have
more than one SRF expression in the tlist (Regina's example has that
after *-expansion, though it might not've looked so to start with),
and some of them are sort/group columns while others are not.

So the question at hand is whether that falls within the scope of "huge
body of code leveraging the old way" or whether it's too much of a corner
case to want to tie ourselves down to.

I wrote a patch that fixes this for the sort-column case, thus getting us
back to the historical behavior; see attached.  If we really wanted to be
consistent, we'd have to look at pushing all SRFs clear back to the
scanjoin_target list if any SRFs appear in grouping columns.  That would
be significantly more code and it would deviate from the historical
behavior, as I illustrated upthread, so I'm not really inclined to do it.
But the historical behavior in this area is pretty unprincipled.

It's also worth noting that we've had multiple complaints about the
ExecTargetList least-common-multiple behavior over the years; it seems
sane enough in examples like these where all the SRFs have the same
period, but it gets way less so as soon as they don't.  I'd love to
toss the entire SRF-in-tlist feature overboard one of these years,
both because of semantic issues like these and because a fair amount
of code and overhead could be ripped out if it were disallowed.
But I don't know how we get to that --- as Merlin says, there's a
pretty large amount of application code depending on the feature.
In the meantime I suppose there's a case to be made for preserving
bug compatibility as much as possible.

So anyway the question is whether to commit the attached or not.

            regards, tom lane

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 5229c84..db347b8 100644
*** a/src/backend/optimizer/plan/planner.c
--- b/src/backend/optimizer/plan/planner.c
*************** make_pathkeys_for_window(PlannerInfo *ro
*** 4615,4625 ****
   *
   * Our current policy is to postpone volatile expressions till after the sort
   * unconditionally (assuming that that's possible, ie they are in plain tlist
!  * columns and not ORDER BY/GROUP BY/DISTINCT columns).  We also postpone
!  * set-returning expressions unconditionally (if possible), because running
!  * them beforehand would bloat the sort dataset, and because it might cause
!  * unexpected output order if the sort isn't stable.  Expensive expressions
!  * are postponed if there is a LIMIT, or if root->tuple_fraction shows that
   * partial evaluation of the query is possible (if neither is true, we expect
   * to have to evaluate the expressions for every row anyway), or if there are
   * any volatile or set-returning expressions (since once we've put in a
--- 4615,4630 ----
   *
   * Our current policy is to postpone volatile expressions till after the sort
   * unconditionally (assuming that that's possible, ie they are in plain tlist
!  * columns and not ORDER BY/GROUP BY/DISTINCT columns).  We also prefer to
!  * postpone set-returning expressions, because running them beforehand would
!  * bloat the sort dataset, and because it might cause unexpected output order
!  * if the sort isn't stable.  However there's a constraint on that: all SRFs
!  * in the tlist should be evaluated at the same plan step, so that they can
!  * run in sync in ExecTargetList.  So if any SRFs are in sort columns, we
!  * mustn't postpone any SRFs.  (Note that in principle that policy should
!  * probably get applied to the group/window input targetlists too, but we
!  * have not done that historically.)  Lastly, expensive expressions are
!  * postponed if there is a LIMIT, or if root->tuple_fraction shows that
   * partial evaluation of the query is possible (if neither is true, we expect
   * to have to evaluate the expressions for every row anyway), or if there are
   * any volatile or set-returning expressions (since once we've put in a
*************** make_sort_input_target(PlannerInfo *root
*** 4664,4673 ****
--- 4669,4681 ----
      Query       *parse = root->parse;
      PathTarget *input_target;
      int            ncols;
+     bool       *col_is_srf;
      bool       *postpone_col;
      bool        have_srf;
      bool        have_volatile;
      bool        have_expensive;
+     bool        have_srf_sortcols;
+     bool        postpone_srfs;
      List       *postponable_cols;
      List       *postponable_vars;
      int            i;
*************** make_sort_input_target(PlannerInfo *root
*** 4680,4687 ****

      /* Inspect tlist and collect per-column information */
      ncols = list_length(final_target->exprs);
      postpone_col = (bool *) palloc0(ncols * sizeof(bool));
!     have_srf = have_volatile = have_expensive = false;

      i = 0;
      foreach(lc, final_target->exprs)
--- 4688,4696 ----

      /* Inspect tlist and collect per-column information */
      ncols = list_length(final_target->exprs);
+     col_is_srf = (bool *) palloc0(ncols * sizeof(bool));
      postpone_col = (bool *) palloc0(ncols * sizeof(bool));
!     have_srf = have_volatile = have_expensive = have_srf_sortcols = false;

      i = 0;
      foreach(lc, final_target->exprs)
*************** make_sort_input_target(PlannerInfo *root
*** 4699,4715 ****
          if (final_target->sortgrouprefs[i] == 0)
          {
              /*
!              * If it returns a set or is volatile, that's an unconditional
!              * reason to postpone.  Check the SRF case first because we must
!              * know whether we have any postponed SRFs.
               */
              if (expression_returns_set((Node *) expr))
              {
!                 postpone_col[i] = true;
                  have_srf = true;
              }
              else if (contain_volatile_functions((Node *) expr))
              {
                  postpone_col[i] = true;
                  have_volatile = true;
              }
--- 4708,4725 ----
          if (final_target->sortgrouprefs[i] == 0)
          {
              /*
!              * Check for SRF or volatile functions.  Check the SRF case first
!              * because we must know whether we have any postponed SRFs.
               */
              if (expression_returns_set((Node *) expr))
              {
!                 /* We'll decide below whether these are postponable */
!                 col_is_srf[i] = true;
                  have_srf = true;
              }
              else if (contain_volatile_functions((Node *) expr))
              {
+                 /* Unconditionally postpone */
                  postpone_col[i] = true;
                  have_volatile = true;
              }
*************** make_sort_input_target(PlannerInfo *root
*** 4736,4749 ****
                  }
              }
          }

          i++;
      }

      /*
       * If we don't need a post-sort projection, just return final_target.
       */
!     if (!(have_srf || have_volatile ||
            (have_expensive &&
             (parse->limitCount || root->tuple_fraction > 0))))
          return final_target;
--- 4746,4771 ----
                  }
              }
          }
+         else
+         {
+             /* For sortgroupref cols, just check if any contain SRFs */
+             if (!have_srf_sortcols &&
+                 expression_returns_set((Node *) expr))
+                 have_srf_sortcols = true;
+         }

          i++;
      }

      /*
+      * We can postpone SRFs if we have some but none are in sortgroupref cols.
+      */
+     postpone_srfs = (have_srf && !have_srf_sortcols);
+
+     /*
       * If we don't need a post-sort projection, just return final_target.
       */
!     if (!(postpone_srfs || have_volatile ||
            (have_expensive &&
             (parse->limitCount || root->tuple_fraction > 0))))
          return final_target;
*************** make_sort_input_target(PlannerInfo *root
*** 4754,4760 ****
       * rely on the query's LIMIT (if any) to bound the number of rows it needs
       * to return.
       */
!     *have_postponed_srfs = have_srf;

      /*
       * Construct the sort-input target, taking all non-postponable columns and
--- 4776,4782 ----
       * rely on the query's LIMIT (if any) to bound the number of rows it needs
       * to return.
       */
!     *have_postponed_srfs = postpone_srfs;

      /*
       * Construct the sort-input target, taking all non-postponable columns and
*************** make_sort_input_target(PlannerInfo *root
*** 4769,4775 ****
      {
          Expr       *expr = (Expr *) lfirst(lc);

!         if (postpone_col[i])
              postponable_cols = lappend(postponable_cols, expr);
          else
              add_column_to_pathtarget(input_target, expr,
--- 4791,4797 ----
      {
          Expr       *expr = (Expr *) lfirst(lc);

!         if (postpone_col[i] || (postpone_srfs && col_is_srf[i]))
              postponable_cols = lappend(postponable_cols, expr);
          else
              add_column_to_pathtarget(input_target, expr,
diff --git a/src/test/regress/expected/limit.out b/src/test/regress/expected/limit.out
index d079887..659a101 100644
*** a/src/test/regress/expected/limit.out
--- b/src/test/regress/expected/limit.out
*************** select unique1, unique2, generate_series
*** 258,260 ****
--- 258,298 ----
         0 |    9998 |               7
  (7 rows)

+ -- use of random() is to keep planner from folding the expressions together
+ explain (verbose, costs off)
+ select generate_series(0,2) as s1, generate_series((random()*.1)::int,2) as s2;
+                                               QUERY PLAN
+ ------------------------------------------------------------------------------------------------------
+  Result
+    Output: generate_series(0, 2), generate_series(((random() * '0.1'::double precision))::integer, 2)
+ (2 rows)
+
+ select generate_series(0,2) as s1, generate_series((random()*.1)::int,2) as s2;
+  s1 | s2
+ ----+----
+   0 |  0
+   1 |  1
+   2 |  2
+ (3 rows)
+
+ explain (verbose, costs off)
+ select generate_series(0,2) as s1, generate_series((random()*.1)::int,2) as s2
+ order by s2 desc;
+                                                  QUERY PLAN
+ ------------------------------------------------------------------------------------------------------------
+  Sort
+    Output: (generate_series(0, 2)), (generate_series(((random() * '0.1'::double precision))::integer, 2))
+    Sort Key: (generate_series(((random() * '0.1'::double precision))::integer, 2)) DESC
+    ->  Result
+          Output: generate_series(0, 2), generate_series(((random() * '0.1'::double precision))::integer, 2)
+ (5 rows)
+
+ select generate_series(0,2) as s1, generate_series((random()*.1)::int,2) as s2
+ order by s2 desc;
+  s1 | s2
+ ----+----
+   2 |  2
+   1 |  1
+   0 |  0
+ (3 rows)
+
diff --git a/src/test/regress/sql/limit.sql b/src/test/regress/sql/limit.sql
index 4b05221..ef5686c 100644
*** a/src/test/regress/sql/limit.sql
--- b/src/test/regress/sql/limit.sql
*************** select unique1, unique2, generate_series
*** 78,80 ****
--- 78,93 ----

  select unique1, unique2, generate_series(1,10)
    from tenk1 order by tenthous limit 7;
+
+ -- use of random() is to keep planner from folding the expressions together
+ explain (verbose, costs off)
+ select generate_series(0,2) as s1, generate_series((random()*.1)::int,2) as s2;
+
+ select generate_series(0,2) as s1, generate_series((random()*.1)::int,2) as s2;
+
+ explain (verbose, costs off)
+ select generate_series(0,2) as s1, generate_series((random()*.1)::int,2) as s2
+ order by s2 desc;
+
+ select generate_series(0,2) as s1, generate_series((random()*.1)::int,2) as s2
+ order by s2 desc;

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Rationalizing code-sharing among src/bin/ directories
Следующее
От: "David G. Johnston"
Дата:
Сообщение: Re: PostgreSQL 9.6 behavior change with set returning (funct).*