Обсуждение: Making CASE error handling less surprising

Поиск
Список
Период
Сортировка

Making CASE error handling less surprising

От
Tom Lane
Дата:
Every so often we get a complaint like [1] about how a CASE should have
prevented a run-time error and didn't, because constant-folding tried
to evaluate a subexpression that would not have been entered at run-time.

It struck me that it would not be hard to improve this situation a great
deal.  If, within a CASE subexpression that isn't certain to be executed
at runtime, we refuse to pre-evaluate *any* function (essentially, treat
them all as volatile), then we should largely get the semantics that
users expect.  There's some potential for query slowdown if a CASE
contains a constant subexpression that we formerly reduced at plan time
and now do not, but that doesn't seem to me to be a very big deal.

Attached is a draft patch that handles CASE and COALESCE this way.

This is not a complete fix, because if you write a sub-SELECT the
contents of the sub-SELECT are not processed by the outer query's
eval_const_expressions pass; instead, we look at it within the
sub-SELECT itself, and in that context there's no apparent reason
to avoid const-folding.  So
   CASE WHEN x < 0 THEN (SELECT 1/0) END
fails even if x is never less than zero.  I don't see any great way
to avoid that, and I'm not particularly concerned about it anyhow;
usually the point of a sub-SELECT like this is to be decoupled from
outer query evaluation, so that the behavior should not be that
surprising.

One interesting point is that the join regression test contains a
number of uses of "coalesce(int8-variable, int4-constant)" which is
treated a little differently than before: we no longer constant-fold
the int4 constant to int8.  That causes the run-time cost of the
expression to be estimated slightly higher, which changes plans in
a couple of these tests; and in any case the EXPLAIN output looks
different since it shows the runtime coercion explicitly.  To avoid
those changes I made all these examples quote the constants, so that
the parser resolves them as int8 out of the gate.  (Perhaps it'd be
okay to just accept the changes, but I didn't feel like trying to
analyze in detail what each test case had been meant to prove.)

Also, I didn't touch the docs yet.  Sections 4.2.14 and 9.18.1
contain some weasel wording that could be backed off, but in light
of the sub-SELECT exception we can't just remove the issue
altogether I think.  Not quite sure how to word it.

Thoughts?

            regards, tom lane

[1] https://www.postgresql.org/message-id/16549-4991fbf36fcec234%40postgresql.org

diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index e04b144072..8a41dce235 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -61,6 +61,15 @@ typedef struct
     AggClauseCosts *costs;
 } get_agg_clause_costs_context;

+typedef enum
+{
+    /* Ordering is important here! */
+    ece_eval_nothing,            /* be unconditionally safe */
+    ece_eval_immutable,            /* eval immutable functions */
+    ece_eval_stable,            /* eval stable functions too */
+    ece_eval_volatile            /* eval volatile functions too */
+} ece_level;
+
 typedef struct
 {
     ParamListInfo boundParams;
@@ -68,6 +77,7 @@ typedef struct
     List       *active_fns;
     Node       *case_val;
     bool        estimate;
+    ece_level    eval_level;
 } eval_const_expressions_context;

 typedef struct
@@ -119,6 +129,8 @@ static Node *eval_const_expressions_mutator(Node *node,
 static bool contain_non_const_walker(Node *node, void *context);
 static bool ece_function_is_safe(Oid funcid,
                                  eval_const_expressions_context *context);
+static bool ece_provolatile_is_safe(char provolatile,
+                                    eval_const_expressions_context *context);
 static Node *apply_const_relabel(Node *arg, Oid rtype,
                                  int32 rtypmod, Oid rcollid,
                                  CoercionForm rformat, int rlocation);
@@ -2264,6 +2276,7 @@ eval_const_expressions(PlannerInfo *root, Node *node)
     context.active_fns = NIL;    /* nothing being recursively simplified */
     context.case_val = NULL;    /* no CASE being examined */
     context.estimate = false;    /* safe transformations only */
+    context.eval_level = ece_eval_immutable;    /* eval immutable functions */
     return eval_const_expressions_mutator(node, &context);
 }

@@ -2280,8 +2293,11 @@ eval_const_expressions(PlannerInfo *root, Node *node)
  *      available by the caller of planner(), even if the Param isn't marked
  *      constant.  This effectively means that we plan using the first supplied
  *      value of the Param.
- * 2. Fold stable, as well as immutable, functions to constants.
+ * 2. Fold stable, as well as immutable, functions to constants.  The risk
+ *      that the result might change from planning time to execution time is
+ *      worth taking, as we otherwise couldn't get an estimate at all.
  * 3. Reduce PlaceHolderVar nodes to their contained expressions.
+ * 4. Ignore domain constraints, assuming that CoerceToDomain will succeed.
  *--------------------
  */
 Node *
@@ -2295,6 +2311,7 @@ estimate_expression_value(PlannerInfo *root, Node *node)
     context.active_fns = NIL;    /* nothing being recursively simplified */
     context.case_val = NULL;    /* no CASE being examined */
     context.estimate = true;    /* unsafe transformations OK */
+    context.eval_level = ece_eval_stable;    /* eval stable functions */
     return eval_const_expressions_mutator(node, &context);
 }

@@ -2960,8 +2977,9 @@ eval_const_expressions_mutator(Node *node,
                  * CaseTestExpr placeholder nodes, so that we have the
                  * opportunity to reduce constant test conditions.  For
                  * example this allows
-                 *        CASE 0 WHEN 0 THEN 1 ELSE 1/0 END
-                 * to reduce to 1 rather than drawing a divide-by-0 error.
+                 *        CASE 0 WHEN 0 THEN 1 ELSE 0 END
+                 * to reduce to just 1.
+                 *
                  * Note that when the test expression is constant, we don't
                  * have to include it in the resulting CASE; for example
                  *        CASE 0 WHEN x THEN y ELSE z END
@@ -2973,10 +2991,20 @@ eval_const_expressions_mutator(Node *node,
                  * expression when executing the CASE, since any contained
                  * CaseTestExprs that might have referred to it will have been
                  * replaced by the constant.
+                 *
+                 * An additional consideration is that the user might be
+                 * expecting the CASE to prevent run-time errors, such as
+                 *        CASE 0 WHEN 0 THEN 1 ELSE 1/0 END
+                 * Since division is immutable, we'd ordinarily simplify the
+                 * division and hence draw the divide-by-zero error at plan
+                 * time.  To avoid that, reduce eval_level to ece_eval_nothing
+                 * whenever we are considering a test condition or result
+                 * value that will not certainly be evaluated at run-time.
                  *----------
                  */
                 CaseExpr   *caseexpr = (CaseExpr *) node;
                 CaseExpr   *newcase;
+                ece_level    save_eval_level = context->eval_level;
                 Node       *save_case_val;
                 Node       *newarg;
                 List       *newargs;
@@ -3027,6 +3055,15 @@ eval_const_expressions_mutator(Node *node,
                         const_true_cond = true;
                     }

+                    /*
+                     * Unless the test condition is constant TRUE, we can't be
+                     * sure the result value will be evaluated, so back off
+                     * the evaluation safety level.  This change will also
+                     * apply to subsequent test conditions and result values.
+                     */
+                    if (!const_true_cond)
+                        context->eval_level = ece_eval_nothing;
+
                     /* Simplify this alternative's result value */
                     caseresult = eval_const_expressions_mutator((Node *) oldcasewhen->result,
                                                                 context);
@@ -3058,6 +3095,7 @@ eval_const_expressions_mutator(Node *node,
                                                                context);

                 context->case_val = save_case_val;
+                context->eval_level = save_eval_level;

                 /*
                  * If no non-FALSE alternatives, CASE reduces to the default
@@ -3113,6 +3151,7 @@ eval_const_expressions_mutator(Node *node,
             {
                 CoalesceExpr *coalesceexpr = (CoalesceExpr *) node;
                 CoalesceExpr *newcoalesce;
+                ece_level    save_eval_level = context->eval_level;
                 List       *newargs;
                 ListCell   *arg;

@@ -3137,13 +3176,25 @@ eval_const_expressions_mutator(Node *node,
                         if (((Const *) e)->constisnull)
                             continue;    /* drop null constant */
                         if (newargs == NIL)
+                        {
+                            context->eval_level = save_eval_level;
                             return e;    /* first expr */
+                        }
                         newargs = lappend(newargs, e);
                         break;
                     }
                     newargs = lappend(newargs, e);
+
+                    /*
+                     * Arguments following a non-constant argument may or may
+                     * not get evaluated at run-time, so don't risk doing any
+                     * not-100%-safe computations within them.
+                     */
+                    context->eval_level = ece_eval_nothing;
                 }

+                context->eval_level = save_eval_level;
+
                 /*
                  * If all the arguments were constant null, the result is just
                  * null
@@ -3163,13 +3214,12 @@ eval_const_expressions_mutator(Node *node,
         case T_SQLValueFunction:
             {
                 /*
-                 * All variants of SQLValueFunction are stable, so if we are
-                 * estimating the expression's value, we should evaluate the
-                 * current function value.  Otherwise just copy.
+                 * All variants of SQLValueFunction are stable, so evaluate if
+                 * we are evaluating stable functions.  Otherwise just copy.
                  */
                 SQLValueFunction *svf = (SQLValueFunction *) node;

-                if (context->estimate)
+                if (context->eval_level >= ece_eval_stable)
                     return (Node *) evaluate_expr((Expr *) svf,
                                                   svf->type,
                                                   svf->typmod,
@@ -3565,20 +3615,28 @@ contain_non_const_walker(Node *node, void *context)
 static bool
 ece_function_is_safe(Oid funcid, eval_const_expressions_context *context)
 {
-    char        provolatile = func_volatile(funcid);
+    return ece_provolatile_is_safe(func_volatile(funcid), context);
+}

-    /*
-     * Ordinarily we are only allowed to simplify immutable functions. But for
-     * purposes of estimation, we consider it okay to simplify functions that
-     * are merely stable; the risk that the result might change from planning
-     * time to execution time is worth taking in preference to not being able
-     * to estimate the value at all.
-     */
+/*
+ * Same, when we have the provolatile value directly at hand
+ */
+static bool
+ece_provolatile_is_safe(char provolatile,
+                        eval_const_expressions_context *context)
+{
+    ece_level    f_level;
+
+    /* Must map the provolatile letter codes to an ordered enum */
     if (provolatile == PROVOLATILE_IMMUTABLE)
-        return true;
-    if (context->estimate && provolatile == PROVOLATILE_STABLE)
-        return true;
-    return false;
+        f_level = ece_eval_immutable;
+    else if (provolatile == PROVOLATILE_STABLE)
+        f_level = ece_eval_stable;
+    else
+        f_level = ece_eval_volatile;
+
+    /* Now, does eval_level allow evaluation of this function? */
+    return (context->eval_level >= f_level);
 }

 /*
@@ -4238,9 +4296,8 @@ recheck_cast_function_args(List *args, Oid result_type, HeapTuple func_tuple)
  * evaluate_function: try to pre-evaluate a function call
  *
  * We can do this if the function is strict and has any constant-null inputs
- * (just return a null constant), or if the function is immutable and has all
- * constant inputs (call it and return the result as a Const node).  In
- * estimation mode we are willing to pre-evaluate stable functions too.
+ * (just return a null constant), or if the function is safe to evaluate and
+ * has all constant inputs (call it and return the result as a Const node).
  *
  * Returns a simplified expression if successful, or NULL if cannot
  * simplify the function.
@@ -4293,7 +4350,7 @@ evaluate_function(Oid funcid, Oid result_type, int32 result_typmod,
      * If the function is strict and has a constant-NULL input, it will never
      * be called at all, so we can replace the call by a NULL constant, even
      * if there are other inputs that aren't constant, and even if the
-     * function is not otherwise immutable.
+     * function is not otherwise safe to evaluate.
      */
     if (funcform->proisstrict && has_null_input)
         return (Expr *) makeNullConst(result_type, result_typmod,
@@ -4308,17 +4365,9 @@ evaluate_function(Oid funcid, Oid result_type, int32 result_typmod,
         return NULL;

     /*
-     * Ordinarily we are only allowed to simplify immutable functions. But for
-     * purposes of estimation, we consider it okay to simplify functions that
-     * are merely stable; the risk that the result might change from planning
-     * time to execution time is worth taking in preference to not being able
-     * to estimate the value at all.
+     * Are we permitted to evaluate functions of this volatility level?
      */
-    if (funcform->provolatile == PROVOLATILE_IMMUTABLE)
-         /* okay */ ;
-    else if (context->estimate && funcform->provolatile == PROVOLATILE_STABLE)
-         /* okay */ ;
-    else
+    if (!ece_provolatile_is_safe(funcform->provolatile, context))
         return NULL;

     /*
diff --git a/src/test/regress/expected/case.out b/src/test/regress/expected/case.out
index c0c8acf035..3326ebd5be 100644
--- a/src/test/regress/expected/case.out
+++ b/src/test/regress/expected/case.out
@@ -93,9 +93,17 @@ SELECT CASE 1 WHEN 0 THEN 1/0 WHEN 1 THEN 1 ELSE 2/0 END;
     1
 (1 row)

--- However we do not currently suppress folding of potentially
--- reachable subexpressions
 SELECT CASE WHEN i > 100 THEN 1/0 ELSE 0 END FROM case_tbl;
+ case
+------
+    0
+    0
+    0
+    0
+(4 rows)
+
+-- However, that guarantee doesn't extend into sub-selects
+SELECT CASE WHEN i > 100 THEN (select 1/0) ELSE 0 END FROM case_tbl;
 ERROR:  division by zero
 -- Test for cases involving untyped literals in test expression
 SELECT CASE 'a' WHEN 'a' THEN 1 ELSE 2 END;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index a46b1573bd..6a411008dd 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -2860,9 +2860,9 @@ ON sub1.key1 = sub2.key3;
 EXPLAIN (COSTS OFF)
 SELECT qq, unique1
   FROM
-  ( SELECT COALESCE(q1, 0) AS qq FROM int8_tbl a ) AS ss1
+  ( SELECT COALESCE(q1, '0') AS qq FROM int8_tbl a ) AS ss1
   FULL OUTER JOIN
-  ( SELECT COALESCE(q2, -1) AS qq FROM int8_tbl b ) AS ss2
+  ( SELECT COALESCE(q2, '-1') AS qq FROM int8_tbl b ) AS ss2
   USING (qq)
   INNER JOIN tenk1 c ON qq = unique2;
                                                QUERY PLAN
@@ -2879,9 +2879,9 @@ SELECT qq, unique1

 SELECT qq, unique1
   FROM
-  ( SELECT COALESCE(q1, 0) AS qq FROM int8_tbl a ) AS ss1
+  ( SELECT COALESCE(q1, '0') AS qq FROM int8_tbl a ) AS ss1
   FULL OUTER JOIN
-  ( SELECT COALESCE(q2, -1) AS qq FROM int8_tbl b ) AS ss2
+  ( SELECT COALESCE(q2, '-1') AS qq FROM int8_tbl b ) AS ss2
   USING (qq)
   INNER JOIN tenk1 c ON qq = unique2;
  qq  | unique1
@@ -4299,14 +4299,14 @@ explain (costs off)
 --
 -- test that quals attached to an outer join have correct semantics,
 -- specifically that they don't re-use expressions computed below the join;
--- we force a mergejoin so that coalesce(b.q1, 1) appears as a join input
+-- we force a mergejoin so that coalesce(b.q1, '1') appears as a join input
 --
 set enable_hashjoin to off;
 set enable_nestloop to off;
 explain (verbose, costs off)
   select a.q2, b.q1
-    from int8_tbl a left join int8_tbl b on a.q2 = coalesce(b.q1, 1)
-    where coalesce(b.q1, 1) > 0;
+    from int8_tbl a left join int8_tbl b on a.q2 = coalesce(b.q1, '1')
+    where coalesce(b.q1, '1') > 0;
                        QUERY PLAN
 ---------------------------------------------------------
  Merge Left Join
@@ -4326,8 +4326,8 @@ explain (verbose, costs off)
 (14 rows)

 select a.q2, b.q1
-  from int8_tbl a left join int8_tbl b on a.q2 = coalesce(b.q1, 1)
-  where coalesce(b.q1, 1) > 0;
+  from int8_tbl a left join int8_tbl b on a.q2 = coalesce(b.q1, '1')
+  where coalesce(b.q1, '1') > 0;
         q2         |        q1
 -------------------+------------------
  -4567890123456789 |
@@ -5077,7 +5077,7 @@ select * from (values(1)) x(lb),
 (5 rows)

 select * from
-  int8_tbl x left join (select q1,coalesce(q2,0) q2 from int8_tbl) y on x.q2 = y.q1,
+  int8_tbl x left join (select q1,coalesce(q2,'0') q2 from int8_tbl) y on x.q2 = y.q1,
   lateral (values(x.q1,y.q1,y.q2)) v(xq1,yq1,yq2);
         q1        |        q2         |        q1        |        q2         |       xq1        |       yq1        |
    yq2         

------------------+-------------------+------------------+-------------------+------------------+------------------+-------------------
@@ -5094,7 +5094,7 @@ select * from
 (10 rows)

 select * from
-  int8_tbl x left join (select q1,coalesce(q2,0) q2 from int8_tbl) y on x.q2 = y.q1,
+  int8_tbl x left join (select q1,coalesce(q2,'0') q2 from int8_tbl) y on x.q2 = y.q1,
   lateral (select x.q1,y.q1,y.q2) v(xq1,yq1,yq2);
         q1        |        q2         |        q1        |        q2         |       xq1        |       yq1        |
    yq2         

------------------+-------------------+------------------+-------------------+------------------+------------------+-------------------
@@ -5111,7 +5111,7 @@ select * from
 (10 rows)

 select x.* from
-  int8_tbl x left join (select q1,coalesce(q2,0) q2 from int8_tbl) y on x.q2 = y.q1,
+  int8_tbl x left join (select q1,coalesce(q2,'0') q2 from int8_tbl) y on x.q2 = y.q1,
   lateral (select x.q1,y.q1,y.q2) v(xq1,yq1,yq2);
         q1        |        q2
 ------------------+-------------------
@@ -5128,7 +5128,7 @@ select x.* from
 (10 rows)

 select v.* from
-  (int8_tbl x left join (select q1,coalesce(q2,0) q2 from int8_tbl) y on x.q2 = y.q1)
+  (int8_tbl x left join (select q1,coalesce(q2,'0') q2 from int8_tbl) y on x.q2 = y.q1)
   left join int4_tbl z on z.f1 = x.q2,
   lateral (select x.q1,y.q1 union all select x.q2,y.q2) v(vx,vy);
         vx         |        vy
@@ -5156,7 +5156,7 @@ select v.* from
 (20 rows)

 select v.* from
-  (int8_tbl x left join (select q1,(select coalesce(q2,0)) q2 from int8_tbl) y on x.q2 = y.q1)
+  (int8_tbl x left join (select q1,(select coalesce(q2,'0')) q2 from int8_tbl) y on x.q2 = y.q1)
   left join int4_tbl z on z.f1 = x.q2,
   lateral (select x.q1,y.q1 union all select x.q2,y.q2) v(vx,vy);
         vx         |        vy
@@ -5184,7 +5184,7 @@ select v.* from
 (20 rows)

 select v.* from
-  (int8_tbl x left join (select q1,(select coalesce(q2,0)) q2 from int8_tbl) y on x.q2 = y.q1)
+  (int8_tbl x left join (select q1,(select coalesce(q2,'0')) q2 from int8_tbl) y on x.q2 = y.q1)
   left join int4_tbl z on z.f1 = x.q2,
   lateral (select x.q1,y.q1 from onerow union all select x.q2,y.q2 from onerow) v(vx,vy);
         vx         |        vy
@@ -5246,7 +5246,7 @@ select * from
 explain (verbose, costs off)
 select * from
   int8_tbl a left join
-  lateral (select *, coalesce(a.q2, 42) as x from int8_tbl b) ss on a.q2 = ss.q1;
+  lateral (select *, coalesce(a.q2, '42') as x from int8_tbl b) ss on a.q2 = ss.q1;
                             QUERY PLAN
 ------------------------------------------------------------------
  Nested Loop Left Join
@@ -5260,7 +5260,7 @@ select * from

 select * from
   int8_tbl a left join
-  lateral (select *, coalesce(a.q2, 42) as x from int8_tbl b) ss on a.q2 = ss.q1;
+  lateral (select *, coalesce(a.q2, '42') as x from int8_tbl b) ss on a.q2 = ss.q1;
         q1        |        q2         |        q1        |        q2         |        x
 ------------------+-------------------+------------------+-------------------+------------------
               123 |               456 |                  |                   |
@@ -5462,7 +5462,7 @@ select * from
 explain (verbose, costs off)
 select * from
   int8_tbl c left join (
-    int8_tbl a left join (select q1, coalesce(q2,42) as x from int8_tbl b) ss1
+    int8_tbl a left join (select q1, coalesce(q2,'42') as x from int8_tbl b) ss1
       on a.q2 = ss1.q1
     cross join
     lateral (select q1, coalesce(ss1.x,q2) as y from int8_tbl d) ss2
diff --git a/src/test/regress/sql/case.sql b/src/test/regress/sql/case.sql
index 17436c524a..cfad0b815c 100644
--- a/src/test/regress/sql/case.sql
+++ b/src/test/regress/sql/case.sql
@@ -66,11 +66,11 @@ SELECT '7' AS "None",
 -- Constant-expression folding shouldn't evaluate unreachable subexpressions
 SELECT CASE WHEN 1=0 THEN 1/0 WHEN 1=1 THEN 1 ELSE 2/0 END;
 SELECT CASE 1 WHEN 0 THEN 1/0 WHEN 1 THEN 1 ELSE 2/0 END;
-
--- However we do not currently suppress folding of potentially
--- reachable subexpressions
 SELECT CASE WHEN i > 100 THEN 1/0 ELSE 0 END FROM case_tbl;

+-- However, that guarantee doesn't extend into sub-selects
+SELECT CASE WHEN i > 100 THEN (select 1/0) ELSE 0 END FROM case_tbl;
+
 -- Test for cases involving untyped literals in test expression
 SELECT CASE 'a' WHEN 'a' THEN 1 ELSE 2 END;

diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 1403e0ffe7..5c868a3dae 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -811,17 +811,17 @@ ON sub1.key1 = sub2.key3;
 EXPLAIN (COSTS OFF)
 SELECT qq, unique1
   FROM
-  ( SELECT COALESCE(q1, 0) AS qq FROM int8_tbl a ) AS ss1
+  ( SELECT COALESCE(q1, '0') AS qq FROM int8_tbl a ) AS ss1
   FULL OUTER JOIN
-  ( SELECT COALESCE(q2, -1) AS qq FROM int8_tbl b ) AS ss2
+  ( SELECT COALESCE(q2, '-1') AS qq FROM int8_tbl b ) AS ss2
   USING (qq)
   INNER JOIN tenk1 c ON qq = unique2;

 SELECT qq, unique1
   FROM
-  ( SELECT COALESCE(q1, 0) AS qq FROM int8_tbl a ) AS ss1
+  ( SELECT COALESCE(q1, '0') AS qq FROM int8_tbl a ) AS ss1
   FULL OUTER JOIN
-  ( SELECT COALESCE(q2, -1) AS qq FROM int8_tbl b ) AS ss2
+  ( SELECT COALESCE(q2, '-1') AS qq FROM int8_tbl b ) AS ss2
   USING (qq)
   INNER JOIN tenk1 c ON qq = unique2;

@@ -1455,7 +1455,7 @@ explain (costs off)
 --
 -- test that quals attached to an outer join have correct semantics,
 -- specifically that they don't re-use expressions computed below the join;
--- we force a mergejoin so that coalesce(b.q1, 1) appears as a join input
+-- we force a mergejoin so that coalesce(b.q1, '1') appears as a join input
 --

 set enable_hashjoin to off;
@@ -1463,11 +1463,11 @@ set enable_nestloop to off;

 explain (verbose, costs off)
   select a.q2, b.q1
-    from int8_tbl a left join int8_tbl b on a.q2 = coalesce(b.q1, 1)
-    where coalesce(b.q1, 1) > 0;
+    from int8_tbl a left join int8_tbl b on a.q2 = coalesce(b.q1, '1')
+    where coalesce(b.q1, '1') > 0;
 select a.q2, b.q1
-  from int8_tbl a left join int8_tbl b on a.q2 = coalesce(b.q1, 1)
-  where coalesce(b.q1, 1) > 0;
+  from int8_tbl a left join int8_tbl b on a.q2 = coalesce(b.q1, '1')
+  where coalesce(b.q1, '1') > 0;

 reset enable_hashjoin;
 reset enable_nestloop;
@@ -1766,24 +1766,24 @@ select * from (values(1)) x(lb),
 select * from (values(1)) x(lb),
   lateral (select lb from int4_tbl) y(lbcopy);
 select * from
-  int8_tbl x left join (select q1,coalesce(q2,0) q2 from int8_tbl) y on x.q2 = y.q1,
+  int8_tbl x left join (select q1,coalesce(q2,'0') q2 from int8_tbl) y on x.q2 = y.q1,
   lateral (values(x.q1,y.q1,y.q2)) v(xq1,yq1,yq2);
 select * from
-  int8_tbl x left join (select q1,coalesce(q2,0) q2 from int8_tbl) y on x.q2 = y.q1,
+  int8_tbl x left join (select q1,coalesce(q2,'0') q2 from int8_tbl) y on x.q2 = y.q1,
   lateral (select x.q1,y.q1,y.q2) v(xq1,yq1,yq2);
 select x.* from
-  int8_tbl x left join (select q1,coalesce(q2,0) q2 from int8_tbl) y on x.q2 = y.q1,
+  int8_tbl x left join (select q1,coalesce(q2,'0') q2 from int8_tbl) y on x.q2 = y.q1,
   lateral (select x.q1,y.q1,y.q2) v(xq1,yq1,yq2);
 select v.* from
-  (int8_tbl x left join (select q1,coalesce(q2,0) q2 from int8_tbl) y on x.q2 = y.q1)
+  (int8_tbl x left join (select q1,coalesce(q2,'0') q2 from int8_tbl) y on x.q2 = y.q1)
   left join int4_tbl z on z.f1 = x.q2,
   lateral (select x.q1,y.q1 union all select x.q2,y.q2) v(vx,vy);
 select v.* from
-  (int8_tbl x left join (select q1,(select coalesce(q2,0)) q2 from int8_tbl) y on x.q2 = y.q1)
+  (int8_tbl x left join (select q1,(select coalesce(q2,'0')) q2 from int8_tbl) y on x.q2 = y.q1)
   left join int4_tbl z on z.f1 = x.q2,
   lateral (select x.q1,y.q1 union all select x.q2,y.q2) v(vx,vy);
 select v.* from
-  (int8_tbl x left join (select q1,(select coalesce(q2,0)) q2 from int8_tbl) y on x.q2 = y.q1)
+  (int8_tbl x left join (select q1,(select coalesce(q2,'0')) q2 from int8_tbl) y on x.q2 = y.q1)
   left join int4_tbl z on z.f1 = x.q2,
   lateral (select x.q1,y.q1 from onerow union all select x.q2,y.q2 from onerow) v(vx,vy);

@@ -1797,10 +1797,10 @@ select * from
 explain (verbose, costs off)
 select * from
   int8_tbl a left join
-  lateral (select *, coalesce(a.q2, 42) as x from int8_tbl b) ss on a.q2 = ss.q1;
+  lateral (select *, coalesce(a.q2, '42') as x from int8_tbl b) ss on a.q2 = ss.q1;
 select * from
   int8_tbl a left join
-  lateral (select *, coalesce(a.q2, 42) as x from int8_tbl b) ss on a.q2 = ss.q1;
+  lateral (select *, coalesce(a.q2, '42') as x from int8_tbl b) ss on a.q2 = ss.q1;

 -- lateral can result in join conditions appearing below their
 -- real semantic level
@@ -1841,7 +1841,7 @@ select * from
 explain (verbose, costs off)
 select * from
   int8_tbl c left join (
-    int8_tbl a left join (select q1, coalesce(q2,42) as x from int8_tbl b) ss1
+    int8_tbl a left join (select q1, coalesce(q2,'42') as x from int8_tbl b) ss1
       on a.q2 = ss1.q1
     cross join
     lateral (select q1, coalesce(ss1.x,q2) as y from int8_tbl d) ss2

Re: Making CASE error handling less surprising

От
ilmari@ilmari.org (Dagfinn Ilmari Mannsåker)
Дата:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Every so often we get a complaint like [1] about how a CASE should have
> prevented a run-time error and didn't, because constant-folding tried
> to evaluate a subexpression that would not have been entered at run-time.
>
> It struck me that it would not be hard to improve this situation a great
> deal.  If, within a CASE subexpression that isn't certain to be executed
> at runtime, we refuse to pre-evaluate *any* function (essentially, treat
> them all as volatile), then we should largely get the semantics that
> users expect.  There's some potential for query slowdown if a CASE
> contains a constant subexpression that we formerly reduced at plan time
> and now do not, but that doesn't seem to me to be a very big deal.
[…]
> Thoughts?

Would it be feasible to set up an exception handler when constant-
folding cases that might not be reached, and leave the expression
unfolded only if an error was thrown, or does that have too much
overhead to be worthwhile?

- ilmari
-- 
"A disappointingly low fraction of the human race is,
 at any given time, on fire." - Stig Sandbeck Mathisen



Re: Making CASE error handling less surprising

От
Andres Freund
Дата:
Hi,

On 2020-07-23 18:50:32 +0100, Dagfinn Ilmari Mannsåker wrote:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
> 
> > Every so often we get a complaint like [1] about how a CASE should have
> > prevented a run-time error and didn't, because constant-folding tried
> > to evaluate a subexpression that would not have been entered at run-time.
> >
> > It struck me that it would not be hard to improve this situation a great
> > deal.  If, within a CASE subexpression that isn't certain to be executed
> > at runtime, we refuse to pre-evaluate *any* function (essentially, treat
> > them all as volatile), then we should largely get the semantics that
> > users expect.  There's some potential for query slowdown if a CASE
> > contains a constant subexpression that we formerly reduced at plan time
> > and now do not, but that doesn't seem to me to be a very big deal.
> […]
> > Thoughts?
> 
> Would it be feasible to set up an exception handler when constant-
> folding cases that might not be reached, and leave the expression
> unfolded only if an error was thrown, or does that have too much
> overhead to be worthwhile?

That'd require using a subtransaction for expression
simplification. That'd be way too high overhead.

Given how often we've had a need to call functions while handling
errors, I do wonder if it'd be worthwhile and feasible to mark functions
as being safe to call without subtransactions, or mark them as not
erroring out (e.g. comparators would usually be safe).

Greetings,

Andres Freund



Re: Making CASE error handling less surprising

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2020-07-23 18:50:32 +0100, Dagfinn Ilmari Mannsåker wrote:
>> Would it be feasible to set up an exception handler when constant-
>> folding cases that might not be reached, and leave the expression
>> unfolded only if an error was thrown, or does that have too much
>> overhead to be worthwhile?

> That'd require using a subtransaction for expression
> simplification. That'd be way too high overhead.

That's my opinion as well.  It'd be a subtransaction for *each*
operator/function call we need to simplify, which seems completely
disastrous.

> Given how often we've had a need to call functions while handling
> errors, I do wonder if it'd be worthwhile and feasible to mark functions
> as being safe to call without subtransactions, or mark them as not
> erroring out (e.g. comparators would usually be safe).

Yeah.  I was wondering whether the existing "leakproof" marking would
be adequate for this purpose.  It's a little stronger than what we
need, but the pain-in-the-rear factor for adding YA function property
is high enough that I'm inclined to just use it anyway.

We do have to assume that "leakproof" includes "cannot throw any
input-dependent error", but it seems to me that that's true.

            regards, tom lane



Re: Making CASE error handling less surprising

От
Pavel Stehule
Дата:


čt 23. 7. 2020 v 21:43 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
Andres Freund <andres@anarazel.de> writes:
> On 2020-07-23 18:50:32 +0100, Dagfinn Ilmari Mannsåker wrote:
>> Would it be feasible to set up an exception handler when constant-
>> folding cases that might not be reached, and leave the expression
>> unfolded only if an error was thrown, or does that have too much
>> overhead to be worthwhile?

> That'd require using a subtransaction for expression
> simplification. That'd be way too high overhead.

That's my opinion as well.  It'd be a subtransaction for *each*
operator/function call we need to simplify, which seems completely
disastrous.

> Given how often we've had a need to call functions while handling
> errors, I do wonder if it'd be worthwhile and feasible to mark functions
> as being safe to call without subtransactions, or mark them as not
> erroring out (e.g. comparators would usually be safe).

Yeah.  I was wondering whether the existing "leakproof" marking would
be adequate for this purpose.  It's a little stronger than what we
need, but the pain-in-the-rear factor for adding YA function property
is high enough that I'm inclined to just use it anyway.

We do have to assume that "leakproof" includes "cannot throw any
input-dependent error", but it seems to me that that's true.

I am afraid of a performance impact. 

lot of people expects constant folding everywhere now and I can imagine query like

SELECT CASE col1 WHEN 1 THEN upper('hello') ELSE upper('bye')  END FROM ...

Now, it is optimized well, but with the proposed patch, this query can be slow.

We should introduce planner safe functions for some usual functions, or maybe better explain the behaviour, the costs, and benefits.  I don't think this issue is too common.

Regards

Pavel



                        regards, tom lane


Re: Making CASE error handling less surprising

От
Pavel Stehule
Дата:


čt 23. 7. 2020 v 21:56 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:


čt 23. 7. 2020 v 21:43 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
Andres Freund <andres@anarazel.de> writes:
> On 2020-07-23 18:50:32 +0100, Dagfinn Ilmari Mannsåker wrote:
>> Would it be feasible to set up an exception handler when constant-
>> folding cases that might not be reached, and leave the expression
>> unfolded only if an error was thrown, or does that have too much
>> overhead to be worthwhile?

> That'd require using a subtransaction for expression
> simplification. That'd be way too high overhead.

That's my opinion as well.  It'd be a subtransaction for *each*
operator/function call we need to simplify, which seems completely
disastrous.

> Given how often we've had a need to call functions while handling
> errors, I do wonder if it'd be worthwhile and feasible to mark functions
> as being safe to call without subtransactions, or mark them as not
> erroring out (e.g. comparators would usually be safe).

Yeah.  I was wondering whether the existing "leakproof" marking would
be adequate for this purpose.  It's a little stronger than what we
need, but the pain-in-the-rear factor for adding YA function property
is high enough that I'm inclined to just use it anyway.

We do have to assume that "leakproof" includes "cannot throw any
input-dependent error", but it seems to me that that's true.

I am afraid of a performance impact. 

lot of people expects constant folding everywhere now and I can imagine query like

SELECT CASE col1 WHEN 1 THEN upper('hello') ELSE upper('bye')  END FROM ...

Now, it is optimized well, but with the proposed patch, this query can be slow.

We should introduce planner safe functions for some usual functions, or maybe better explain the behaviour, the costs, and benefits.  I don't think this issue is too common.

what about different access. We can introduce function

create or replace function volatile_expr(anyelement) returns anyelement as $$ begin return $1; end $$ language plpgsql;

and this can be used as a constant folding optimization fence.

select case col when 1 then volatile_expr(1/$1) else $1 end;

I don't think so people have a problem with this behaviour - the problem is unexpected behaviour change between major releases without really illustrative explanation in documentation.

Regards

Pavel


Regards

Pavel



                        regards, tom lane


Re: Making CASE error handling less surprising

От
Andres Freund
Дата:
Hi,

On 2020-07-23 21:56:26 +0200, Pavel Stehule wrote:
> I am afraid of a performance impact.
> 
> lot of people expects constant folding everywhere now and I can imagine
> query like
> 
> SELECT CASE col1 WHEN 1 THEN upper('hello') ELSE upper('bye')  END FROM ...
> 
> Now, it is optimized well, but with the proposed patch, this query can be
> slow.

I'd be more concerned about thinks like conditional expressions that
involve both columns and non-comparison operations on constants. Where
right now we'd simplify the constant part of the expression, but
wouldn't at all anymore after this.

Is there an argument to continue simplifying expressions within case
when only involving "true" constants even with not leakproof functions,
but only simplify "pseudo" constants like parameters with leakproof
functions?  I.e CASE WHEN ... THEN 1 / 0 would still raise an error
during simplification but CASE WHEN ... THEN 1 / $1 wouldn't, because $1
is not a real constant (even if PARAM_FLAG_CONST).

It doesn't seem like it'd be too hard to implement that, but that it'd
probably be fairly bulky because we'd need to track more state across
recursive expression_tree_mutator() calls.

Greetings,

Andres Freund



Re: Making CASE error handling less surprising

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> Is there any concern about having to do additional lookups for
> leakproofness? It doesn't seem likely to me since we already need to do
> lookups for the FmgrInfo?

No, we could easily fix it so that one syscache lookup gets both
the provolatile and proleakproof markings.

            regards, tom lane



Re: Making CASE error handling less surprising

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> Is there an argument to continue simplifying expressions within case
> when only involving "true" constants even with not leakproof functions,
> but only simplify "pseudo" constants like parameters with leakproof
> functions?  I.e CASE WHEN ... THEN 1 / 0 would still raise an error
> during simplification but CASE WHEN ... THEN 1 / $1 wouldn't, because $1
> is not a real constant (even if PARAM_FLAG_CONST).

Hmm, interesting idea.  That might fix all the practical cases in plpgsql,
but it wouldn't do anything to make the behavior more explainable.  Not
sure if we care about that though.

If we go this way I'd be inclined to do this instead of, not in addition
to, what I originally proposed.  Not sure if that was how you envisioned
it, but I think this is probably sufficient for its purpose and we would
not need any additional lobotomization of const-simplification.

> It doesn't seem like it'd be too hard to implement that, but that it'd
> probably be fairly bulky because we'd need to track more state across
> recursive expression_tree_mutator() calls.

It wouldn't be any harder than what I posted upthread; it would
just be a different flag getting passed down in the context struct
and getting tested in a different place.

            regards, tom lane



Re: Making CASE error handling less surprising

От
Andres Freund
Дата:
Hi,

On 2020-07-23 16:34:25 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > Is there an argument to continue simplifying expressions within case
> > when only involving "true" constants even with not leakproof functions,
> > but only simplify "pseudo" constants like parameters with leakproof
> > functions?  I.e CASE WHEN ... THEN 1 / 0 would still raise an error
> > during simplification but CASE WHEN ... THEN 1 / $1 wouldn't, because $1
> > is not a real constant (even if PARAM_FLAG_CONST).
> 
> Hmm, interesting idea.  That might fix all the practical cases in plpgsql,
> but it wouldn't do anything to make the behavior more explainable.  Not
> sure if we care about that though.

I've probably done too much compiler stuff, but to me it doesn't seem
too hard to understand that purely constant expressions may get
evaluated unconditionally even when inside a CASE, but everything else
won't. The fact that we sometimes optimize params to be essentially
constants isn't really exposed to users, so shouldn't be confusing.


> If we go this way I'd be inclined to do this instead of, not in addition
> to, what I originally proposed.  Not sure if that was how you envisioned
> it, but I think this is probably sufficient for its purpose and we would
> not need any additional lobotomization of const-simplification.

Yea, I would assume that we'd not need anything else. I've not thought
about the subquery case yet, so perhaps it'd be desirable to do
something additional there.


> > It doesn't seem like it'd be too hard to implement that, but that it'd
> > probably be fairly bulky because we'd need to track more state across
> > recursive expression_tree_mutator() calls.
> 
> It wouldn't be any harder than what I posted upthread; it would
> just be a different flag getting passed down in the context struct
> and getting tested in a different place.

Cool.

Greetings,

Andres Freund



Re: Making CASE error handling less surprising

От
Andres Freund
Дата:
Hi,

On 2020-07-23 13:42:08 -0700, Andres Freund wrote:
> On 2020-07-23 16:34:25 -0400, Tom Lane wrote:
> > Andres Freund <andres@anarazel.de> writes:
> > > It doesn't seem like it'd be too hard to implement that, but that it'd
> > > probably be fairly bulky because we'd need to track more state across
> > > recursive expression_tree_mutator() calls.
> > 
> > It wouldn't be any harder than what I posted upthread; it would
> > just be a different flag getting passed down in the context struct
> > and getting tested in a different place.
> 
> Cool.

Hm. Would SQL function inlining be a problem? It looks like that just
substitutes parameters. Before calling
eval_const_expressions_mutator(). So we'd not know not to evaluate such
"pseudo constants".  And that'd probably be confusing, especially
because it's not exactly obvious when inlining happens.

Greetings,

Andres Freund



Re: Making CASE error handling less surprising

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> Hm. Would SQL function inlining be a problem? It looks like that just
> substitutes parameters. Before calling
> eval_const_expressions_mutator(). So we'd not know not to evaluate such
> "pseudo constants".  And that'd probably be confusing, especially
> because it's not exactly obvious when inlining happens.

Hm, interesting question.  I think it might be all right without any
further hacking, because the parameters we care about substituting
would have been handled (or not) before inlining.  But the interactions
would be ticklish, and surely worthy of a test case or three.

            regards, tom lane



Re: Making CASE error handling less surprising

От
Andres Freund
Дата:
Hi,

On 2020-07-23 16:56:44 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > Hm. Would SQL function inlining be a problem? It looks like that just
> > substitutes parameters. Before calling
> > eval_const_expressions_mutator(). So we'd not know not to evaluate such
> > "pseudo constants".  And that'd probably be confusing, especially
> > because it's not exactly obvious when inlining happens.
> 
> Hm, interesting question.  I think it might be all right without any
> further hacking, because the parameters we care about substituting
> would have been handled (or not) before inlining.  But the interactions
> would be ticklish, and surely worthy of a test case or three.

I'm a bit worried about a case like:

SELECT foo(17);
CREATE FUNCTION yell(int, int)
RETURNS int
IMMUTABLE
LANGUAGE SQL AS $$
   SELECT CASE WHEN $1 != 0 THEN 17 / $2 ELSE NULL END
$$;

EXPLAIN SELECT yell(g.i, 0) FROM generate_series(1, 10) g(i);

I don't think the parameters here would have been handled before
inlining, right?

Greetings,

Andres Freund



Re: Making CASE error handling less surprising

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> I'm a bit worried about a case like:

> CREATE FUNCTION yell(int, int)
> RETURNS int
> IMMUTABLE
> LANGUAGE SQL AS $$
>    SELECT CASE WHEN $1 != 0 THEN 17 / $2 ELSE NULL END
> $$;

> EXPLAIN SELECT yell(g.i, 0) FROM generate_series(1, 10) g(i);

> I don't think the parameters here would have been handled before
> inlining, right?

Ah, I see what you mean.  Yeah, that throws an error today, and it
still would with the patch I was envisioning (attached), because
inlining does Param substitution in a different way.  I'm not
sure that we could realistically fix the inlining case with this
sort of approach.

I think this bears out the comment I made before that this approach
still leaves us with a very complicated behavior.  Maybe we should
stick with the previous approach, possibly supplemented with a
leakproofness exception.

            regards, tom lane

diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index e04b144072..4ff7a69908 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -61,6 +61,13 @@ typedef struct
     AggClauseCosts *costs;
 } get_agg_clause_costs_context;

+typedef enum
+{
+    ece_param_never,            /* never inject values for Params */
+    ece_param_const,            /* inject values for Params if marked CONST */
+    ece_param_always            /* always inject values for Params */
+} ece_p_mode;
+
 typedef struct
 {
     ParamListInfo boundParams;
@@ -68,6 +75,7 @@ typedef struct
     List       *active_fns;
     Node       *case_val;
     bool        estimate;
+    ece_p_mode    param_mode;
 } eval_const_expressions_context;

 typedef struct
@@ -2264,6 +2272,7 @@ eval_const_expressions(PlannerInfo *root, Node *node)
     context.active_fns = NIL;    /* nothing being recursively simplified */
     context.case_val = NULL;    /* no CASE being examined */
     context.estimate = false;    /* safe transformations only */
+    context.param_mode = ece_param_const;    /* inject only CONST Params */
     return eval_const_expressions_mutator(node, &context);
 }

@@ -2280,8 +2289,11 @@ eval_const_expressions(PlannerInfo *root, Node *node)
  *      available by the caller of planner(), even if the Param isn't marked
  *      constant.  This effectively means that we plan using the first supplied
  *      value of the Param.
- * 2. Fold stable, as well as immutable, functions to constants.
+ * 2. Fold stable, as well as immutable, functions to constants.  The risk
+ *      that the result might change from planning time to execution time is
+ *      worth taking, as we otherwise couldn't get an estimate at all.
  * 3. Reduce PlaceHolderVar nodes to their contained expressions.
+ * 4. Ignore domain constraints, assuming that CoerceToDomain will succeed.
  *--------------------
  */
 Node *
@@ -2295,6 +2307,7 @@ estimate_expression_value(PlannerInfo *root, Node *node)
     context.active_fns = NIL;    /* nothing being recursively simplified */
     context.case_val = NULL;    /* no CASE being examined */
     context.estimate = true;    /* unsafe transformations OK */
+    context.param_mode = ece_param_always;    /* inject all Param values */
     return eval_const_expressions_mutator(node, &context);
 }

@@ -2372,8 +2385,22 @@ eval_const_expressions_mutator(Node *node,
                         prm->ptype == param->paramtype)
                     {
                         /* OK to substitute parameter value? */
-                        if (context->estimate ||
-                            (prm->pflags & PARAM_FLAG_CONST))
+                        bool        subst = false;
+
+                        switch (context->param_mode)
+                        {
+                            case ece_param_never:
+                                /* subst is already false */
+                                break;
+                            case ece_param_const:
+                                subst = (prm->pflags & PARAM_FLAG_CONST) != 0;
+                                break;
+                            case ece_param_always:
+                                subst = true;
+                                break;
+                        }
+
+                        if (subst)
                         {
                             /*
                              * Return a Const representing the param value.
@@ -2973,10 +3000,26 @@ eval_const_expressions_mutator(Node *node,
                  * expression when executing the CASE, since any contained
                  * CaseTestExprs that might have referred to it will have been
                  * replaced by the constant.
+                 *
+                 * An additional consideration is that the user might be
+                 * expecting the CASE to prevent run-time errors, as in
+                 *        CASE ... THEN 1/$1 ELSE ...
+                 * If a constant value of zero is available for $1, we would
+                 * end up trying to simplify the division, thus drawing a
+                 * divide-by-zero error even if this CASE result would not be
+                 * reached at runtime.  This is problematic because it can
+                 * occur even if the user has not written anything as silly as
+                 * a constant "1/0" expression: substitution of values for
+                 * Params is standard in plpgsql, for instance.  To ameliorate
+                 * this problem without giving up const-simplification of CASE
+                 * subexpressions entirely, we prevent substitution of Param
+                 * values within test condition or result expressions that
+                 * will not certainly be evaluated at run-time.
                  *----------
                  */
                 CaseExpr   *caseexpr = (CaseExpr *) node;
                 CaseExpr   *newcase;
+                ece_p_mode    save_param_mode = context->param_mode;
                 Node       *save_case_val;
                 Node       *newarg;
                 List       *newargs;
@@ -3027,6 +3070,15 @@ eval_const_expressions_mutator(Node *node,
                         const_true_cond = true;
                     }

+                    /*
+                     * Unless the test condition is constant TRUE, we can't be
+                     * sure the result value will be evaluated, so back off
+                     * the Param safety level.  This change will also apply to
+                     * subsequent test conditions and result values.
+                     */
+                    if (!const_true_cond)
+                        context->param_mode = ece_param_never;
+
                     /* Simplify this alternative's result value */
                     caseresult = eval_const_expressions_mutator((Node *) oldcasewhen->result,
                                                                 context);
@@ -3058,6 +3110,7 @@ eval_const_expressions_mutator(Node *node,
                                                                context);

                 context->case_val = save_case_val;
+                context->param_mode = save_param_mode;

                 /*
                  * If no non-FALSE alternatives, CASE reduces to the default
@@ -3113,6 +3166,7 @@ eval_const_expressions_mutator(Node *node,
             {
                 CoalesceExpr *coalesceexpr = (CoalesceExpr *) node;
                 CoalesceExpr *newcoalesce;
+                ece_p_mode    save_param_mode = context->param_mode;
                 List       *newargs;
                 ListCell   *arg;

@@ -3137,13 +3191,25 @@ eval_const_expressions_mutator(Node *node,
                         if (((Const *) e)->constisnull)
                             continue;    /* drop null constant */
                         if (newargs == NIL)
+                        {
+                            context->param_mode = save_param_mode;
                             return e;    /* first expr */
+                        }
                         newargs = lappend(newargs, e);
                         break;
                     }
                     newargs = lappend(newargs, e);
+
+                    /*
+                     * Arguments following a non-constant argument may or may
+                     * not get evaluated at run-time.  As in CASE expressions,
+                     * avoid substituting Param values within such arguments.
+                     */
+                    context->param_mode = ece_param_never;
                 }

+                context->param_mode = save_param_mode;
+
                 /*
                  * If all the arguments were constant null, the result is just
                  * null

Re: Making CASE error handling less surprising

От
Chris Travers
Дата:


On Fri, Jul 24, 2020 at 4:35 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andres Freund <andres@anarazel.de> writes:
> I'm a bit worried about a case like:

> CREATE FUNCTION yell(int, int)
> RETURNS int
> IMMUTABLE
> LANGUAGE SQL AS $$
>    SELECT CASE WHEN $1 != 0 THEN 17 / $2 ELSE NULL END
> $$;

> EXPLAIN SELECT yell(g.i, 0) FROM generate_series(1, 10) g(i);

> I don't think the parameters here would have been handled before
> inlining, right?

Ah, I see what you mean.  Yeah, that throws an error today, and it
still would with the patch I was envisioning (attached), because
inlining does Param substitution in a different way.  I'm not
sure that we could realistically fix the inlining case with this
sort of approach.

I think this bears out the comment I made before that this approach
still leaves us with a very complicated behavior.  Maybe we should
stick with the previous approach, possibly supplemented with a
leakproofness exception.


I am actually not so sure this is a good idea. Here are two doubts I have.

1.  The problem of when a given SQL expression is evaluated crops up in a wide variety of different contexts and, worst case, causes far more damage than queries which always error.  Removing the lower hanging fruit while leaving cases like:

select lock_foo(id), * from foo where somefield > 100; -- which rows does lock_foo(id) run on?  Does it matter?

is going to legitimize these complaints in a way which will be very hard to do unless we also want to eventually be able to specify when volatile functions may be run. The two cases don't look the same but they are manifestations of the same problem which is that when you execute a SQL query you have no control over when expressions are actually run.

2.  The refusal to fold immutables within case statements here mean either we do more tricks to get around the planner if we hit a pathological cases in performance.  I am not convinced this is a net win.

If we go this route, would it be too much to ask to allow a GUC variable to preserve the old behavior?


                        regards, tom lane



--
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com 
Saarbrücker Straße 37a, 10405 Berlin

Re: Making CASE error handling less surprising

От
Tom Lane
Дата:
I wrote:
> Ah, I see what you mean.  Yeah, that throws an error today, and it
> still would with the patch I was envisioning (attached), because
> inlining does Param substitution in a different way.  I'm not
> sure that we could realistically fix the inlining case with this
> sort of approach.

Here's another example that we can't possibly fix with Param substitution
hacking, because there are no Params involved in the first place:

select f1, case when f1 = 42 then 1/i else null end
from (select f1, 0 as i from int4_tbl) ss;

Pulling up the subquery results in "1/0", so this fails today,
even though "f1 = 42" is never true.

Attached is a v3 patch that incorporates the leakproofness idea.
As shown in the new case.sql tests, this does fix both the SQL
function and subquery-pullup cases.

To keep the join regression test results the same, I marked int48()
as leakproof, which is surely safe enough.  Probably we should make
a push to mark all unconditionally-safe implicit coercions as
leakproof, but that's a separate matter.

            regards, tom lane

diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index e04b144072..48e31b16d6 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -61,6 +61,16 @@ typedef struct
     AggClauseCosts *costs;
 } get_agg_clause_costs_context;

+typedef enum
+{
+    /* Ordering is important here! */
+    ece_eval_nothing,            /* be unconditionally safe */
+    ece_eval_leakproof,            /* eval leakproof immutable functions */
+    ece_eval_immutable,            /* eval immutable functions too */
+    ece_eval_stable,            /* eval stable functions too */
+    ece_eval_volatile            /* eval volatile functions too */
+} ece_eval_mode;
+
 typedef struct
 {
     ParamListInfo boundParams;
@@ -68,6 +78,7 @@ typedef struct
     List       *active_fns;
     Node       *case_val;
     bool        estimate;
+    ece_eval_mode eval_mode;
 } eval_const_expressions_context;

 typedef struct
@@ -119,6 +130,8 @@ static Node *eval_const_expressions_mutator(Node *node,
 static bool contain_non_const_walker(Node *node, void *context);
 static bool ece_function_is_safe(Oid funcid,
                                  eval_const_expressions_context *context);
+static bool ece_function_form_is_safe(Form_pg_proc func_form,
+                                      eval_const_expressions_context *context);
 static Node *apply_const_relabel(Node *arg, Oid rtype,
                                  int32 rtypmod, Oid rcollid,
                                  CoercionForm rformat, int rlocation);
@@ -2264,6 +2277,7 @@ eval_const_expressions(PlannerInfo *root, Node *node)
     context.active_fns = NIL;    /* nothing being recursively simplified */
     context.case_val = NULL;    /* no CASE being examined */
     context.estimate = false;    /* safe transformations only */
+    context.eval_mode = ece_eval_immutable; /* eval immutable functions */
     return eval_const_expressions_mutator(node, &context);
 }

@@ -2280,8 +2294,11 @@ eval_const_expressions(PlannerInfo *root, Node *node)
  *      available by the caller of planner(), even if the Param isn't marked
  *      constant.  This effectively means that we plan using the first supplied
  *      value of the Param.
- * 2. Fold stable, as well as immutable, functions to constants.
+ * 2. Fold stable, as well as immutable, functions to constants.  The risk
+ *      that the result might change from planning time to execution time is
+ *      worth taking, as we otherwise couldn't get an estimate at all.
  * 3. Reduce PlaceHolderVar nodes to their contained expressions.
+ * 4. Ignore domain constraints, assuming that CoerceToDomain will succeed.
  *--------------------
  */
 Node *
@@ -2295,6 +2312,7 @@ estimate_expression_value(PlannerInfo *root, Node *node)
     context.active_fns = NIL;    /* nothing being recursively simplified */
     context.case_val = NULL;    /* no CASE being examined */
     context.estimate = true;    /* unsafe transformations OK */
+    context.eval_mode = ece_eval_stable;    /* eval stable functions */
     return eval_const_expressions_mutator(node, &context);
 }

@@ -2961,7 +2979,22 @@ eval_const_expressions_mutator(Node *node,
                  * opportunity to reduce constant test conditions.  For
                  * example this allows
                  *        CASE 0 WHEN 0 THEN 1 ELSE 1/0 END
-                 * to reduce to 1 rather than drawing a divide-by-0 error.
+                 * to reduce to 1 rather than drawing a divide-by-0 error,
+                 * since we'll throw away the ELSE without processing it.
+                 *
+                 * Even in not-all-constant cases, the user might be expecting
+                 * the CASE to prevent run-time errors, for example in
+                 *        CASE WHEN j > 0 THEN 1 ELSE 1/0 END
+                 * Since division is immutable, we'd ordinarily simplify the
+                 * division and hence draw the divide-by-zero error at plan
+                 * time, even if j is never zero at run time.  To avoid that,
+                 * reduce eval_mode to ece_eval_leakproof while processing any
+                 * test condition or result value that will not certainly be
+                 * evaluated at run-time.  We expect that leakproof immutable
+                 * functions will not throw any errors (except perhaps in
+                 * corner cases such as OOM, which we need not guard against
+                 * for this purpose).
+                 *
                  * Note that when the test expression is constant, we don't
                  * have to include it in the resulting CASE; for example
                  *        CASE 0 WHEN x THEN y ELSE z END
@@ -2977,6 +3010,7 @@ eval_const_expressions_mutator(Node *node,
                  */
                 CaseExpr   *caseexpr = (CaseExpr *) node;
                 CaseExpr   *newcase;
+                ece_eval_mode save_eval_mode = context->eval_mode;
                 Node       *save_case_val;
                 Node       *newarg;
                 List       *newargs;
@@ -3027,6 +3061,16 @@ eval_const_expressions_mutator(Node *node,
                         const_true_cond = true;
                     }

+                    /*
+                     * Unless the test condition is constant TRUE, we can't be
+                     * sure the result value will be evaluated, so become more
+                     * conservative about which functions to execute.  This
+                     * change will also apply to subsequent test conditions
+                     * and result values.
+                     */
+                    if (!const_true_cond)
+                        context->eval_mode = ece_eval_leakproof;
+
                     /* Simplify this alternative's result value */
                     caseresult = eval_const_expressions_mutator((Node *) oldcasewhen->result,
                                                                 context);
@@ -3058,6 +3102,7 @@ eval_const_expressions_mutator(Node *node,
                                                                context);

                 context->case_val = save_case_val;
+                context->eval_mode = save_eval_mode;

                 /*
                  * If no non-FALSE alternatives, CASE reduces to the default
@@ -3113,6 +3158,7 @@ eval_const_expressions_mutator(Node *node,
             {
                 CoalesceExpr *coalesceexpr = (CoalesceExpr *) node;
                 CoalesceExpr *newcoalesce;
+                ece_eval_mode save_eval_mode = context->eval_mode;
                 List       *newargs;
                 ListCell   *arg;

@@ -3137,13 +3183,25 @@ eval_const_expressions_mutator(Node *node,
                         if (((Const *) e)->constisnull)
                             continue;    /* drop null constant */
                         if (newargs == NIL)
+                        {
+                            context->eval_mode = save_eval_mode;
                             return e;    /* first expr */
+                        }
                         newargs = lappend(newargs, e);
                         break;
                     }
                     newargs = lappend(newargs, e);
+
+                    /*
+                     * Arguments following a non-constant argument may or may
+                     * not get evaluated at run-time, so as in CASE, become
+                     * more conservative about which functions to execute.
+                     */
+                    context->eval_mode = ece_eval_leakproof;
                 }

+                context->eval_mode = save_eval_mode;
+
                 /*
                  * If all the arguments were constant null, the result is just
                  * null
@@ -3163,13 +3221,12 @@ eval_const_expressions_mutator(Node *node,
         case T_SQLValueFunction:
             {
                 /*
-                 * All variants of SQLValueFunction are stable, so if we are
-                 * estimating the expression's value, we should evaluate the
-                 * current function value.  Otherwise just copy.
+                 * All variants of SQLValueFunction are stable, so evaluate if
+                 * we are evaluating stable functions.  Otherwise just copy.
                  */
                 SQLValueFunction *svf = (SQLValueFunction *) node;

-                if (context->estimate)
+                if (context->eval_mode >= ece_eval_stable)
                     return (Node *) evaluate_expr((Expr *) svf,
                                                   svf->type,
                                                   svf->typmod,
@@ -3565,20 +3622,42 @@ contain_non_const_walker(Node *node, void *context)
 static bool
 ece_function_is_safe(Oid funcid, eval_const_expressions_context *context)
 {
-    char        provolatile = func_volatile(funcid);
+    bool        result;
+    HeapTuple    tp;

-    /*
-     * Ordinarily we are only allowed to simplify immutable functions. But for
-     * purposes of estimation, we consider it okay to simplify functions that
-     * are merely stable; the risk that the result might change from planning
-     * time to execution time is worth taking in preference to not being able
-     * to estimate the value at all.
-     */
-    if (provolatile == PROVOLATILE_IMMUTABLE)
-        return true;
-    if (context->estimate && provolatile == PROVOLATILE_STABLE)
-        return true;
-    return false;
+    tp = SearchSysCache1(PROCOID, ObjectIdGetDatum(funcid));
+    if (!HeapTupleIsValid(tp))
+        elog(ERROR, "cache lookup failed for function %u", funcid);
+
+    result = ece_function_form_is_safe((Form_pg_proc) GETSTRUCT(tp), context);
+    ReleaseSysCache(tp);
+    return result;
+}
+
+/*
+ * Same, when we have the pg_proc entry directly at hand
+ */
+static bool
+ece_function_form_is_safe(Form_pg_proc func_form,
+                          eval_const_expressions_context *context)
+{
+    ece_eval_mode f_mode;
+
+    /* Must map the function properties to an ordered enum */
+    if (func_form->provolatile == PROVOLATILE_IMMUTABLE)
+    {
+        if (func_form->proleakproof)
+            f_mode = ece_eval_leakproof;
+        else
+            f_mode = ece_eval_immutable;
+    }
+    else if (func_form->provolatile == PROVOLATILE_STABLE)
+        f_mode = ece_eval_stable;
+    else
+        f_mode = ece_eval_volatile;
+
+    /* Now, does eval_mode allow evaluation of this function? */
+    return (context->eval_mode >= f_mode);
 }

 /*
@@ -4238,9 +4317,8 @@ recheck_cast_function_args(List *args, Oid result_type, HeapTuple func_tuple)
  * evaluate_function: try to pre-evaluate a function call
  *
  * We can do this if the function is strict and has any constant-null inputs
- * (just return a null constant), or if the function is immutable and has all
- * constant inputs (call it and return the result as a Const node).  In
- * estimation mode we are willing to pre-evaluate stable functions too.
+ * (just return a null constant), or if the function is safe to evaluate and
+ * has all constant inputs (call it and return the result as a Const node).
  *
  * Returns a simplified expression if successful, or NULL if cannot
  * simplify the function.
@@ -4293,7 +4371,7 @@ evaluate_function(Oid funcid, Oid result_type, int32 result_typmod,
      * If the function is strict and has a constant-NULL input, it will never
      * be called at all, so we can replace the call by a NULL constant, even
      * if there are other inputs that aren't constant, and even if the
-     * function is not otherwise immutable.
+     * function is not otherwise safe to evaluate.
      */
     if (funcform->proisstrict && has_null_input)
         return (Expr *) makeNullConst(result_type, result_typmod,
@@ -4308,17 +4386,9 @@ evaluate_function(Oid funcid, Oid result_type, int32 result_typmod,
         return NULL;

     /*
-     * Ordinarily we are only allowed to simplify immutable functions. But for
-     * purposes of estimation, we consider it okay to simplify functions that
-     * are merely stable; the risk that the result might change from planning
-     * time to execution time is worth taking in preference to not being able
-     * to estimate the value at all.
+     * Are we permitted to evaluate this function in the current context?
      */
-    if (funcform->provolatile == PROVOLATILE_IMMUTABLE)
-         /* okay */ ;
-    else if (context->estimate && funcform->provolatile == PROVOLATILE_STABLE)
-         /* okay */ ;
-    else
+    if (!ece_function_form_is_safe(funcform, context))
         return NULL;

     /*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 4b5af32440..5a3fdcc136 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -1338,7 +1338,7 @@
   proname => 'int4', prorettype => 'int4', proargtypes => 'int8',
   prosrc => 'int84' },
 { oid => '481', descr => 'convert int4 to int8',
-  proname => 'int8', prorettype => 'int8', proargtypes => 'int4',
+  proname => 'int8', proleakproof => 't', prorettype => 'int8', proargtypes => 'int4',
   prosrc => 'int48' },
 { oid => '482', descr => 'convert int8 to float8',
   proname => 'float8', prorettype => 'float8', proargtypes => 'int8',
diff --git a/src/test/regress/expected/case.out b/src/test/regress/expected/case.out
index c0c8acf035..0b23057ee4 100644
--- a/src/test/regress/expected/case.out
+++ b/src/test/regress/expected/case.out
@@ -93,10 +93,62 @@ SELECT CASE 1 WHEN 0 THEN 1/0 WHEN 1 THEN 1 ELSE 2/0 END;
     1
 (1 row)

--- However we do not currently suppress folding of potentially
--- reachable subexpressions
 SELECT CASE WHEN i > 100 THEN 1/0 ELSE 0 END FROM case_tbl;
+ case
+------
+    0
+    0
+    0
+    0
+(4 rows)
+
+-- However, that guarantee doesn't extend into sub-selects
+SELECT CASE WHEN i > 100 THEN (select 1/0) ELSE 0 END FROM case_tbl;
 ERROR:  division by zero
+-- Inline-ing a SQL function shouldn't cause a failure
+CREATE FUNCTION case_func(double precision, integer)
+RETURNS integer
+LANGUAGE SQL AS $$
+   SELECT CASE WHEN $1 = 42 THEN 1/$2 ELSE NULL END
+$$;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT case_func(f, 0) FROM case_tbl;
+                                      QUERY PLAN
+--------------------------------------------------------------------------------------
+ Seq Scan on public.case_tbl
+   Output: CASE WHEN (f = '42'::double precision) THEN (1 / 0) ELSE NULL::integer END
+(2 rows)
+
+SELECT case_func(f, 0) FROM case_tbl;
+ case_func
+-----------
+
+
+
+
+(4 rows)
+
+DROP FUNCTION case_func(double precision, integer);
+-- Subquery flattening shouldn't result in such errors, either
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT f, CASE WHEN f = 42 THEN 1/j ELSE NULL END
+FROM (SELECT f, 0 AS j FROM case_tbl) ss;
+                                                QUERY PLAN
+-----------------------------------------------------------------------------------------------------------
+ Seq Scan on public.case_tbl
+   Output: case_tbl.f, CASE WHEN (case_tbl.f = '42'::double precision) THEN (1 / 0) ELSE NULL::integer END
+(2 rows)
+
+SELECT f, CASE WHEN f = 42 THEN 1/j ELSE NULL END
+FROM (SELECT f, 0 AS j FROM case_tbl) ss;
+   f   | case
+-------+------
+  10.1 |
+  20.2 |
+ -30.3 |
+       |
+(4 rows)
+
 -- Test for cases involving untyped literals in test expression
 SELECT CASE 'a' WHEN 'a' THEN 1 ELSE 2 END;
  case
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 27056d70d3..7b576b0208 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -634,6 +634,7 @@ int84lt(bigint,integer)
 int84gt(bigint,integer)
 int84le(bigint,integer)
 int84ge(bigint,integer)
+int8(integer)
 oidvectorne(oidvector,oidvector)
 namelt(name,name)
 namele(name,name)
diff --git a/src/test/regress/sql/case.sql b/src/test/regress/sql/case.sql
index 17436c524a..07ee4f8651 100644
--- a/src/test/regress/sql/case.sql
+++ b/src/test/regress/sql/case.sql
@@ -66,11 +66,33 @@ SELECT '7' AS "None",
 -- Constant-expression folding shouldn't evaluate unreachable subexpressions
 SELECT CASE WHEN 1=0 THEN 1/0 WHEN 1=1 THEN 1 ELSE 2/0 END;
 SELECT CASE 1 WHEN 0 THEN 1/0 WHEN 1 THEN 1 ELSE 2/0 END;
-
--- However we do not currently suppress folding of potentially
--- reachable subexpressions
 SELECT CASE WHEN i > 100 THEN 1/0 ELSE 0 END FROM case_tbl;

+-- However, that guarantee doesn't extend into sub-selects
+SELECT CASE WHEN i > 100 THEN (select 1/0) ELSE 0 END FROM case_tbl;
+
+-- Inline-ing a SQL function shouldn't cause a failure
+CREATE FUNCTION case_func(double precision, integer)
+RETURNS integer
+LANGUAGE SQL AS $$
+   SELECT CASE WHEN $1 = 42 THEN 1/$2 ELSE NULL END
+$$;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT case_func(f, 0) FROM case_tbl;
+
+SELECT case_func(f, 0) FROM case_tbl;
+
+DROP FUNCTION case_func(double precision, integer);
+
+-- Subquery flattening shouldn't result in such errors, either
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT f, CASE WHEN f = 42 THEN 1/j ELSE NULL END
+FROM (SELECT f, 0 AS j FROM case_tbl) ss;
+
+SELECT f, CASE WHEN f = 42 THEN 1/j ELSE NULL END
+FROM (SELECT f, 0 AS j FROM case_tbl) ss;
+
 -- Test for cases involving untyped literals in test expression
 SELECT CASE 'a' WHEN 'a' THEN 1 ELSE 2 END;


Re: Making CASE error handling less surprising

От
Robert Haas
Дата:
On Thu, Jul 23, 2020 at 12:57 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Every so often we get a complaint like [1] about how a CASE should have
> prevented a run-time error and didn't, because constant-folding tried
> to evaluate a subexpression that would not have been entered at run-time.

Yes, I've heard such complaints from other sources as well.

> It struck me that it would not be hard to improve this situation a great
> deal.  If, within a CASE subexpression that isn't certain to be executed
> at runtime, we refuse to pre-evaluate *any* function (essentially, treat
> them all as volatile), then we should largely get the semantics that
> users expect.  There's some potential for query slowdown if a CASE
> contains a constant subexpression that we formerly reduced at plan time
> and now do not, but that doesn't seem to me to be a very big deal.

Like Pavel, and I think implicitly Dagfinn and Andres, I'm not sure I
believe this. Pavel's example is a good one. The leakproof exception
helps, but it doesn't cover everything. Users I've encountered throw
things like date_trunc() and lpad() into SQL code and expect them to
behave (from a performance point of view) like constants, but they
also expect 1/0 not to get evaluated too early when e.g. CASE is used.
It's difficult to meet both sets of expectations at the same time and
we're probably never going to have a perfect solution, but I think
you're minimizing the concern too much here.

> This is not a complete fix, because if you write a sub-SELECT the
> contents of the sub-SELECT are not processed by the outer query's
> eval_const_expressions pass; instead, we look at it within the
> sub-SELECT itself, and in that context there's no apparent reason
> to avoid const-folding.  So
>    CASE WHEN x < 0 THEN (SELECT 1/0) END
> fails even if x is never less than zero.  I don't see any great way
> to avoid that, and I'm not particularly concerned about it anyhow;
> usually the point of a sub-SELECT like this is to be decoupled from
> outer query evaluation, so that the behavior should not be that
> surprising.

I don't think I believe this either. I don't think an average user is
going to expect <expression> to behave differently from (SELECT
<expression>). This one actually bothers me more than the previous
one. How would we even document it? Sometimes things get inlined,
sometimes they don't. Sometimes subqueries get pulled up, sometimes
not. The current behavior isn't great, but at least it handles these
cases consistently. Getting the easy cases "right" while making the
behavior in more complex cases harder to understand is not necessarily
a win.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Making CASE error handling less surprising

От
Andres Freund
Дата:
Hi,

On 2020-07-24 12:31:05 -0400, Robert Haas wrote:
> On Thu, Jul 23, 2020 at 12:57 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Every so often we get a complaint like [1] about how a CASE should have
> > prevented a run-time error and didn't, because constant-folding tried
> > to evaluate a subexpression that would not have been entered at run-time.
> 
> Yes, I've heard such complaints from other sources as well.
> 
> > It struck me that it would not be hard to improve this situation a great
> > deal.  If, within a CASE subexpression that isn't certain to be executed
> > at runtime, we refuse to pre-evaluate *any* function (essentially, treat
> > them all as volatile), then we should largely get the semantics that
> > users expect.  There's some potential for query slowdown if a CASE
> > contains a constant subexpression that we formerly reduced at plan time
> > and now do not, but that doesn't seem to me to be a very big deal.
> 
> Like Pavel, and I think implicitly Dagfinn and Andres, I'm not sure I
> believe this. Pavel's example is a good one. The leakproof exception
> helps, but it doesn't cover everything. Users I've encountered throw
> things like date_trunc() and lpad() into SQL code and expect them to
> behave (from a performance point of view) like constants, but they
> also expect 1/0 not to get evaluated too early when e.g. CASE is used.
> It's difficult to meet both sets of expectations at the same time and
> we're probably never going to have a perfect solution, but I think
> you're minimizing the concern too much here.

Wouldn't the rule that I proposed earlier, namely that sub-expressions
that involve only "proper" constants continue to get evaluated even
within CASE, largely address that?


> I don't think I believe this either. I don't think an average user is
> going to expect <expression> to behave differently from (SELECT
> <expression>). This one actually bothers me more than the previous
> one. How would we even document it? Sometimes things get inlined,
> sometimes they don't. Sometimes subqueries get pulled up, sometimes
> not. The current behavior isn't great, but at least it handles these
> cases consistently. Getting the easy cases "right" while making the
> behavior in more complex cases harder to understand is not necessarily
> a win.

Well, if we formalize the desired behaviour it's probably a lot easier
to work towards implementing it in additional cases (like
subselects). It doesn't seem to hard to keep track of whether a specific
subquery can be evaluate constants in a certain way, if that's what we
need.

Greetings,

Andres Freund



Re: Making CASE error handling less surprising

От
Pavel Stehule
Дата:


pá 24. 7. 2020 v 18:49 odesílatel Andres Freund <andres@anarazel.de> napsal:
Hi,

On 2020-07-24 12:31:05 -0400, Robert Haas wrote:
> On Thu, Jul 23, 2020 at 12:57 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Every so often we get a complaint like [1] about how a CASE should have
> > prevented a run-time error and didn't, because constant-folding tried
> > to evaluate a subexpression that would not have been entered at run-time.
>
> Yes, I've heard such complaints from other sources as well.
>
> > It struck me that it would not be hard to improve this situation a great
> > deal.  If, within a CASE subexpression that isn't certain to be executed
> > at runtime, we refuse to pre-evaluate *any* function (essentially, treat
> > them all as volatile), then we should largely get the semantics that
> > users expect.  There's some potential for query slowdown if a CASE
> > contains a constant subexpression that we formerly reduced at plan time
> > and now do not, but that doesn't seem to me to be a very big deal.
>
> Like Pavel, and I think implicitly Dagfinn and Andres, I'm not sure I
> believe this. Pavel's example is a good one. The leakproof exception
> helps, but it doesn't cover everything. Users I've encountered throw
> things like date_trunc() and lpad() into SQL code and expect them to
> behave (from a performance point of view) like constants, but they
> also expect 1/0 not to get evaluated too early when e.g. CASE is used.
> It's difficult to meet both sets of expectations at the same time and
> we're probably never going to have a perfect solution, but I think
> you're minimizing the concern too much here.

Wouldn't the rule that I proposed earlier, namely that sub-expressions
that involve only "proper" constants continue to get evaluated even
within CASE, largely address that?

It doesn't solve a possible performance problem with one shot (EXECUTE stmt plpgsql) queries, or with parameterized queries






> I don't think I believe this either. I don't think an average user is
> going to expect <expression> to behave differently from (SELECT
> <expression>). This one actually bothers me more than the previous
> one. How would we even document it? Sometimes things get inlined,
> sometimes they don't. Sometimes subqueries get pulled up, sometimes
> not. The current behavior isn't great, but at least it handles these
> cases consistently. Getting the easy cases "right" while making the
> behavior in more complex cases harder to understand is not necessarily
> a win.

Well, if we formalize the desired behaviour it's probably a lot easier
to work towards implementing it in additional cases (like
subselects). It doesn't seem to hard to keep track of whether a specific
subquery can be evaluate constants in a certain way, if that's what we
need.

Greetings,

Andres Freund


Re: Making CASE error handling less surprising

От
Andres Freund
Дата:
Hi,

On 2020-07-24 19:03:30 +0200, Pavel Stehule wrote:
> pá 24. 7. 2020 v 18:49 odesílatel Andres Freund <andres@anarazel.de> napsal:
> > Wouldn't the rule that I proposed earlier, namely that sub-expressions
> > that involve only "proper" constants continue to get evaluated even
> > within CASE, largely address that?
> >
> 
> It doesn't solve a possible performance problem with one shot (EXECUTE stmt
> plpgsql) queries, or with parameterized queries

What precisely are you thinking of here? Most expressions involving
parameters would still get constant evaluated - it'd just be inside CASE
etc that they wouldn't anymore? Do you think it's that common to have a
parameter reference inside an expression inside a CASE where it's
crucial that that parameter reference gets constant evaluated? I'd think
that's a bit of a stretch.

Your earlier example of a WHEN ... THEN upper('constant') ... would
still have the upper('constant') be evaluated, because it doesn't
involve a parameter. And e.g. THEN upper('constant') * $1 would also
still have the upper('constant') be evaluated, just the multiplication
with $1 wouldn't get evaluated.


I'm not sure what you're concerned about with the one-shot bit?

Greetings,

Andres Freund



Re: Making CASE error handling less surprising

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> Like Pavel, and I think implicitly Dagfinn and Andres, I'm not sure I
> believe this. Pavel's example is a good one. The leakproof exception
> helps, but it doesn't cover everything. Users I've encountered throw
> things like date_trunc() and lpad() into SQL code and expect them to
> behave (from a performance point of view) like constants, but they
> also expect 1/0 not to get evaluated too early when e.g. CASE is used.
> It's difficult to meet both sets of expectations at the same time and
> we're probably never going to have a perfect solution, but I think
> you're minimizing the concern too much here.

I've quoted this point before, but ... we can make queries arbitrarily
fast, if we don't have to give the right answer.  I think we've seen
enough complaints on this topic now to make it clear that what we're
doing today with CASE is the wrong answer.

The performance argument can be made to cut both ways, too.  If somebody's
got a very expensive function in a CASE arm that they don't expect to
reach, having it be evaluated anyway because it's got constant inputs
isn't going to make them happy.

The real bottom line is: if you don't want to do this, how else do
you want to fix the problem?  I'm no longer willing to deny that
there is a problem.

> I don't think I believe this either. I don't think an average user is
> going to expect <expression> to behave differently from (SELECT
> <expression>).

Agreed, that's poorly (or not at all?) documented.  But it's been
true all along, and this patch isn't changing that behavior at all.
I'm not sure if we should do anything more than improve the docs,
but in any case it seems independent of the CASE issue.

> The current behavior isn't great, but at least it handles these
> cases consistently.

Really?

            regards, tom lane



Re: Making CASE error handling less surprising

От
Andres Freund
Дата:
Hi,

On 2020-07-23 22:34:53 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > I'm a bit worried about a case like:
> 
> > CREATE FUNCTION yell(int, int)
> > RETURNS int
> > IMMUTABLE
> > LANGUAGE SQL AS $$
> >    SELECT CASE WHEN $1 != 0 THEN 17 / $2 ELSE NULL END
> > $$;
> 
> > EXPLAIN SELECT yell(g.i, 0) FROM generate_series(1, 10) g(i);
> 
> > I don't think the parameters here would have been handled before
> > inlining, right?
> 
> Ah, I see what you mean.  Yeah, that throws an error today, and it
> still would with the patch I was envisioning (attached), because
> inlining does Param substitution in a different way.  I'm not
> sure that we could realistically fix the inlining case with this
> sort of approach.

Thinking about it a bit it seems we could solve that fairly easy if we
don't replace function parameter with actual Const nodes, but instead
with a PseudoConst parameter. If we map that to the same expression
evaluation step that should be fairly cheap to implement, basically
adding a bunch of 'case PseudoConst:' to the Const cases.  That way we
could take the type of constness into account before constant folding.

Alternatively we could add a new field to Const, indicating the 'source'
or 'context of the Const, which we then could take into account during
constant evaluation.


> I think this bears out the comment I made before that this approach
> still leaves us with a very complicated behavior.  Maybe we should
> stick with the previous approach, possibly supplemented with a
> leakproofness exception.

ISTM that most of the complication has to be dealt with in either
approach?

Greetings,

Andres Freund



Re: Making CASE error handling less surprising

От
Pavel Stehule
Дата:


pá 24. 7. 2020 v 19:13 odesílatel Andres Freund <andres@anarazel.de> napsal:
Hi,

On 2020-07-24 19:03:30 +0200, Pavel Stehule wrote:
> pá 24. 7. 2020 v 18:49 odesílatel Andres Freund <andres@anarazel.de> napsal:
> > Wouldn't the rule that I proposed earlier, namely that sub-expressions
> > that involve only "proper" constants continue to get evaluated even
> > within CASE, largely address that?
> >
>
> It doesn't solve a possible performance problem with one shot (EXECUTE stmt
> plpgsql) queries, or with parameterized queries

What precisely are you thinking of here? Most expressions involving
parameters would still get constant evaluated - it'd just be inside CASE
etc that they wouldn't anymore? Do you think it's that common to have a
parameter reference inside an expression inside a CASE where it's
crucial that that parameter reference gets constant evaluated? I'd think
that's a bit of a stretch.

Your earlier example of a WHEN ... THEN upper('constant') ... would
still have the upper('constant') be evaluated, because it doesn't
involve a parameter. And e.g. THEN upper('constant') * $1 would also
still have the upper('constant') be evaluated, just the multiplication
with $1 wouldn't get evaluated.


I'm not sure what you're concerned about with the one-shot bit?

Now query parameters are evaluated like constant.

I can imagine WHERE clause like WHERE col = CASE  $1 WHEN true THEN upper($2) ELSE $2 END

I remember applications  that use these strange queries to support parameterized behaviour - like case sensitive or case insensitive searching.



Greetings,

Andres Freund

Re: Making CASE error handling less surprising

От
Andres Freund
Дата:

On July 24, 2020 10:30:37 AM PDT, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>pá 24. 7. 2020 v 19:13 odesílatel Andres Freund <andres@anarazel.de>
>napsal:
>> Your earlier example of a WHEN ... THEN upper('constant') ... would
>> still have the upper('constant') be evaluated, because it doesn't
>> involve a parameter. And e.g. THEN upper('constant') * $1 would also
>> still have the upper('constant') be evaluated, just the
>multiplication
>> with $1 wouldn't get evaluated.
>>
>>
>> I'm not sure what you're concerned about with the one-shot bit?
>>
>
>Now query parameters are evaluated like constant.

How's that related to oneeshot plans?

>I can imagine WHERE clause like WHERE col = CASE  $1 WHEN true THEN
>upper($2) ELSE $2 END
>
>I remember applications  that use these strange queries to support
>parameterized behaviour - like case sensitive or case insensitive
>searching.

I don't buy this as a significant issue. This could much more sensibly be written as an OR. If the policy is that we
cannotregress anything then there's no way we can improve at all. 

Andres

Andres

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.



Re: Making CASE error handling less surprising

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> Wouldn't the rule that I proposed earlier, namely that sub-expressions
> that involve only "proper" constants continue to get evaluated even
> within CASE, largely address that?

The more I think about that the less I like it.  It'd make the behavior
even harder to reason about than it is now, and it doesn't fix the issue
for subquery pullup cases.

Basically this seems like a whole lot of thrashing to try to preserve
all the details of a behavior that is kind of accidental to begin with.
The argument that it's a performance issue seems hypothetical too,
rather than founded on any observed results.

BTW, to the extent that there is a performance issue, we could perhaps
fix it if we resurrected the "cache stable subexpressions" patch that
was kicking around a year or two ago.  That'd give us both
at-most-one-evaluation and no-evaluation-until-necessary behaviors,
if we made sure to apply it to stable CASE arms.

            regards, tom lane



Re: Making CASE error handling less surprising

От
Pavel Stehule
Дата:


pá 24. 7. 2020 v 19:46 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
Andres Freund <andres@anarazel.de> writes:
> Wouldn't the rule that I proposed earlier, namely that sub-expressions
> that involve only "proper" constants continue to get evaluated even
> within CASE, largely address that?

The more I think about that the less I like it.  It'd make the behavior
even harder to reason about than it is now, and it doesn't fix the issue
for subquery pullup cases.

Basically this seems like a whole lot of thrashing to try to preserve
all the details of a behavior that is kind of accidental to begin with.
The argument that it's a performance issue seems hypothetical too,
rather than founded on any observed results.

BTW, to the extent that there is a performance issue, we could perhaps
fix it if we resurrected the "cache stable subexpressions" patch that
was kicking around a year or two ago.  That'd give us both
at-most-one-evaluation and no-evaluation-until-necessary behaviors,
if we made sure to apply it to stable CASE arms.

+1

regards

Pavel


                        regards, tom lane


Re: Making CASE error handling less surprising

От
Chris Travers
Дата:


On Fri, Jul 24, 2020 at 7:18 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
> Like Pavel, and I think implicitly Dagfinn and Andres, I'm not sure I
> believe this. Pavel's example is a good one. The leakproof exception
> helps, but it doesn't cover everything. Users I've encountered throw
> things like date_trunc() and lpad() into SQL code and expect them to
> behave (from a performance point of view) like constants, but they
> also expect 1/0 not to get evaluated too early when e.g. CASE is used.
> It's difficult to meet both sets of expectations at the same time and
> we're probably never going to have a perfect solution, but I think
> you're minimizing the concern too much here.

I've quoted this point before, but ... we can make queries arbitrarily
fast, if we don't have to give the right answer.  I think we've seen
enough complaints on this topic now to make it clear that what we're
doing today with CASE is the wrong answer.

So here's my concern in a little more detail. 

For small databases, these performance concerns are not big deals. But for large, heavily loaded databases one tends to run into all of the pathological cases more frequently.  In other words the overhead for the largest users will likely not be proportional to the gains of the newer users who are surprised by the current behavior.  The more complex we make exceptions as to how the planner works, the more complex the knowledge required to work on the high end of the database is.  So the complexity here is such that I just don't think is worth it.


The performance argument can be made to cut both ways, too.  If somebody's
got a very expensive function in a CASE arm that they don't expect to
reach, having it be evaluated anyway because it's got constant inputs
isn't going to make them happy.

However in this case we would be evaluating the expensive case arm every time it is invoked (i.e. for every row matched), right?  It is hard to see this as even being close to a performance gain or even approximately neutral because the cases where you have a significant gain are likely to be extremely rare, and the penalties for when the cost applies will be many multiples of the maximum gain.

The real bottom line is: if you don't want to do this, how else do
you want to fix the problem?  I'm no longer willing to deny that
there is a problem.

I see three ways forward.

The first (probably the best) would be a solution along the lines of yours along with a session-level GUC variable which could determine whether case branches can fold constants.  This has several important benefits:

1.  It gets a fix in shortly for those who want it.
2.  It ensures this is optional behavior for the more experienced users (where one can better decide which direction to go), and
3.  It makes the behavior explicit, documented, and thus more easily understood.

A third approach would be to allow some sort of "constant evaluation mechanism" maybe with its own memory context where constants could be cached on first evaluation under the statement memory context.  That would solve the problem more gneerally.
 

> I don't think I believe this either. I don't think an average user is
> going to expect <expression> to behave differently from (SELECT
> <expression>).

Agreed, that's poorly (or not at all?) documented.  But it's been
true all along, and this patch isn't changing that behavior at all.
I'm not sure if we should do anything more than improve the docs,
but in any case it seems independent of the CASE issue.

> The current behavior isn't great, but at least it handles these
> cases consistently.

Really?

                        regards, tom lane




--
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com 
Saarbrücker Straße 37a, 10405 Berlin

Re: Making CASE error handling less surprising

От
Robert Haas
Дата:
On Sun, Jul 26, 2020 at 1:27 PM Chris Travers <chris.travers@adjust.com> wrote:
> The first (probably the best) would be a solution along the lines of yours along with a session-level GUC variable
whichcould determine whether case branches can fold constants.
 

Bluntly, that seems like a terrible idea. It's great if you are an
expert DBA, because then you can adjust the behavior on your own
system according to what works best for you. But if you are trying to
write portable code that will work on any PostgreSQL instance, you now
have to remember to test it with every possible value of the GUC and
make sure it behaves the same way under all of them. That is a major
burden on authors of tools and extensions, and if we add even three or
four such GUCs with three or four possible values each, there are
suddenly dozens or even hundreds of possible combinations to test. I
think that adding GUCs for this kind of thing is a complete
non-starter for that reason.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Making CASE error handling less surprising

От
Robert Haas
Дата:
On Fri, Jul 24, 2020 at 1:18 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The real bottom line is: if you don't want to do this, how else do
> you want to fix the problem?  I'm no longer willing to deny that
> there is a problem.

That's the wrong question. The right question is whether we're
sufficiently certain that a particular proposal is an improvement over
the status quo to justify changing something. It's better to do
nothing than to do something that makes some cases better and other
cases worse, because then instead of users having a problem with this,
they have a variety of different problems depending on which release
they are running.  IMHO, changing the semantics of something like this
is really scary and should be approached with great caution.

You don't have to deny that something is a problem in order to admit
that you might not have a perfect solution.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company