Обсуждение: Strange result with LATERAL query

Поиск
Список
Период
Сортировка

Strange result with LATERAL query

От
Jeevan Chalke
Дата:
Hi,

While playing with LATERAL along with some aggregates in sub-query, I have
observed somewhat unusual behavior.

Consider following steps:

create table tab1(c1 int, c2 int);
insert into tab1 select id, 1 from generate_series(1, 3) id;
create function sum_tab1(extra int) returns setof bigint as $$
  select sum(c1 + extra) sum from tab1 group by c1
$$ language sql;

-- This gives wrong output
select c1, c2, sum from tab1 t1, lateral
  (select sum(t2.c1 + t1.c1) sum from tab1 t2 group by t2.c1) qry
  order by 1, 2, 3;
-- This gives correct output
select c1, c2, sum from tab1 t1, lateral
  (select sum_tab1 sum from sum_tab1(c1)) qry
  order by 1, 2, 3;


I would expect same result from both these queries, but unfortunately not.
Per my understanding, second query involving function gives me correct result
where as first query's output seems wrong.

Is this an expected behavior OR we are giving wrong result in case of first
query?

Thanks
--
Jeevan B Chalke

Re: Strange result with LATERAL query

От
Andrew Gierth
Дата:
>>>>> "Jeevan" == Jeevan Chalke <jeevan.chalke@enterprisedb.com> writes:
Jeevan> Hi,Jeevan> While playing with LATERAL along with some aggregates inJeevan> sub-query, I have observed somewhat
unusualbehavior.
 

Simpler example not needing LATERAL:

select array(select sum(x+y) from generate_series(1,3) y group by y) from generate_series(1,3) x;?column? 
----------{2,4,3}{2,4,3}{2,4,3}
(3 rows)

This is broken all the way back, it's not a recent bug.

Something is wrong with the way chgParam is being handled in Agg nodes.
The code in ExecReScanAgg seems to assume that if the lefttree doesn't
have any parameter changes then it suffices to re-project the data from
the existing hashtable; but of course this is nonsense if the parameter
is in an input to an aggregate function.

-- 
Andrew (irc:RhodiumToad)



Re: Strange result with LATERAL query

От
Andrew Gierth
Дата:
>>>>> "Andrew" == Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
>>>>> "Jeevan" == Jeevan Chalke <jeevan.chalke@enterprisedb.com> writes:
Jeevan> Hi,Jeevan> While playing with LATERAL along with some aggregates inJeevan> sub-query, I have observed somewhat
unusualbehavior.
 
Andrew> Simpler example not needing LATERAL:
Andrew> select array(select sum(x+y) from generate_series(1,3) y group by y)Andrew>   from generate_series(1,3) x;

Oh, and another way to verify that it's a bug and not expected behavior
is that it goes away with set enable_hashagg=false;

-- 
Andrew (irc:RhodiumToad)



Re: Strange result with LATERAL query

От
Tom Lane
Дата:
Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> Something is wrong with the way chgParam is being handled in Agg nodes.
> The code in ExecReScanAgg seems to assume that if the lefttree doesn't
> have any parameter changes then it suffices to re-project the data from
> the existing hashtable; but of course this is nonsense if the parameter
> is in an input to an aggregate function.

It looks like it's sufficient to do this:

diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index 1ec2515..f468fad 100644
*** a/src/backend/executor/nodeAgg.c
--- b/src/backend/executor/nodeAgg.c
*************** ExecReScanAgg(AggState *node)
*** 3425,3435 ****             return;          /*
!          * If we do have the hash table and the subplan does not have any
!          * parameter changes, then we can just rescan the existing hash table;
!          * no need to build it again.          */
!         if (outerPlan->chgParam == NULL)         {             ResetTupleHashIterator(node->hashtable,
&node->hashiter);            return;
 
--- 3425,3436 ----             return;          /*
!          * If we do have the hash table and there are no relevant parameter
!          * changes, then we can just rescan the existing hash table; no need
!          * to build it again.          */
!         if (node->ss.ps.chgParam == NULL &&
!             outerPlan->chgParam == NULL)         {             ResetTupleHashIterator(node->hashtable,
&node->hashiter);            return;
 


I'm not sure if it's worth trying to distinguish whether the Param is
inside any aggregate calls or not.  The existing code gets the right
answer for

select array(select x+sum(y) from generate_series(1,3) y group by y)  from generate_series(1,3) x;

and we'd be losing some efficiency for cases like that if we fix
it as above.  But is it worth the trouble?
        regards, tom lane



Re: Strange result with LATERAL query

От
Pavel Stehule
Дата:


2016-08-24 17:08 GMT+02:00 Tom Lane <tgl@sss.pgh.pa.us>:
Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> Something is wrong with the way chgParam is being handled in Agg nodes.
> The code in ExecReScanAgg seems to assume that if the lefttree doesn't
> have any parameter changes then it suffices to re-project the data from
> the existing hashtable; but of course this is nonsense if the parameter
> is in an input to an aggregate function.

It looks like it's sufficient to do this:

diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index 1ec2515..f468fad 100644
*** a/src/backend/executor/nodeAgg.c
--- b/src/backend/executor/nodeAgg.c
*************** ExecReScanAgg(AggState *node)
*** 3425,3435 ****
                        return;

                /*
!                * If we do have the hash table and the subplan does not have any
!                * parameter changes, then we can just rescan the existing hash table;
!                * no need to build it again.
                 */
!               if (outerPlan->chgParam == NULL)
                {
                        ResetTupleHashIterator(node->hashtable, &node->hashiter);
                        return;
--- 3425,3436 ----
                        return;

                /*
!                * If we do have the hash table and there are no relevant parameter
!                * changes, then we can just rescan the existing hash table; no need
!                * to build it again.
                 */
!               if (node->ss.ps.chgParam == NULL &&
!                       outerPlan->chgParam == NULL)
                {
                        ResetTupleHashIterator(node->hashtable, &node->hashiter);
                        return;


I'm not sure if it's worth trying to distinguish whether the Param is
inside any aggregate calls or not.  The existing code gets the right
answer for

select array(select x+sum(y) from generate_series(1,3) y group by y)
  from generate_series(1,3) x;

and we'd be losing some efficiency for cases like that if we fix
it as above.  But is it worth the trouble?


The result should not depend on GUC - hashagg on/off changing output - it is error.

Regards

Pavel
 
                        regards, tom lane


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Strange result with LATERAL query

От
Andrew Gierth
Дата:
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
Tom> I'm not sure if it's worth trying to distinguish whether the ParamTom> is inside any aggregate calls or not.  The
existingcode gets theTom> right answer for
 
Tom> select array(select x+sum(y) from generate_series(1,3) y group by y) Tom>   from generate_series(1,3) x;
Tom> and we'd be losing some efficiency for cases like that if we fixTom> it as above.  But is it worth the trouble?

The loss of efficiency could be significant, since it's forcing a rescan
of what could be an expensive plan.

-- 
Andrew (irc:RhodiumToad)



Re: Strange result with LATERAL query

От
Andrew Gierth
Дата:
>>>>> "Andrew" == Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

 Tom> I'm not sure if it's worth trying to distinguish whether the Param
 Tom> is inside any aggregate calls or not.

How about:

--
Andrew (irc:RhodiumToad)

diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index 1ec2515..4a9fefb 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -3426,10 +3426,11 @@ ExecReScanAgg(AggState *node)

         /*
          * If we do have the hash table and the subplan does not have any
-         * parameter changes, then we can just rescan the existing hash table;
-         * no need to build it again.
+         * parameter changes, we might still need to rebuild the hash if any of
+         * the parameter changes affect the input to aggregate functions.
          */
-        if (outerPlan->chgParam == NULL)
+        if (outerPlan->chgParam == NULL
+            && !bms_overlap(node->ss.ps.chgParam, aggnode->aggParam))
         {
             ResetTupleHashIterator(node->hashtable, &node->hashiter);
             return;
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index c7a0644..7b5eb86 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -871,6 +871,7 @@ _copyAgg(const Agg *from)
     COPY_SCALAR_FIELD(aggstrategy);
     COPY_SCALAR_FIELD(aggsplit);
     COPY_SCALAR_FIELD(numCols);
+    COPY_BITMAPSET_FIELD(aggParam);
     if (from->numCols > 0)
     {
         COPY_POINTER_FIELD(grpColIdx, from->numCols * sizeof(AttrNumber));
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 1fab807..5e48edd 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -706,6 +706,7 @@ _outAgg(StringInfo str, const Agg *node)
     WRITE_ENUM_FIELD(aggstrategy, AggStrategy);
     WRITE_ENUM_FIELD(aggsplit, AggSplit);
     WRITE_INT_FIELD(numCols);
+    WRITE_BITMAPSET_FIELD(aggParam);

     appendStringInfoString(str, " :grpColIdx");
     for (i = 0; i < node->numCols; i++)
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index c83063e..67dcf17 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -2004,6 +2004,7 @@ _readAgg(void)
     READ_ENUM_FIELD(aggstrategy, AggStrategy);
     READ_ENUM_FIELD(aggsplit, AggSplit);
     READ_INT_FIELD(numCols);
+    READ_BITMAPSET_FIELD(aggParam);
     READ_ATTRNUMBER_ARRAY(grpColIdx, local_node->numCols);
     READ_OID_ARRAY(grpOperators, local_node->numCols);
     READ_LONG_FIELD(numGroups);
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index a46cc10..2e56484 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -82,6 +82,7 @@ static Bitmapset *finalize_plan(PlannerInfo *root,
               Bitmapset *valid_params,
               Bitmapset *scan_params);
 static bool finalize_primnode(Node *node, finalize_primnode_context *context);
+static bool finalize_agg_primnode(Node *node, finalize_primnode_context *context);


 /*
@@ -2659,8 +2660,30 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
                               &context);
             break;

-        case T_Hash:
         case T_Agg:
+            {
+                Agg           *agg = (Agg *) plan;
+                finalize_primnode_context aggcontext;
+
+                /*
+                 * We need to know which params are referenced in inputs to
+                 * aggregate calls, so that we know whether we need to rebuild
+                 * the hashtable in the AGG_HASHED case. Rescan the tlist and
+                 * qual for this purpose.
+                 */
+
+                aggcontext = context;
+                aggcontext.paramids = NULL;
+
+                finalize_agg_primnode((Node *) agg->plan.targetlist, &aggcontext);
+                finalize_agg_primnode((Node *) agg->plan.qual, &aggcontext);
+
+                /* remember results for execution */
+                agg->aggParam = aggcontext.paramids;
+            }
+            break;
+
+        case T_Hash:
         case T_Material:
         case T_Sort:
         case T_Unique:
@@ -2812,6 +2835,24 @@ finalize_primnode(Node *node, finalize_primnode_context *context)
 }

 /*
+ * finalize_agg_primnode: add IDs of all PARAM_EXEC params appearing inside
+ * Aggref nodes in the given expression tree to the result set.
+ */
+static bool
+finalize_agg_primnode(Node *node, finalize_primnode_context *context)
+{
+    if (node == NULL)
+        return false;
+    if (IsA(node, Aggref))
+    {
+        finalize_primnode(node, context);
+        return false;            /* no more to do here */
+    }
+    return expression_tree_walker(node, finalize_agg_primnode,
+                                  (void *) context);
+}
+
+/*
  * SS_make_initplan_output_param - make a Param for an initPlan's output
  *
  * The plan is expected to return a scalar value of the given type/collation.
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 369179f..3d5e4df 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -712,6 +712,7 @@ typedef struct Agg
     AggStrategy aggstrategy;    /* basic strategy, see nodes.h */
     AggSplit    aggsplit;        /* agg-splitting mode, see nodes.h */
     int            numCols;        /* number of grouping columns */
+    Bitmapset  *aggParam;        /* params used in Aggref inputs */
     AttrNumber *grpColIdx;        /* their indexes in the target list */
     Oid           *grpOperators;    /* equality operators to compare with */
     long        numGroups;        /* estimated number of groups in input */

Re: Strange result with LATERAL query

От
Andrew Gierth
Дата:
>>>>> "Pavel" == Pavel Stehule <pavel.stehule@gmail.com> writes:
Pavel> The result should not depend on GUC - hashagg on/off changingPavel> output - it is error.

I don't think anyone's suggesting leaving it unfixed, just whether the
fix should introduce unnecessary rescans of the aggregate input.

-- 
Andrew (irc:RhodiumToad)



Re: Strange result with LATERAL query

От
Tom Lane
Дата:
Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> How about:

Hm, I was just working on inserting something of the sort into
ExecInitAgg.  But I guess we could do it in the planner too.
Will run with your approach.

I think it's a bit too stupid as-is, though.  We don't need to
recalculate for Params in aggdirectargs, do we?
        regards, tom lane



Re: Strange result with LATERAL query

От
Andrew Gierth
Дата:
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
Tom> Hm, I was just working on inserting something of the sort intoTom> ExecInitAgg.  But I guess we could do it in the
plannertoo.  WillTom> run with your approach.
 
Tom> I think it's a bit too stupid as-is, though.  We don't need toTom> recalculate for Params in aggdirectargs, do
we?

In theory we would need to. But in practice we don't, because we don't
allow ordered aggs in AGG_HASHED mode anyway.

We could skip filling in aggParam at all if not in AGG_HASHED mode I
guess.

-- 
Andrew (irc:RhodiumToad)



Re: Strange result with LATERAL query

От
Tom Lane
Дата:
Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
>  Tom> I think it's a bit too stupid as-is, though.  We don't need to
>  Tom> recalculate for Params in aggdirectargs, do we?

> In theory we would need to.

How come?  Those are only passed to the final function, no?  They
shouldn't affect what is in the hash table.

> But in practice we don't, because we don't
> allow ordered aggs in AGG_HASHED mode anyway.

True, it's moot at the moment.

> We could skip filling in aggParam at all if not in AGG_HASHED mode I
> guess.

Yeah, I'm planning to make it do that.
        regards, tom lane