Обсуждение: Aggregate-function space leakage

Поиск
Список
Период
Сортировка

Aggregate-function space leakage

От
Tom Lane
Дата:
I looked into Chris Spotts' recent report of massive memory leakage in
8.4, in a case involving array_agg() executed in a GROUP BY query:
http://archives.postgresql.org/pgsql-general/2009-07/msg00858.php
The reason for that turns out to be that we deliberately lobotomized
array_agg that way, just last month:
http://archives.postgresql.org/pgsql-committers/2009-06/msg00259.php
in response to this problem:
http://archives.postgresql.org/pgsql-hackers/2009-06/msg01186.php

We need a better idea.

One possibility is to have nodeAgg.c reset the "aggcontext" between
groups when in group mode.  I think this would be more than a one-liner
fix because it probably has pointers to transvalues that are in that
context, but it's surely doable.

Another possibility is to modify array_agg (and other aggregates that
try to use aggcontext) so that they distinguish hash- and group-mode
aggregation.  If we had array_agg attach its working context to the Agg
node's per-output-tuple context instead of aggcontext when in group
mode, I think everything would work as desired.  The main disadvantage
of this is modifying a coding convention that third-party code probably
is using.  (But OTOH, we have already required any such code to change
for 8.4.)

I haven't tested either of the above ideas --- just brainstorming
at this point.

Comments, better ideas?
        regards, tom lane


Re: Aggregate-function space leakage

От
Jeff Davis
Дата:
On Wed, 2009-07-22 at 17:14 -0400, Tom Lane wrote:
> One possibility is to have nodeAgg.c reset the "aggcontext" between
> groups when in group mode.  I think this would be more than a one-liner
> fix because it probably has pointers to transvalues that are in that
> context, but it's surely doable.

Is there a disadvantage to this approach, or is it just more work?

Regards,Jeff Davis



Re: Aggregate-function space leakage

От
Tom Lane
Дата:
Jeff Davis <pgsql@j-davis.com> writes:
> On Wed, 2009-07-22 at 17:14 -0400, Tom Lane wrote:
>> One possibility is to have nodeAgg.c reset the "aggcontext" between
>> groups when in group mode.  I think this would be more than a one-liner
>> fix because it probably has pointers to transvalues that are in that
>> context, but it's surely doable.

> Is there a disadvantage to this approach, or is it just more work?

Don't know ... haven't tried to do it yet.
        regards, tom lane


Re: Aggregate-function space leakage

От
Greg Stark
Дата:
On Wed, Jul 22, 2009 at 10:14 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
> The reason for that turns out to be that we deliberately lobotomized
> array_agg that way, just last month:
> http://archives.postgresql.org/pgsql-committers/2009-06/msg00259.php
> in response to this problem:
> http://archives.postgresql.org/pgsql-hackers/2009-06/msg01186.php
>
> We need a better idea.


Rereading your diagnosis of Merlin Moncure's original problem I'm a
bit puzzled. Why do we have to rerun the final function when we rescan
the hash table? Surely the logical thing to do is to store the final
value in the hash table with some flag saying that value has been
finalized rather than to reexecute the final function every time it's
rescanned.

I'm not sure that really solves anything though since there's no
guarantee that the first scan was finished when it's reset so there
could still be unfinalized elements in the hash table. Would it be too
costly to finalize all the hash elements in a single pass before
returning any?

-- 
greg
http://mit.edu/~gsstark/resume.pdf


Re: Aggregate-function space leakage

От
Hitoshi Harada
Дата:
2009/7/23 Greg Stark <gsstark@mit.edu>:
> On Wed, Jul 22, 2009 at 10:14 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
>> The reason for that turns out to be that we deliberately lobotomized
>> array_agg that way, just last month:
>> http://archives.postgresql.org/pgsql-committers/2009-06/msg00259.php
>> in response to this problem:
>> http://archives.postgresql.org/pgsql-hackers/2009-06/msg01186.php
>>
>> We need a better idea.
>
>
> Rereading your diagnosis of Merlin Moncure's original problem I'm a
> bit puzzled. Why do we have to rerun the final function when we rescan
> the hash table? Surely the logical thing to do is to store the final
> value in the hash table with some flag saying that value has been
> finalized rather than to reexecute the final function every time it's
> rescanned.
>
> I'm not sure that really solves anything though since there's no
> guarantee that the first scan was finished when it's reset so there
> could still be unfinalized elements in the hash table. Would it be too
> costly to finalize all the hash elements in a single pass before
> returning any?
>

It looks like Agg node builds whole of the hash table before returning
a tuple in hash-mode. If it stores all the results somewhere and just
return them on rescan, an issue is volatile final functions (and I
know it's so rare case), but except for that it sounds sane, though I
don't know exact requirement of rescaning and reexecuting in the
Exectuor. If it really needs to release anything when rescan, this
approach also fails.

So two ideas from Tom seem to me a little worse than that. Modifying
Agg.c might add overhead to reset context group by group and forcing
array_agg() (i.e. user aggregates) to distinguish hash-mode and
group-mode is definitely heavy for users. The real problem here is
how/when to release transvalue stored by aggregates in new method
introduced in 8.4 with array_agg(), which is to pass pointers by
transfunc's arguments. Maybe array_agg should not do that thing
introduced in 8.4. We may go back to array_accum() possibly.


Regards,

> --
> greg
> http://mit.edu/~gsstark/resume.pdf
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



-- 
Hitoshi Harada


Re: Aggregate-function space leakage

От
Tom Lane
Дата:
Hitoshi Harada <umi.tanuki@gmail.com> writes:
> So two ideas from Tom seem to me a little worse than that. Modifying
> Agg.c might add overhead to reset context group by group and forcing
> array_agg() (i.e. user aggregates) to distinguish hash-mode and
> group-mode is definitely heavy for users.

I agree that the second choice would be a pain.  I think you are
overestimating the cost of the first choice though.  We have already
taken steps to ensure that MemoryContextReset is *extremely* cheap
when there is nothing for it to do.  If there is something for it to
do, well, that's the case that we have a memory leak now.  Also,
resetting the context should be cheaper than retail pfree's anyway.

Anyway, I'll go take a look at exactly what would be involved in the
first choice.
        regards, tom lane


Re: Aggregate-function space leakage

От
Tom Lane
Дата:
Greg Stark <gsstark@mit.edu> writes:
> Rereading your diagnosis of Merlin Moncure's original problem I'm a
> bit puzzled. Why do we have to rerun the final function when we rescan
> the hash table? Surely the logical thing to do is to store the final
> value in the hash table with some flag saying that value has been
> finalized rather than to reexecute the final function every time it's
> rescanned.

In the normal case where we're not going to do a rescan, this would very
likely make things slower because we'd have to make a never-used extra
copy of the function's output.  It might be worth doing if we knew we
were likely to get rescanned; but I'm not eager to have two
significantly different operational modes for that.  nodeAgg is
complicated enough already...
        regards, tom lane


Re: Aggregate-function space leakage

От
Tom Lane
Дата:
I wrote:
> Anyway, I'll go take a look at exactly what would be involved in the
> first choice.

Actually, it seems this way results in a net *savings* of code, because
we can simply remove the code that was responsible for retail pfree'ing
of the transition values.  I suppose that code must have predated the
introduction of MemoryContextReset, or it would have occurred to us to
do it like this to begin with.

I think that WindowAgg does not need any changes because it already does
MemoryContextResetAndDeleteChildren(winstate->wincontext) at partition
boundaries.  Hitoshi, do you agree?

            regards, tom lane

Index: src/backend/executor/nodeAgg.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/executor/nodeAgg.c,v
retrieving revision 1.167
diff -c -r1.167 nodeAgg.c
*** src/backend/executor/nodeAgg.c    17 Jun 2009 16:05:34 -0000    1.167
--- src/backend/executor/nodeAgg.c    23 Jul 2009 19:19:33 -0000
***************
*** 55,60 ****
--- 55,62 ----
   *      in either case its value need not be preserved.  See int8inc() for an
   *      example.    Notice that advance_transition_function() is coded to avoid a
   *      data copy step when the previous transition value pointer is returned.
+  *      Also, some transition functions make use of the aggcontext to store
+  *      working state.
   *
   *
   * Portions Copyright (c) 1996-2009, PostgreSQL Global Development Group
***************
*** 273,290 ****
          }

          /*
-          * If we are reinitializing after a group boundary, we have to free
-          * any prior transValue to avoid memory leakage.  We must check not
-          * only the isnull flag but whether the pointer is NULL; since
-          * pergroupstate is initialized with palloc0, the initial condition
-          * has isnull = 0 and null pointer.
-          */
-         if (!peraggstate->transtypeByVal &&
-             !pergroupstate->transValueIsNull &&
-             DatumGetPointer(pergroupstate->transValue) != NULL)
-             pfree(DatumGetPointer(pergroupstate->transValue));
-
-         /*
           * (Re)set transValue to the initial value.
           *
           * Note that when the initial value is pass-by-ref, we must copy it
--- 275,280 ----
***************
*** 911,920 ****
          }

          /*
!          * Clear the per-output-tuple context for each group
           */
          ResetExprContext(econtext);

          /*
           * Initialize working state for a new input tuple group
           */
--- 901,915 ----
          }

          /*
!          * Clear the per-output-tuple context for each group, as well as
!          * aggcontext (which contains any pass-by-ref transvalues of the
!          * old group).  We also clear any child contexts of the aggcontext;
!          * some aggregate functions store working state in such contexts.
           */
          ResetExprContext(econtext);

+         MemoryContextResetAndDeleteChildren(aggstate->aggcontext);
+
          /*
           * Initialize working state for a new input tuple group
           */
***************
*** 1234,1240 ****
       * structures and transition values.  NOTE: the details of what is stored
       * in aggcontext and what is stored in the regular per-query memory
       * context are driven by a simple decision: we want to reset the
!      * aggcontext in ExecReScanAgg to recover no-longer-wanted space.
       */
      aggstate->aggcontext =
          AllocSetContextCreate(CurrentMemoryContext,
--- 1229,1236 ----
       * structures and transition values.  NOTE: the details of what is stored
       * in aggcontext and what is stored in the regular per-query memory
       * context are driven by a simple decision: we want to reset the
!      * aggcontext at group boundaries (if not hashing) and in ExecReScanAgg
!      * to recover no-longer-wanted space.
       */
      aggstate->aggcontext =
          AllocSetContextCreate(CurrentMemoryContext,
Index: src/backend/utils/adt/array_userfuncs.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/adt/array_userfuncs.c,v
retrieving revision 1.31
diff -c -r1.31 array_userfuncs.c
*** src/backend/utils/adt/array_userfuncs.c    20 Jun 2009 18:45:28 -0000    1.31
--- src/backend/utils/adt/array_userfuncs.c    23 Jul 2009 19:19:33 -0000
***************
*** 539,545 ****

      /*
       * Make the result.  We cannot release the ArrayBuildState because
!      * sometimes aggregate final functions are re-executed.
       */
      result = makeMdArrayResult(state, 1, dims, lbs,
                                 CurrentMemoryContext,
--- 539,547 ----

      /*
       * Make the result.  We cannot release the ArrayBuildState because
!      * sometimes aggregate final functions are re-executed.  Rather, it
!      * is nodeAgg.c's responsibility to reset the aggcontext when it's
!      * safe to do so.
       */
      result = makeMdArrayResult(state, 1, dims, lbs,
                                 CurrentMemoryContext,

Re: Aggregate-function space leakage

От
Hitoshi Harada
Дата:
2009/7/24 Tom Lane <tgl@sss.pgh.pa.us>:
> I think that WindowAgg does not need any changes because it already does
> MemoryContextResetAndDeleteChildren(winstate->wincontext) at partition
> boundaries.  Hitoshi, do you agree?
>

I do. Looking closer, temporal space management of Agg is getting
similar to WindowAgg's partition localMemory strategy. We might be
able to get them one. Anyway, the choice sounds better now, though I
will test array_agg more.

Regards,

--
Hitoshi Harada