On Sat, Jun 20, 2009 at 1:31 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
> I've identified the cause of Merlin Moncure's crash report here:
> http://archives.postgresql.org/pgsql-hackers/2009-06/msg01171.php
> (Thanks to Merlin for making some proprietary stuff available to me
> for testing it.)
>
> The problem query generates a plan in which a HashAggregate node is on
> the inside of a NestLoop, where it can be (and is) executed more than
> once. ExecReScanAgg decides that it can rescan the existing hash table
> instead of forcing it to be rebuilt. What that means is that the
> aggregate final-functions will be re-executed again on the latest
> transition values. And array_agg_finalfn has got a couple of issues
> with that. The obvious one is that it thinks it can release the
> ArrayBuildState when called by an Agg node. The less obvious one
> is that if any of the original input values were toasted,
> construct_md_array does this:
> if (elmlen == -1)
> elems[i] = PointerGetDatum(PG_DETOAST_DATUM(elems[i]));
> which means it's clobbering an element of the ArrayBuildState's elems
> array with a pointer to short-lived data. So on the re-execution it
> may find elems[] pointing to bogus data.
>
> There are basically two ways that we could respond to this:
>
> (1) Decide that the bug is array_agg_finalfn's, and make the necessary
> fixes so that it doesn't modify or free its argument ArrayBuildState.
> This would amount to legislating that aggregate final functions cannot
> scribble on their inputs *ever*, rather than the compromise position
> we thought we had adopted last fall, namely that they can do so when
> called by an Agg node but not when called by WindowAgg.
This definitely sounds like the right answer.
merlin