Обсуждение: Recursive calls to functions that return sets

Поиск
Список
Период
Сортировка

Recursive calls to functions that return sets

От
Thomas Hallgren
Дата:
Imagine the following scenario:

Function 'A' returns SETOF 'x'. It will issue a query using SPI that 
calls function 'B'. This function returns SETOF 'y'.
Each tuple of 'x' is formed from some data in 'y'.
There will be millions of tuples so building a set of 'y' in memory is 
not an option.

What would the recommended use of MemoryContexts in an SRF function be 
in order to make this work? The SPI_connect must be issued during the 
SRF_IS_FIRST_CALL() phase. The Portal that it creates must remain alive 
until it's time for the SRF_RETURN_DONE(). What would the recommended 
approach be to accomplish this efficiently (and without introducing a 
major memory leak)?

The problem I'm trying to solve is a generic one. It's very possible 
that the recursion is is of arbitrary depth.

Regards,
Thomas Hallgren



Re: Recursive calls to functions that return sets

От
Martijn van Oosterhout
Дата:
On Wed, Mar 22, 2006 at 03:31:59PM +0100, Thomas Hallgren wrote:
> Imagine the following scenario:
>
> Function 'A' returns SETOF 'x'. It will issue a query using SPI that
> calls function 'B'. This function returns SETOF 'y'.
> Each tuple of 'x' is formed from some data in 'y'.
> There will be millions of tuples so building a set of 'y' in memory is
> not an option.

I think you're running into a small limitation of set functions here.
If you look at nodeFunctionScan.c that handles this, you can see that
the code is written in such a way as to collect all the tuples first
before returning anything. Not sure why it does that, probably to
handle mark/restore, though that isn't stated anywhere in the code.

> What would the recommended use of MemoryContexts in an SRF function be
> in order to make this work? The SPI_connect must be issued during the
> SRF_IS_FIRST_CALL() phase. The Portal that it creates must remain alive
> until it's time for the SRF_RETURN_DONE(). What would the recommended
> approach be to accomplish this efficiently (and without introducing a
> major memory leak)?

Well, I think this is done the normal way. The function returning
values allocates them in it's own context and does a RETURN NEXT. Once
it has returned them it can free it, or reset the context if it
prefers. The caller is always responsible for copying (since it isn't
often needed).

Have you read the executor/README ?

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Re: Recursive calls to functions that return sets

От
Tom Lane
Дата:
Martijn van Oosterhout <kleptog@svana.org> writes:
> I think you're running into a small limitation of set functions here.
> If you look at nodeFunctionScan.c that handles this, you can see that
> the code is written in such a way as to collect all the tuples first
> before returning anything.

I don't think Thomas intended to go through nodeFunctionScan, so this
needn't apply to him.

> Not sure why it does that,

plpgsql and similar languages will return a tuplestore anyway, so it has
to handle that case, and it was convenient to make all the cases look
alike for starters.  Nobody's yet gone back to improve it for the case
of languages that return a tuple per call.
        regards, tom lane


Re: Recursive calls to functions that return sets

От
Andrew Dunstan
Дата:
Tom Lane wrote:

>
>plpgsql and similar languages will return a tuplestore anyway, so it has
>to handle that case, and it was convenient to make all the cases look
>alike for starters.  Nobody's yet gone back to improve it for the case
>of languages that return a tuple per call.
>
>  
>



This would be hard to do in the plperl case, at least, and I would be 
surprised if it weren't in most others too. So what plperl does is to 
fetch the whole set on the first call and then fudges all the other 
calls to get the next element from the result set. We save out the 
intermediate tuple store on each call and restore it afterwards, so I 
think recursion shouldn't be a difficulty.

cheers

andrew


Re: Recursive calls to functions that return sets

От
Thomas Hallgren
Дата:
Recursive calls works in PL/Java. No problem there. But the larger the 
set, the more memory it consumes. Do I read your answers correctly if I 
conclude this is a known limitation when SPI is used? I.e. there's no 
way to stream one row at a time without ever building the full set?

Regards,
Thomas Hallgren


Andrew Dunstan wrote:
> Tom Lane wrote:
>
>>
>> plpgsql and similar languages will return a tuplestore anyway, so it has
>> to handle that case, and it was convenient to make all the cases look
>> alike for starters.  Nobody's yet gone back to improve it for the case
>> of languages that return a tuple per call.
>>
>>  
>>
>
>
>
> This would be hard to do in the plperl case, at least, and I would be 
> surprised if it weren't in most others too. So what plperl does is to 
> fetch the whole set on the first call and then fudges all the other 
> calls to get the next element from the result set. We save out the 
> intermediate tuple store on each call and restore it afterwards, so I 
> think recursion shouldn't be a difficulty.
>
> cheers
>
> andrew



Re: Recursive calls to functions that return sets

От
Alvaro Herrera
Дата:
Thomas Hallgren wrote:
> Recursive calls works in PL/Java. No problem there. But the larger the 
> set, the more memory it consumes. Do I read your answers correctly if I 
> conclude this is a known limitation when SPI is used? I.e. there's no 
> way to stream one row at a time without ever building the full set?

Hmm, are you using a tuplestore?  The PL/php code for return_next looks
like this:

ZEND_FUNCTION(return_next)
{   ... some stuff ...
   /* Use the per-query context so that the tuplestore survives */   oldcxt =
MemoryContextSwitchTo(rsi->econtext->ecxt_per_query_memory);
   /* Form the tuple */   tup = plphp_srf_htup_from_zval(param, current_attinmeta, current_memcxt);
   /* First call?  Create the tuplestore. */   if (!current_tuplestore)       current_tuplestore =
tuplestore_begin_heap(true,false, work_mem);
 
   /* Save the tuple and clean up */   tuplestore_puttuple(current_tuplestore, tup);   heap_freetuple(tup);
   MemoryContextSwitchTo(oldcxt);
}

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: Recursive calls to functions that return sets

От
Tom Lane
Дата:
Thomas Hallgren <thomas@tada.se> writes:
> Recursive calls works in PL/Java. No problem there. But the larger the 
> set, the more memory it consumes. Do I read your answers correctly if I 
> conclude this is a known limitation when SPI is used? I.e. there's no 
> way to stream one row at a time without ever building the full set?

By no means.  The point is that there are some callers of SRFs that are
going to materialize the result set, as well as some SRFs that are going
to hand back a materialized result set anyway.  The interface can handle
a tuple-per-call but that's not the way everybody chooses to use it.
        regards, tom lane


Re: Recursive calls to functions that return sets

От
Andrew Dunstan
Дата:
Thomas Hallgren wrote:

> Recursive calls works in PL/Java. No problem there. But the larger the 
> set, the more memory it consumes. Do I read your answers correctly if 
> I conclude this is a known limitation when SPI is used? I.e. there's 
> no way to stream one row at a time without ever building the full set?


plperl stashes the results in a tuplestore object, which spills to disk. 
So memory use is not unbounded. Before 8.1 we had no return_next and no 
intermediate tuplestore, so we had serious memory problems with 
returning large sets.

As for SPI calls, we also had problems there but now we provide a cursor 
interface that works much more nicely.

cheers

andrew





Re: Recursive calls to functions that return sets

От
Thomas Hallgren
Дата:
Tom Lane wrote:
> Thomas Hallgren <thomas@tada.se> writes:
>   
>> Recursive calls works in PL/Java. No problem there. But the larger the 
>> set, the more memory it consumes. Do I read your answers correctly if I 
>> conclude this is a known limitation when SPI is used? I.e. there's no 
>> way to stream one row at a time without ever building the full set?
>>     
>
> By no means.  The point is that there are some callers of SRFs that are
> going to materialize the result set, as well as some SRFs that are going
> to hand back a materialized result set anyway.  The interface can handle
> a tuple-per-call but that's not the way everybody chooses to use it.
>
>   
OK. I've managed to get rid of my last memory-leak (i hope). I followed 
Martijn's suggestion to create the returned tuple in my own context. Now 
even the nastiest recursive chains using huge sets of data seems to 
behave ok :-)

There's one thing that's still a bit fuzzy to me. If I don't use SPI, 
the context that is current when my SRF function is called seems to be 
reset between each call. I can palloc stuff in it as much as I like. I 
can even create the tuple that I return using this context. No memory 
leak. But if I, during the SPI_IS_FIRST_CALL phase, do an SPI_connect 
(done when the 'multi_call_memory_ctx' is current), then the leak seem 
to occur immediately. Will that connect somehow alter the durability for 
the context that is current on each call to my SRF?

Regards,
Thomas Hallgren



Re: Recursive calls to functions that return sets

От
Martijn van Oosterhout
Дата:
On Wed, Mar 22, 2006 at 09:09:34PM +0100, Thomas Hallgren wrote:
> There's one thing that's still a bit fuzzy to me. If I don't use SPI,
> the context that is current when my SRF function is called seems to be
> reset between each call. I can palloc stuff in it as much as I like. I
> can even create the tuple that I return using this context. No memory
> leak. But if I, during the SPI_IS_FIRST_CALL phase, do an SPI_connect
> (done when the 'multi_call_memory_ctx' is current), then the leak seem
> to occur immediately. Will that connect somehow alter the durability for
> the context that is current on each call to my SRF?

Ok, I'm not sure if I understand the reasoning but I think it's like
this:

- When the results of an SRF are accumulated by
ExecMakeTableFunctionResult, that function is reseting your context
each time.

- When you call SPI_connect it creates a new context and switches to
it. It switches back on SPI_finish. SPI_finish switches to the context
active at SPI_connect, maybe this is not what you expect?

The ExecMakeTableFunctionResult only resets the one context, the one
provided when your function starts, anything created in other contexts
is Somebody Else's Problem. So the question, which context are you
allocating in?

Hope this clarifies it,
>

--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Re: Recursive calls to functions that return sets

От
Tom Lane
Дата:
Thomas Hallgren <thomas@tada.se> writes:
> But if I, during the SPI_IS_FIRST_CALL phase, do an SPI_connect 
> (done when the 'multi_call_memory_ctx' is current), then the leak seem 
> to occur immediately. Will that connect somehow alter the durability for 
> the context that is current on each call to my SRF?

Are you remembering to SPI_finish when you're done?  Maybe what you are
leaking is the SPI state information.

You could look at the MemoryContextStats printout for clues --- easiest
way is to deliberately run the backend out of memory, and after the "out
of memory" error occurs, look in the postmaster log.
        regards, tom lane