Обсуждение: Endless loop calling PL/Python set returning functions

Поиск

Список

Период

Сортировка

Endless loop calling PL/Python set returning functions

От

Alexey Grishchenko

Дата:

10 марта 2016 г., 17:08:01

Hello

There is a bug in implementation of set-returning functions in PL/Python. When you call the same set-returning function twice in a single query, the executor falls to infinite loop which causes OOM. Here is a simple reproduction for this issue:

CREATE OR REPLACE FUNCTION func(iter int) RETURNS SETOF int AS $$
return xrange(iter)
$$ LANGUAGE plpythonu;

select func(3), func(4);

The endless loop is caused by the fact that PL/Python uses PLyProcedure structure for each of the functions, containing information specific for the function. This structure is used to store the result set iterator returned by the Python function call. But in fact, when we call the same function twice, PL/Python uses the same structure for both calls, and the same result set iterator (PLyProcedure.setof), which is being constantly updated by one function after another. When the iterator reaches the end, the first function sets it to null. Then Postgres calls the second function, it receives NULL iterator and calls Python function once again, receiving another iterator. This is an endless loop

In fact, for set-returning functions in Postgres we should use a set of SRF_* functions, which gives us an access to function call context (FuncCallContext). In my implementation this context is used to store the iterator for function result set, so these two calls would have separate iterators and the query would succeed.

Another issue with calling the same set-returning function twice in the same query, is that it would delete the input parameter of the function from the global variables dictionary at the end of execution. With calling the function twice, this code attempts to delete the same entry from global variables dict twice, thus causing KeyError. This is why the function PLy_function_delete_args is modified as well to check whether the key we intend to delete is in the globals dictionary.

New regression test is included in the patch.

Best regards,

Alexey Grishchenko

Вложения

0001-Fix-endless-loop-in-plpython-set-returning-function.patch

Re: Endless loop calling PL/Python set returning functions

От

Tom Lane

Дата:

10 марта 2016 г., 18:36:03

Alexey Grishchenko <agrishchenko@pivotal.io> writes:
> There is a bug in implementation of set-returning functions in PL/Python.
> When you call the same set-returning function twice in a single query, the
> executor falls to infinite loop which causes OOM.

Ugh.

> Another issue with calling the same set-returning function twice in the
> same query, is that it would delete the input parameter of the function
> from the global variables dictionary at the end of execution. With calling
> the function twice, this code attempts to delete the same entry from global
> variables dict twice, thus causing KeyError. This is why the
> function PLy_function_delete_args is modified as well to check whether the
> key we intend to delete is in the globals dictionary.

That whole business with putting a function's parameters into a global
dictionary makes me itch.  Doesn't it mean problems if one plpython
function calls another (presumably via SPI)?
        regards, tom lane

Re: Endless loop calling PL/Python set returning functions

От

Alexey Grishchenko

Дата:

10 марта 2016 г., 19:20:38

I agree that passing function parameters through globals is not the best solution

It works in a following way - executing custom code (in our case Python function invocation) in Python is made with PyEval_EvalCode. As an input to this C function you specify dictionary of globals that would be available to this code. The structure PLyProcedure stores "PyObject *globals;", which is the dictionary of globals for specific function. So SPI works pretty fine, as each function has a separate dictionary of globals and they don't conflict with each other

One scenario when the problem occurs, is when you are calling the same set-returning function in a single query twice. This way they share the same "globals" which is not a bad thing, but when one function finishes execution and deallocates input parameter's global, the second will fail trying to do the same. I included the fix for this problem in my patch

The second scenario when the problem occurs is when you want to call the same PL/Python function in recursion. For example, this code will not work:

create or replace function test(a int) returns int as $BODY$
r = 0
if a > 1:
r = plpy.execute("SELECT test(%d) as a" % (a-1))[0]['a']
return a + r
$BODY$ language plpythonu;

select test(10);

The function "test" has a single PLyProcedure object allocated to handle it, thus it has a single "globals" dictionary. When internal function call finishes, it removes the key "a" from the dictionary, and the outer function fails with "NameError: global name 'a' is not defined" when it tries to execute "return a + r"

But the second issue is a separate story and I think it is worth a separate patch

On Thu, Mar 10, 2016 at 3:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alexey Grishchenko <agrishchenko@pivotal.io> writes:
> There is a bug in implementation of set-returning functions in PL/Python.
> When you call the same set-returning function twice in a single query, the
> executor falls to infinite loop which causes OOM.

Ugh.

> Another issue with calling the same set-returning function twice in the
> same query, is that it would delete the input parameter of the function
> from the global variables dictionary at the end of execution. With calling
> the function twice, this code attempts to delete the same entry from global
> variables dict twice, thus causing KeyError. This is why the
> function PLy_function_delete_args is modified as well to check whether the
> key we intend to delete is in the globals dictionary.

That whole business with putting a function's parameters into a global
dictionary makes me itch. Doesn't it mean problems if one plpython
function calls another (presumably via SPI)?

regards, tom lane

Best regards,

Alexey Grishchenko

Re: Endless loop calling PL/Python set returning functions

От

Tom Lane

Дата:

10 марта 2016 г., 19:33:23

Alexey Grishchenko <agrishchenko@pivotal.io> writes:
> One scenario when the problem occurs, is when you are calling the same
> set-returning function in a single query twice. This way they share the
> same "globals" which is not a bad thing, but when one function finishes
> execution and deallocates input parameter's global, the second will fail
> trying to do the same. I included the fix for this problem in my patch

> The second scenario when the problem occurs is when you want to call the
> same PL/Python function in recursion. For example, this code will not work:

Right, the recursion case is what's not being covered by this patch.
I would rather have a single patch that deals with both of those cases,
perhaps by *not* sharing the same dictionary across calls.  I think
what you've done here is not so much a fix as a band-aid.  In fact,
it doesn't even really fix the problem for the two-calls-per-query
case does it?  It'll work if the first execution of the SRF is run to
completion before starting the second one, but not if the two executions
are interleaved.  I believe you can test that type of scenario with
something like
 select set_returning_function_1(...), set_returning_function_2(...);
        regards, tom lane

Re: Endless loop calling PL/Python set returning functions

От

Alexey Grishchenko

Дата:

10 марта 2016 г., 19:57:15

No, my fix handles this well.

In fact, with the first function call you allocate global variables representing Python function input parameters, call the function and receive iterator over the function results. Then in a series of Postgres calls to PL/Python handler you just fetch next value from the iterator, you are not calling the Python function anymore. When the iterator reaches the end, PL/Python call handler deallocates the global variable representing function input parameter.

Regardless of the number of parallel invocations of the same function, each of them in my patch would set its own input parameters to the Python function, call the function and receive separate iterators. When the first function's result iterator would reach its end, it would deallocate the input global variable. But it won't affect other functions as they no longer need to invoke any Python code. Even if they need - they would reallocate global variable (it would be set before the Python function invocation). The issue here was in the fact that they tried to deallocate the global input variable multiple times independently, which caused error that I fixed.

Regarding the patch for the second case with recursion - not caching the "globals" between function calls would have a performance impact, as you would have to construct "globals" object before each function call. And you need globals as it contains references to "plpy" module and global variables and global dictionary ("GD"). I will think on this, maybe there might be a better design for this scenario. But I still think the second scenario requires a separate patch

On Thu, Mar 10, 2016 at 4:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alexey Grishchenko <agrishchenko@pivotal.io> writes:
> One scenario when the problem occurs, is when you are calling the same
> set-returning function in a single query twice. This way they share the
> same "globals" which is not a bad thing, but when one function finishes
> execution and deallocates input parameter's global, the second will fail
> trying to do the same. I included the fix for this problem in my patch

> The second scenario when the problem occurs is when you want to call the
> same PL/Python function in recursion. For example, this code will not work:

Right, the recursion case is what's not being covered by this patch.
I would rather have a single patch that deals with both of those cases,
perhaps by *not* sharing the same dictionary across calls. I think
what you've done here is not so much a fix as a band-aid. In fact,
it doesn't even really fix the problem for the two-calls-per-query
case does it? It'll work if the first execution of the SRF is run to
completion before starting the second one, but not if the two executions
are interleaved. I believe you can test that type of scenario with
something like

select set_returning_function_1(...), set_returning_function_2(...);

regards, tom lane

Best regards,

Alexey Grishchenko

Re: Endless loop calling PL/Python set returning functions

От

Tom Lane

Дата:

10 марта 2016 г., 20:20:37

Alexey Grishchenko <agrishchenko@pivotal.io> writes:
> No, my fix handles this well.
> In fact, with the first function call you allocate global variables
> representing Python function input parameters, call the function and
> receive iterator over the function results. Then in a series of Postgres
> calls to PL/Python handler you just fetch next value from the iterator, you
> are not calling the Python function anymore. When the iterator reaches the
> end, PL/Python call handler deallocates the global variable representing
> function input parameter.

> Regardless of the number of parallel invocations of the same function, each
> of them in my patch would set its own input parameters to the Python
> function, call the function and receive separate iterators. When the first
> function's result iterator would reach its end, it would deallocate the
> input global variable. But it won't affect other functions as they no
> longer need to invoke any Python code.

Well, if you think that works, why not undo the global-dictionary changes
at the end of the first call, rather than later?  Then there's certainly
no overlap in their lifespan.
        regards, tom lane

Re: Endless loop calling PL/Python set returning functions

От

Alexey Grishchenko

Дата:

10 марта 2016 г., 20:57:48

Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alexey Grishchenko <agrishchenko@pivotal.io> writes:
> No, my fix handles this well.
> In fact, with the first function call you allocate global variables
> representing Python function input parameters, call the function and
> receive iterator over the function results. Then in a series of Postgres
> calls to PL/Python handler you just fetch next value from the iterator, you
> are not calling the Python function anymore. When the iterator reaches the
> end, PL/Python call handler deallocates the global variable representing
> function input parameter.

> Regardless of the number of parallel invocations of the same function, each
> of them in my patch would set its own input parameters to the Python
> function, call the function and receive separate iterators. When the first
> function's result iterator would reach its end, it would deallocate the
> input global variable. But it won't affect other functions as they no
> longer need to invoke any Python code.

Well, if you think that works, why not undo the global-dictionary changes
at the end of the first call, rather than later? Then there's certainly
no overlap in their lifespan.

regards, tom lane

Could you elaborate more on this? In general, stack-like solution would work - if before the function call there is a global variable with the name matching input variable name, push its value to the stack, and pop it after the function execution. Would implement it tomorrow and see how it works

--

Sent from handheld device

Re: Endless loop calling PL/Python set returning functions

От

Alexey Grishchenko

Дата:

11 марта 2016 г., 13:09:35

Alexey Grishchenko <agrishchenko@pivotal.io> wrote:

Tom Lane <tgl@sss.pgh.pa.us> wrote:
Alexey Grishchenko <agrishchenko@pivotal.io> writes:
> No, my fix handles this well.
> In fact, with the first function call you allocate global variables
> representing Python function input parameters, call the function and
> receive iterator over the function results. Then in a series of Postgres
> calls to PL/Python handler you just fetch next value from the iterator, you
> are not calling the Python function anymore. When the iterator reaches the
> end, PL/Python call handler deallocates the global variable representing
> function input parameter.

> Regardless of the number of parallel invocations of the same function, each
> of them in my patch would set its own input parameters to the Python
> function, call the function and receive separate iterators. When the first
> function's result iterator would reach its end, it would deallocate the
> input global variable. But it won't affect other functions as they no
> longer need to invoke any Python code.

Well, if you think that works, why not undo the global-dictionary changes
at the end of the first call, rather than later? Then there's certainly
no overlap in their lifespan.

regards, tom lane

Could you elaborate more on this? In general, stack-like solution would work - if before the function call there is a global variable with the name matching input variable name, push its value to the stack, and pop it after the function execution. Would implement it tomorrow and see how it works

--

Sent from handheld device

I have improved the code using proposed approach. The second version of patch is in attachment

It works in a following way - the procedure object PLyProcedure stores information about the call stack depth (calldepth field) and the stack itself (argstack field). When the call stack depth is zero we don't make any additional processing, i.e. there won't be any performance impact for existing enduser functions. Stack manipulations are put in action only when the calldepth is greater than zero, which can be achieved either when the function is called recursively with SPI, or when you are calling the same set-returning function in a single query twice or more.

Example of multiple calls to SRF within a single function:

CREATE OR REPLACE FUNCTION func(iter int) RETURNS SETOF int AS $$
return xrange(iter)
$$ LANGUAGE plpythonu;

select func(3), func(4);

Before the patch query caused endless loop finishing with OOM. Now it works as it should

Example of recursion with SPI:

CREATE OR REPLACE FUNCTION test(a int) RETURNS int AS $BODY$
r = 0
if a > 1:
r = plpy.execute("SELECT test(%d) as a" % (a-1))[0]['a']
return a + r
$BODY$ LANGUAGE plpythonu;

select test(10);

Before the patch query failed with "NameError: global name 'a' is not defined". Now it works correctly and returns 55

Best regards,

Alexey Grishchenko

Вложения

0002-Fix-endless-loop-in-plpython-set-returning-function.patch

Re: Endless loop calling PL/Python set returning functions

От

Alexey Grishchenko

Дата:

22 марта 2016 г., 13:15:41

Alexey Grishchenko <agrishchenko@pivotal.io> wrote:

Alexey Grishchenko <agrishchenko@pivotal.io> wrote:
Tom Lane <tgl@sss.pgh.pa.us> wrote:
Alexey Grishchenko <agrishchenko@pivotal.io> writes:
> No, my fix handles this well.
> In fact, with the first function call you allocate global variables
> representing Python function input parameters, call the function and
> receive iterator over the function results. Then in a series of Postgres
> calls to PL/Python handler you just fetch next value from the iterator, you
> are not calling the Python function anymore. When the iterator reaches the
> end, PL/Python call handler deallocates the global variable representing
> function input parameter.

> Regardless of the number of parallel invocations of the same function, each
> of them in my patch would set its own input parameters to the Python
> function, call the function and receive separate iterators. When the first
> function's result iterator would reach its end, it would deallocate the
> input global variable. But it won't affect other functions as they no
> longer need to invoke any Python code.

Well, if you think that works, why not undo the global-dictionary changes
at the end of the first call, rather than later? Then there's certainly
no overlap in their lifespan.

regards, tom lane

Could you elaborate more on this? In general, stack-like solution would work - if before the function call there is a global variable with the name matching input variable name, push its value to the stack, and pop it after the function execution. Would implement it tomorrow and see how it works

--

Sent from handheld device

I have improved the code using proposed approach. The second version of patch is in attachment

It works in a following way - the procedure object PLyProcedure stores information about the call stack depth (calldepth field) and the stack itself (argstack field). When the call stack depth is zero we don't make any additional processing, i.e. there won't be any performance impact for existing enduser functions. Stack manipulations are put in action only when the calldepth is greater than zero, which can be achieved either when the function is called recursively with SPI, or when you are calling the same set-returning function in a single query twice or more.

Example of multiple calls to SRF within a single function:

CREATE OR REPLACE FUNCTION func(iter int) RETURNS SETOF int AS $$
return xrange(iter)
$$ LANGUAGE plpythonu;

select func(3), func(4);

Before the patch query caused endless loop finishing with OOM. Now it works as it should

Example of recursion with SPI:
CREATE OR REPLACE FUNCTION test(a int) RETURNS int AS $BODY$
r = 0
if a > 1:
r = plpy.execute("SELECT test(%d) as a" % (a-1))[0]['a']
return a + r
$BODY$ LANGUAGE plpythonu;

select test(10);

Before the patch query failed with "NameError: global name 'a' is not defined". Now it works correctly and returns 55

--
Best regards,
Alexey Grishchenko

Any comments on this patch?

Regarding passing parameters to the Python function using globals - it was in initial design of PL/Python (code, documentation). Originally you had to work with "args" global list of input parameters and wasn't able to access the named parameters directly. And you can do so even with the latest release. Going away from global input parameters would require switching to PyObject_CallFunctionObjArgs, which should be possible by changing the function declaration to include input parameters plus "args" (for backward compatibility). However, triggers are a bit different - they depend on modifying the global "TD" dictionary inside the Python function, and they return only the status string. For them, there is no option of modifying the code to avoid global input parameters without breaking the backward compatibility with the old enduser code

Best regards,

Alexey Grishchenko

Re: Endless loop calling PL/Python set returning functions

От

Tom Lane

Дата:

05 апреля 2016 г., 22:00:12

Alexey Grishchenko <agrishchenko@pivotal.io> writes:
> Any comments on this patch?

I felt that this was more nearly a bug fix than a new feature, so I picked
it up even though it's nominally in the next commitfest not the current
one.  I did not like the code too much as it stood: you were not being
paranoid enough about ensuring that the callstack data structure stayed
in sync with the actual control flow.  Also, it didn't work for functions
that modify their argument values (cf the committed regression tests);
you have to explicitly save named arguments not only the "args" version,
and you have to do it for SRF suspend/resume not just recursion cases.
But I cleaned all that up and committed it.

> triggers are a bit different - they depend on modifying the global "TD"
> dictionary inside the Python function, and they return only the status
> string. For them, there is no option of modifying the code to avoid global
> input parameters without breaking the backward compatibility with the old
> enduser code.

Yeah.  It might be worth the trouble to include triggers in the
save/restore logic, since at least in principle they can be invoked
recursively; but there's not that much practical use for such cases.
I didn't bother with that in the patch as-committed, but if you want
to follow up with an adjustment for it, I'd take a look.
        regards, tom lane

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Endless loop calling PL/Python set returning functions

Вложения

Вложения