Re: [PATCH] plpython function causes server panic

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: [PATCH] plpython function causes server panic
Дата
Msg-id 301509.1701479055@sss.pgh.pa.us
обсуждение исходный текст
Ответ на [PATCH] plpython function causes server panic  (Hao Zhang <zhrt1446384557@gmail.com>)
Ответы Re: [PATCH] plpython function causes server panic  (Andres Freund <andres@anarazel.de>)
Re: [PATCH] plpython function causes server panic  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Hao Zhang <zhrt1446384557@gmail.com> writes:
> I found a problem when executing the plpython function:
> After the plpython function returns an error, in the same session, if we
> continue to execute
> plpython function, the server panic will be caused.

Thanks for the report!  I see the problem is that we're not expecting
BeginInternalSubTransaction to fail.  However, I'm not sure I like
this solution, mainly because it's only covering a fraction of the
problem.  There are similarly unsafe usages in plperl, pltcl, and
very possibly a lot of third-party PLs.  I wonder if there's a way
to deal with this issue without changing these API assumptions.

The only readily-reachable error case in BeginInternalSubTransaction
is this specific one about IsInParallelMode, which was added later
than the original design and evidently with not a lot of thought or
testing.  The comment for it speculates about whether we could get
rid of it, so I wonder if our thoughts about this ought to go in that
direction.

In any case, if we do proceed along the lines of catching errors
from BeginInternalSubTransaction, I think your patch is a bit shy
of a load because it doesn't do all the same things that other callers
of PLy_spi_exception_set do.  Looking at that made me wonder why
the PLy_spi_exceptions lookup business was being duplicated by every
caller rather than being done once in PLy_spi_exception_set.  So
0001 attached is a quick refactoring patch to remove that code
duplication, and then 0002 is your patch adapted to that.

I also attempted to include a test case in 0002, but I'm not very
satisfied with that.  Your original test case seemed pretty expensive
for the amount of code coverage it adds, so I tried to make it happen
with debug_parallel_query instead.  That does exercise the new code,
but it does not exhibit the crash if run against unpatched code.
That's because with this test case the error is only thrown in worker
processes not the leader, so we don't end up with corrupted Python
state in the leader.  That result also points up that the original
test case isn't very reliable for this either: you have to have
parallel_leader_participation on, and you have to have the leader
process at least one row, which makes it pretty timing-sensitive.
On top of all that, the test would become useless if we do eventually
get rid of the !IsInParallelMode restriction.  So I'm kind of inclined
to not bother with a test case if this gets to be committed in this
form.

Thoughts anyone?

            regards, tom lane

diff --git a/src/pl/plpython/plpy_spi.c b/src/pl/plpython/plpy_spi.c
index ff87b27de0..64ca47f218 100644
--- a/src/pl/plpython/plpy_spi.c
+++ b/src/pl/plpython/plpy_spi.c
@@ -28,7 +28,7 @@
 static PyObject *PLy_spi_execute_query(char *query, long limit);
 static PyObject *PLy_spi_execute_fetch_result(SPITupleTable *tuptable,
                                               uint64 rows, int status);
-static void PLy_spi_exception_set(PyObject *excclass, ErrorData *edata);
+static void PLy_spi_exception_set(ErrorData *edata);


 /* prepare(query="select * from foo")
@@ -470,8 +470,6 @@ PLy_commit(PyObject *self, PyObject *args)
     PG_CATCH();
     {
         ErrorData  *edata;
-        PLyExceptionEntry *entry;
-        PyObject   *exc;

         /* Save error info */
         MemoryContextSwitchTo(oldcontext);
@@ -481,17 +479,8 @@ PLy_commit(PyObject *self, PyObject *args)
         /* was cleared at transaction end, reset pointer */
         exec_ctx->scratch_ctx = NULL;

-        /* Look up the correct exception */
-        entry = hash_search(PLy_spi_exceptions, &(edata->sqlerrcode),
-                            HASH_FIND, NULL);
-
-        /*
-         * This could be a custom error code, if that's the case fallback to
-         * SPIError
-         */
-        exc = entry ? entry->exc : PLy_exc_spi_error;
         /* Make Python raise the exception */
-        PLy_spi_exception_set(exc, edata);
+        PLy_spi_exception_set(edata);
         FreeErrorData(edata);

         return NULL;
@@ -517,8 +506,6 @@ PLy_rollback(PyObject *self, PyObject *args)
     PG_CATCH();
     {
         ErrorData  *edata;
-        PLyExceptionEntry *entry;
-        PyObject   *exc;

         /* Save error info */
         MemoryContextSwitchTo(oldcontext);
@@ -528,17 +515,8 @@ PLy_rollback(PyObject *self, PyObject *args)
         /* was cleared at transaction end, reset pointer */
         exec_ctx->scratch_ctx = NULL;

-        /* Look up the correct exception */
-        entry = hash_search(PLy_spi_exceptions, &(edata->sqlerrcode),
-                            HASH_FIND, NULL);
-
-        /*
-         * This could be a custom error code, if that's the case fallback to
-         * SPIError
-         */
-        exc = entry ? entry->exc : PLy_exc_spi_error;
         /* Make Python raise the exception */
-        PLy_spi_exception_set(exc, edata);
+        PLy_spi_exception_set(edata);
         FreeErrorData(edata);

         return NULL;
@@ -594,8 +572,6 @@ void
 PLy_spi_subtransaction_abort(MemoryContext oldcontext, ResourceOwner oldowner)
 {
     ErrorData  *edata;
-    PLyExceptionEntry *entry;
-    PyObject   *exc;

     /* Save error info */
     MemoryContextSwitchTo(oldcontext);
@@ -607,17 +583,8 @@ PLy_spi_subtransaction_abort(MemoryContext oldcontext, ResourceOwner oldowner)
     MemoryContextSwitchTo(oldcontext);
     CurrentResourceOwner = oldowner;

-    /* Look up the correct exception */
-    entry = hash_search(PLy_spi_exceptions, &(edata->sqlerrcode),
-                        HASH_FIND, NULL);
-
-    /*
-     * This could be a custom error code, if that's the case fallback to
-     * SPIError
-     */
-    exc = entry ? entry->exc : PLy_exc_spi_error;
     /* Make Python raise the exception */
-    PLy_spi_exception_set(exc, edata);
+    PLy_spi_exception_set(edata);
     FreeErrorData(edata);
 }

@@ -626,12 +593,21 @@ PLy_spi_subtransaction_abort(MemoryContext oldcontext, ResourceOwner oldowner)
  * internal query and error position.
  */
 static void
-PLy_spi_exception_set(PyObject *excclass, ErrorData *edata)
+PLy_spi_exception_set(ErrorData *edata)
 {
+    PLyExceptionEntry *entry;
+    PyObject   *excclass;
     PyObject   *args = NULL;
     PyObject   *spierror = NULL;
     PyObject   *spidata = NULL;

+    /* Look up the correct exception */
+    entry = hash_search(PLy_spi_exceptions, &(edata->sqlerrcode),
+                        HASH_FIND, NULL);
+
+    /* Fall back to SPIError if no entry */
+    excclass = entry ? entry->exc : PLy_exc_spi_error;
+
     args = Py_BuildValue("(s)", edata->message);
     if (!args)
         goto failure;
diff --git a/src/pl/plpython/expected/plpython_spi.out b/src/pl/plpython/expected/plpython_spi.out
index 8853e2540d..26e1cc3c9a 100644
--- a/src/pl/plpython/expected/plpython_spi.out
+++ b/src/pl/plpython/expected/plpython_spi.out
@@ -464,3 +464,29 @@ SELECT plan_composite_args();
  (3,label)
 (1 row)

+--
+-- Error cases
+--
+-- Check recovery from a subtransaction start failure; to make one
+-- happen repeatably, try to do a SPI operation in an allegedly
+-- parallel-safe function
+CREATE FUNCTION falsely_marked_parallel_safe() RETURNS SETOF int AS
+$$
+plpy.execute("select 2+2")
+for i in range(0, 5):
+    yield (i)
+$$ LANGUAGE plpython3u parallel safe;
+SET debug_parallel_query = 1;
+SELECT falsely_marked_parallel_safe();
+ERROR:  error fetching next item from iterator
+DETAIL:  spiexceptions.InvalidTransactionState: cannot start subtransactions during a parallel operation
+CONTEXT:  Traceback (most recent call last):
+PL/Python function "falsely_marked_parallel_safe"
+parallel worker
+SELECT falsely_marked_parallel_safe();
+ERROR:  error fetching next item from iterator
+DETAIL:  spiexceptions.InvalidTransactionState: cannot start subtransactions during a parallel operation
+CONTEXT:  Traceback (most recent call last):
+PL/Python function "falsely_marked_parallel_safe"
+parallel worker
+RESET debug_parallel_query;
diff --git a/src/pl/plpython/plpy_cursorobject.c b/src/pl/plpython/plpy_cursorobject.c
index 57e8f8ec21..8bfe05815b 100644
--- a/src/pl/plpython/plpy_cursorobject.c
+++ b/src/pl/plpython/plpy_cursorobject.c
@@ -98,7 +98,8 @@ PLy_cursor_query(const char *query)
     oldcontext = CurrentMemoryContext;
     oldowner = CurrentResourceOwner;

-    PLy_spi_subtransaction_begin(oldcontext, oldowner);
+    if (!PLy_spi_subtransaction_begin(oldcontext, oldowner))
+        return NULL;

     PG_TRY();
     {
@@ -196,7 +197,8 @@ PLy_cursor_plan(PyObject *ob, PyObject *args)
     oldcontext = CurrentMemoryContext;
     oldowner = CurrentResourceOwner;

-    PLy_spi_subtransaction_begin(oldcontext, oldowner);
+    if (!PLy_spi_subtransaction_begin(oldcontext, oldowner))
+        return NULL;

     PG_TRY();
     {
@@ -333,7 +335,8 @@ PLy_cursor_iternext(PyObject *self)
     oldcontext = CurrentMemoryContext;
     oldowner = CurrentResourceOwner;

-    PLy_spi_subtransaction_begin(oldcontext, oldowner);
+    if (!PLy_spi_subtransaction_begin(oldcontext, oldowner))
+        return NULL;

     PG_TRY();
     {
@@ -403,7 +406,8 @@ PLy_cursor_fetch(PyObject *self, PyObject *args)
     oldcontext = CurrentMemoryContext;
     oldowner = CurrentResourceOwner;

-    PLy_spi_subtransaction_begin(oldcontext, oldowner);
+    if (!PLy_spi_subtransaction_begin(oldcontext, oldowner))
+        return NULL;

     PG_TRY();
     {
diff --git a/src/pl/plpython/plpy_spi.c b/src/pl/plpython/plpy_spi.c
index 64ca47f218..ca97f2eca6 100644
--- a/src/pl/plpython/plpy_spi.c
+++ b/src/pl/plpython/plpy_spi.c
@@ -77,7 +77,8 @@ PLy_spi_prepare(PyObject *self, PyObject *args)
     oldcontext = CurrentMemoryContext;
     oldowner = CurrentResourceOwner;

-    PLy_spi_subtransaction_begin(oldcontext, oldowner);
+    if (!PLy_spi_subtransaction_begin(oldcontext, oldowner))
+        return NULL;

     PG_TRY();
     {
@@ -217,7 +218,8 @@ PLy_spi_execute_plan(PyObject *ob, PyObject *list, long limit)
     oldcontext = CurrentMemoryContext;
     oldowner = CurrentResourceOwner;

-    PLy_spi_subtransaction_begin(oldcontext, oldowner);
+    if (!PLy_spi_subtransaction_begin(oldcontext, oldowner))
+        return NULL;

     PG_TRY();
     {
@@ -313,7 +315,8 @@ PLy_spi_execute_query(char *query, long limit)
     oldcontext = CurrentMemoryContext;
     oldowner = CurrentResourceOwner;

-    PLy_spi_subtransaction_begin(oldcontext, oldowner);
+    if (!PLy_spi_subtransaction_begin(oldcontext, oldowner))
+        return NULL;

     PG_TRY();
     {
@@ -534,7 +537,9 @@ PLy_rollback(PyObject *self, PyObject *args)
  *    MemoryContext oldcontext = CurrentMemoryContext;
  *    ResourceOwner oldowner = CurrentResourceOwner;
  *
- *    PLy_spi_subtransaction_begin(oldcontext, oldowner);
+ *    if (!PLy_spi_subtransaction_begin(oldcontext, oldowner))
+ *        return NULL;
+ *
  *    PG_TRY();
  *    {
  *        <call SPI functions>
@@ -551,12 +556,38 @@ PLy_rollback(PyObject *self, PyObject *args)
  * These utilities take care of restoring connection to the SPI manager and
  * setting a Python exception in case of an abort.
  */
-void
+bool
 PLy_spi_subtransaction_begin(MemoryContext oldcontext, ResourceOwner oldowner)
 {
-    BeginInternalSubTransaction(NULL);
-    /* Want to run inside function's memory context */
-    MemoryContextSwitchTo(oldcontext);
+    PG_TRY();
+    {
+        /* Start subtransaction (could fail) */
+        BeginInternalSubTransaction(NULL);
+        /* Want to run inside function's memory context */
+        MemoryContextSwitchTo(oldcontext);
+    }
+    PG_CATCH();
+    {
+        ErrorData  *edata;
+
+        /* Save error info */
+        MemoryContextSwitchTo(oldcontext);
+        edata = CopyErrorData();
+        FlushErrorState();
+
+        /* Ensure we restore original context and owner */
+        MemoryContextSwitchTo(oldcontext);
+        CurrentResourceOwner = oldowner;
+
+        /* Make Python raise the exception */
+        PLy_spi_exception_set(edata);
+        FreeErrorData(edata);
+
+        return false;
+    }
+    PG_END_TRY();
+
+    return true;
 }

 void
@@ -580,6 +611,8 @@ PLy_spi_subtransaction_abort(MemoryContext oldcontext, ResourceOwner oldowner)

     /* Abort the inner transaction */
     RollbackAndReleaseCurrentSubTransaction();
+
+    /* Ensure we restore original context and owner */
     MemoryContextSwitchTo(oldcontext);
     CurrentResourceOwner = oldowner;

diff --git a/src/pl/plpython/plpy_spi.h b/src/pl/plpython/plpy_spi.h
index 98ccd21093..a43d803928 100644
--- a/src/pl/plpython/plpy_spi.h
+++ b/src/pl/plpython/plpy_spi.h
@@ -22,7 +22,7 @@ typedef struct PLyExceptionEntry
 } PLyExceptionEntry;

 /* handling of SPI operations inside subtransactions */
-extern void PLy_spi_subtransaction_begin(MemoryContext oldcontext, ResourceOwner oldowner);
+extern bool PLy_spi_subtransaction_begin(MemoryContext oldcontext, ResourceOwner oldowner);
 extern void PLy_spi_subtransaction_commit(MemoryContext oldcontext, ResourceOwner oldowner);
 extern void PLy_spi_subtransaction_abort(MemoryContext oldcontext, ResourceOwner oldowner);

diff --git a/src/pl/plpython/sql/plpython_spi.sql b/src/pl/plpython/sql/plpython_spi.sql
index fcd113acaa..69a448cd68 100644
--- a/src/pl/plpython/sql/plpython_spi.sql
+++ b/src/pl/plpython/sql/plpython_spi.sql
@@ -320,3 +320,22 @@ SELECT cursor_fetch_next_empty();
 SELECT cursor_plan();
 SELECT cursor_plan_wrong_args();
 SELECT plan_composite_args();
+
+--
+-- Error cases
+--
+
+-- Check recovery from a subtransaction start failure; to make one
+-- happen repeatably, try to do a SPI operation in an allegedly
+-- parallel-safe function
+CREATE FUNCTION falsely_marked_parallel_safe() RETURNS SETOF int AS
+$$
+plpy.execute("select 2+2")
+for i in range(0, 5):
+    yield (i)
+$$ LANGUAGE plpython3u parallel safe;
+
+SET debug_parallel_query = 1;
+SELECT falsely_marked_parallel_safe();
+SELECT falsely_marked_parallel_safe();
+RESET debug_parallel_query;

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: logical decoding and replication of sequences, take 2
Следующее
От: Thomas Munro
Дата:
Сообщение: Re: processes stuck in shutdown following OOM/recovery