Re: BUG #17912: Invalid memory access when converting plpython' array containing empty array

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: BUG #17912: Invalid memory access when converting plpython' array containing empty array
Дата
Msg-id 2073074.1682871878@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: BUG #17912: Invalid memory access when converting plpython' array containing empty array  (Alexander Lakhin <exclusion@gmail.com>)
Ответы Re: BUG #17912: Invalid memory access when converting plpython' array containing empty array  (Alexander Lakhin <exclusion@gmail.com>)
Список pgsql-bugs
Alexander Lakhin <exclusion@gmail.com> writes:
> 29.04.2023 20:16, Tom Lane wrote:
>> I'm a little uncomfortable with changing the semantics of non-failing
>> cases in the back branches.

> I agree that we shouldn't introduce a new behavior in the released branches.
> For that moment I was thinking about the choice between v16 and v17
> for that patch to be committed to.
> But as you see the better solution now, the patch will be different and more
> extensive, I suppose, so I'd vote for postponing it for v17.

Here's a version that adopts plperl's logic, causing it to treat
empty sub-lists as being zero-length dimensions.  Most of the new
test cases are borrowed from plperl, too, and most of them act
differently before and after the code change.  So I'm pretty
hesitant to put this into stable branches.  OTOH, maybe it's not
too late for v16?

I noticed one inarguable bug here, too: PLySequence_ToArray_recurse
leaks Python object refcounts after errors, because it has no
PG_TRY to ensure that Py_XDECREF() gets done.  I'm not sure if
it's worth doing something about that in the back branches, given
the lack of complaints.

            regards, tom lane

diff --git a/src/pl/plpython/expected/plpython_types.out b/src/pl/plpython/expected/plpython_types.out
index c4e749a5a8..8a680e15c1 100644
--- a/src/pl/plpython/expected/plpython_types.out
+++ b/src/pl/plpython/expected/plpython_types.out
@@ -687,23 +687,74 @@ SELECT * FROM test_type_conversion_array_mixed2();
 ERROR:  invalid input syntax for type integer: "abc"
 CONTEXT:  while creating return value
 PL/Python function "test_type_conversion_array_mixed2"
-CREATE FUNCTION test_type_conversion_array_mixed3() RETURNS text[] AS $$
-return [[], 'a']
+-- check output of multi-dimensional arrays
+CREATE FUNCTION test_type_conversion_md_array_out() RETURNS text[] AS $$
+return [['a'], ['b'], ['c']]
 $$ LANGUAGE plpython3u;
-SELECT * FROM test_type_conversion_array_mixed3();
- test_type_conversion_array_mixed3
+select test_type_conversion_md_array_out();
+ test_type_conversion_md_array_out
 -----------------------------------
- {[],a}
+ {{a},{b},{c}}
 (1 row)

+CREATE OR REPLACE FUNCTION test_type_conversion_md_array_out() RETURNS text[] AS $$
+return [[], []]
+$$ LANGUAGE plpython3u;
+select test_type_conversion_md_array_out();
+ test_type_conversion_md_array_out
+-----------------------------------
+ {}
+(1 row)
+
+CREATE OR REPLACE FUNCTION test_type_conversion_md_array_out() RETURNS text[] AS $$
+return [[], [1]]
+$$ LANGUAGE plpython3u;
+select test_type_conversion_md_array_out();  -- fail
+ERROR:  multidimensional arrays must have array expressions with matching dimensions
+CONTEXT:  while creating return value
+PL/Python function "test_type_conversion_md_array_out"
+CREATE OR REPLACE FUNCTION test_type_conversion_md_array_out() RETURNS text[] AS $$
+return [[], 1]
+$$ LANGUAGE plpython3u;
+select test_type_conversion_md_array_out();  -- fail
+ERROR:  multidimensional arrays must have array expressions with matching dimensions
+CONTEXT:  while creating return value
+PL/Python function "test_type_conversion_md_array_out"
+CREATE OR REPLACE FUNCTION test_type_conversion_md_array_out() RETURNS text[] AS $$
+return [1, []]
+$$ LANGUAGE plpython3u;
+select test_type_conversion_md_array_out();  -- fail
+ERROR:  multidimensional arrays must have array expressions with matching dimensions
+CONTEXT:  while creating return value
+PL/Python function "test_type_conversion_md_array_out"
+CREATE OR REPLACE FUNCTION test_type_conversion_md_array_out() RETURNS text[] AS $$
+return [[1], [[]]]
+$$ LANGUAGE plpython3u;
+select test_type_conversion_md_array_out();  -- fail
+ERROR:  multidimensional arrays must have array expressions with matching dimensions
+CONTEXT:  while creating return value
+PL/Python function "test_type_conversion_md_array_out"
 CREATE FUNCTION test_type_conversion_mdarray_malformed() RETURNS int[] AS $$
 return [[1,2,3],[4,5]]
 $$ LANGUAGE plpython3u;
 SELECT * FROM test_type_conversion_mdarray_malformed();
-ERROR:  wrong length of inner sequence: has length 2, but 3 was expected
-DETAIL:  To construct a multidimensional array, the inner sequences must all have the same length.
+ERROR:  multidimensional arrays must have array expressions with matching dimensions
 CONTEXT:  while creating return value
 PL/Python function "test_type_conversion_mdarray_malformed"
+CREATE FUNCTION test_type_conversion_mdarray_malformed2() RETURNS text[] AS $$
+return [[1,2,3], "abc"]
+$$ LANGUAGE plpython3u;
+SELECT * FROM test_type_conversion_mdarray_malformed2();
+ERROR:  multidimensional arrays must have array expressions with matching dimensions
+CONTEXT:  while creating return value
+PL/Python function "test_type_conversion_mdarray_malformed2"
+CREATE FUNCTION test_type_conversion_mdarray_malformed3() RETURNS text[] AS $$
+return ["abc", [1,2,3]]
+$$ LANGUAGE plpython3u;
+SELECT * FROM test_type_conversion_mdarray_malformed3();
+ERROR:  multidimensional arrays must have array expressions with matching dimensions
+CONTEXT:  while creating return value
+PL/Python function "test_type_conversion_mdarray_malformed3"
 CREATE FUNCTION test_type_conversion_mdarray_toodeep() RETURNS int[] AS $$
 return [[[[[[[1]]]]]]]
 $$ LANGUAGE plpython3u;
diff --git a/src/pl/plpython/plpy_typeio.c b/src/pl/plpython/plpy_typeio.c
index 864b5f1765..0bfd3059f2 100644
--- a/src/pl/plpython/plpy_typeio.c
+++ b/src/pl/plpython/plpy_typeio.c
@@ -54,9 +54,10 @@ static Datum PLyObject_ToTransform(PLyObToDatum *arg, PyObject *plrv,
                                    bool *isnull, bool inarray);
 static Datum PLySequence_ToArray(PLyObToDatum *arg, PyObject *plrv,
                                  bool *isnull, bool inarray);
-static void PLySequence_ToArray_recurse(PLyObToDatum *elm, PyObject *list,
-                                        int *dims, int ndim, int dim,
-                                        Datum *elems, bool *nulls, int *currelem);
+static void PLySequence_ToArray_recurse(PyObject *obj,
+                                        ArrayBuildState **astatep,
+                                        int *ndims, int *dims, int cur_depth,
+                                        PLyObToDatum *elm, Oid elmbasetype);

 /* conversion from Python objects to composite Datums */
 static Datum PLyUnicode_ToComposite(PLyObToDatum *arg, PyObject *string, bool inarray);
@@ -1126,23 +1127,16 @@ PLyObject_ToTransform(PLyObToDatum *arg, PyObject *plrv,


 /*
- * Convert Python sequence to SQL array.
+ * Convert Python sequence (or list of lists) to SQL array.
  */
 static Datum
 PLySequence_ToArray(PLyObToDatum *arg, PyObject *plrv,
                     bool *isnull, bool inarray)
 {
-    ArrayType  *array;
-    int            i;
-    Datum       *elems;
-    bool       *nulls;
-    int            len;
-    int            ndim;
+    ArrayBuildState *astate = NULL;
+    int            ndims = 1;
     int            dims[MAXDIM];
     int            lbs[MAXDIM];
-    int            currelem;
-    PyObject   *pyptr = plrv;
-    PyObject   *next;

     if (plrv == Py_None)
     {
@@ -1152,128 +1146,130 @@ PLySequence_ToArray(PLyObToDatum *arg, PyObject *plrv,
     *isnull = false;

     /*
-     * Determine the number of dimensions, and their sizes.
+     * For historical reasons, we allow any sequence (not only a list) at the
+     * top level when converting a Python object to a SQL array.  However, a
+     * multi-dimensional array is recognized only when the object contains
+     * true lists.
      */
-    ndim = 0;
-
-    Py_INCREF(plrv);
-
-    for (;;)
-    {
-        if (!PyList_Check(pyptr))
-            break;
-
-        if (ndim == MAXDIM)
-            ereport(ERROR,
-                    (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
-                     errmsg("number of array dimensions exceeds the maximum allowed (%d)",
-                            MAXDIM)));
-
-        dims[ndim] = PySequence_Length(pyptr);
-        if (dims[ndim] < 0)
-            PLy_elog(ERROR, "could not determine sequence length for function return value");
-
-        if (dims[ndim] == 0)
-        {
-            /* empty sequence */
-            break;
-        }
-
-        ndim++;
-
-        next = PySequence_GetItem(pyptr, 0);
-        Py_XDECREF(pyptr);
-        pyptr = next;
-    }
-    Py_XDECREF(pyptr);
-
-    /*
-     * Check for zero dimensions. This happens if the object is a tuple or a
-     * string, rather than a list, or is not a sequence at all. We don't map
-     * tuples or strings to arrays in general, but in the first level, be
-     * lenient, for historical reasons. So if the object is a sequence of any
-     * kind, treat it as a one-dimensional array.
-     */
-    if (ndim == 0)
-    {
-        if (!PySequence_Check(plrv))
-            ereport(ERROR,
-                    (errcode(ERRCODE_DATATYPE_MISMATCH),
-                     errmsg("return value of function with array return type is not a Python sequence")));
-
-        ndim = 1;
-        dims[0] = PySequence_Length(plrv);
-    }
+    if (!PySequence_Check(plrv))
+        ereport(ERROR,
+                (errcode(ERRCODE_DATATYPE_MISMATCH),
+                 errmsg("return value of function with array return type is not a Python sequence")));

-    /* Allocate space for work arrays, after detecting array size overflow */
-    len = ArrayGetNItems(ndim, dims);
-    elems = palloc(sizeof(Datum) * len);
-    nulls = palloc(sizeof(bool) * len);
+    /* Initialize dimensionality info with first-level dimension */
+    memset(dims, 0, sizeof(dims));
+    dims[0] = PySequence_Length(plrv);

     /*
      * Traverse the Python lists, in depth-first order, and collect all the
-     * elements at the bottom level into 'elems'/'nulls' arrays.
+     * elements at the bottom level into an ArrayBuildState.
      */
-    currelem = 0;
-    PLySequence_ToArray_recurse(arg->u.array.elm, plrv,
-                                dims, ndim, 0,
-                                elems, nulls, &currelem);
+    PLySequence_ToArray_recurse(plrv, &astate,
+                                &ndims, dims, 1,
+                                arg->u.array.elm,
+                                arg->u.array.elmbasetype);
+
+    /* ensure we get zero-D array for no inputs, as per PG convention */
+    if (astate == NULL)
+        return PointerGetDatum(construct_empty_array(arg->u.array.elmbasetype));

-    for (i = 0; i < ndim; i++)
+    for (int i = 0; i < ndims; i++)
         lbs[i] = 1;

-    array = construct_md_array(elems,
-                               nulls,
-                               ndim,
-                               dims,
-                               lbs,
-                               arg->u.array.elmbasetype,
-                               arg->u.array.elm->typlen,
-                               arg->u.array.elm->typbyval,
-                               arg->u.array.elm->typalign);
-
-    return PointerGetDatum(array);
+    return makeMdArrayResult(astate, ndims, dims, lbs,
+                             CurrentMemoryContext, true);
 }

 /*
  * Helper function for PLySequence_ToArray. Traverse a Python list of lists in
- * depth-first order, storing the elements in 'elems'.
+ * depth-first order, storing the elements in *astatep.
+ *
+ * The ArrayBuildState is created only when we first find a scalar element;
+ * if we didn't do it like that, we'd need some other convention for knowing
+ * whether we'd already found any scalars (and thus the number of dimensions
+ * is frozen).
  */
 static void
-PLySequence_ToArray_recurse(PLyObToDatum *elm, PyObject *list,
-                            int *dims, int ndim, int dim,
-                            Datum *elems, bool *nulls, int *currelem)
+PLySequence_ToArray_recurse(PyObject *obj, ArrayBuildState **astatep,
+                            int *ndims, int *dims, int cur_depth,
+                            PLyObToDatum *elm, Oid elmbasetype)
 {
     int            i;
+    int            len = PySequence_Length(obj);

-    if (PySequence_Length(list) != dims[dim])
-        ereport(ERROR,
-                (errcode(ERRCODE_ARRAY_SUBSCRIPT_ERROR),
-                 errmsg("wrong length of inner sequence: has length %d, but %d was expected",
-                        (int) PySequence_Length(list), dims[dim]),
-                 (errdetail("To construct a multidimensional array, the inner sequences must all have the same
length."))));
+    /* We should not get here with a non-sequence object */
+    if (len < 0)
+        PLy_elog(ERROR, "could not determine sequence length for function return value");

-    if (dim < ndim - 1)
+    for (i = 0; i < len; i++)
     {
-        for (i = 0; i < dims[dim]; i++)
-        {
-            PyObject   *sublist = PySequence_GetItem(list, i);
+        /* fetch the array element */
+        PyObject   *subobj = PySequence_GetItem(obj, i);

-            PLySequence_ToArray_recurse(elm, sublist, dims, ndim, dim + 1,
-                                        elems, nulls, currelem);
-            Py_XDECREF(sublist);
+        /* need PG_TRY to ensure we release the subobj's refcount */
+        PG_TRY();
+        {
+            /* multi-dimensional array? */
+            if (PyList_Check(subobj))
+            {
+                /* set size when at first element in this level, else compare */
+                if (i == 0 && *ndims == cur_depth)
+                {
+                    /* array after some scalars at same level? */
+                    if (*astatep != NULL)
+                        ereport(ERROR,
+                                (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+                                 errmsg("multidimensional arrays must have array expressions with matching
dimensions")));
+                    /* too many dimensions? */
+                    if (cur_depth + 1 > MAXDIM)
+                        ereport(ERROR,
+                                (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+                                 errmsg("number of array dimensions exceeds the maximum allowed (%d)",
+                                        MAXDIM)));
+                    /* OK, add a dimension */
+                    dims[*ndims] = PySequence_Length(subobj);
+                    (*ndims)++;
+                }
+                else if (cur_depth >= *ndims ||
+                         PySequence_Length(subobj) != dims[cur_depth])
+                    ereport(ERROR,
+                            (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+                             errmsg("multidimensional arrays must have array expressions with matching dimensions")));
+
+                /* recurse to fetch elements of this sub-array */
+                PLySequence_ToArray_recurse(subobj, astatep,
+                                            ndims, dims, cur_depth + 1,
+                                            elm, elmbasetype);
+            }
+            else
+            {
+                Datum        dat;
+                bool        isnull;
+
+                /* scalar after some sub-arrays at same level? */
+                if (*ndims != cur_depth)
+                    ereport(ERROR,
+                            (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+                             errmsg("multidimensional arrays must have array expressions with matching dimensions")));
+
+                /* convert non-list object to Datum */
+                dat = elm->func(elm, subobj, &isnull, true);
+
+                /* create ArrayBuildState if we didn't already */
+                if (*astatep == NULL)
+                    *astatep = initArrayResult(elmbasetype,
+                                               CurrentMemoryContext, true);
+
+                /* ... and save the element value in it */
+                (void) accumArrayResult(*astatep, dat, isnull,
+                                        elmbasetype, CurrentMemoryContext);
+            }
         }
-    }
-    else
-    {
-        for (i = 0; i < dims[dim]; i++)
+        PG_FINALLY();
         {
-            PyObject   *obj = PySequence_GetItem(list, i);
-
-            elems[*currelem] = elm->func(elm, obj, &nulls[*currelem], true);
-            Py_XDECREF(obj);
-            (*currelem)++;
+            Py_XDECREF(subobj);
         }
+        PG_END_TRY();
     }
 }

diff --git a/src/pl/plpython/sql/plpython_types.sql b/src/pl/plpython/sql/plpython_types.sql
index 9702a10a72..0985a9cca2 100644
--- a/src/pl/plpython/sql/plpython_types.sql
+++ b/src/pl/plpython/sql/plpython_types.sql
@@ -328,11 +328,43 @@ $$ LANGUAGE plpython3u;

 SELECT * FROM test_type_conversion_array_mixed2();

-CREATE FUNCTION test_type_conversion_array_mixed3() RETURNS text[] AS $$
-return [[], 'a']
+
+-- check output of multi-dimensional arrays
+CREATE FUNCTION test_type_conversion_md_array_out() RETURNS text[] AS $$
+return [['a'], ['b'], ['c']]
+$$ LANGUAGE plpython3u;
+
+select test_type_conversion_md_array_out();
+
+CREATE OR REPLACE FUNCTION test_type_conversion_md_array_out() RETURNS text[] AS $$
+return [[], []]
+$$ LANGUAGE plpython3u;
+
+select test_type_conversion_md_array_out();
+
+CREATE OR REPLACE FUNCTION test_type_conversion_md_array_out() RETURNS text[] AS $$
+return [[], [1]]
+$$ LANGUAGE plpython3u;
+
+select test_type_conversion_md_array_out();  -- fail
+
+CREATE OR REPLACE FUNCTION test_type_conversion_md_array_out() RETURNS text[] AS $$
+return [[], 1]
+$$ LANGUAGE plpython3u;
+
+select test_type_conversion_md_array_out();  -- fail
+
+CREATE OR REPLACE FUNCTION test_type_conversion_md_array_out() RETURNS text[] AS $$
+return [1, []]
+$$ LANGUAGE plpython3u;
+
+select test_type_conversion_md_array_out();  -- fail
+
+CREATE OR REPLACE FUNCTION test_type_conversion_md_array_out() RETURNS text[] AS $$
+return [[1], [[]]]
 $$ LANGUAGE plpython3u;

-SELECT * FROM test_type_conversion_array_mixed3();
+select test_type_conversion_md_array_out();  -- fail


 CREATE FUNCTION test_type_conversion_mdarray_malformed() RETURNS int[] AS $$
@@ -341,6 +373,18 @@ $$ LANGUAGE plpython3u;

 SELECT * FROM test_type_conversion_mdarray_malformed();

+CREATE FUNCTION test_type_conversion_mdarray_malformed2() RETURNS text[] AS $$
+return [[1,2,3], "abc"]
+$$ LANGUAGE plpython3u;
+
+SELECT * FROM test_type_conversion_mdarray_malformed2();
+
+CREATE FUNCTION test_type_conversion_mdarray_malformed3() RETURNS text[] AS $$
+return ["abc", [1,2,3]]
+$$ LANGUAGE plpython3u;
+
+SELECT * FROM test_type_conversion_mdarray_malformed3();
+
 CREATE FUNCTION test_type_conversion_mdarray_toodeep() RETURNS int[] AS $$
 return [[[[[[[1]]]]]]]
 $$ LANGUAGE plpython3u;

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Alexander Lakhin
Дата:
Сообщение: Re: BUG #17798: Incorrect memory access occurs when using BEFORE ROW UPDATE trigger
Следующее
От: Peter Geoghegan
Дата:
Сообщение: Re: BUG #17798: Incorrect memory access occurs when using BEFORE ROW UPDATE trigger