Обсуждение: BUG #17912: Invalid memory access when converting plpython' array containing empty array

Поиск
Список
Период
Сортировка

BUG #17912: Invalid memory access when converting plpython' array containing empty array

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      17912
Logged by:          Alexander Lakhin
Email address:      exclusion@gmail.com
PostgreSQL version: 15.2
Operating system:   Ubuntu 22.04
Description:

When the following query executed:
CREATE EXTENSION plpython3u;
CREATE OR REPLACE FUNCTION test() RETURNS text[] AS $$
return [[], "a"]
$$ LANGUAGE plpython3u;
SELECT test();

valgrind detects an incorrect memory access:
==00:00:00:05.073 1489859== Invalid write of size 1
==00:00:00:05.073 1489859==    at 0x4878C38: PLyObject_ToScalar
(plpy_typeio.c:1083)
==00:00:00:05.073 1489859==    by 0x4877267: PLySequence_ToArray_recurse
(plpy_typeio.c:1282)
==00:00:00:05.073 1489859==    by 0x48776AF: PLySequence_ToArray
(plpy_typeio.c:1227)
==00:00:00:05.073 1489859==    by 0x4877E9C: PLy_output_convert
(plpy_typeio.c:122)
==00:00:00:05.073 1489859==    by 0x487101E: PLy_exec_function
(plpy_exec.c:235)
==00:00:00:05.073 1489859==    by 0x487201B: plpython3_call_handler
(plpy_main.c:247)
==00:00:00:05.073 1489859==    by 0x401A95: ExecInterpExpr
(execExprInterp.c:727)
==00:00:00:05.073 1489859==    by 0x3FE2A6: ExecInterpExprStillValid
(execExprInterp.c:1826)
==00:00:00:05.073 1489859==    by 0x440563: ExecEvalExprSwitchContext
(executor.h:341)
==00:00:00:05.073 1489859==    by 0x440563: ExecProject (executor.h:375)
==00:00:00:05.073 1489859==    by 0x440563: ExecResult (nodeResult.c:136)
==00:00:00:05.073 1489859==    by 0x40EBA2: ExecProcNodeFirst
(execProcnode.c:464)
==00:00:00:05.073 1489859==    by 0x407196: ExecProcNode (executor.h:259)
==00:00:00:05.073 1489859==    by 0x407196: ExecutePlan (execMain.c:1636)
==00:00:00:05.073 1489859==    by 0x407376: standard_ExecutorRun
(execMain.c:363)
==00:00:00:05.073 1489859==  Address 0x112e9340 is 320 bytes inside a block
of size 8,192 alloc'd
==00:00:00:05.073 1489859==    at 0x4848899: malloc
(vg_replace_malloc.c:381)
==00:00:00:05.073 1489859==    by 0x73ACFA: AllocSetContextCreateInternal
(aset.c:469)
==00:00:00:05.073 1489859==    by 0x415DFF: CreateExprContextInternal
(execUtils.c:259)
==00:00:00:05.073 1489859==    by 0x41623E: CreateExprContext
(execUtils.c:309)
==00:00:00:05.073 1489859==    by 0x41648A: ExecAssignExprContext
(execUtils.c:488)
==00:00:00:05.073 1489859==    by 0x44075F: ExecInitResult
(nodeResult.c:205)
==00:00:00:05.073 1489859==    by 0x40ED32: ExecInitNode
(execProcnode.c:167)
==00:00:00:05.073 1489859==    by 0x407AA9: InitPlan (execMain.c:938)
==00:00:00:05.073 1489859==    by 0x407C85: standard_ExecutorStart
(execMain.c:265)
==00:00:00:05.073 1489859==    by 0x407DDD: ExecutorStart (execMain.c:144)
==00:00:00:05.073 1489859==    by 0x5C6723: PortalStart (pquery.c:517)
==00:00:00:05.073 1489859==    by 0x5C32DF: exec_simple_query
(postgres.c:1211)

Without valgrind, but with asserts enabled, I get:
WARNING:  problem in alloc set ExprContext: detected write past chunk end in
block 0x562777dfbeb0, chunk 0x562777dfbed8
WARNING:  problem in alloc set ExprContext: req size > alloc size for chunk
0x562777dfbef0 in block 0x562777dfbeb0
  test  
--------
 {[],a}
(1 row)

When the function returns '["a", []]', I see no anomalies.

As I can see, for the first case we get len = 0 in PLySequence_ToArray();
elems, nulls palloc'ed with zero elements, but PLyObject_ToScalar() tries
to
write a value into nulls[0]...

Reproduced on REL_11_STABLE..master.


Re: BUG #17912: Invalid memory access when converting plpython' array containing empty array

От
Tom Lane
Дата:
PG Bug reporting form <noreply@postgresql.org> writes:
> CREATE EXTENSION plpython3u;
> CREATE OR REPLACE FUNCTION test() RETURNS text[] AS $$
> return [[], "a"]
> $$ LANGUAGE plpython3u;
> SELECT test();

> As I can see, for the first case we get len = 0 in PLySequence_ToArray();
> elems, nulls palloc'ed with zero elements, but PLyObject_ToScalar() tries
> to write a value into nulls[0]...

Yeah.  The calculation of the array size is being done in the wrong
place, so that we may update len to zero before breaking out of the
loop.  But really it's poor coding for this function to be doing
its own calculation of the array size at all, rather than consulting
the authoritative ArrayGetNItems().  I think we need something like
the attached.

            regards, tom lane

diff --git a/src/pl/plpython/plpy_typeio.c b/src/pl/plpython/plpy_typeio.c
index 7018c9d404..864b5f1765 100644
--- a/src/pl/plpython/plpy_typeio.c
+++ b/src/pl/plpython/plpy_typeio.c
@@ -1136,7 +1136,7 @@ PLySequence_ToArray(PLyObToDatum *arg, PyObject *plrv,
     int            i;
     Datum       *elems;
     bool       *nulls;
-    int64        len;
+    int            len;
     int            ndim;
     int            dims[MAXDIM];
     int            lbs[MAXDIM];
@@ -1155,7 +1155,6 @@ PLySequence_ToArray(PLyObToDatum *arg, PyObject *plrv,
      * Determine the number of dimensions, and their sizes.
      */
     ndim = 0;
-    len = 1;

     Py_INCREF(plrv);

@@ -1174,17 +1173,6 @@ PLySequence_ToArray(PLyObToDatum *arg, PyObject *plrv,
         if (dims[ndim] < 0)
             PLy_elog(ERROR, "could not determine sequence length for function return value");

-        if (dims[ndim] > MaxAllocSize)
-            ereport(ERROR,
-                    (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
-                     errmsg("array size exceeds the maximum allowed")));
-
-        len *= dims[ndim];
-        if (len > MaxAllocSize)
-            ereport(ERROR,
-                    (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
-                     errmsg("array size exceeds the maximum allowed")));
-
         if (dims[ndim] == 0)
         {
             /* empty sequence */
@@ -1214,15 +1202,18 @@ PLySequence_ToArray(PLyObToDatum *arg, PyObject *plrv,
                      errmsg("return value of function with array return type is not a Python sequence")));

         ndim = 1;
-        len = dims[0] = PySequence_Length(plrv);
+        dims[0] = PySequence_Length(plrv);
     }

+    /* Allocate space for work arrays, after detecting array size overflow */
+    len = ArrayGetNItems(ndim, dims);
+    elems = palloc(sizeof(Datum) * len);
+    nulls = palloc(sizeof(bool) * len);
+
     /*
      * Traverse the Python lists, in depth-first order, and collect all the
      * elements at the bottom level into 'elems'/'nulls' arrays.
      */
-    elems = palloc(sizeof(Datum) * len);
-    nulls = palloc(sizeof(bool) * len);
     currelem = 0;
     PLySequence_ToArray_recurse(arg->u.array.elm, plrv,
                                 dims, ndim, 0,

Re: BUG #17912: Invalid memory access when converting plpython' array containing empty array

От
Alexander Lakhin
Дата:
Hello Tom,

28.04.2023 18:14, Tom Lane wrote:
> PG Bug reporting form <noreply@postgresql.org> writes:
>> CREATE EXTENSION plpython3u;
>> CREATE OR REPLACE FUNCTION test() RETURNS text[] AS $$
>> return [[], "a"]
>> $$ LANGUAGE plpython3u;
>> SELECT test();
>> As I can see, for the first case we get len = 0 in PLySequence_ToArray();
>> elems, nulls palloc'ed with zero elements, but PLyObject_ToScalar() tries
>> to write a value into nulls[0]...
> Yeah.  The calculation of the array size is being done in the wrong
> place, so that we may update len to zero before breaking out of the
> loop.  But really it's poor coding for this function to be doing
> its own calculation of the array size at all, rather than consulting
> the authoritative ArrayGetNItems().  I think we need something like
> the attached.

Thank you! I've read the message of the commit you just pushed,
and I confirm that there are another oddities in that area. For example:
CREATE OR REPLACE FUNCTION test() RETURNS text[] AS $$
return [1, [2]]
$$ LANGUAGE plpython3u;
SELECT test();
---------
  {1,[2]}

But:
CREATE OR REPLACE FUNCTION test() RETURNS text[] AS $$
return [[1], 2]
$$ LANGUAGE plpython3u;
SELECT test();
ERROR:  wrong length of inner sequence: has length -1, but 1 was expected
DETAIL:  To construct a multidimensional array, the inner sequences must all have the same length.
CONTEXT:  while creating return value
PL/Python function "test"

Or:
CREATE OR REPLACE FUNCTION test() RETURNS text[] AS $$
return [["1"], "abc"]
$$ LANGUAGE plpython3u;
SELECT test();
ERROR:  wrong length of inner sequence: has length 3, but 1 was expected
DETAIL:  To construct a multidimensional array, the inner sequences must all have the same length.
CONTEXT:  while creating return value
PL/Python function "test"

Though may be it's okay, considering python's concepts of
strings/arrays/sequences, or at least deserves a separate discussion.

Best regards,
Alexander



Re: BUG #17912: Invalid memory access when converting plpython' array containing empty array

От
Tom Lane
Дата:
Alexander Lakhin <exclusion@gmail.com> writes:
> Thank you! I've read the message of the commit you just pushed,
> and I confirm that there are another oddities in that area. For example:

> CREATE OR REPLACE FUNCTION test() RETURNS text[] AS $$
> return [[1], 2]
> $$ LANGUAGE plpython3u;
> SELECT test();
> ERROR:  wrong length of inner sequence: has length -1, but 1 was expected
> DETAIL:  To construct a multidimensional array, the inner sequences must all have the same length.

Yeah.  AFAICT, the idea of the existing code is to descend through
the first item at each nest level till we hit a non-list, then take
the lengths of those lists as the array dimensions, and then complain
if we find any later items that don't fit those dimensions.  That leads
to some symmetry problems, in that the error you get for inconsistent
sequence lengths depends on the order in which the items are presented.

Also, it seems like there is something odd here:

regression=# CREATE OR REPLACE FUNCTION test() RETURNS text[] AS $$
regression$# return ["abc"]
regression$# $$ LANGUAGE plpython3u;
CREATE FUNCTION
regression=# select test();
 test
-------
 {abc}
(1 row)

regression=# CREATE OR REPLACE FUNCTION test() RETURNS text[] AS $$
return [[1,2,3], "abc"]
$$ LANGUAGE plpython3u;
CREATE FUNCTION
regression=# select test();
       test
-------------------
 {{1,2,3},{a,b,c}}
(1 row)

I think it's weird that "abc" is taken as a scalar in the first
case and a list in the second case.  Even worse,

regression=# CREATE OR REPLACE FUNCTION test() RETURNS text[] AS $$
return ["abc", [1,2,3]]
$$ LANGUAGE plpython3u;
CREATE FUNCTION
regression=# select test();
       test
-------------------
 {abc,"[1, 2, 3]"}
(1 row)

This might be something that used to work more sanely with
Python 2 and got broken in Python 3 for the same reasons
discussed in bug #17908.  I've not poked at it more though.
I don't think I have a Python 2 installation to test with
anymore :-(

            regards, tom lane



Re: BUG #17912: Invalid memory access when converting plpython' array containing empty array

От
Tom Lane
Дата:
I wrote:
> Yeah.  AFAICT, the idea of the existing code is to descend through
> the first item at each nest level till we hit a non-list, then take
> the lengths of those lists as the array dimensions, and then complain
> if we find any later items that don't fit those dimensions.  That leads
> to some symmetry problems, in that the error you get for inconsistent
> sequence lengths depends on the order in which the items are presented.

The real problem here is that we don't check that the list nesting
depth is the same throughout the array: if lists are more deeply
nested in later elements, we'll treat those sub-lists as scalars,
leading to inconsistent results.  Conversely, a less-deeply-nested
list structure in a later element might still work, if it can be
treated as a sequence.  I think the second and third examples
I gave should both throw errors.

I also notice that the error messages in this area speak of "sequences",
but it is more correct to call them "lists", because Python draws a
distinction.  (All lists are sequences, but not vice versa, eg a
string is a sequence but not a list.)

So I'm thinking about the attached.  I do not propose this for
back-patching, because it could break applications that work today.
But it seems like good tightening-up for HEAD, or maybe we should
wait for v17 at this point?

            regards, tom lane

diff --git a/src/pl/plpython/expected/plpython_types.out b/src/pl/plpython/expected/plpython_types.out
index c4e749a5a8..f477e239bc 100644
--- a/src/pl/plpython/expected/plpython_types.out
+++ b/src/pl/plpython/expected/plpython_types.out
@@ -700,10 +700,26 @@ CREATE FUNCTION test_type_conversion_mdarray_malformed() RETURNS int[] AS $$
 return [[1,2,3],[4,5]]
 $$ LANGUAGE plpython3u;
 SELECT * FROM test_type_conversion_mdarray_malformed();
-ERROR:  wrong length of inner sequence: has length 2, but 3 was expected
-DETAIL:  To construct a multidimensional array, the inner sequences must all have the same length.
+ERROR:  wrong length of inner list: has length 2, but 3 was expected
+DETAIL:  To construct a multidimensional array, the inner lists must all have the same length.
 CONTEXT:  while creating return value
 PL/Python function "test_type_conversion_mdarray_malformed"
+CREATE FUNCTION test_type_conversion_mdarray_malformed2() RETURNS text[] AS $$
+return [[1,2,3], "abc"]
+$$ LANGUAGE plpython3u;
+SELECT * FROM test_type_conversion_mdarray_malformed2();
+ERROR:  array element is a scalar, but should be a list
+DETAIL:  To construct a multidimensional array, the inner lists must be nested to a uniform depth.
+CONTEXT:  while creating return value
+PL/Python function "test_type_conversion_mdarray_malformed2"
+CREATE FUNCTION test_type_conversion_mdarray_malformed3() RETURNS text[] AS $$
+return ["abc", [1,2,3]]
+$$ LANGUAGE plpython3u;
+SELECT * FROM test_type_conversion_mdarray_malformed3();
+ERROR:  array element is a list, but should be a scalar
+DETAIL:  To construct a multidimensional array, the inner lists must be nested to a uniform depth.
+CONTEXT:  while creating return value
+PL/Python function "test_type_conversion_mdarray_malformed3"
 CREATE FUNCTION test_type_conversion_mdarray_toodeep() RETURNS int[] AS $$
 return [[[[[[[1]]]]]]]
 $$ LANGUAGE plpython3u;
diff --git a/src/pl/plpython/plpy_typeio.c b/src/pl/plpython/plpy_typeio.c
index 864b5f1765..eddedec3d4 100644
--- a/src/pl/plpython/plpy_typeio.c
+++ b/src/pl/plpython/plpy_typeio.c
@@ -1126,7 +1126,7 @@ PLyObject_ToTransform(PLyObToDatum *arg, PyObject *plrv,


 /*
- * Convert Python sequence to SQL array.
+ * Convert Python sequence (or list of lists) to SQL array.
  */
 static Datum
 PLySequence_ToArray(PLyObToDatum *arg, PyObject *plrv,
@@ -1205,8 +1205,15 @@ PLySequence_ToArray(PLyObToDatum *arg, PyObject *plrv,
         dims[0] = PySequence_Length(plrv);
     }

-    /* Allocate space for work arrays, after detecting array size overflow */
     len = ArrayGetNItems(ndim, dims);
+    if (len == 0)
+    {
+        /* If no elements, ensure we return a zero-D array */
+        array = construct_empty_array(arg->u.array.elmbasetype);
+        return PointerGetDatum(array);
+    }
+
+    /* Allocate space for work arrays, after detecting array size overflow */
     elems = palloc(sizeof(Datum) * len);
     nulls = palloc(sizeof(bool) * len);

@@ -1246,35 +1253,55 @@ PLySequence_ToArray_recurse(PLyObToDatum *elm, PyObject *list,
 {
     int            i;

-    if (PySequence_Length(list) != dims[dim])
-        ereport(ERROR,
-                (errcode(ERRCODE_ARRAY_SUBSCRIPT_ERROR),
-                 errmsg("wrong length of inner sequence: has length %d, but %d was expected",
-                        (int) PySequence_Length(list), dims[dim]),
-                 (errdetail("To construct a multidimensional array, the inner sequences must all have the same
length."))));
-
-    if (dim < ndim - 1)
+    /*
+     * The list nesting depth in all array items must match what
+     * PLySequence_ToArray saw in the first item.  At the first dimension, we
+     * might have a non-list sequence, but recurse anyway.
+     */
+    if (dim == 0 || PyList_Check(list))
     {
-        for (i = 0; i < dims[dim]; i++)
-        {
-            PyObject   *sublist = PySequence_GetItem(list, i);
+        int            thisdim = PySequence_Length(list);

-            PLySequence_ToArray_recurse(elm, sublist, dims, ndim, dim + 1,
-                                        elems, nulls, currelem);
-            Py_XDECREF(sublist);
-        }
-    }
-    else
-    {
-        for (i = 0; i < dims[dim]; i++)
+        if (thisdim < 0)
+            PLy_elog(ERROR, "could not determine sequence length for function return value");
+
+        /* like PLySequence_ToArray, we treat 0-length list as a scalar */
+        if (thisdim > 0)
         {
-            PyObject   *obj = PySequence_GetItem(list, i);
+            /* Check for uniform array structure */
+            if (dim >= ndim)
+                ereport(ERROR,
+                        (errcode(ERRCODE_ARRAY_SUBSCRIPT_ERROR),
+                         errmsg("array element is a list, but should be a scalar"),
+                         (errdetail("To construct a multidimensional array, the inner lists must be nested to a
uniformdepth.")))); 
+            if (thisdim != dims[dim])
+                ereport(ERROR,
+                        (errcode(ERRCODE_ARRAY_SUBSCRIPT_ERROR),
+                         errmsg("wrong length of inner list: has length %d, but %d was expected",
+                                thisdim, dims[dim]),
+                         (errdetail("To construct a multidimensional array, the inner lists must all have the same
length."))));
+            /* OK, so recurse */
+            for (i = 0; i < thisdim; i++)
+            {
+                PyObject   *sublist = PySequence_GetItem(list, i);

-            elems[*currelem] = elm->func(elm, obj, &nulls[*currelem], true);
-            Py_XDECREF(obj);
-            (*currelem)++;
+                PLySequence_ToArray_recurse(elm, sublist, dims, ndim, dim + 1,
+                                            elems, nulls, currelem);
+                Py_XDECREF(sublist);
+            }
+            return;
         }
     }
+
+    /* The "list" is, or should be treated as, a scalar */
+    if (dim != ndim)
+        ereport(ERROR,
+                (errcode(ERRCODE_ARRAY_SUBSCRIPT_ERROR),
+                 errmsg("array element is a scalar, but should be a list"),
+                 (errdetail("To construct a multidimensional array, the inner lists must be nested to a uniform
depth."))));
+
+    elems[*currelem] = elm->func(elm, list, &nulls[*currelem], true);
+    (*currelem)++;
 }


diff --git a/src/pl/plpython/sql/plpython_types.sql b/src/pl/plpython/sql/plpython_types.sql
index 9702a10a72..b57b9a6463 100644
--- a/src/pl/plpython/sql/plpython_types.sql
+++ b/src/pl/plpython/sql/plpython_types.sql
@@ -341,6 +341,18 @@ $$ LANGUAGE plpython3u;

 SELECT * FROM test_type_conversion_mdarray_malformed();

+CREATE FUNCTION test_type_conversion_mdarray_malformed2() RETURNS text[] AS $$
+return [[1,2,3], "abc"]
+$$ LANGUAGE plpython3u;
+
+SELECT * FROM test_type_conversion_mdarray_malformed2();
+
+CREATE FUNCTION test_type_conversion_mdarray_malformed3() RETURNS text[] AS $$
+return ["abc", [1,2,3]]
+$$ LANGUAGE plpython3u;
+
+SELECT * FROM test_type_conversion_mdarray_malformed3();
+
 CREATE FUNCTION test_type_conversion_mdarray_toodeep() RETURNS int[] AS $$
 return [[[[[[[1]]]]]]]
 $$ LANGUAGE plpython3u;

Re: BUG #17912: Invalid memory access when converting plpython' array containing empty array

От
Alexander Lakhin
Дата:
28.04.2023 22:17, Tom Lane wrote:
> The real problem here is that we don't check that the list nesting
> depth is the same throughout the array: if lists are more deeply
> nested in later elements, we'll treat those sub-lists as scalars,
> leading to inconsistent results.  Conversely, a less-deeply-nested
> list structure in a later element might still work, if it can be
> treated as a sequence.  I think the second and third examples
> I gave should both throw errors.
>
> I also notice that the error messages in this area speak of "sequences",
> but it is more correct to call them "lists", because Python draws a
> distinction.  (All lists are sequences, but not vice versa, eg a
> string is a sequence but not a list.)
>
> So I'm thinking about the attached.

Thanks for fixing this!
Now python's handling of arrays is much nicer and is aligned with the plperl's behaviour:
CREATE FUNCTION test_pl() RETURNS text[] AS $$ return [1, [2, 3]]; $$ LANGUAGE plperl;
SELECT * FROM test_pl();
ERROR:  multidimensional arrays must have array expressions with matching dimensions
CONTEXT:  PL/Perl function "test_pl"

I observed another light-hearted case (without the patch, of course):
CREATE FUNCTION test_py() RETURNS text[] AS $$ return [1, [2, 3]]; $$ LANGUAGE plpython3u;
SELECT * FROM test_py();
  {1,"[2, 3]"}

So the patch looks more like a bug fix.

Though I still see some discrepancy between plperl and plpython:
CREATE OR REPLACE FUNCTION test_pl() RETURNS text[] AS $$ return [[],1]; $$ LANGUAGE plperl;
SELECT * FROM test_pl();
ERROR:  multidimensional arrays must have array expressions with matching dimensions
CONTEXT:  PL/Perl function "test_pl"
vs
CREATE OR REPLACE FUNCTION test_py() RETURNS text[] AS $$ return [[],1]; $$ LANGUAGE plpython3u;
SELECT * FROM test_py();
  {[],1}

It seems that [] was recognized as "[]" here.

While playing with plperl, I found that it handles arrays not perfectly too:
CREATE OR REPLACE FUNCTION test_pl() RETURNS text[] AS $$return [1, []];$$ LANGUAGE plperl;
SELECT * FROM test_pl();
triggers:
==00:00:08:45.537 2325687== Conditional jump or move depends on uninitialised value(s)
==00:00:08:45.537 2325687==    at 0x61FEDC: construct_md_array (arrayfuncs.c:3500)
==00:00:08:45.537 2325687==    by 0x625917: makeMdArrayResult (arrayfuncs.c:5428)
==00:00:08:45.537 2325687==    by 0x486DEF3: plperl_array_to_datum (plperl.c:1278)
==00:00:08:45.537 2325687==    by 0x486D8E8: plperl_sv_to_datum (plperl.c:1347)
==00:00:08:45.537 2325687==    by 0x4872FA6: plperl_func_handler (plperl.c:2483)
==00:00:08:45.537 2325687==    by 0x4873CE5: plperl_call_handler (plperl.c:1858)
==00:00:08:45.537 2325687==    by 0x41406F: ExecMakeTableFunctionResult (execSRF.c:235)
==00:00:08:45.538 2325687==    by 0x426F1C: FunctionNext (nodeFunctionscan.c:95)
==00:00:08:45.538 2325687==    by 0x414AA7: ExecScanFetch (execScan.c:133)
==00:00:08:45.538 2325687==    by 0x414B42: ExecScan (execScan.c:182)
==00:00:08:45.538 2325687==    by 0x426E2E: ExecFunctionScan (nodeFunctionscan.c:270)
==00:00:08:45.538 2325687==    by 0x4117F1: ExecProcNodeFirst (execProcnode.c:464)
==00:00:08:45.538 2325687==

Here, nelems = 2 (from ArrayGetNItems(ndims, dims)), but
array_to_datum_internal() generated only one datum.

And yet another case:
CREATE OR REPLACE FUNCTION test_py() RETURNS text[] AS $$return [[1],[[]]];$$ LANGUAGE plpython3u;
regression=# SELECT * FROM test_py();
  {{1},{[]}}
vs
CREATE OR REPLACE FUNCTION test_pl() RETURNS text[] AS $$return [[1],[[]]];$$ LANGUAGE plperl;
SELECT * FROM test_pl();
  {}

> I do not propose this for
> back-patching, because it could break applications that work today.
> But it seems like good tightening-up for HEAD, or maybe we should
> wait for v17 at this point?

I suppose that waiting for v17 is preferable if the patch is considered as
bringing a new feature (it's not the case) or could require extra time to
stabilize. But I'm afraid that anomalies, that would require additional
fixes for the change stabilization, could be related to the existing
code, and thus that extra time will be invested in improving v16- too.

Best regards,
Alexander



Re: BUG #17912: Invalid memory access when converting plpython' array containing empty array

От
Tom Lane
Дата:
Alexander Lakhin <exclusion@gmail.com> writes:
> Thanks for fixing this!
> Now python's handling of arrays is much nicer and is aligned with the plperl's behaviour:

Well, mostly.  As you noticed, it's a bit weird about zero-length
sub-lists, treating those as scalars.  I had been intending to keep
the existing behavior there, but now that I see that plperl does it
the other way (that is, you end up with an overall empty output array)
I think we ought to make plpython do likewise.

> While playing with plperl, I found that it handles arrays not perfectly too:
> CREATE OR REPLACE FUNCTION test_pl() RETURNS text[] AS $$return [1, []];$$ LANGUAGE plperl;
> SELECT * FROM test_pl();

Ugh.  I pushed a fix for that.

On the whole, I like plperl's implementation of this better, that is
it scans the data structure just once and uses an ArrayBuildState to
accumulate the datums.  So now I'm thinking about throwing out
the code in PLySequence_ToArray[_recurse] altogether and borrowing
plperl's logic.  I think that would make it easier to deal with
zero-length sublists correctly.  Haven't written the patch yet though.

> I suppose that waiting for v17 is preferable if the patch is considered as
> bringing a new feature (it's not the case) or could require extra time to
> stabilize. But I'm afraid that anomalies, that would require additional
> fixes for the change stabilization, could be related to the existing
> code, and thus that extra time will be invested in improving v16- too.

I'm a little uncomfortable with changing the semantics of non-failing
cases in the back branches.

            regards, tom lane



Re: BUG #17912: Invalid memory access when converting plpython' array containing empty array

От
Alexander Lakhin
Дата:
29.04.2023 20:16, Tom Lane wrote:
>> I suppose that waiting for v17 is preferable if the patch is considered as
>> bringing a new feature (it's not the case) or could require extra time to
>> stabilize. But I'm afraid that anomalies, that would require additional
>> fixes for the change stabilization, could be related to the existing
>> code, and thus that extra time will be invested in improving v16- too.
> I'm a little uncomfortable with changing the semantics of non-failing
> cases in the back branches.

I agree that we shouldn't introduce a new behavior in the released branches.
For that moment I was thinking about the choice between v16 and v17
for that patch to be committed to.
But as you see the better solution now, the patch will be different and more
extensive, I suppose, so I'd vote for postponing it for v17.

Best regards,
Alexander



Re: BUG #17912: Invalid memory access when converting plpython' array containing empty array

От
Tom Lane
Дата:
Alexander Lakhin <exclusion@gmail.com> writes:
> 29.04.2023 20:16, Tom Lane wrote:
>> I'm a little uncomfortable with changing the semantics of non-failing
>> cases in the back branches.

> I agree that we shouldn't introduce a new behavior in the released branches.
> For that moment I was thinking about the choice between v16 and v17
> for that patch to be committed to.
> But as you see the better solution now, the patch will be different and more
> extensive, I suppose, so I'd vote for postponing it for v17.

Here's a version that adopts plperl's logic, causing it to treat
empty sub-lists as being zero-length dimensions.  Most of the new
test cases are borrowed from plperl, too, and most of them act
differently before and after the code change.  So I'm pretty
hesitant to put this into stable branches.  OTOH, maybe it's not
too late for v16?

I noticed one inarguable bug here, too: PLySequence_ToArray_recurse
leaks Python object refcounts after errors, because it has no
PG_TRY to ensure that Py_XDECREF() gets done.  I'm not sure if
it's worth doing something about that in the back branches, given
the lack of complaints.

            regards, tom lane

diff --git a/src/pl/plpython/expected/plpython_types.out b/src/pl/plpython/expected/plpython_types.out
index c4e749a5a8..8a680e15c1 100644
--- a/src/pl/plpython/expected/plpython_types.out
+++ b/src/pl/plpython/expected/plpython_types.out
@@ -687,23 +687,74 @@ SELECT * FROM test_type_conversion_array_mixed2();
 ERROR:  invalid input syntax for type integer: "abc"
 CONTEXT:  while creating return value
 PL/Python function "test_type_conversion_array_mixed2"
-CREATE FUNCTION test_type_conversion_array_mixed3() RETURNS text[] AS $$
-return [[], 'a']
+-- check output of multi-dimensional arrays
+CREATE FUNCTION test_type_conversion_md_array_out() RETURNS text[] AS $$
+return [['a'], ['b'], ['c']]
 $$ LANGUAGE plpython3u;
-SELECT * FROM test_type_conversion_array_mixed3();
- test_type_conversion_array_mixed3
+select test_type_conversion_md_array_out();
+ test_type_conversion_md_array_out
 -----------------------------------
- {[],a}
+ {{a},{b},{c}}
 (1 row)

+CREATE OR REPLACE FUNCTION test_type_conversion_md_array_out() RETURNS text[] AS $$
+return [[], []]
+$$ LANGUAGE plpython3u;
+select test_type_conversion_md_array_out();
+ test_type_conversion_md_array_out
+-----------------------------------
+ {}
+(1 row)
+
+CREATE OR REPLACE FUNCTION test_type_conversion_md_array_out() RETURNS text[] AS $$
+return [[], [1]]
+$$ LANGUAGE plpython3u;
+select test_type_conversion_md_array_out();  -- fail
+ERROR:  multidimensional arrays must have array expressions with matching dimensions
+CONTEXT:  while creating return value
+PL/Python function "test_type_conversion_md_array_out"
+CREATE OR REPLACE FUNCTION test_type_conversion_md_array_out() RETURNS text[] AS $$
+return [[], 1]
+$$ LANGUAGE plpython3u;
+select test_type_conversion_md_array_out();  -- fail
+ERROR:  multidimensional arrays must have array expressions with matching dimensions
+CONTEXT:  while creating return value
+PL/Python function "test_type_conversion_md_array_out"
+CREATE OR REPLACE FUNCTION test_type_conversion_md_array_out() RETURNS text[] AS $$
+return [1, []]
+$$ LANGUAGE plpython3u;
+select test_type_conversion_md_array_out();  -- fail
+ERROR:  multidimensional arrays must have array expressions with matching dimensions
+CONTEXT:  while creating return value
+PL/Python function "test_type_conversion_md_array_out"
+CREATE OR REPLACE FUNCTION test_type_conversion_md_array_out() RETURNS text[] AS $$
+return [[1], [[]]]
+$$ LANGUAGE plpython3u;
+select test_type_conversion_md_array_out();  -- fail
+ERROR:  multidimensional arrays must have array expressions with matching dimensions
+CONTEXT:  while creating return value
+PL/Python function "test_type_conversion_md_array_out"
 CREATE FUNCTION test_type_conversion_mdarray_malformed() RETURNS int[] AS $$
 return [[1,2,3],[4,5]]
 $$ LANGUAGE plpython3u;
 SELECT * FROM test_type_conversion_mdarray_malformed();
-ERROR:  wrong length of inner sequence: has length 2, but 3 was expected
-DETAIL:  To construct a multidimensional array, the inner sequences must all have the same length.
+ERROR:  multidimensional arrays must have array expressions with matching dimensions
 CONTEXT:  while creating return value
 PL/Python function "test_type_conversion_mdarray_malformed"
+CREATE FUNCTION test_type_conversion_mdarray_malformed2() RETURNS text[] AS $$
+return [[1,2,3], "abc"]
+$$ LANGUAGE plpython3u;
+SELECT * FROM test_type_conversion_mdarray_malformed2();
+ERROR:  multidimensional arrays must have array expressions with matching dimensions
+CONTEXT:  while creating return value
+PL/Python function "test_type_conversion_mdarray_malformed2"
+CREATE FUNCTION test_type_conversion_mdarray_malformed3() RETURNS text[] AS $$
+return ["abc", [1,2,3]]
+$$ LANGUAGE plpython3u;
+SELECT * FROM test_type_conversion_mdarray_malformed3();
+ERROR:  multidimensional arrays must have array expressions with matching dimensions
+CONTEXT:  while creating return value
+PL/Python function "test_type_conversion_mdarray_malformed3"
 CREATE FUNCTION test_type_conversion_mdarray_toodeep() RETURNS int[] AS $$
 return [[[[[[[1]]]]]]]
 $$ LANGUAGE plpython3u;
diff --git a/src/pl/plpython/plpy_typeio.c b/src/pl/plpython/plpy_typeio.c
index 864b5f1765..0bfd3059f2 100644
--- a/src/pl/plpython/plpy_typeio.c
+++ b/src/pl/plpython/plpy_typeio.c
@@ -54,9 +54,10 @@ static Datum PLyObject_ToTransform(PLyObToDatum *arg, PyObject *plrv,
                                    bool *isnull, bool inarray);
 static Datum PLySequence_ToArray(PLyObToDatum *arg, PyObject *plrv,
                                  bool *isnull, bool inarray);
-static void PLySequence_ToArray_recurse(PLyObToDatum *elm, PyObject *list,
-                                        int *dims, int ndim, int dim,
-                                        Datum *elems, bool *nulls, int *currelem);
+static void PLySequence_ToArray_recurse(PyObject *obj,
+                                        ArrayBuildState **astatep,
+                                        int *ndims, int *dims, int cur_depth,
+                                        PLyObToDatum *elm, Oid elmbasetype);

 /* conversion from Python objects to composite Datums */
 static Datum PLyUnicode_ToComposite(PLyObToDatum *arg, PyObject *string, bool inarray);
@@ -1126,23 +1127,16 @@ PLyObject_ToTransform(PLyObToDatum *arg, PyObject *plrv,


 /*
- * Convert Python sequence to SQL array.
+ * Convert Python sequence (or list of lists) to SQL array.
  */
 static Datum
 PLySequence_ToArray(PLyObToDatum *arg, PyObject *plrv,
                     bool *isnull, bool inarray)
 {
-    ArrayType  *array;
-    int            i;
-    Datum       *elems;
-    bool       *nulls;
-    int            len;
-    int            ndim;
+    ArrayBuildState *astate = NULL;
+    int            ndims = 1;
     int            dims[MAXDIM];
     int            lbs[MAXDIM];
-    int            currelem;
-    PyObject   *pyptr = plrv;
-    PyObject   *next;

     if (plrv == Py_None)
     {
@@ -1152,128 +1146,130 @@ PLySequence_ToArray(PLyObToDatum *arg, PyObject *plrv,
     *isnull = false;

     /*
-     * Determine the number of dimensions, and their sizes.
+     * For historical reasons, we allow any sequence (not only a list) at the
+     * top level when converting a Python object to a SQL array.  However, a
+     * multi-dimensional array is recognized only when the object contains
+     * true lists.
      */
-    ndim = 0;
-
-    Py_INCREF(plrv);
-
-    for (;;)
-    {
-        if (!PyList_Check(pyptr))
-            break;
-
-        if (ndim == MAXDIM)
-            ereport(ERROR,
-                    (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
-                     errmsg("number of array dimensions exceeds the maximum allowed (%d)",
-                            MAXDIM)));
-
-        dims[ndim] = PySequence_Length(pyptr);
-        if (dims[ndim] < 0)
-            PLy_elog(ERROR, "could not determine sequence length for function return value");
-
-        if (dims[ndim] == 0)
-        {
-            /* empty sequence */
-            break;
-        }
-
-        ndim++;
-
-        next = PySequence_GetItem(pyptr, 0);
-        Py_XDECREF(pyptr);
-        pyptr = next;
-    }
-    Py_XDECREF(pyptr);
-
-    /*
-     * Check for zero dimensions. This happens if the object is a tuple or a
-     * string, rather than a list, or is not a sequence at all. We don't map
-     * tuples or strings to arrays in general, but in the first level, be
-     * lenient, for historical reasons. So if the object is a sequence of any
-     * kind, treat it as a one-dimensional array.
-     */
-    if (ndim == 0)
-    {
-        if (!PySequence_Check(plrv))
-            ereport(ERROR,
-                    (errcode(ERRCODE_DATATYPE_MISMATCH),
-                     errmsg("return value of function with array return type is not a Python sequence")));
-
-        ndim = 1;
-        dims[0] = PySequence_Length(plrv);
-    }
+    if (!PySequence_Check(plrv))
+        ereport(ERROR,
+                (errcode(ERRCODE_DATATYPE_MISMATCH),
+                 errmsg("return value of function with array return type is not a Python sequence")));

-    /* Allocate space for work arrays, after detecting array size overflow */
-    len = ArrayGetNItems(ndim, dims);
-    elems = palloc(sizeof(Datum) * len);
-    nulls = palloc(sizeof(bool) * len);
+    /* Initialize dimensionality info with first-level dimension */
+    memset(dims, 0, sizeof(dims));
+    dims[0] = PySequence_Length(plrv);

     /*
      * Traverse the Python lists, in depth-first order, and collect all the
-     * elements at the bottom level into 'elems'/'nulls' arrays.
+     * elements at the bottom level into an ArrayBuildState.
      */
-    currelem = 0;
-    PLySequence_ToArray_recurse(arg->u.array.elm, plrv,
-                                dims, ndim, 0,
-                                elems, nulls, &currelem);
+    PLySequence_ToArray_recurse(plrv, &astate,
+                                &ndims, dims, 1,
+                                arg->u.array.elm,
+                                arg->u.array.elmbasetype);
+
+    /* ensure we get zero-D array for no inputs, as per PG convention */
+    if (astate == NULL)
+        return PointerGetDatum(construct_empty_array(arg->u.array.elmbasetype));

-    for (i = 0; i < ndim; i++)
+    for (int i = 0; i < ndims; i++)
         lbs[i] = 1;

-    array = construct_md_array(elems,
-                               nulls,
-                               ndim,
-                               dims,
-                               lbs,
-                               arg->u.array.elmbasetype,
-                               arg->u.array.elm->typlen,
-                               arg->u.array.elm->typbyval,
-                               arg->u.array.elm->typalign);
-
-    return PointerGetDatum(array);
+    return makeMdArrayResult(astate, ndims, dims, lbs,
+                             CurrentMemoryContext, true);
 }

 /*
  * Helper function for PLySequence_ToArray. Traverse a Python list of lists in
- * depth-first order, storing the elements in 'elems'.
+ * depth-first order, storing the elements in *astatep.
+ *
+ * The ArrayBuildState is created only when we first find a scalar element;
+ * if we didn't do it like that, we'd need some other convention for knowing
+ * whether we'd already found any scalars (and thus the number of dimensions
+ * is frozen).
  */
 static void
-PLySequence_ToArray_recurse(PLyObToDatum *elm, PyObject *list,
-                            int *dims, int ndim, int dim,
-                            Datum *elems, bool *nulls, int *currelem)
+PLySequence_ToArray_recurse(PyObject *obj, ArrayBuildState **astatep,
+                            int *ndims, int *dims, int cur_depth,
+                            PLyObToDatum *elm, Oid elmbasetype)
 {
     int            i;
+    int            len = PySequence_Length(obj);

-    if (PySequence_Length(list) != dims[dim])
-        ereport(ERROR,
-                (errcode(ERRCODE_ARRAY_SUBSCRIPT_ERROR),
-                 errmsg("wrong length of inner sequence: has length %d, but %d was expected",
-                        (int) PySequence_Length(list), dims[dim]),
-                 (errdetail("To construct a multidimensional array, the inner sequences must all have the same
length."))));
+    /* We should not get here with a non-sequence object */
+    if (len < 0)
+        PLy_elog(ERROR, "could not determine sequence length for function return value");

-    if (dim < ndim - 1)
+    for (i = 0; i < len; i++)
     {
-        for (i = 0; i < dims[dim]; i++)
-        {
-            PyObject   *sublist = PySequence_GetItem(list, i);
+        /* fetch the array element */
+        PyObject   *subobj = PySequence_GetItem(obj, i);

-            PLySequence_ToArray_recurse(elm, sublist, dims, ndim, dim + 1,
-                                        elems, nulls, currelem);
-            Py_XDECREF(sublist);
+        /* need PG_TRY to ensure we release the subobj's refcount */
+        PG_TRY();
+        {
+            /* multi-dimensional array? */
+            if (PyList_Check(subobj))
+            {
+                /* set size when at first element in this level, else compare */
+                if (i == 0 && *ndims == cur_depth)
+                {
+                    /* array after some scalars at same level? */
+                    if (*astatep != NULL)
+                        ereport(ERROR,
+                                (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+                                 errmsg("multidimensional arrays must have array expressions with matching
dimensions")));
+                    /* too many dimensions? */
+                    if (cur_depth + 1 > MAXDIM)
+                        ereport(ERROR,
+                                (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+                                 errmsg("number of array dimensions exceeds the maximum allowed (%d)",
+                                        MAXDIM)));
+                    /* OK, add a dimension */
+                    dims[*ndims] = PySequence_Length(subobj);
+                    (*ndims)++;
+                }
+                else if (cur_depth >= *ndims ||
+                         PySequence_Length(subobj) != dims[cur_depth])
+                    ereport(ERROR,
+                            (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+                             errmsg("multidimensional arrays must have array expressions with matching dimensions")));
+
+                /* recurse to fetch elements of this sub-array */
+                PLySequence_ToArray_recurse(subobj, astatep,
+                                            ndims, dims, cur_depth + 1,
+                                            elm, elmbasetype);
+            }
+            else
+            {
+                Datum        dat;
+                bool        isnull;
+
+                /* scalar after some sub-arrays at same level? */
+                if (*ndims != cur_depth)
+                    ereport(ERROR,
+                            (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+                             errmsg("multidimensional arrays must have array expressions with matching dimensions")));
+
+                /* convert non-list object to Datum */
+                dat = elm->func(elm, subobj, &isnull, true);
+
+                /* create ArrayBuildState if we didn't already */
+                if (*astatep == NULL)
+                    *astatep = initArrayResult(elmbasetype,
+                                               CurrentMemoryContext, true);
+
+                /* ... and save the element value in it */
+                (void) accumArrayResult(*astatep, dat, isnull,
+                                        elmbasetype, CurrentMemoryContext);
+            }
         }
-    }
-    else
-    {
-        for (i = 0; i < dims[dim]; i++)
+        PG_FINALLY();
         {
-            PyObject   *obj = PySequence_GetItem(list, i);
-
-            elems[*currelem] = elm->func(elm, obj, &nulls[*currelem], true);
-            Py_XDECREF(obj);
-            (*currelem)++;
+            Py_XDECREF(subobj);
         }
+        PG_END_TRY();
     }
 }

diff --git a/src/pl/plpython/sql/plpython_types.sql b/src/pl/plpython/sql/plpython_types.sql
index 9702a10a72..0985a9cca2 100644
--- a/src/pl/plpython/sql/plpython_types.sql
+++ b/src/pl/plpython/sql/plpython_types.sql
@@ -328,11 +328,43 @@ $$ LANGUAGE plpython3u;

 SELECT * FROM test_type_conversion_array_mixed2();

-CREATE FUNCTION test_type_conversion_array_mixed3() RETURNS text[] AS $$
-return [[], 'a']
+
+-- check output of multi-dimensional arrays
+CREATE FUNCTION test_type_conversion_md_array_out() RETURNS text[] AS $$
+return [['a'], ['b'], ['c']]
+$$ LANGUAGE plpython3u;
+
+select test_type_conversion_md_array_out();
+
+CREATE OR REPLACE FUNCTION test_type_conversion_md_array_out() RETURNS text[] AS $$
+return [[], []]
+$$ LANGUAGE plpython3u;
+
+select test_type_conversion_md_array_out();
+
+CREATE OR REPLACE FUNCTION test_type_conversion_md_array_out() RETURNS text[] AS $$
+return [[], [1]]
+$$ LANGUAGE plpython3u;
+
+select test_type_conversion_md_array_out();  -- fail
+
+CREATE OR REPLACE FUNCTION test_type_conversion_md_array_out() RETURNS text[] AS $$
+return [[], 1]
+$$ LANGUAGE plpython3u;
+
+select test_type_conversion_md_array_out();  -- fail
+
+CREATE OR REPLACE FUNCTION test_type_conversion_md_array_out() RETURNS text[] AS $$
+return [1, []]
+$$ LANGUAGE plpython3u;
+
+select test_type_conversion_md_array_out();  -- fail
+
+CREATE OR REPLACE FUNCTION test_type_conversion_md_array_out() RETURNS text[] AS $$
+return [[1], [[]]]
 $$ LANGUAGE plpython3u;

-SELECT * FROM test_type_conversion_array_mixed3();
+select test_type_conversion_md_array_out();  -- fail


 CREATE FUNCTION test_type_conversion_mdarray_malformed() RETURNS int[] AS $$
@@ -341,6 +373,18 @@ $$ LANGUAGE plpython3u;

 SELECT * FROM test_type_conversion_mdarray_malformed();

+CREATE FUNCTION test_type_conversion_mdarray_malformed2() RETURNS text[] AS $$
+return [[1,2,3], "abc"]
+$$ LANGUAGE plpython3u;
+
+SELECT * FROM test_type_conversion_mdarray_malformed2();
+
+CREATE FUNCTION test_type_conversion_mdarray_malformed3() RETURNS text[] AS $$
+return ["abc", [1,2,3]]
+$$ LANGUAGE plpython3u;
+
+SELECT * FROM test_type_conversion_mdarray_malformed3();
+
 CREATE FUNCTION test_type_conversion_mdarray_toodeep() RETURNS int[] AS $$
 return [[[[[[[1]]]]]]]
 $$ LANGUAGE plpython3u;

Re: BUG #17912: Invalid memory access when converting plpython' array containing empty array

От
Alexander Lakhin
Дата:
30.04.2023 19:24, Tom Lane wrote:
> Here's a version that adopts plperl's logic, causing it to treat
> empty sub-lists as being zero-length dimensions.  Most of the new
> test cases are borrowed from plperl, too, and most of them act
> differently before and after the code change.  So I'm pretty
> hesitant to put this into stable branches.  OTOH, maybe it's not
> too late for v16?

Thanks for the patch!
I've tested the new implementation and found no issues with it — only
rectangular structures are accepted now. The code is straightforward and
very similar to plperl's, so I would not expect that it might bring new
anomalies, which couldn't be seen before.
Thus I don't think that adding it to current master (and possible follow-up
fixing) can take a significant amount of time out of v16+ schedule only.

> I noticed one inarguable bug here, too: PLySequence_ToArray_recurse
> leaks Python object refcounts after errors, because it has no
> PG_TRY to ensure that Py_XDECREF() gets done.  I'm not sure if
> it's worth doing something about that in the back branches, given
> the lack of complaints.

I continue watching the array handling bugs dancing Sirtaki too. Now it's
another asymmetry:
select '{{1},{{2}}}'::int[];
  {{{1}},{{2}}}
but:
select '{{{1}},{2}}'::int[];
  {}

Reproduced on REL_11_STABLE..master.

Best regards,
Alexander



Alexander Lakhin <exclusion@gmail.com> writes:
> 30.04.2023 19:24, Tom Lane wrote:
>> Here's a version that adopts plperl's logic, causing it to treat
>> empty sub-lists as being zero-length dimensions.  Most of the new
>> test cases are borrowed from plperl, too, and most of them act
>> differently before and after the code change.  So I'm pretty
>> hesitant to put this into stable branches.  OTOH, maybe it's not
>> too late for v16?

> Thanks for the patch!
> I've tested the new implementation and found no issues with it — only
> rectangular structures are accepted now. The code is straightforward and
> very similar to plperl's, so I would not expect that it might bring new
> anomalies, which couldn't be seen before.
> Thus I don't think that adding it to current master (and possible follow-up
> fixing) can take a significant amount of time out of v16+ schedule only.

After thinking about this some more, I'm inclined to go ahead and
apply this patch and indeed back-patch it.  As things stood before
commits 81eaaf65e et al, it was completely unsafe to use an empty
first-level sub-list in a plpython multi-dimensional result value at all.
The only valid use of such a thing would be like "return [[], []]",
that is, all the sub-lists have to be zero-length to satisfy
rectangularity.  So that would trigger the bug you originally reported
wherein a zero-length first sublist computes a wrong datum array
length leading to memory clobber.  Commit 81eaaf65e did the minimum
possible change to stop the memory clobber, but it left us in a
situation where the actual result behavior isn't very sane.  We
should not ship that behavior; we should make it do something sane,
namely return a zero-dimensional array for cases like this.

The argument against that is that zero-length sublists below the
first level managed not to crash, nor to fail, in some weird
cases like this one:

regression=# CREATE OR REPLACE FUNCTION test_type_conversion_md_array_out() RETURNS text[] AS $$
regression$# return [[1], [[]]]
regression$# $$ LANGUAGE plpython3u;
CREATE FUNCTION
regression=# select test_type_conversion_md_array_out();
 test_type_conversion_md_array_out
-----------------------------------
 {{1},{[]}}
(1 row)

which the patch would turn into an error case.  But it's pretty hard
to believe that anyone is depending on corner cases like that one
and yet managing not to trip over the crash hazard.  Moreover,
throwing an error for this is consistent with the change we made
in plperl at f47004add et al.  So I'm thinking let's apply this
patch and then just release-note all three patchsets as "tighten
checks for rectangularity of multi-dimensional arrays in plperl
and plpython".

            regards, tom lane



Alexander Lakhin <exclusion@gmail.com> writes:
> I continue watching the array handling bugs dancing Sirtaki too. Now it's
> another asymmetry:
> select '{{1},{{2}}}'::int[];
>   {{{1}},{{2}}}
> but:
> select '{{{1}},{2}}'::int[];
>   {}

For the sake of the archives --- this issue in the core code
is being tracked at

https://www.postgresql.org/message-id/2794005.1683042087%40sss.pgh.pa.us

I think all the reported issues in plperl and plpython are resolved now.

            regards, tom lane