Обсуждение: PL/Python adding support for multi-dimensional arrays

Поиск
Список
Период
Сортировка

PL/Python adding support for multi-dimensional arrays

От
Alexey Grishchenko
Дата:
Hi

Current implementation of PL/Python does not allow the use of multi-dimensional arrays, for both input and output parameters. This forces end users to introduce workarounds like casting arrays to text before passing them to the functions and parsing them after, which is an error-prone approach

This patch adds support for multi-dimensional arrays as both input and output parameters for PL/Python functions. The number of dimensions supported is limited by Postgres MAXDIM macrovariable, by default equal to 6. Both input and output multi-dimensional arrays should have fixed dimension sizes, i.e. 2-d arrays should represent MxN matrix, 3-d arrays represent MxNxK cube, etc.

This patch does not support multi-dimensional arrays of composite types, as composite types in Python might be represented as iterators and there is no obvious way to find out when the nested array stops and composite type structure starts. For example, if we have a composite type of (int, text), we can try to return "[ [ [1,'a'], [2,'b'] ], [ [3,'c'], [4,'d'] ] ]", and it is hard to find out that the first two lists are lists, and the third one represents structure. Things are getting even more complex when you have arrays as members of composite type. This is why I think this limitation is reasonable.

Given the function:
CREATE FUNCTION test_type_conversion_array_int4(x int4[]) RETURNS int4[] AS $$
plpy.info(x, type(x))
return x
$$ LANGUAGE plpythonu;

Before patch:
# SELECT * FROM test_type_conversion_array_int4(ARRAY[[1,2,3],[4,5,6]]);
ERROR:  cannot convert multidimensional array to Python list
DETAIL:  PL/Python only supports one-dimensional arrays.
CONTEXT:  PL/Python function "test_type_conversion_array_int4"

After patch:
# SELECT * FROM test_type_conversion_array_int4(ARRAY[[1,2,3],[4,5,6]]);
INFO:  ([[1, 2, 3], [4, 5, 6]], <type 'list'>)
 test_type_conversion_array_int4 
---------------------------------
 {{1,2,3},{4,5,6}}
(1 row)

--
Best regards,
Alexey Grishchenko
Вложения

Re: PL/Python adding support for multi-dimensional arrays

От
Alexey Grishchenko
Дата:
On Wed, Aug 3, 2016 at 12:49 PM, Alexey Grishchenko <agrishchenko@pivotal.io> wrote:
Hi

Current implementation of PL/Python does not allow the use of multi-dimensional arrays, for both input and output parameters. This forces end users to introduce workarounds like casting arrays to text before passing them to the functions and parsing them after, which is an error-prone approach

This patch adds support for multi-dimensional arrays as both input and output parameters for PL/Python functions. The number of dimensions supported is limited by Postgres MAXDIM macrovariable, by default equal to 6. Both input and output multi-dimensional arrays should have fixed dimension sizes, i.e. 2-d arrays should represent MxN matrix, 3-d arrays represent MxNxK cube, etc.

This patch does not support multi-dimensional arrays of composite types, as composite types in Python might be represented as iterators and there is no obvious way to find out when the nested array stops and composite type structure starts. For example, if we have a composite type of (int, text), we can try to return "[ [ [1,'a'], [2,'b'] ], [ [3,'c'], [4,'d'] ] ]", and it is hard to find out that the first two lists are lists, and the third one represents structure. Things are getting even more complex when you have arrays as members of composite type. This is why I think this limitation is reasonable.

Given the function:
CREATE FUNCTION test_type_conversion_array_int4(x int4[]) RETURNS int4[] AS $$
plpy.info(x, type(x))
return x
$$ LANGUAGE plpythonu;

Before patch:
# SELECT * FROM test_type_conversion_array_int4(ARRAY[[1,2,3],[4,5,6]]);
ERROR:  cannot convert multidimensional array to Python list
DETAIL:  PL/Python only supports one-dimensional arrays.
CONTEXT:  PL/Python function "test_type_conversion_array_int4"

After patch:
# SELECT * FROM test_type_conversion_array_int4(ARRAY[[1,2,3],[4,5,6]]);
INFO:  ([[1, 2, 3], [4, 5, 6]], <type 'list'>)
 test_type_conversion_array_int4 
---------------------------------
 {{1,2,3},{4,5,6}}
(1 row)

--
Best regards,
Alexey Grishchenko

Also this patch incorporates the fix for https://www.postgresql.org/message-id/CAH38_tkwA5qgLV8zPN1OpPzhtkNKQb30n3xq-2NR9jUfv3qwHA%40mail.gmail.com, as they touch the same piece of code - array manipulation in PL/Python

--
Best regards,
Alexey Grishchenko

Re: PL/Python adding support for multi-dimensional arrays

От
Pavel Stehule
Дата:
Hi

2016-08-03 13:54 GMT+02:00 Alexey Grishchenko <agrishchenko@pivotal.io>:
On Wed, Aug 3, 2016 at 12:49 PM, Alexey Grishchenko <agrishchenko@pivotal.io> wrote:
Hi

Current implementation of PL/Python does not allow the use of multi-dimensional arrays, for both input and output parameters. This forces end users to introduce workarounds like casting arrays to text before passing them to the functions and parsing them after, which is an error-prone approach

This patch adds support for multi-dimensional arrays as both input and output parameters for PL/Python functions. The number of dimensions supported is limited by Postgres MAXDIM macrovariable, by default equal to 6. Both input and output multi-dimensional arrays should have fixed dimension sizes, i.e. 2-d arrays should represent MxN matrix, 3-d arrays represent MxNxK cube, etc.

This patch does not support multi-dimensional arrays of composite types, as composite types in Python might be represented as iterators and there is no obvious way to find out when the nested array stops and composite type structure starts. For example, if we have a composite type of (int, text), we can try to return "[ [ [1,'a'], [2,'b'] ], [ [3,'c'], [4,'d'] ] ]", and it is hard to find out that the first two lists are lists, and the third one represents structure. Things are getting even more complex when you have arrays as members of composite type. This is why I think this limitation is reasonable.

Given the function:
CREATE FUNCTION test_type_conversion_array_int4(x int4[]) RETURNS int4[] AS $$
plpy.info(x, type(x))
return x
$$ LANGUAGE plpythonu;

Before patch:
# SELECT * FROM test_type_conversion_array_int4(ARRAY[[1,2,3],[4,5,6]]);
ERROR:  cannot convert multidimensional array to Python list
DETAIL:  PL/Python only supports one-dimensional arrays.
CONTEXT:  PL/Python function "test_type_conversion_array_int4"

After patch:
# SELECT * FROM test_type_conversion_array_int4(ARRAY[[1,2,3],[4,5,6]]);
INFO:  ([[1, 2, 3], [4, 5, 6]], <type 'list'>)
 test_type_conversion_array_int4 
---------------------------------
 {{1,2,3},{4,5,6}}
(1 row)

--
Best regards,
Alexey Grishchenko

Also this patch incorporates the fix for https://www.postgresql.org/message-id/CAH38_tkwA5qgLV8zPN1OpPzhtkNKQb30n3xq-2NR9jUfv3qwHA%40mail.gmail.com, as they touch the same piece of code - array manipulation in PL/Python


I am sending review of this patch:

1. The implemented functionality is clearly benefit - passing MD arrays, pretty faster passing bigger arrays
2. I was able to use this patch cleanly without any errors or warnings
3. There is no any error or warning
4. All tests passed - I tested Python 2.7 and Python 3.5
5. The code is well commented and clean
6. For this new functionality the documentation is not necessary

7. I invite more regress tests for both directions (Python <-> Postgres) for more than two dimensions

My only one objection is not enough regress tests - after fixing this patch will be ready for commiters.

Good work, Alexey

Thank you

Regards

Pavel
 
--
Best regards,
Alexey Grishchenko

Re: PL/Python adding support for multi-dimensional arrays

От
Dave Cramer
Дата:

On 10 August 2016 at 01:53, Pavel Stehule <pavel.stehule@gmail.com> wrote:
Hi

2016-08-03 13:54 GMT+02:00 Alexey Grishchenko <agrishchenko@pivotal.io>:
On Wed, Aug 3, 2016 at 12:49 PM, Alexey Grishchenko <agrishchenko@pivotal.io> wrote:
Hi

Current implementation of PL/Python does not allow the use of multi-dimensional arrays, for both input and output parameters. This forces end users to introduce workarounds like casting arrays to text before passing them to the functions and parsing them after, which is an error-prone approach

This patch adds support for multi-dimensional arrays as both input and output parameters for PL/Python functions. The number of dimensions supported is limited by Postgres MAXDIM macrovariable, by default equal to 6. Both input and output multi-dimensional arrays should have fixed dimension sizes, i.e. 2-d arrays should represent MxN matrix, 3-d arrays represent MxNxK cube, etc.

This patch does not support multi-dimensional arrays of composite types, as composite types in Python might be represented as iterators and there is no obvious way to find out when the nested array stops and composite type structure starts. For example, if we have a composite type of (int, text), we can try to return "[ [ [1,'a'], [2,'b'] ], [ [3,'c'], [4,'d'] ] ]", and it is hard to find out that the first two lists are lists, and the third one represents structure. Things are getting even more complex when you have arrays as members of composite type. This is why I think this limitation is reasonable.

Given the function:
CREATE FUNCTION test_type_conversion_array_int4(x int4[]) RETURNS int4[] AS $$
plpy.info(x, type(x))
return x
$$ LANGUAGE plpythonu;

Before patch:
# SELECT * FROM test_type_conversion_array_int4(ARRAY[[1,2,3],[4,5,6]]);
ERROR:  cannot convert multidimensional array to Python list
DETAIL:  PL/Python only supports one-dimensional arrays.
CONTEXT:  PL/Python function "test_type_conversion_array_int4"

After patch:
# SELECT * FROM test_type_conversion_array_int4(ARRAY[[1,2,3],[4,5,6]]);
INFO:  ([[1, 2, 3], [4, 5, 6]], <type 'list'>)
 test_type_conversion_array_int4 
---------------------------------
 {{1,2,3},{4,5,6}}
(1 row)

--
Best regards,
Alexey Grishchenko

Also this patch incorporates the fix for https://www.postgresql.org/message-id/CAH38_tkwA5qgLV8zPN1OpPzhtkNKQb30n3xq-2NR9jUfv3qwHA%40mail.gmail.com, as they touch the same piece of code - array manipulation in PL/Python


I am sending review of this patch:

1. The implemented functionality is clearly benefit - passing MD arrays, pretty faster passing bigger arrays
2. I was able to use this patch cleanly without any errors or warnings
3. There is no any error or warning
4. All tests passed - I tested Python 2.7 and Python 3.5
5. The code is well commented and clean
6. For this new functionality the documentation is not necessary

7. I invite more regress tests for both directions (Python <-> Postgres) for more than two dimensions

My only one objection is not enough regress tests - after fixing this patch will be ready for commiters.

Good work, Alexey

Thank you

Regards

Pavel
 
--
Best regards,
Alexey Grishchenko



Pavel,

I will pick this up.

Re: PL/Python adding support for multi-dimensional arrays

От
Dave Cramer
Дата:

On 18 September 2016 at 09:27, Dave Cramer <pg@fastcrypt.com> wrote:

On 10 August 2016 at 01:53, Pavel Stehule <pavel.stehule@gmail.com> wrote:
Hi

2016-08-03 13:54 GMT+02:00 Alexey Grishchenko <agrishchenko@pivotal.io>:
On Wed, Aug 3, 2016 at 12:49 PM, Alexey Grishchenko <agrishchenko@pivotal.io> wrote:
Hi

Current implementation of PL/Python does not allow the use of multi-dimensional arrays, for both input and output parameters. This forces end users to introduce workarounds like casting arrays to text before passing them to the functions and parsing them after, which is an error-prone approach

This patch adds support for multi-dimensional arrays as both input and output parameters for PL/Python functions. The number of dimensions supported is limited by Postgres MAXDIM macrovariable, by default equal to 6. Both input and output multi-dimensional arrays should have fixed dimension sizes, i.e. 2-d arrays should represent MxN matrix, 3-d arrays represent MxNxK cube, etc.

This patch does not support multi-dimensional arrays of composite types, as composite types in Python might be represented as iterators and there is no obvious way to find out when the nested array stops and composite type structure starts. For example, if we have a composite type of (int, text), we can try to return "[ [ [1,'a'], [2,'b'] ], [ [3,'c'], [4,'d'] ] ]", and it is hard to find out that the first two lists are lists, and the third one represents structure. Things are getting even more complex when you have arrays as members of composite type. This is why I think this limitation is reasonable.

Given the function:
CREATE FUNCTION test_type_conversion_array_int4(x int4[]) RETURNS int4[] AS $$
plpy.info(x, type(x))
return x
$$ LANGUAGE plpythonu;

Before patch:
# SELECT * FROM test_type_conversion_array_int4(ARRAY[[1,2,3],[4,5,6]]);
ERROR:  cannot convert multidimensional array to Python list
DETAIL:  PL/Python only supports one-dimensional arrays.
CONTEXT:  PL/Python function "test_type_conversion_array_int4"

After patch:
# SELECT * FROM test_type_conversion_array_int4(ARRAY[[1,2,3],[4,5,6]]);
INFO:  ([[1, 2, 3], [4, 5, 6]], <type 'list'>)
 test_type_conversion_array_int4 
---------------------------------
 {{1,2,3},{4,5,6}}
(1 row)

--
Best regards,
Alexey Grishchenko

Also this patch incorporates the fix for https://www.postgresql.org/message-id/CAH38_tkwA5qgLV8zPN1OpPzhtkNKQb30n3xq-2NR9jUfv3qwHA%40mail.gmail.com, as they touch the same piece of code - array manipulation in PL/Python


I am sending review of this patch:

1. The implemented functionality is clearly benefit - passing MD arrays, pretty faster passing bigger arrays
2. I was able to use this patch cleanly without any errors or warnings
3. There is no any error or warning
4. All tests passed - I tested Python 2.7 and Python 3.5
5. The code is well commented and clean
6. For this new functionality the documentation is not necessary

7. I invite more regress tests for both directions (Python <-> Postgres) for more than two dimensions

My only one objection is not enough regress tests - after fixing this patch will be ready for commiters.

Good work, Alexey

Thank you

Regards

Pavel
 
--
Best regards,
Alexey Grishchenko



Pavel,

I will pick this up.


Pavel,

Please see attached patch which provides more test cases
 
I just realized this patch contains the original patch as well. What is the protocol for sending in subsequent patches ?

Вложения

Re: PL/Python adding support for multi-dimensional arrays

От
Pavel Stehule
Дата:
Hi

2016-09-21 19:53 GMT+02:00 Dave Cramer <pg@fastcrypt.com>:

On 18 September 2016 at 09:27, Dave Cramer <pg@fastcrypt.com> wrote:

On 10 August 2016 at 01:53, Pavel Stehule <pavel.stehule@gmail.com> wrote:
Hi

2016-08-03 13:54 GMT+02:00 Alexey Grishchenko <agrishchenko@pivotal.io>:
On Wed, Aug 3, 2016 at 12:49 PM, Alexey Grishchenko <agrishchenko@pivotal.io> wrote:
Hi

Current implementation of PL/Python does not allow the use of multi-dimensional arrays, for both input and output parameters. This forces end users to introduce workarounds like casting arrays to text before passing them to the functions and parsing them after, which is an error-prone approach

This patch adds support for multi-dimensional arrays as both input and output parameters for PL/Python functions. The number of dimensions supported is limited by Postgres MAXDIM macrovariable, by default equal to 6. Both input and output multi-dimensional arrays should have fixed dimension sizes, i.e. 2-d arrays should represent MxN matrix, 3-d arrays represent MxNxK cube, etc.

This patch does not support multi-dimensional arrays of composite types, as composite types in Python might be represented as iterators and there is no obvious way to find out when the nested array stops and composite type structure starts. For example, if we have a composite type of (int, text), we can try to return "[ [ [1,'a'], [2,'b'] ], [ [3,'c'], [4,'d'] ] ]", and it is hard to find out that the first two lists are lists, and the third one represents structure. Things are getting even more complex when you have arrays as members of composite type. This is why I think this limitation is reasonable.

Given the function:
CREATE FUNCTION test_type_conversion_array_int4(x int4[]) RETURNS int4[] AS $$
plpy.info(x, type(x))
return x
$$ LANGUAGE plpythonu;

Before patch:
# SELECT * FROM test_type_conversion_array_int4(ARRAY[[1,2,3],[4,5,6]]);
ERROR:  cannot convert multidimensional array to Python list
DETAIL:  PL/Python only supports one-dimensional arrays.
CONTEXT:  PL/Python function "test_type_conversion_array_int4"

After patch:
# SELECT * FROM test_type_conversion_array_int4(ARRAY[[1,2,3],[4,5,6]]);
INFO:  ([[1, 2, 3], [4, 5, 6]], <type 'list'>)
 test_type_conversion_array_int4 
---------------------------------
 {{1,2,3},{4,5,6}}
(1 row)

--
Best regards,
Alexey Grishchenko

Also this patch incorporates the fix for https://www.postgresql.org/message-id/CAH38_tkwA5qgLV8zPN1OpPzhtkNKQb30n3xq-2NR9jUfv3qwHA%40mail.gmail.com, as they touch the same piece of code - array manipulation in PL/Python


I am sending review of this patch:

1. The implemented functionality is clearly benefit - passing MD arrays, pretty faster passing bigger arrays
2. I was able to use this patch cleanly without any errors or warnings
3. There is no any error or warning
4. All tests passed - I tested Python 2.7 and Python 3.5
5. The code is well commented and clean
6. For this new functionality the documentation is not necessary

7. I invite more regress tests for both directions (Python <-> Postgres) for more than two dimensions

My only one objection is not enough regress tests - after fixing this patch will be ready for commiters.

Now, the tests are enough - so I'll mark this patch as ready for commiters.

I had to fix tests - there was lot of white spaces, and the result for python3 was missing

Regards

Pavel


 

Good work, Alexey

Thank you

Regards

Pavel
 
--
Best regards,
Alexey Grishchenko



Pavel,

I will pick this up.


Pavel,

Please see attached patch which provides more test cases
 
I just realized this patch contains the original patch as well. What is the protocol for sending in subsequent patches ?


Вложения

Re: PL/Python adding support for multi-dimensional arrays

От
Heikki Linnakangas
Дата:
On 09/22/2016 10:28 AM, Pavel Stehule wrote:
> Now, the tests are enough - so I'll mark this patch as ready for commiters.
>
> I had to fix tests - there was lot of white spaces, and the result for
> python3 was missing

Thanks Pavel!

This crashes with arrays with non-default lower bounds:

postgres=# SELECT * FROM test_type_conversion_array_int4('[2:4]={1,2,3}');
INFO:  ([1, 2, <NULL>], <type 'list'>)
server closed the connection unexpectedlyThis probably means the server terminated abnormallybefore or while processing
therequest.
 


I'd like to see some updates to the docs for this. The manual doesn't 
currently say anything about multi-dimensional arrays in pl/python, but 
it should've mentioned that they're not supported. Now that it is 
supported, should mention that, and explain briefly that a 
multi-dimensional array is mapped to a python list of lists.

It seems we don't have any mention in the docs about arrays with 
non-default lower-bounds ATM. That's not this patch's fault, but it 
would be good to point out that the lower bounds are discarded when an 
array is passed to python.

I find the loop in PLyList_FromArray() quite difficult to understand. 
Are the comments there mixing up the "inner" and "outer" dimensions? I 
wonder if that would be easier to read, if it was written in a 
recursive-style, rather than iterative with stacks for the dimensions.

On 08/03/2016 02:49 PM, Alexey Grishchenko wrote:
> This patch does not support multi-dimensional arrays of composite types, as
> composite types in Python might be represented as iterators and there is no
> obvious way to find out when the nested array stops and composite type
> structure starts. For example, if we have a composite type of (int, text),
> we can try to return "[ [ [1,'a'], [2,'b'] ], [ [3,'c'], [4,'d'] ] ]", and
> it is hard to find out that the first two lists are lists, and the third
> one represents structure. Things are getting even more complex when you
> have arrays as members of composite type. This is why I think this
> limitation is reasonable.

How do we handle single-dimensional arrays of composite types at the 
moment? At a quick glance, it seems that the composite types are just 
treated like strings, when they're in an array. That's probably OK, but 
it means that there's nothing special about composite types in 
multi-dimensional arrays. In any case, we should mention that in the docs.

- Heikki



Re: PL/Python adding support for multi-dimensional arrays

От
Jim Nasby
Дата:
On 9/23/16 2:42 AM, Heikki Linnakangas wrote:
> How do we handle single-dimensional arrays of composite types at the
> moment? At a quick glance, it seems that the composite types are just
> treated like strings, when they're in an array. That's probably OK, but
> it means that there's nothing special about composite types in
> multi-dimensional arrays. In any case, we should mention that in the docs.

That is how they're handled, but I'd really like to change that. I've 
held off because I don't know how to handle the backwards 
incompatibility that would introduce. (I've been wondering if we might 
add a facility to allow specifying default TRANSFORMs that should be 
used for specific data types in specific languages.)

The converse case (a composite with arrays) suffers the same problem 
(array is just treated as a string).
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461



Re: PL/Python adding support for multi-dimensional arrays

От
Dave Cramer
Дата:




This crashes with arrays with non-default lower bounds:

postgres=# SELECT * FROM test_type_conversion_array_int4('[2:4]={1,2,3}');
INFO:  ([1, 2, <NULL>], <type 'list'>)
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.

Attached patch fixes this bug, and adds a test for it. 

I'd like to see some updates to the docs for this. The manual doesn't currently say anything about multi-dimensional arrays in pl/python, but it should've mentioned that they're not supported. Now that it is supported, should mention that, and explain briefly that a multi-dimensional array is mapped to a python list of lists.

If the code passes I'll fix the docs 
It seems we don't have any mention in the docs about arrays with non-default lower-bounds ATM. That's not this patch's fault, but it would be good to point out that the lower bounds are discarded when an array is passed to python.

I find the loop in PLyList_FromArray() quite difficult to understand. Are the comments there mixing up the "inner" and "outer" dimensions? I wonder if that would be easier to read, if it was written in a recursive-style, rather than iterative with stacks for the dimensions.

Yes, it is fairly convoluted. 
 

Re: PL/Python adding support for multi-dimensional arrays

От
Dave Cramer
Дата:

On 26 September 2016 at 14:52, Dave Cramer <pg@fastcrypt.com> wrote:




This crashes with arrays with non-default lower bounds:

postgres=# SELECT * FROM test_type_conversion_array_int4('[2:4]={1,2,3}');
INFO:  ([1, 2, <NULL>], <type 'list'>)
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.

Attached patch fixes this bug, and adds a test for it. 

I'd like to see some updates to the docs for this. The manual doesn't currently say anything about multi-dimensional arrays in pl/python, but it should've mentioned that they're not supported. Now that it is supported, should mention that, and explain briefly that a multi-dimensional array is mapped to a python list of lists.

If the code passes I'll fix the docs 
It seems we don't have any mention in the docs about arrays with non-default lower-bounds ATM. That's not this patch's fault, but it would be good to point out that the lower bounds are discarded when an array is passed to python.

I find the loop in PLyList_FromArray() quite difficult to understand. Are the comments there mixing up the "inner" and "outer" dimensions? I wonder if that would be easier to read, if it was written in a recursive-style, rather than iterative with stacks for the dimensions.

Yes, it is fairly convoluted. 
 


Вложения

Re: PL/Python adding support for multi-dimensional arrays

От
Heikki Linnakangas
Дата:
On 09/27/2016 02:04 PM, Dave Cramer wrote:
> On 26 September 2016 at 14:52, Dave Cramer <pg@fastcrypt.com> wrote:
>>> This crashes with arrays with non-default lower bounds:
>>>
>>> postgres=# SELECT * FROM test_type_conversion_array_int
>>> 4('[2:4]={1,2,3}');
>>> INFO:  ([1, 2, <NULL>], <type 'list'>)
>>> server closed the connection unexpectedly
>>>         This probably means the server terminated abnormally
>>>         before or while processing the request.
>>>
>>> Attached patch fixes this bug, and adds a test for it.

I spent some more time massaging this:

* Changed the loops from iterative to recursive style. I think this
indeed is slightly easier to understand.

* Fixed another segfault, with too deeply nested lists:

CREATE or replace FUNCTION test_type_conversion_mdarray_toodeep()
RETURNS int[] AS $$
return [[[[[[[[[[[[[[[[[[1]]]]]]]]]]]]]]]]]]
$$ LANGUAGE plpythonu;

* Also, in PLySequence_ToArray(), we must check that the 'len' of the
array doesn't overflow.

* Fixed reference leak in the loop in PLySequence_ToArray() to count the
number of dimensions.

>>> I'd like to see some updates to the docs for this. The manual doesn't
>>> currently say anything about multi-dimensional arrays in pl/python, but it
>>> should've mentioned that they're not supported. Now that it is supported,
>>> should mention that, and explain briefly that a multi-dimensional array is
>>> mapped to a python list of lists.
>>>
>> If the code passes I'll fix the docs

Please do, thanks!

- Heikki


Вложения

Re: PL/Python adding support for multi-dimensional arrays

От
Dave Cramer
Дата:

On 27 September 2016 at 14:58, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 09/27/2016 02:04 PM, Dave Cramer wrote:
On 26 September 2016 at 14:52, Dave Cramer <pg@fastcrypt.com> wrote:
This crashes with arrays with non-default lower bounds:

postgres=# SELECT * FROM test_type_conversion_array_int
4('[2:4]={1,2,3}');
INFO:  ([1, 2, <NULL>], <type 'list'>)
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.

Attached patch fixes this bug, and adds a test for it.

I spent some more time massaging this:

* Changed the loops from iterative to recursive style. I think this indeed is slightly easier to understand.

* Fixed another segfault, with too deeply nested lists:

CREATE or replace FUNCTION test_type_conversion_mdarray_toodeep() RETURNS int[] AS $$
return [[[[[[[[[[[[[[[[[[1]]]]]]]]]]]]]]]]]]
$$ LANGUAGE plpythonu;

* Also, in PLySequence_ToArray(), we must check that the 'len' of the array doesn't overflow.

* Fixed reference leak in the loop in PLySequence_ToArray() to count the number of dimensions.

I'd like to see some updates to the docs for this. The manual doesn't
currently say anything about multi-dimensional arrays in pl/python, but it
should've mentioned that they're not supported. Now that it is supported,
should mention that, and explain briefly that a multi-dimensional array is
mapped to a python list of lists.

If the code passes I'll fix the docs

Please do, thanks!


see attached 




Вложения

Re: PL/Python adding support for multi-dimensional arrays

От
Heikki Linnakangas
Дата:
On 09/23/2016 10:27 PM, Jim Nasby wrote:
> On 9/23/16 2:42 AM, Heikki Linnakangas wrote:
>> How do we handle single-dimensional arrays of composite types at the
>> moment? At a quick glance, it seems that the composite types are just
>> treated like strings, when they're in an array. That's probably OK, but
>> it means that there's nothing special about composite types in
>> multi-dimensional arrays. In any case, we should mention that in the docs.
>
> That is how they're handled, but I'd really like to change that. I've
> held off because I don't know how to handle the backwards
> incompatibility that would introduce. (I've been wondering if we might
> add a facility to allow specifying default TRANSFORMs that should be
> used for specific data types in specific languages.)
>
> The converse case (a composite with arrays) suffers the same problem
> (array is just treated as a string).

I take that back, I don't know what I was talking about. Without this 
patch, an array of composite types can be returned, using any of the 
three representations for the composite type explained in the docs: a 
string, a sequence, or a dictionary. So, all these work, and return the 
same value:

create table foo (a int4, b int4);

CREATE FUNCTION comp_array_string() RETURNS foo[] AS $$
return ["(1, 2)"]
$$ LANGUAGE plpythonu;

CREATE FUNCTION comp_array_sequence() RETURNS foo[] AS $$
return [[1, 2]]
$$ LANGUAGE plpythonu;

CREATE FUNCTION comp_array_dict() RETURNS foo[] AS $$
return [{"a": 1, "b": 2}]
$$ LANGUAGE plpythonu;

Jim, I was confused, but you agreed with me. Were you also confused, or 
am I missing something?

Now, back to multi-dimensional arrays. I can see that the Sequence 
representation is problematic, with arrays, because if you have a python 
list of lists, like [[1, 2]], it's not immediately clear if that's a 
one-dimensional array of tuples, or two-dimensional array of integers. 
Then again, we do have the type definitions available. So is it really 
ambiguous?

The string and dict representations don't have that ambiguity at all, so 
I don't see why we wouldn't support those, at least.

- Heikki




Re: PL/Python adding support for multi-dimensional arrays

От
Jim Nasby
Дата:
On 9/29/16 1:51 PM, Heikki Linnakangas wrote:
> Jim, I was confused, but you agreed with me. Were you also confused, or
> am I missing something?

I was confused by inputs:

CREATE FUNCTION repr(i foo[]) RETURNS text LANGUAGE plpythonu AS 
$$return repr(i)$$;
select repr(array[row(1,2)::foo, row(3,4)::foo]);        repr
-------------------- ['(1,2)', '(3,4)']
(1 row)

(in ipython...)

In [1]: i=['(1,2)', '(3,4)']

In [2]: type(i)
Out[2]: list

In [3]: type(i[0])
Out[3]: str

I wonder if your examples work only

> Now, back to multi-dimensional arrays. I can see that the Sequence
> representation is problematic, with arrays, because if you have a python
> list of lists, like [[1, 2]], it's not immediately clear if that's a
> one-dimensional array of tuples, or two-dimensional array of integers.
> Then again, we do have the type definitions available. So is it really
> ambiguous?

[[1,2]] is a list of lists...
In [4]: b=[[1,2]]

In [5]: type(b)
Out[5]: list

In [6]: type(b[0])
Out[6]: list

If you want a list of tuples...
In [7]: c=[(1,2)]

In [8]: type(c)
Out[8]: list

In [9]: type(c[0])
Out[9]: tuple
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461



Re: PL/Python adding support for multi-dimensional arrays

От
Michael Paquier
Дата:
On Sat, Oct 1, 2016 at 8:45 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
> On 9/29/16 1:51 PM, Heikki Linnakangas wrote:
>>
>> Jim, I was confused, but you agreed with me. Were you also confused, or
>> am I missing something?
>
>
> I was confused by inputs:

I have marked the patch as returned with feedback. Or Heikki, do you
plan on looking at it more and commit soon?
-- 
Michael



Re: PL/Python adding support for multi-dimensional arrays

От
Heikki Linnakangas
Дата:
On 10/01/2016 02:45 AM, Jim Nasby wrote:
> On 9/29/16 1:51 PM, Heikki Linnakangas wrote:
>> Now, back to multi-dimensional arrays. I can see that the Sequence
>> representation is problematic, with arrays, because if you have a python
>> list of lists, like [[1, 2]], it's not immediately clear if that's a
>> one-dimensional array of tuples, or two-dimensional array of integers.
>> Then again, we do have the type definitions available. So is it really
>> ambiguous?
>
> [[1,2]] is a list of lists...
> In [4]: b=[[1,2]]
>
> In [5]: type(b)
> Out[5]: list
>
> In [6]: type(b[0])
> Out[6]: list
>
> If you want a list of tuples...
> In [7]: c=[(1,2)]
>
> In [8]: type(c)
> Out[8]: list
>
> In [9]: type(c[0])
> Out[9]: tuple

Hmm, so we would start to treat lists and tuples differently? A Python 
list would be converted into an array, and a Python tuple would be 
converted into a composite type. That does make a lot of sense. The only 
problem is that it's not backwards-compatible. A PL/python function that 
returns an SQL array of rows, and does that by returning Python list of 
lists, it would start failing.

I think we should bite the bullet and do that anyway. As long as it's 
clearly documented, and the error message you get contains a clear hint 
on how to fix it, I don't think it would be too painful to adjust 
existing application.

We could continue to accept a Python list for a plain composite type, 
this would only affect arrays of composite types.

I don't use PL/python much myself, so I don't feel qualified to make the 
call, though. Any 3rd opinions?

- Heikki




Re: PL/Python adding support for multi-dimensional arrays

От
Pavel Stehule
Дата:


2016-10-10 12:31 GMT+02:00 Heikki Linnakangas <hlinnaka@iki.fi>:
On 10/01/2016 02:45 AM, Jim Nasby wrote:
On 9/29/16 1:51 PM, Heikki Linnakangas wrote:
Now, back to multi-dimensional arrays. I can see that the Sequence
representation is problematic, with arrays, because if you have a python
list of lists, like [[1, 2]], it's not immediately clear if that's a
one-dimensional array of tuples, or two-dimensional array of integers.
Then again, we do have the type definitions available. So is it really
ambiguous?

[[1,2]] is a list of lists...
In [4]: b=[[1,2]]

In [5]: type(b)
Out[5]: list

In [6]: type(b[0])
Out[6]: list

If you want a list of tuples...
In [7]: c=[(1,2)]

In [8]: type(c)
Out[8]: list

In [9]: type(c[0])
Out[9]: tuple

Hmm, so we would start to treat lists and tuples differently? A Python list would be converted into an array, and a Python tuple would be converted into a composite type. That does make a lot of sense. The only problem is that it's not backwards-compatible. A PL/python function that returns an SQL array of rows, and does that by returning Python list of lists, it would start failing.

is not possible do decision in last moment - on PL/Postgres interface? There the expected type should be known.

Regards

Pavel
 

I think we should bite the bullet and do that anyway. As long as it's clearly documented, and the error message you get contains a clear hint on how to fix it, I don't think it would be too painful to adjust existing application.

We could continue to accept a Python list for a plain composite type, this would only affect arrays of composite types.

I don't use PL/python much myself, so I don't feel qualified to make the call, though. Any 3rd opinions?

- Heikki


Re: PL/Python adding support for multi-dimensional arrays

От
Dave Cramer
Дата:

On 10 October 2016 at 13:42, Pavel Stehule <pavel.stehule@gmail.com> wrote:


2016-10-10 12:31 GMT+02:00 Heikki Linnakangas <hlinnaka@iki.fi>:
On 10/01/2016 02:45 AM, Jim Nasby wrote:
On 9/29/16 1:51 PM, Heikki Linnakangas wrote:
Now, back to multi-dimensional arrays. I can see that the Sequence
representation is problematic, with arrays, because if you have a python
list of lists, like [[1, 2]], it's not immediately clear if that's a
one-dimensional array of tuples, or two-dimensional array of integers.
Then again, we do have the type definitions available. So is it really
ambiguous?

[[1,2]] is a list of lists...
In [4]: b=[[1,2]]

In [5]: type(b)
Out[5]: list

In [6]: type(b[0])
Out[6]: list

If you want a list of tuples...
In [7]: c=[(1,2)]

In [8]: type(c)
Out[8]: list

In [9]: type(c[0])
Out[9]: tuple

Hmm, so we would start to treat lists and tuples differently? A Python list would be converted into an array, and a Python tuple would be converted into a composite type. That does make a lot of sense. The only problem is that it's not backwards-compatible. A PL/python function that returns an SQL array of rows, and does that by returning Python list of lists, it would start failing.

is not possible do decision in last moment - on PL/Postgres interface? There the expected type should be known.

Regards

Pavel
 

I think we should bite the bullet and do that anyway. As long as it's clearly documented, and the error message you get contains a clear hint on how to fix it, I don't think it would be too painful to adjust existing application.

We could continue to accept a Python list for a plain composite type, this would only affect arrays of composite types.

I don't use PL/python much myself, so I don't feel qualified to make the call, though. Any 3rd opinions?

Can't you determine the correct output based on the function output definition ?

For instance if the function output was an array type then we would return the list as an array
if the function output was a set of then we return tuples ?


Re: PL/Python adding support for multi-dimensional arrays

От
Heikki Linnakangas
Дата:
On 10/10/2016 08:42 PM, Pavel Stehule wrote:
> 2016-10-10 12:31 GMT+02:00 Heikki Linnakangas <hlinnaka@iki.fi>:
>
>> On 10/01/2016 02:45 AM, Jim Nasby wrote:
>>
>>> On 9/29/16 1:51 PM, Heikki Linnakangas wrote:
>>>
>>>> Now, back to multi-dimensional arrays. I can see that the Sequence
>>>> representation is problematic, with arrays, because if you have a python
>>>> list of lists, like [[1, 2]], it's not immediately clear if that's a
>>>> one-dimensional array of tuples, or two-dimensional array of integers.
>>>> Then again, we do have the type definitions available. So is it really
>>>> ambiguous?
>>>>
>>>
>>> [[1,2]] is a list of lists...
>>> In [4]: b=[[1,2]]
>>>
>>> In [5]: type(b)
>>> Out[5]: list
>>>
>>> In [6]: type(b[0])
>>> Out[6]: list
>>>
>>> If you want a list of tuples...
>>> In [7]: c=[(1,2)]
>>>
>>> In [8]: type(c)
>>> Out[8]: list
>>>
>>> In [9]: type(c[0])
>>> Out[9]: tuple
>>>
>>
>> Hmm, so we would start to treat lists and tuples differently? A Python
>> list would be converted into an array, and a Python tuple would be
>> converted into a composite type. That does make a lot of sense. The only
>> problem is that it's not backwards-compatible. A PL/python function that
>> returns an SQL array of rows, and does that by returning Python list of
>> lists, it would start failing.
>
> is not possible do decision in last moment - on PL/Postgres interface?
> There the expected type should be known.

Unfortunately there are cases that are fundamentally ambiguous.

create type comptype as (intarray int[]);
create function array_return() returns comptype[] as $$  return [[[[1]]]];
$$ language plpython;

What does the function return? It could be two-dimension array of 
comptype, with a single-dimension intarray, or a single-dimension 
comptype, with a two-dimension intarray.

We could resolve it for simpler cases, but not the general case. The 
simple cases would probably cover most things people do in practice. But 
if the distinction between a tuple and a list feels natural to Python 
programmers, I think it would be more clear in the long run to have 
people adjust their applications.

- Heikki




Re: PL/Python adding support for multi-dimensional arrays

От
Pavel Stehule
Дата:


2016-10-11 7:49 GMT+02:00 Heikki Linnakangas <hlinnaka@iki.fi>:
On 10/10/2016 08:42 PM, Pavel Stehule wrote:
2016-10-10 12:31 GMT+02:00 Heikki Linnakangas <hlinnaka@iki.fi>:

On 10/01/2016 02:45 AM, Jim Nasby wrote:

On 9/29/16 1:51 PM, Heikki Linnakangas wrote:

Now, back to multi-dimensional arrays. I can see that the Sequence
representation is problematic, with arrays, because if you have a python
list of lists, like [[1, 2]], it's not immediately clear if that's a
one-dimensional array of tuples, or two-dimensional array of integers.
Then again, we do have the type definitions available. So is it really
ambiguous?


[[1,2]] is a list of lists...
In [4]: b=[[1,2]]

In [5]: type(b)
Out[5]: list

In [6]: type(b[0])
Out[6]: list

If you want a list of tuples...
In [7]: c=[(1,2)]

In [8]: type(c)
Out[8]: list

In [9]: type(c[0])
Out[9]: tuple


Hmm, so we would start to treat lists and tuples differently? A Python
list would be converted into an array, and a Python tuple would be
converted into a composite type. That does make a lot of sense. The only
problem is that it's not backwards-compatible. A PL/python function that
returns an SQL array of rows, and does that by returning Python list of
lists, it would start failing.

is not possible do decision in last moment - on PL/Postgres interface?
There the expected type should be known.

Unfortunately there are cases that are fundamentally ambiguous.

create type comptype as (intarray int[]);
create function array_return() returns comptype[] as $$
  return [[[[1]]]];
$$ language plpython;

What does the function return? It could be two-dimension array of comptype, with a single-dimension intarray, or a single-dimension comptype, with a two-dimension intarray.

We could resolve it for simpler cases, but not the general case. The simple cases would probably cover most things people do in practice. But if the distinction between a tuple and a list feels natural to Python programmers, I think it would be more clear in the long run to have people adjust their applications.

I agree. The distinction is natural - and it is our issue, so we don't distinguish strongly.

Regards

Pavel
 

- Heikki


Re: PL/Python adding support for multi-dimensional arrays

От
Heikki Linnakangas
Дата:
On 10/11/2016 08:56 AM, Pavel Stehule wrote:
> 2016-10-11 7:49 GMT+02:00 Heikki Linnakangas <hlinnaka@iki.fi>:
>
>> Unfortunately there are cases that are fundamentally ambiguous.
>>
>> create type comptype as (intarray int[]);
>> create function array_return() returns comptype[] as $$
>>   return [[[[1]]]];
>> $$ language plpython;
>>
>> What does the function return? It could be two-dimension array of
>> comptype, with a single-dimension intarray, or a single-dimension comptype,
>> with a two-dimension intarray.
>>
>> We could resolve it for simpler cases, but not the general case. The
>> simple cases would probably cover most things people do in practice. But if
>> the distinction between a tuple and a list feels natural to Python
>> programmers, I think it would be more clear in the long run to have people
>> adjust their applications.
>
> I agree. The distinction is natural - and it is our issue, so we don't
> distinguish strongly.

Ok, let's do that then. Here is a patch set that does that. The first is
the main patch. The second patch adds some code to give a hint, if you
do that thing that whose behavior changed. That code isn't very pretty,
but I think a good error message is absolutely required, if we are to
make this change. Does anyone have better suggestions on how to catch
the common cases of that?

Please review. Are the docs and the error messages now clear enough on
this? We'll need a mention in the release notes too, when it's time for
that.

- Heikki


Вложения

Re: PL/Python adding support for multi-dimensional arrays

От
Jim Nasby
Дата:
On 10/14/16 3:53 AM, Heikki Linnakangas wrote:
> Composite types in arrays must now be returned as
> Python tuples, not lists, to resolve the ambiguity. I.e. "[(col1, col2),
> (col1, col2)]".

Shouldn't dicts be allowed as well? I'm not sure they would 
automatically be considered as tuples (unlike something that extends 
tuples, such as namedtuples).
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461



Re: PL/Python adding support for multi-dimensional arrays

От
Heikki Linnakangas
Дата:

On 14 October 2016 19:18:01 EEST, Jim Nasby <Jim.Nasby@BlueTreble.com> wrote:
>On 10/14/16 3:53 AM, Heikki Linnakangas wrote:
>> Composite types in arrays must now be returned as
>> Python tuples, not lists, to resolve the ambiguity. I.e. "[(col1,
>col2),
>> (col1, col2)]".
>
>Shouldn't dicts be allowed as well? 

Ah yes, dicts are also allowed, as before. And strings. The only change is that a list is interpreted as an array
dimension,instead of a composite type.
 

- Heikki




Re: PL/Python adding support for multi-dimensional arrays

От
Pavel Stehule
Дата:
Hi

2016-10-14 10:53 GMT+02:00 Heikki Linnakangas <hlinnaka@iki.fi>:
On 10/11/2016 08:56 AM, Pavel Stehule wrote:
2016-10-11 7:49 GMT+02:00 Heikki Linnakangas <hlinnaka@iki.fi>:

Unfortunately there are cases that are fundamentally ambiguous.

create type comptype as (intarray int[]);
create function array_return() returns comptype[] as $$
  return [[[[1]]]];
$$ language plpython;

What does the function return? It could be two-dimension array of
comptype, with a single-dimension intarray, or a single-dimension comptype,
with a two-dimension intarray.

We could resolve it for simpler cases, but not the general case. The
simple cases would probably cover most things people do in practice. But if
the distinction between a tuple and a list feels natural to Python
programmers, I think it would be more clear in the long run to have people
adjust their applications.

I agree. The distinction is natural - and it is our issue, so we don't
distinguish strongly.

Ok, let's do that then. Here is a patch set that does that. The first is the main patch. The second patch adds some code to give a hint, if you do that thing that whose behavior changed. That code isn't very pretty, but I think a good error message is absolutely required, if we are to make this change. Does anyone have better suggestions on how to catch the common cases of that?

Please review. Are the docs and the error messages now clear enough on this? We'll need a mention in the release notes too, when it's time for that.

The error message is clear.

I tested patches - and the regression test is broken (is not actualized)

+ -- Starting with PostgreSQL 10, a composite type in an array cannot be represented as
+ -- a Python list, because it's ambiguous with multi-dimensional arrays. So this
+ -- throws an error now. The error should contain a useful hint on the issue.
+ CREATE FUNCTION composite_type_as_list()  RETURNS type_record[] AS $$
+   return [['first', 1]];
+ $$ LANGUAGE plpythonu;
+ SELECT * FROM composite_type_as_list();
+ ERROR:  malformed record literal: "first"
+ DETAIL:  Missing left parenthesis.
+ HINT:  To return a composite type in an array, return the composite type as a Python tuple, e.g. "[('foo')]"
+ CONTEXT:  while creating return value
+ PL/Python function "composite_type_as_list"

I tested Pyhon 3.5 and 2.7 and there are not any other issues

There are no new tests for multidimensional array of composites - there is only new negative test.

Regards

Pavel
 

- Heikki


Re: PL/Python adding support for multi-dimensional arrays

От
Heikki Linnakangas
Дата:
On 10/24/2016 10:33 PM, Pavel Stehule wrote:
> Hi
>
> 2016-10-14 10:53 GMT+02:00 Heikki Linnakangas <hlinnaka@iki.fi>:
>
>> Please review. Are the docs and the error messages now clear enough on
>> this? We'll need a mention in the release notes too, when it's time for
>> that.
>
> The error message is clear.

Ok, great!

> I tested patches - and the regression test is broken (is not actualized)

Ah, fixed.

> There are no new tests for multidimensional array of composites - there is
> only new negative test.

Added one.

Thanks for the review! Committed, with those little fixes, and some 
little last-minute comment tweaks.

- Heikki




Re: PL/Python adding support for multi-dimensional arrays

От
Pavel Stehule
Дата:


2016-10-26 10:03 GMT+02:00 Heikki Linnakangas <hlinnaka@iki.fi>:
On 10/24/2016 10:33 PM, Pavel Stehule wrote:
Hi

2016-10-14 10:53 GMT+02:00 Heikki Linnakangas <hlinnaka@iki.fi>:

Please review. Are the docs and the error messages now clear enough on
this? We'll need a mention in the release notes too, when it's time for
that.

The error message is clear.

Ok, great!

I tested patches - and the regression test is broken (is not actualized)

Ah, fixed.

There are no new tests for multidimensional array of composites - there is
only new negative test.

Added one.

Thanks for the review! Committed, with those little fixes, and some little last-minute comment tweaks.

Thank you very much

nice feature

Regards

Pavel
 

- Heikki