Обсуждение: array functions - request for opinions (was Re: [PATCHES] array support patch phase 1 patch)

Поиск
Список
Период
Сортировка
The discussion below has been taking place on the PATCHES list. I'm 
forwarding it to HACKERS to get a wider audience. We're looking for 
opinions on (at a minimum):

1) Which of these functions should exist *and* be included in user   documentation
2) Which of these functions are not worth having in the backend at all
3) Specifically WRT array_accum(), is there merit in having a userland   function that aggregates row values into
arrays,for use in custom   aggregates.
 

Note that all of these functions have already been implemented and are 
either in CVS (all except str_to_array and array_to_str) or in a 
submitted patch. Some are required to implement the SQL 99 standard 
syntax, but some are not. Specifically, array_subscript, array_assign, 
and singleton_array are not needed to implement other functionality and 
would be the primary candidates for removal.

Thanks,

Joe

-------- Original Message --------
Peter Eisentraut wrote:
> Joe Conway writes:
>>I personally don't understand why we should hide them from users. If I
>>prefer to use array_append(var1, var2) rather than (var1 || var2),
>>what's the problem? Its a matter of taste as to which is better.
> 
> The problem is that this approach leads to bloat without bound.  Maybe
> tomorrow someone prefers append_array(var1, var2) or var1 + var2.  The
> standard says it's var1 || var2, there is no technical or substantial
> aesthetical argument against it, so that's what we should use.

I can see your point. But is there any circumstance where the function
will work and the standard syntax won't? Until recently, it was not
possible to assign individual array elements in PL/pgSQL (I think Tom
fixed this). Are there any more examples?

Let's go down the list:

array_accum(anyarray, anyelement)  -- no standard syntax; more on this later
array_append(anyarray, anyelement)  -- can be easily done with "||" operator; is there any case where "||"     won't
workthat array_append() will? If not, don't document as a     user function.
 
array_assign(anyarray, integer, anyelement)  -- can be easily done with "var[n] = X" syntax; is there any case
where"var[n] = X" won't work that array_assign() will? If not,     don't document as a user function.
 
array_cat(anyarray, anyarray)  -- see array_append
array_dims(anyarray)  -- no standard syntax; should be documented.
array_lower(anyarray, integer)  -- no standard syntax; should be documented.
array_prepend(anyelement, anyarray)  -- see array_append
array_subscript(anyarray, integer)  -- can be easily done with "var[n]" syntax; is there any case     where "var[n]"
won'twork that array_subscript() will? If not,     don't document as a user function.
 
array_to_str(anyarray, text)  -- no standard syntax; should be documented.
array_upper(anyarray, integer)  -- no standard syntax; should be documented.
singleton_array(anyelement)  -- can be easily done with "array[x]" or "'{x}'" syntax; is there any     case where one
ofthese won't work that singleton_array() will? If     not, don't document as a user function.
 
str_to_array(text, text)  -- no standard syntax; should be documented.

BTW, should this discussion be on HACKERS or even GENERAL in order to
get a wider audience of opinion?

>>And in any case, array_accum() is intended to be used for building
>>custom aggregates, so that needs to be documented.
> Can you give an example of an aggregate or a class of aggregates where
> this would be useful?
> 

There are many discussions on the lists over the past few years in which
people have requested this kind of functionality. There is even a
contrib/intagg that does this for integers only. I don't think there is
any question that there is a demand for the feature. Some examples:

http://fts.postgresql.org/db/msg.html?mid=1021530
http://fts.postgresql.org/db/msg.html?mid=1096592
http://fts.postgresql.org/db/msg.html?mid=1031700
http://fts.postgresql.org/db/msg.html?mid=1353047
http://fts.postgresql.org/db/msg.html?mid=1063738
http://fts.postgresql.org/db/msg.html?mid=1050837
http://fts.postgresql.org/db/msg.html?mid=1066349

Some people would like to simply aggregate rows of data into arrays for
analysis, some want to write custom final functions to post-process the
resulting array (e.g. median). With array_accum() and some of the other
new functions (array_lower & array_upper specifically come to mind),
people have a relatively easy way to create there own custom aggregates
using PL/pgSQL.

With PL/R, I can use array_accum to create an aggregate for just about
any summary statistic that I'm interested in. And while, yes, there is
already an implementation of array_accum in PL/R, it isn't reasonable to
require anyone who wants to use it to install libR also. And, yes, there
are undoubtedly other ways to create those aggregates, but none as
simple to use.

Again, I'd suggest that if we want to debate this much further, we
should move the discussion to the GENERAL and SQL lists so that users
can have a chance to chime in.

Joe






Re: array functions - request for opinions (was Re: [PATCHES] array

От
Peter Eisentraut
Дата:
Joe Conway writes:

> >>And in any case, array_accum() is intended to be used for building
> >>custom aggregates, so that needs to be documented.
> > Can you give an example of an aggregate or a class of aggregates where
> > this would be useful?

> There are many discussions on the lists over the past few years in which
> people have requested this kind of functionality. There is even a
> contrib/intagg that does this for integers only. I don't think there is
> any question that there is a demand for the feature. Some examples:

These applications should use an empty array as initial value.  Then the
normal array concatenation can be used as transition function.

-- 
Peter Eisentraut   peter_e@gmx.net



Re: array functions - request for opinions (was Re: [PATCHES]

От
Joe Conway
Дата:
Peter Eisentraut wrote:
> Joe Conway writes:
>>There are many discussions on the lists over the past few years in which
>>people have requested this kind of functionality. There is even a
>>contrib/intagg that does this for integers only. I don't think there is
>>any question that there is a demand for the feature. Some examples:
> 
> These applications should use an empty array as initial value.  Then the
> normal array concatenation can be used as transition function.

How can you create an aggregate using an operator as a transition function?

=# CREATE AGGREGATE myagg
-# (
(#   BASETYPE = text,
(#   SFUNC = '||',
(#   STYPE = text,
(#   INITCOND = ''
(# );
ERROR:  AggregateCreate: function ||(text, text) does not exist

Also (I suppose you could argue this should be fixed, but anyway...), 
you can't currently add elements to an empty array.

regression=# create table t(f float8[]);
CREATE TABLE
regression=# insert into t values('{}');
INSERT 1865486 1
regression=# update t set f[1] = 1;
ERROR:  Invalid array subscripts
regression=# select array_dims(f) from t; array_dims
------------

(1 row)

Joe



Re: array functions - request for opinions (was Re: [PATCHES]

От
Andreas Pflug
Дата:
Joe Conway wrote:

> How can you create an aggregate using an operator as a transition 
> function?
>
> =# CREATE AGGREGATE myagg
> -# (
> (#   BASETYPE = text,
> (#   SFUNC = '||',
> (#   STYPE = text,
> (#   INITCOND = ''
> (# );
> ERROR:  AggregateCreate: function ||(text, text) does not exist 

Use the function underlying the operator, in this case textcat.

Regards,
Andreas




Re: array functions - request for opinions (was Re: [PATCHES]

От
Joe Conway
Дата:
Andreas Pflug wrote:
> Joe Conway wrote:
> 
>> How can you create an aggregate using an operator as a transition 
>> function?
>>
>> =# CREATE AGGREGATE myagg
>> -# (
>> (#   BASETYPE = text,
>> (#   SFUNC = '||',
>> (#   STYPE = text,
>> (#   INITCOND = ''
>> (# );
>> ERROR:  AggregateCreate: function ||(text, text) does not exist 
> 
> 
> Use the function underlying the operator, in this case textcat.
> 

Right, but Peter's desire is to *not document* the underlying function 
(this was just an example, we're really discussing the new array 
functions array_accum and/or array_append), and my point is that it 
needs to be documented if we expect people to use it.

Joe




Re: array functions - request for opinions (was Re: [PATCHES] array

От
Tom Lane
Дата:
Joe Conway <mail@joeconway.com> writes:
> How can you create an aggregate using an operator as a transition function?

You'd have to reference the underlying function.  I think what Peter is
really asking is whether the implementation function for || will serve
for this purpose, rather than making an additional one.

> Also (I suppose you could argue this should be fixed, but anyway...), 
> you can't currently add elements to an empty array.

That particular manifestation is not relevant; I think you mean this
one:

regression=# select '{}'::int4[] || 2;
ERROR:  Arrays greater than one-dimension are not supported

In both cases I think it's mostly a matter of defining a defensible
behavior, which is probably not that hard, but no one's yet gotten
annoyed enough to try to fix it.
        regards, tom lane


Re: array functions - request for opinions (was Re: [PATCHES]

От
Joe Conway
Дата:
Tom Lane wrote:
> You'd have to reference the underlying function.

That's what I thought, but I thought maybe I was missing something.

> I think what Peter is really asking is whether the implementation
> function for || will serve for this purpose, rather than making an
> additional one.

But, as I said in my other post a moment ago, that then implies (at 
least to me) that the underlying function needs to be documented as an 
"end user" function, rather than an "internal" one.

> That particular manifestation is not relevant; I think you mean this 
> one:
> 
> regression=# select '{}'::int4[] || 2; ERROR:  Arrays greater than
> one-dimension are not supported

I just wanted to show that the issue pre-existed the new functions. I 
think they are both caused by the fact that an empty array has *no* 
dimensions (clearly the error message in the new array functions could 
be better).

> In both cases I think it's mostly a matter of defining a defensible 
> behavior, which is probably not that hard, but no one's yet gotten 
> annoyed enough to try to fix it.

It is probably easy enough to work around in the array concatenation 
functions. Would it be defensible to say that ('{}'::int4[] || 2) should 
produce ('{2}'::int4[]), i.e. a one-dimensional integer array with one 
element?

Joe



Re: array functions - request for opinions (was Re: [PATCHES] array

От
Tom Lane
Дата:
Joe Conway <mail@joeconway.com> writes:
> Tom Lane wrote:
>> In both cases I think it's mostly a matter of defining a defensible 
>> behavior, which is probably not that hard, but no one's yet gotten 
>> annoyed enough to try to fix it.

> It is probably easy enough to work around in the array concatenation 
> functions. Would it be defensible to say that ('{}'::int4[] || 2) should 
> produce ('{2}'::int4[]), i.e. a one-dimensional integer array with one 
> element?

You would also have to assume that the subscript lower bound is one,
which doesn't bother me but is an additional bit of state that has to
appear out of nowhere.  (In the assignment case you don't have to assume
that, since the subscript tells you what to do.)
        regards, tom lane


Re: array functions - request for opinions (was Re: [PATCHES]

От
Joe Conway
Дата:
Tom Lane wrote:
> You would also have to assume that the subscript lower bound is one,
> which doesn't bother me but is an additional bit of state that has to
> appear out of nowhere.  (In the assignment case you don't have to assume
> that, since the subscript tells you what to do.)

I've gotten this working for array concatenation and assignment.

Examples:

-- empty array concatenated with any element, return one element,
-- one-dimensional array, with lower bound set to 1
regression=# select '{}'::int4[] || 1; ?column?
---------- {1}
(1 row)

regression=# select 0 || '{}'::int4[]; ?column?
---------- {0}
(1 row)

regression=# select array_dims(0 || '{}'::int4[]); array_dims
------------ [1:1]
(1 row)

-- empty array concatenated with any non-empty, return the non-empty one
regression=# select '{}'::int4[] || array[[[1,2],[3,4]]];    ?column?
----------------- {{{1,2},{3,4}}}
(1 row)

-- concatenate two empty arrays, return empty array
regression=# select '{}'::int4[] || '{}'::int4[]; ?column?
---------- {}
(1 row)

-- assignment to empty array: determine number
-- of dimensions and array subscripts based on those
-- given in the assignment statement
regression=# create table t(f float8[]);
CREATE TABLE
regression=# insert into t values('{}');
INSERT 2011035 1
regression=# update t set f[-2:2] = array[1,2,3,4,5];
UPDATE 1
regression=# select * from t;      f
------------- {1,2,3,4,5}
(1 row)

regression=# select array_dims(f) from t; array_dims
------------ [-2:2]
(1 row)


One question, should this work to create an empty array:

regression=# select array[];
ERROR:  parser: parse error at or near "]" at character 14

or by analogy to '{}'::int4[]

regression=# select array[]::int4[];
ERROR:  parser: parse error at or near "]" at character 14

Or is the current '{}'::int4[] syntax all we want/need?

Joe



Re: array functions - request for opinions (was Re: [PATCHES] array

От
Tom Lane
Дата:
Joe Conway <mail@joeconway.com> writes:
> One question, should this work to create an empty array:

> regression=# select array[];
> ERROR:  parser: parse error at or near "]" at character 14

Only if you can specify what type it has.  This seems to get back
to our off-list discussion about what to do with array of unknown.
        regards, tom lane


Re: array functions - request for opinions (was Re: [PATCHES]

От
Joe Conway
Дата:
Tom Lane wrote:
> Joe Conway <mail@joeconway.com> writes:
>>regression=# select array[];
>>ERROR:  parser: parse error at or near "]" at character 14
> 
> Only if you can specify what type it has.  This seems to get back
> to our off-list discussion about what to do with array of unknown.
> 

Yeah, without an unknown[] data type, I think a special case hack would 
be needed even for the array[]::int4[] example, because the coercion to 
int4[] happens too late.

I guess '{}'::arraytype[] will have to do for now.

Joe