Обсуждение: Cannot create an aggregate function with variadic parameters andenabled for parallel execution

Поиск
Список
Период
Сортировка

Cannot create an aggregate function with variadic parameters andenabled for parallel execution

От
Alexey Bashtanov
Дата:
When trying to improve my extension Argm to work with parallel queries I 
found that
CREATE AGGREGATE wants serial/deserial/combine functions
to have variadic any parameter if the aggregate function itself is 
variadic any.
If I add the requested parameter, CREATE AGGREGATE still cannot find the 
function.

Having an analog of [m]finalfunc_extra for serial/deserial/combine functions
may be a good idea, but it's not yet there. So the attached patch makes 
postgres
skip the check for the parallel support functions.

I would appreciate if someone could have a look at the patch.
The problem is probably actual for 9.6+, so back-patch would be nice if 
there's no objections.

Should one be curious how to reproduce the problem, please clone 
https://github.com/bashtanov/argm/tree/parallel
and try make && make install && make installcheck. This fails without 
the patch, but works with it.
The error message is "function argmax_combine(internal, internal) must 
accept VARIADIC ANY to be used in this aggregate".
(Just in case: this "parallel" branch is a WIP, please use "master" 
branch if you need something stable).

Best regards,
   Alexey

Вложения
An easier way to reproduce the problem:

create aggregate weird_concat (foo variadic "any") (
     sfunc = concat_ws,
     stype = text,
     combinefunc = textcat
);

Hope it helps.

Best regards,
   Alexey


On 14/05/18 00:44, Alexey Bashtanov wrote:
> When trying to improve my extension Argm to work with parallel queries 
> I found that
> CREATE AGGREGATE wants serial/deserial/combine functions
> to have variadic any parameter if the aggregate function itself is 
> variadic any.
> If I add the requested parameter, CREATE AGGREGATE still cannot find 
> the function.
>
> Having an analog of [m]finalfunc_extra for serial/deserial/combine 
> functions
> may be a good idea, but it's not yet there. So the attached patch 
> makes postgres
> skip the check for the parallel support functions.
>
> I would appreciate if someone could have a look at the patch.
> The problem is probably actual for 9.6+, so back-patch would be nice 
> if there's no objections.
>
> Should one be curious how to reproduce the problem, please clone 
> https://github.com/bashtanov/argm/tree/parallel
> and try make && make install && make installcheck. This fails without 
> the patch, but works with it.
> The error message is "function argmax_combine(internal, internal) must 
> accept VARIADIC ANY to be used in this aggregate".
> (Just in case: this "parallel" branch is a WIP, please use "master" 
> branch if you need something stable).
>
> Best regards,
>   Alexey



Alexey Bashtanov <bashtanov@imap.cc> writes:
> An easier way to reproduce the problem:
> create aggregate weird_concat (foo variadic "any") (
>      sfunc = concat_ws,
>      stype = text,
>      combinefunc = textcat
> );

Thanks for the self-contained test case.  I concur this is broken.

The question we need to consider before fixing this, I think, is whether
we need to do anything immediately about the situation for polymorphic
aggregates.  If we just push your fix as-is, we'll be creating a
backwards compatibility constraint, so we had better get it right.

I believe that in fact it's okay, or at least we can define it as okay.
There are two components of the problem: can the support functions be
declared legally, and do they need additional type info at runtime
to do their work?

For the combine function, it seems fine to just say that the function
signature is alway "combine(stype, stype) returns stype" even in
polymorphic cases.  "combine(anyarray, anyarray) returns anyarray"
is a perfectly legal function declaration, so there's no problem at
that end.  At runtime, the combine function could inspect its argument
to find out what the array element type is.  In cases that are like
"combine(internal, internal) returns internal", the function declaration
is still acceptable.  The representation of the internal-type state
value would need to contain enough information for the combine function
to do its work without additional type lookups --- but that would likely
need to be true anyway, so it does not seem like a big penalty.

For the serial/deserial functions, the function signatures are
prespecified and don't depend on the aggregate argument types, so
there's no declaration issue.  Again, the state value would need
to be sufficiently self-contained for the functions to not need
additional info to figure out how to serialize it --- but again,
I do not see much problem there.

So I think we're okay with solving this as per your patch, though
it needs some more comments IMO.  Will push.

BTW, I noticed while looking at this that this error message in
nodeAgg.c seems to be exactly backwards:

        /*
         * Ensure that a combine function to combine INTERNAL states is not
         * strict. This should have been checked during CREATE AGGREGATE, but
         * the strict property could have been changed since then.
         */
        if (pertrans->transfn.fn_strict && aggtranstype == INTERNALOID)
            ereport(ERROR,
                    (errcode(ERRCODE_INVALID_FUNCTION_DEFINITION),
                     errmsg("combine function for aggregate %u must be declared as STRICT",
                            aggref->aggfnoid)));
    }

Should be "must *not* be declared STRICT", no?  I wonder also why this
message isn't worded exactly like the corresponding one in pg_aggregate.c,
which says

                     errmsg("combine function with transition type %s must not be declared STRICT",
                            format_type_be(aggTransType))));

It doesn't really seem worth making translators expend effort on a variant
wording that has clearly never once been seen by a user (or they'd have
reported this as a bug).

            regards, tom lane



On 15/05/18 17:52, Tom Lane wrote:
>
> For the combine function, it seems fine to just say that the function
> signature is alway "combine(stype, stype) returns stype" even in
> polymorphic cases.  "combine(anyarray, anyarray) returns anyarray"
> is a perfectly legal function declaration, so there's no problem at
> that end.  At runtime, the combine function could inspect its argument
> to find out what the array element type is.  In cases that are like
> "combine(internal, internal) returns internal", the function declaration
> is still acceptable.  The representation of the internal-type state
> value would need to contain enough information for the combine function
> to do its work without additional type lookups --- but that would likely
> need to be true anyway, so it does not seem like a big penalty.
>
> For the serial/deserial functions, the function signatures are
> prespecified and don't depend on the aggregate argument types, so
> there's no declaration issue.  Again, the state value would need
> to be sufficiently self-contained for the functions to not need
> additional info to figure out how to serialize it --- but again,
> I do not see much problem there.

Sounds reasonable, aggregate arguments type info is not necessary for
combine/de-/serialize declarations, and for execution it could be
passed in the state variable.


> So I think we're okay with solving this as per your patch, though
> it needs some more comments IMO.  Will push.

Do you mean it needs to be discussed more, or it needs more
inline code comments?

>
> BTW, I noticed while looking at this that this error message in
> nodeAgg.c seems to be exactly backwards:
>
>          /*
>           * Ensure that a combine function to combine INTERNAL states is not
>           * strict. This should have been checked during CREATE AGGREGATE, but
>           * the strict property could have been changed since then.
>           */
>          if (pertrans->transfn.fn_strict && aggtranstype == INTERNALOID)
>              ereport(ERROR,
>                      (errcode(ERRCODE_INVALID_FUNCTION_DEFINITION),
>                       errmsg("combine function for aggregate %u must be declared as STRICT",
>                              aggref->aggfnoid)));
>      }
>
> Should be "must *not* be declared STRICT", no?

Agree, this must have been a typo.