Re: Parallelized polymorphic aggs, and aggtype vs aggoutputtype

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Parallelized polymorphic aggs, and aggtype vs aggoutputtype
Дата
Msg-id 27247.1466185504@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Parallelized polymorphic aggs, and aggtype vs aggoutputtype  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Parallelized polymorphic aggs, and aggtype vs aggoutputtype  (Robert Haas <robertmhaas@gmail.com>)
Re: Parallelized polymorphic aggs, and aggtype vs aggoutputtype  (David Rowley <david.rowley@2ndquadrant.com>)
Список pgsql-hackers
I wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> I don't mind if you feel the need to refactor this.

> I'm not sure yet.  What I think we need to work out first is exactly
> how polymorphic parallel aggregates ought to identify which concrete
> data types they are using.  There were well-defined rules before,
> but how do we adapt those to two levels of aggregate evaluation?

After reviewing things a bit, it seems like the first design-level issue
there is what about polymorphic combine functions.  So far as the type
system is concerned, there's no issue: the declared signature is always
going to be "combine(foo, foo) returns foo" and it doesn't matter whether
foo is an ordinary type, a polymorphic one, or INTERNAL.  The other
question is whether the function itself knows what it's operating on in
the current invocation.  For regular polymorphic types this seems no
different from any other usage.  If the transtype is INTERNAL, then the
type system provides no help; but we could reasonably assume that the
internal representation itself contains as much information as the combine
func needs.  We could add extra arguments like the "finalfunc_extra"
option does for the finalfunc, but I don't really see a need --- that hack
is mainly to satisfy the type system that the finalfunc's signature is
sane, not to tell the finalfunc something it has no other way to find out.
So I think we're probably OK as far as the combine function is concerned.

The situation is much direr as far as serialize/deserialize are concerned.
These basically don't work at all for polymorphic transtypes: if you try
to declare "deserialize(bytea) returns anyarray", the type system won't
let you.  Perhaps that's not an issue because you shouldn't really need
serialize/deserialize for anything except INTERNAL transtype.  However,
there's a far bigger problem which is that "deserialize(bytea) returns
internal" is a blatant security hole, which I see you ignored the defense
against in opr_sanity.sql.  The claim in the comment there that it's okay
if we check for aggregate context is a joke --- or haven't you heard that
we allow users to create aggregates?  That's not nearly good enough to
prevent unsafe usage of such functions.  Not to mention that CREATE
FUNCTION won't allow creation of such functions, so extensions are locked
out of using this feature.

This has to be redesigned or else reverted entirely.  I'm not going to
take no for an answer.

A possible solution is to give deserialize an extra dummy argument, along
the lines of "deserialize(bytea, internal) returns internal", thereby
ensuring it can't be called in any non-system-originated contexts.  This
is still rather dangerous if the other argument is variable, as somebody
might be able to abuse an internal-taking function by naming it as the
deserialize function for a maliciously-designed aggregate.  What I'm
inclined to do to lock it down further is to drop the "serialtype"
argument to CREATE AGGREGATE, which seems rather pointless (what else
would you ever use besides bytea?).  Instead, insist that
serialize/deserialize apply *only* when the transtype is INTERNAL, and
their signatures are exactly "serialize(internal) returns bytea" and
"deserialize(bytea, internal) returns internal", never anything else.

A different way of locking things down, which might be cleaner in the
long run, is to invent a new pseudo-type for the sole purpose of being
the serialization type, that is we'd insist on the signatures being
"serialize(internal) returns serialized_internal" and
"deserialize(serialized_internal) returns internal", where
serialized_internal has the representation properties of bytea but is
usable for no other purpose than this.  Not sure it's worth the trouble
though.

Anyway, setting aside the security angle, it doesn't seem like there is
anything wrong with the high-level design for polymorphic cases.  I'm now
thinking the problem I saw is just a garden variety implementation bug
(or bugs).  I'm still not very happy with the confusion around aggtype
though, not least because I suspect it contributed directly to this bug.
I also feel that both the code comments and the user-facing documentation
for this feature are well below project standards, eg where's the
discussion in section 35.10?
        regards, tom lane



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: 10.0
Следующее
От: Aleksey Demakov
Дата:
Сообщение: Re: Experimental dynamic memory allocation of postgresql shared memory