Re: [PATCHES] array_accum aggregate
От | Tom Lane |
---|---|
Тема | Re: [PATCHES] array_accum aggregate |
Дата | |
Msg-id | 20294.1160693122@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: [PATCHES] array_accum aggregate (Stephen Frost <sfrost@snowman.net>) |
Ответы |
Re: [PATCHES] array_accum aggregate
|
Список | pgsql-hackers |
Stephen Frost <sfrost@snowman.net> writes: > Another alternative would be to provide a seperate area for each > aggregate to put any other information it needs. I'm not convinced that that's necessary --- the cases we have at hand suggest that the transition function is perfectly capable of doing the storage management it wants. The problem is how to declare to CREATE AGGREGATE that we're using a transition function of this kind rather than the "stupid" functions it expects. When the function is doing its own storage management, we'd really rather that nodeAgg.c stayed out of the way and didn't try to do any datum copying at all; having it copy a placeholder bytea or anyarray or whatever is really a waste of cycles, not to mention obscuring what is going on. If nodeAgg just provided a pass-by-value Datum, which the transition function could use to store a pointer to storage it's handling, things would be a lot cleaner. After a little bit of thought I'm tempted to propose that we handle this by inventing a new pseudotype called something like "aggregate_state", which'd be declared in the catalogs as pass-by-value, thereby suppressing useless copying activity in nodeAgg.c. You'd declare the aggregate as having stype = aggregate_state, and the transition function would have signature sfunc(aggregate_state, ... aggregate-input-type(s) ...) returns aggregate_state and the final function of course ffunc(aggregate_state) returns aggregate-result-type aggregate_state would have no other uses in the system, and its input and output functions would raise an error, so type safety is assured --- there would be no way to call either the sfunc or ffunc "manually", except by passing a NULL value, which should be safe because that's what they'd expect as the aggregate initial condition. One advantage of doing it this way is that the planner could be taught to recognize aggregates with stype = aggregate_state specially, and make allowance for the fact that they'll use more workspace than meets the eye. If we don't have something like this then the planner is likely to try to use hash aggregation in scenarios where it'd be absolutely fatal to do so. I'm not sure whether we'd want to completely forbid hash aggregation when any stype = aggregate_state is present, but for sure we want to assume that there's some pretty large amount of per-aggregate state we don't know about. regards, tom lane
В списке pgsql-hackers по дате отправления: