Re: Implementation of GROUPING SETS (T431: Extended grouping capabilities)
От | Pavel Stehule |
---|---|
Тема | Re: Implementation of GROUPING SETS (T431: Extended grouping capabilities) |
Дата | |
Msg-id | 162867790908131005j54312e26h5fb12525f64b2c50@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Implementation of GROUPING SETS (T431: Extended grouping capabilities) (Олег Царев <zabivator@gmail.com>) |
Список | pgsql-hackers |
2009/8/13 Олег Царев <zabivator@gmail.com>: > 2009/8/13 Hitoshi Harada <umi.tanuki@gmail.com>: >> 2009/8/8 Alvaro Herrera <alvherre@commandprompt.com>: >>> Олег Царев escribió: >>>> Hello all! >>>> If no one objecte (all agree, in other say) i continue work on patch - >>>> particulary, i want support second strategy (tuple store instead of >>>> hash-table) for save order of source (more cheap solution in case with >>>> grouping sets + order by), investigate and brainstorm another >>>> optimisation, writing regression tests and technical documentation. >>>> But I need some time for complete my investigation internals of >>>> PostgreSQL, particulary CTE. >>> >>> Where are we on this patch? Is it moving forward? >>> >> >> It seems to me that the patch goes backward. >> >> I looked trough the gsets-0.6.diff for about an hour, and found it is >> now only a syntax sugar that builds multiple GROUP BY queries based on >> CTE functionality. There's no executor modification. >> >> If I remember correctly, the original patch touched executor parts. >> I'd buy if the GROUPING SETS touches executor but I don't if this is >> only syntax sugar, because you can write it as the same by yourself >> without GROUPING SETS syntax. The motivation we push this forward is >> performance that cannot be made by rewriting query, I guess. >> >> Because GROUP BY we have today is a subset of GROUPING SETS by >> definition, I suppose we'll refactor nodeAgg.c so that it is allowed >> to take multiple group definitions. And we must support both of >> HashAgg and GroupAgg. For HashAgg, it is easier in any case as the >> earlier patch does. For GroupAgg, it is a bit complicated since we >> sort by different key sets. >> >> When we want GROUPING SET(a, b), at first we sort by a and aggregate >> then sort by b and aggregate. This is the same as: >> >> select a, null, count(*) from x group by a >> union all >> select null, b, count(*) from x group by b >> >> so nothing better than query rewriting unless we invent something new. >> >> But in case of sub total and grand total like ROLLUP query, GroupAgg >> can do it by one-time scan by having multiple life cycle PerGroup >> state. >> >> Anyway, before going ahead we need to find rough sketch of how to >> implement this feature. Only syntax sugar is acceptable? Or internal >> executor support is necessary? >> >> >> Regards, >> >> >> -- >> Hitoshi Harada >> > > All rights, exclude >> Because GROUP BY we have today is a subset of GROUPING SETS by >> definition, I suppose we'll refactor nodeAgg.c so that it is allowed >> to take multiple group definitions. And we must support both of >> HashAgg and GroupAgg. For HashAgg, it is easier in any case as the >> earlier patch does. For GroupAgg, it is a bit complicated since we >> sort by different key sets. > because group by it's optimized version of grouping sets. > Of course, we can extend the current definition of group by, but we > regress perfomance of it. > Some questions for you: > > How calcualte aggregation on ROLLUP on single pass? > Stupid way - store different buffer of aggregations for every group, > and accumulate every record on group for every calculator. When a > group has changed, return key of this group to output set with NULL > for fields not contains in this group, and restart current buffer of > aggregation. > Better way - add operation "merge aggregations", and calculate one > buffer on every group, when group has cnahged - merge this "main > buffer" to other, and return some intermediate result. > I don't thing, so this is possible for all operations. Don't forgot. People can to implement own aggregates. example: weighted average regards Pavel Stehule > I think, support this of grouping operation isn't simple, and > different implementation of ROLLUP it's better. > > Regards, Tsarev Oleg >
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Hitoshi HaradaДата:
Сообщение: Re: Implementation of GROUPING SETS (T431: Extended grouping capabilities)