Work plan: aggregate(DISTINCT ...)
От | Tom Lane |
---|---|
Тема | Work plan: aggregate(DISTINCT ...) |
Дата | |
Msg-id | 26844.945026147@sss.pgh.pa.us обсуждение исходный текст |
Список | pgsql-hackers |
As I mentioned in passing a day or two ago, I've figured out how to support aggregates whose input is flagged DISTINCT without too much pain. Basically it can all be done inside nodeAgg.c, once we teach the parser to put the DISTINCT flag bit into Aggref querytree nodes. (a) If DISTINCT is not specified for a particular aggregate, then nodeAgg.c runs the aggregate's transition function(s) as each input tuple is presented, same as now. (b) If DISTINCT is specified, then nodeAgg.c evaluates the aggregate's input expression at each input tuple, and passes the resulting datum into a sort operation that it's started. (Now that tuplesort.c has a fairly clean object-based interface, it will be easy to start up a separate sort operation for each DISTINCT aggregate.) (c) At the end of the input table (or row group), nodeAgg.c does this for each DISTINCT aggregate: * finish the pending sort operation; * scan the sort output, drop adjacent duplicate values(the code for this can be borrowed from nodeUnique), and run the aggregate's transition function(s) for each remainingvalue. Finally, the aggregate result values can be computed for all the aggregates (both DISTINCT and regular), and then the output tuple can be formed. This is looking like a day's work at most, and considering how often it gets asked for, I think it's well worth doing. A limitation of this approach is that an explicit sort of the aggregate input values will always be done, even when the input is or could be delivered in the right order anyway. It is certainly *necessary* that nodeAgg.c be able to do internal sorts on-the-fly, in order to cope with multiple DISTINCT aggregates, egSELECT COUNT(DISTINCT foo), AVG(DISTINCT bar) FROM table; since there is no way to scan the table in an order that's sorted for both simultaneously. But in simpler cases it might be a win if the optimizer generated a plan that delivered the data in the right order and nodeAgg.c could be told to skip the internal sort for a DISTINCT aggregate. I'm not going to worry about that now, but it's a possible future improvement. regards, tom lane
В списке pgsql-hackers по дате отправления: