Re: Is Hash Agg being used? 7.4 seems to handle this query worse than 7.3
От | Greg Stark |
---|---|
Тема | Re: Is Hash Agg being used? 7.4 seems to handle this query worse than 7.3 |
Дата | |
Msg-id | 87n0l1ocep.fsf@stark.dyndns.tv обсуждение исходный текст |
Ответ на | Re: Is Hash Agg being used? 7.4 seems to handle this query worse than 7.3 (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-general |
Tom Lane <tgl@sss.pgh.pa.us> writes: > Greg Stark <gsstark@mit.edu> writes: > > There is a simple case that isn't being handled. A straight DISTINCT is > > exactly equivalent to a GROUP BY on all the columns listed. Right now it's > > still doing a Sort+Unique. > > True. The DISTINCT logic is so intertwined with ORDER BY that I > couldn't see an easy way to consider a hash-based alternative in the > planner. I think it would require querytree changes propagating all the > way back to the parser to make this feasible at all, and then you'd > still need to think about how to make an honest cost-estimate comparison > of the alternatives (preferably without planning the whole query twice). > Anyone want to dig into it? So my thinking is that DISTINCT and DISTINCT ON are really just special cases of GROUP BY. instead of having separate code paths for them I would suggest rewriting them to GROUP BY, using a "first()" aggregate function for the DISTINCT ON case. Then let the normal logic handle it. The only disadvantage is that if there is an index it's possible to implement DISTINCT ON more efficiently than arbitrary aggregate functions. Is that the case currently for the Unique node? If we have a way of marking arbitrary aggregate functions as only requiring the first or last record then we could notice that the only aggregate in rewritten DISTINCT ON queries is first() which only requires the first record of each value and optimize the index scan. As a bonus this would solve the min()/max() issue as well. -- greg
В списке pgsql-general по дате отправления: