Speeding aggregates by splitting the work
От | Martijn van Oosterhout |
---|---|
Тема | Speeding aggregates by splitting the work |
Дата | |
Msg-id | 20011004134911.A25410@svana.org обсуждение исходный текст |
Список | pgsql-general |
I currently have a query that calculates a sum of a field grouping by some fields in other tables, like this: select a.aid, b.bid, c.cid, sum(d.amount) from a, b, c, d where a.aid = b.aid and b.bid = c.bid and c.cid = d.cid group by a.aid, b.bid, c.cid; Those ids are all primary keys of their respective tables. A has about 20 entries, B has about 2000, C has about 3000 and D has about 2 million. What happens is that a huge join is done, sorted and then aggregated. The sort takes a huge amount of memory and slows everything down. My idea is to have the planner do the equivalent of: select d.cid, sum(d.amount) from d; Which will be about 3,000 rows, and *then* do the join with A, B and C to produce the final result. As a bonus (at least under the current arrangement), the result would be sorted by d.cid, lending itself to merge joining. This works because (as far as I can see) sum(x) == sum(sum(x)). Is this feasable or am I missing some corner cases? Generalising is harder. count(x) == sum(count(x)) and avg doesn't work obviously at all. However, avg = sum/count, both of which can be simplified according to the earlier rules. Naturally you could handcraft this but it would possibly be nicer for the planner to do it. Basically, it works because by sorting earlier the width is less thus not as much memory is required to do the sorting. If my logic is wrong, let me know. When I get back home I'll have to check whether the amount of required memory is taken into account for a sort. -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Magnetism, electricity and motion are like a three-for-two special offer: > if you have two of them, the third one comes free.
В списке pgsql-general по дате отправления: