Avoiding double-counting in aggregates with more than one join?
От | Paul Jungwirth |
---|---|
Тема | Avoiding double-counting in aggregates with more than one join? |
Дата | |
Msg-id | 86b9ec78-925c-1935-bc9d-6bad4ceb1f40@illuminatedcomputing.com обсуждение исходный текст |
Ответы |
Re: Avoiding double-counting in aggregates with more than
one join?
|
Список | pgsql-general |
Hi All, I've noticed in the past that doing aggregates while joining to more than one table can sometimes give you unintended results. For example, suppose I have three tables: products, sales, and resupplies. In sales I track what I sell, and in resupplies I track my own purchases to increase inventory. Both have a foreign key to products. Now I want to run a report showing the total dollars sold for each product versus the total dollars spent for each product. I could try this: SELECT p.id, SUM(s.price * s.qty) AS total_sold, SUM(r.price * r.qty) AS total_spent FROM products p LEFT OUTER JOIN sales s ON s.product_id = p.id LEFT OUTER JOIN resupplies r ON r.product_id = p.id GROUP BY p.id ; That seems pretty safe, but actually I get bad answers, for example if some product has this data: sales ----- sold 1 @ $2/ea resupplies ---------- bought 1 @ $1/eq bought 2 @ $1/ea Then pre-grouping I have this: p.id | s.qty | s.price | r.qty | r.price -----+-------+---------+-------+-------- 1 | 1 | $2 | 1 | $1 1 | 1 | $2 | 2 | $1 You can see the problem is that I'm going to double-count my sales. What I really want is this: p.id | s.qty | s.price | r.qty | r.price -----+-------+---------+-------+-------- 1 | 1 | $2 | 1 | $1 1 | | | 2 | $1 In the past I've always handled these situations by aggregating each table separately and only then joining things together: WITH s AS ( SELECT product_id, SUM(price * qty) AS total_sold FROM sales GROUP BY product_id) s ), r AS ( SELECT product_id, SUM(price * qty) AS total_spent FROM resupplies GROUP BY product_id) r ) SELECT p.id, COALESCE(s.total_sold, 0), COALESCE(r.total_spent, 0) FROM products p LEFT OUTER JOIN s ON s.product_id = p.id LEFT OUTER JOIN r ON r.product_id = p.id ; Since I've guaranteed that each CTE includes at most one row per product, this is safe from double-counting errors. But is there a better way? My approach feels verbose and harder to read. Also you have to type COALESCE a lot. :-) Is there some different way of doing things I haven't thought of yet? Also I wonder about the performance merging all these subqueries together. Would the final merging be any faster if I had an ORDER BY in each CTE? It seems like this pattern comes up a lot; what have others done about it? Thanks, Paul
В списке pgsql-general по дате отправления: