Re: Eager aggregation, take 3
От | Richard Guo |
---|---|
Тема | Re: Eager aggregation, take 3 |
Дата | |
Msg-id | CAMbWs4-2cVfBk1HNGtqV1QFo2yKnzdxLy0BAqQaJHBt+8+kspw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Eager aggregation, take 3 (Richard Guo <guofenglinux@gmail.com>) |
Ответы |
Re: Eager aggregation, take 3
|
Список | pgsql-hackers |
I've run TPC-DS again to compare planning times with and without eager aggregation. Out of 99 queries, only one query (query 64) shows a noticeable increase in planning time. This query performs inner joins across 38 tables. This is a very large search space. (I'm talking about the standard join search method, not the GEQO.) If my math doesn't fail me, the maximum number of different join orders when joining n tables is: Catalan(n − 1) x n!. For n = 38, this number is astronomically large. In practice, query 64 joins 19 tables twice (due to a CTE), which still results in about 3.4E28 different join orders. Of course, in practice, with the help of join_collapse_limit and other heuristics, the effective search space is reduced a lot, but even then, it remains very large. Given this, I'm not too surprised that query 64 shows an increase in planning time when eager aggregation is applied -- exploring the best join order in such a space is inherently expensive. That said, I've identified a few performance hotspots that can be optimized to help reduce planning time: 1) the exprs_known_equal() call in get_expression_sortgroupref(), which is used to check if a given expression is known equal to a grouping expression due to ECs. We can optimize this by storing the EC of each grouping expression, and then get_expression_sortgroupref() would only need to search the relevant EC, rather than scanning all of them. 2) the estimate_num_groups() call in create_rel_agg_info(). We can optimize this by avoiding unnecessary calls to estimate_num_groups() where possible. Attached is an updated version of the patch with these optimizations applied. With this patch, the planning times for query 64, with and without eager aggregation, are: -- with eager aggregation Planning Time: 9432.042 ms -- without eager aggregation Planning Time: 7196.999 ms I think the increase in planning time is acceptable given the large search space involved, though I may be biased. - Richard
Вложения
В списке pgsql-hackers по дате отправления: