Re: Eager aggregation, take 3
От | Richard Guo |
---|---|
Тема | Re: Eager aggregation, take 3 |
Дата | |
Msg-id | CAMbWs49bL2ZMSc0W4G8=R7bjaa-vO6grucEOFYLZFUZE7+nzrQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Eager aggregation, take 3 (Richard Guo <guofenglinux@gmail.com>) |
Ответы |
Re: Eager aggregation, take 3
|
Список | pgsql-hackers |
On Thu, Jul 24, 2025 at 12:21 PM Richard Guo <guofenglinux@gmail.com> wrote: > This patch no longer applies; here's a rebased version. Nothing > essential has changed. Based on some off-list testing by Matheus (CC'ed), several TPC-DS queries that used to apply eager aggregation no longer do, which suggests that the v18 patch is too strict about when eager aggregation can be used. I looked into query 4 and query 11, and found two reasons why they no longer apply eager aggregation with v18. * The has_internal_aggtranstype() check. To avoid potential memory blowout risks from large partial aggregation values, v18 avoids applying eager aggregation if any aggregate uses an INTERNAL transition type, as this typically indicates a large internal data structure (as in string_agg or array_agg). However, this also excludes aggregates like avg(numeric) and sum(numeric), which are actually safe to use with eager aggregation. What we really want to exclude are aggregate functions that can produce large transition values by accumulating or concatenating input rows. So I'm wondering if we could instead check the transfn_oid directly and explicitly exclude only F_ARRAY_AGG_TRANSFN and F_STRING_AGG_TRANSFN. We don't need to worry about json_agg, jsonb_agg, or xmlagg, since they don't support partial aggregation anyway. * The EAGER_AGG_MIN_GROUP_SIZE threshold This threshold defines the minimum average group size required to consider applying eager aggregation. It was previously set to 2, but in v18 it was increased to 20 to be cautious about planning overhead. This change was a snap decision though, without any profiling or data to back it. Looking at TPC-DS queries 4 and 11, a threshold of 10 is the minimum needed to consider eager aggregation for them. The resulting plans show nice performance improvements without any measurable increase in planning time. So, I'm inclined to lower the threshold to 10 for now. (Wondering whether we should make this threshold a GUC, so users can adjust it based on their needs.) With these two changes, here are the planning and execution time for queries 4 and 11 (scale factor 1) on my snail-paced machine, with and without eager aggregation. query 4: -- without eager aggregation Planning Time: 6.765 ms Execution Time: 34941.713 ms -- with eager aggregation Planning Time: 6.674 ms Execution Time: 13994.183 ms query 11: -- without eager aggregation Planning Time: 3.757 ms Execution Time: 20888.076 ms -- with eager aggregation Planning Time: 3.747 ms Execution Time: 7449.522 ms Any comments on these two changes? Thanks Richard
В списке pgsql-hackers по дате отправления: