Re: Memory-Bounded Hash Aggregation
От | Jeff Davis |
---|---|
Тема | Re: Memory-Bounded Hash Aggregation |
Дата | |
Msg-id | ee3784f2d54a37a8c93719704f57a8aa50a98d77.camel@j-davis.com обсуждение исходный текст |
Ответ на | Re: Memory-Bounded Hash Aggregation (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Список | pgsql-hackers |
On Wed, 2020-02-19 at 20:16 +0100, Tomas Vondra wrote: > 5) Assert(nbuckets > 0); ... > 6) Another thing that occurred to me was what happens to grouping > sets, > which we can't spill to disk. So I did this: ... > which fails with segfault at execution time: The biggest problem was that my grouping sets test was not testing multiple hash tables spilling, so a couple bugs crept in. I fixed them, thank you. To fix the tests, I also had to fix the GUCs and the way the planner uses them with my patch. In master, grouping sets are planned by generating a path that tries to do as many grouping sets with hashing as possible (limited by work_mem). But with my patch, the notion of fitting hash tables in work_mem is not necessarily important. If we ignore work_mem during path generation entirely (and only consider it during costing and execution), it will change quite a few plans and undermine the concept of mixed aggregates entirely. That may be a good thing to do eventually as a simplification, but for now it seems like too much, so I am preserving the notion of trying to fit hash tables in work_mem to create mixed aggregates. But that created the testing problem: I need a reliable way to get grouping sets with several hash tables in memory that are all spilling, but the planner is trying to avoid exactly that. So, I am introducing a new GUC called enable_groupingsets_hash_disk (better name suggestions welcome), defaulting it to "off" (and turned on during the test). Additionally, I removed the other GUCs I introduced in earlier versions of this patch. They were basically designed around the idea to revert back to the previous hash aggregation behavior if desired (by setting enable_hashagg_spill=false and hashagg_mem_overflow=true). That makes some sense, but that was already covered pretty well by existing GUCs. If you want to use HashAgg without spilling, just set work_mem higher; and if you want to avoid the planner from choosing HashAgg at all, you set enable_hashagg=false. So I just got rid of enable_hashagg_spill and hashagg_mem_overflow. I didn't forget about your explain-related suggestions. I'll address them in the next patch. Regards, Jeff Davis
Вложения
В списке pgsql-hackers по дате отправления: