Re: Memory-Bounded Hash Aggregation

Поиск
Список
Период
Сортировка
От Jeff Davis
Тема Re: Memory-Bounded Hash Aggregation
Дата
Msg-id ee3784f2d54a37a8c93719704f57a8aa50a98d77.camel@j-davis.com
обсуждение исходный текст
Ответ на Re: Memory-Bounded Hash Aggregation  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Список pgsql-hackers
On Wed, 2020-02-19 at 20:16 +0100, Tomas Vondra wrote:
> 5) Assert(nbuckets > 0);

...

> 6) Another thing that occurred to me was what happens to grouping
> sets,
> which we can't spill to disk. So I did this:

...

> which fails with segfault at execution time:

The biggest problem was that my grouping sets test was not testing
multiple hash tables spilling, so a couple bugs crept in. I fixed them,
thank you.

To fix the tests, I also had to fix the GUCs and the way the planner
uses them with my patch. In master, grouping sets are planned by
generating a path that tries to do as many grouping sets with hashing
as possible (limited by work_mem). But with my patch, the notion of
fitting hash tables in work_mem is not necessarily important. If we
ignore work_mem during path generation entirely (and only consider it
during costing and execution), it will change quite a few plans and
undermine the concept of mixed aggregates entirely. That may be a good
thing to do eventually as a simplification, but for now it seems like
too much, so I am preserving the notion of trying to fit hash tables in
work_mem to create mixed aggregates.

But that created the testing problem: I need a reliable way to get
grouping sets with several hash tables in memory that are all spilling,
but the planner is trying to avoid exactly that. So, I am introducing a
new GUC called enable_groupingsets_hash_disk (better name suggestions
welcome), defaulting it to "off" (and turned on during the test).

Additionally, I removed the other GUCs I introduced in earlier versions
of this patch. They were basically designed around the idea to revert
back to the previous hash aggregation behavior if desired (by setting
enable_hashagg_spill=false and hashagg_mem_overflow=true). That makes
some sense, but that was already covered pretty well by existing GUCs.
If you want to use HashAgg without spilling, just set work_mem higher;
and if you want to avoid the planner from choosing HashAgg at all, you
set enable_hashagg=false. So I just got rid of enable_hashagg_spill and
hashagg_mem_overflow.

I didn't forget about your explain-related suggestions. I'll address
them in the next patch.

Regards,
    Jeff Davis




Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Jeff Davis
Дата:
Сообщение: Re: Memory-Bounded Hash Aggregation
Следующее
От: Michail Nikolaev
Дата:
Сообщение: Re: BUG #16108: Colorization to the output of command-line hasunproperly behaviors at Windows platform