Re: 9.5: Better memory accounting, towards memory-bounded HashAgg
От | Tomas Vondra |
---|---|
Тема | Re: 9.5: Better memory accounting, towards memory-bounded HashAgg |
Дата | |
Msg-id | 53E27895.1010505@fuzzy.cz обсуждение исходный текст |
Ответ на | 9.5: Better memory accounting, towards memory-bounded HashAgg (Jeff Davis <pgsql@j-davis.com>) |
Список | pgsql-hackers |
On 2.8.2014 22:40, Jeff Davis wrote: > Attached is a patch that explicitly tracks allocated memory (the blocks, > not the chunks) for each memory context, as well as its children. > > This is a prerequisite for memory-bounded HashAgg, which I intend to Anyway, I'm really looking forward to the memory-bounded hashagg, and I'm willing to spend some time testing it. > submit for the next CF. Hashjoin tracks the tuple sizes that it adds to > the hash table, which is a good estimate for Hashjoin. But I don't think Actually, even for HashJoin the estimate is pretty bad, and varies a lot depending on the tuple width. With "narrow" tuples (e.g. ~40B of data), which is actually quite common case, you easily get ~100% palloc overhead. I managed to address that by using a custom allocator. See this: https://commitfest.postgresql.org/action/patch_view?id=1503 I wonder whether something like that would be possible for the hashagg? That would make the memory accounting accurate with 0% overhead (because it's not messing with the memory context at all), but it only for the one node (but maybe that's OK?). > it's as easy for Hashagg, for which we need to track transition values, > etc. (also, for HashAgg, I expect that the overhead will be more > significant than for Hashjoin). If we track the space used by the memory > contexts directly, it's easier and more accurate. I don't think that's comparable - I can easily think about cases leading to extreme palloc overhead with HashAgg (think of an aggregate implementing COUNT DISTINCT - that effectively needs to store all the values, and with short values the palloc overhead will be terrible). Actually, I was looking at HashAgg (it's a somehow natural direction after messing with Hash Join), and my plan was to use a similar dense allocation approach. The trickiest part would probably me making this available from the custom aggregates. regards Tomas
В списке pgsql-hackers по дате отправления: