Re: HashAgg degenerate case
От | Andres Freund |
---|---|
Тема | Re: HashAgg degenerate case |
Дата | |
Msg-id | 3btw7dgj56k5samfzyrxnqdoiocbiycp44osza3iahzn3jqdro@ecxairlbdzqx обсуждение исходный текст |
Ответ на | Re: HashAgg degenerate case (Jeff Davis <pgsql@j-davis.com>) |
Список | pgsql-bugs |
Hi, On 2024-11-08 12:40:39 -0800, Jeff Davis wrote: > On Fri, 2024-11-08 at 10:48 -0800, Jeff Davis wrote: > > I can think of two approaches to solve it: > > Another thought: the point of spilling is to avoid using additional > memory by adding a group. > > If the bucket array is 89.8% full, adding a new group doesn't require > new buckets need to be allocated, so we only have the firstTuple and > pergroup data to worry about. If that causes the memory limit to be > exceeded, it's the perfect time to switch to spill mode. > But if the bucket array is 89.9% full, then adding a new group will > cause the bucket array to double. If that causes the memory limit to be > exceeded, then we can switch to spill mode, but it's wasteful to do so > because (a) we won't be using most of those new buckets; (b) the new > buckets will crowd out space for subsequent batches and even fewer of > the buckets will be used; and (c) the memory accounting can be off by > quite a bit. > What if we have a check where, if the metacxt is using more than 40% of > the memory, and if adding a new group would reach the grow_threshold, > then enter spill mode immediately? That assumes the bucket array is a significant portion of the memory usage - but that's not a given, right? We also might need to grow to keep the number of conflict chains at a manageable level. > To make this work, I think we either need to use a tuplehash_lookup() > followed by a tuplehash_insert() (two lookups for each new group), or we > would need a new API into simplehash like tuplehash_insert_without_growing() > that would return NULL instead of growing. The former seems entirely non-viable on performance grounds. The latter might be ok, but I'm vary on code complexity grounds. What about simply checking whether the bucket array was a significant portion of the memory usage at spill time and recreating the hashtable instead of resetting IFF it was? > This approach might not be backportable, but it might be a good approach for > 18+. I doubt any change around this ought to be backpatched. The likelihood of regressions seems to high. Greetings, Andres Freund
В списке pgsql-bugs по дате отправления: