Re: Memory-Bounded Hash Aggregation
От | Jeff Davis |
---|---|
Тема | Re: Memory-Bounded Hash Aggregation |
Дата | |
Msg-id | e5566f7def33a9e9fdff337cca32d07155d7b635.camel@j-davis.com обсуждение исходный текст |
Ответ на | Re: Memory-Bounded Hash Aggregation (Heikki Linnakangas <hlinnaka@iki.fi>) |
Ответы |
Re: Memory-Bounded Hash Aggregation
Re: Memory-Bounded Hash Aggregation |
Список | pgsql-hackers |
On Wed, 2020-01-08 at 12:38 +0200, Heikki Linnakangas wrote: > This makes the assumption that all Aggrefs or GroupingFuncs are at > the > top of the TargetEntry. That's not true, e.g.: > > select 0+sum(a) from foo group by b; > > I think find_aggregated_cols() and find_unaggregated_cols() should > be > merged into one function that scans the targetlist once, and returns > two > Bitmapsets. They're always used together, anyway. I cut the projection out for now, because there's some work in that area in another thread[1]. If that work doesn't pan out, I can reintroduce the projection logic to this one. New patch attached. It now uses logtape.c (thanks Adam for prototyping this work) instead of buffile.c. This gives better control over the number of files and the memory consumed for buffers, and reduces waste. It requires two changes to logtape.c though: * add API to extend the number of tapes * lazily allocate buffers for reading (buffers for writing were already allocated lazily) so that the total number of buffers needed at any time is bounded Unfortunately, I'm seeing some bad behavior (at least in some cases) with logtape.c, where it's spending a lot of time qsorting the list of free blocks. Adam, did you also see this during your perf tests? It seems to be worst with lower work_mem settings and a large number of input groups (perhaps there are just too many small tapes?). It also has some pretty major refactoring that hopefully makes it simpler to understand and reason about, and hopefully I didn't introduce too many bugs/regressions. A list of other changes: * added test that involves rescan * tweaked some details and tunables so that I think memory usage tracking and reporting (EXPLAIN ANALYZE) is better, especially for smaller work_mem * simplified quite a few function signatures Regards, Jeff Davis [1] https://postgr.es/m/CAAKRu_Yj=Q_ZxiGX+pgstNWMbUJApEJX-imvAEwryCk5SLUebg@mail.gmail.com
Вложения
В списке pgsql-hackers по дате отправления: