Re: Spilling hashed SetOps and aggregates to disk
От | Tomas Vondra |
---|---|
Тема | Re: Spilling hashed SetOps and aggregates to disk |
Дата | |
Msg-id | 8f16eb97-09b0-e96b-ff78-d9677677909d@2ndquadrant.com обсуждение исходный текст |
Ответ на | Re: Spilling hashed SetOps and aggregates to disk (David Rowley <david.rowley@2ndquadrant.com>) |
Ответы |
RE: Re: Spilling hashed SetOps and aggregates to disk
|
Список | pgsql-hackers |
On 06/05/2018 03:17 PM, David Rowley wrote: > On 6 June 2018 at 01:09, Andres Freund <andres@anarazel.de> wrote: >> On 2018-06-06 01:06:39 +1200, David Rowley wrote: >>> My concern is that only accounting memory for the group and not the >>> state is only solving half the problem. It might be fine for >>> aggregates that don't stray far from their aggtransspace, but for the >>> other ones, we could still see OOM. >> >>> If solving the problem completely is too hard, then a half fix (maybe >>> 3/4) is better than nothing, but if we can get a design for a full fix >>> before too much work is done, then isn't that better? >> >> I don't think we actually disagree. I was really primarily talking >> about the case where we can't really do better because we don't have >> serialization support. I mean we could just rescan from scratch, using >> a groupagg, but that obviously sucks. > > I don't think we do. To take yours to the 100% solution might just > take adding the memory accounting to palloc that Jeff proposed a few > years ago and use that accounting to decide when we should switch > method. > > However, I don't quite fully recall how the patch accounted for > memory consumed by sub-contexts and if getting the entire > consumption required recursively looking at subcontexts. If that's > the case then checking the consumption would likely cost too much if > it was done after each tuple was aggregated. Yeah, a simple wrapper would not really fix the issue, because allocating memory in the subcontext would not update the accounting information. So we'd be oblivious of possibly large amounts of memory. I don't quite see how is this related to (not) having serialization support, as it's more about not knowing we already hit the memory limit. That being said, I don't think array_agg actually has the issue, at least since b419865a814abbca12bdd6eef6a3d5ed67f432e1 (it does behave differently in aggregate and non-aggregate contexts, IIRC). I don't know how common this issue is, though. I don't think any built-in aggregates create additional contexts, but I may be wrong. But we have this damn extensibility thing, where users can write their own aggregates ... regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
В списке pgsql-hackers по дате отправления: