Re: [PATCH] Incremental sort (was: PoC: Partial sort)
От | James Coleman |
---|---|
Тема | Re: [PATCH] Incremental sort (was: PoC: Partial sort) |
Дата | |
Msg-id | CAAaqYe-i7D+D+oih1AZ5QyyAMSWKjzNT7+eBz-RMr9i18zG96g@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: [PATCH] Incremental sort (was: PoC: Partial sort) (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Ответы |
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
(Tomas Vondra <tomas.vondra@2ndquadrant.com>)
|
Список | pgsql-hackers |
On Tue, Jun 25, 2019 at 4:32 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > > On Tue, Jun 25, 2019 at 12:13:01PM -0700, Peter Geoghegan wrote: > >On Tue, Jun 25, 2019 at 11:03 AM James Coleman <jtc331@gmail.com> wrote: > >> No, I haven't confirmed that it's called less frequently, and I'd be > >> extremely surprised if it were given the diff doesn't suggest any > >> changes to that at all. > > > >I must have misunderstood, then. I thought that you were suggesting > >that that might have happened. > > > >> If you think it's important enough to do so, I can instrument it to > >> confirm, but I was mostly wanting to know if there were any other > >> plausible explanations, and I think you've provided one: there *are* > >> changes in the patch to memory contexts in tuplesort.c, so if memory > >> fragmentation is a real concern this patch could definitely notice > >> changes in that regard. > > > >Sounds like it's probably fragmentation. That's generally hard to measure. > > > > I'm not sure I'm really conviced this explains the difference, because > the changes in tuplesort.c are actually fairly small - we do split the > tuplesort context into two, but vast majority of the stuff is allocated > in one of the contexts (essentially just the tuplesort state gets moved > to a new context). I wouldn't expect this to have such strong impact on > locality/fragmentation. OTOH it is as you noted heavily dependent on data...so it's hard to say if it's a real win or not. > But maybe it does - in that case it seems it might be worthwile to do it > separately, irrespectedly of the incremental sort patch. I wonder if > perf would show that as cache hits/misses, or something? > > It shouldn't be that difficult to separate this change into a separate > patch, and benchmark it on it's own, though. I don't know enough about perf to say, but unless this ends up being a sticking point for the patch I'll probably avoid it for now because there are too many other things to worry about in the patch. > FWIW while looking at the tuplesort.c changes, I've noticed some > inaccurate comments in tuplesort_free. Firstly, the top-level comment > says: > > /* > * tuplesort_free > * > * Internal routine for freeing resources of tuplesort. > */ > > without mentioning which resources it actually releases, so it kinda > suggests it releases everything. But that's not true - AFAICS it only > releases the per-sort resources. IMO this is a poor function name, and > people will easily keep resources longer than they think - we should > rename it to something like tuplesort_free_batch(). > > And then at the end tuplesort_free() does this: > > /* > * Free the per-sort memory context, thereby releasing all working memory, > * including the Tuplesortstate struct itself. > */ > MemoryContextReset(state->sortcontext); > > But that's clearly not true, because the tuplesortstate is allocated in > the maincontext, not sortcontext. > > In general, the comments seem to be a bit confused by what 'sort' means. > Sometimes it means the whole sort operation, sometimes it means one of > the batches, etc. And the fact that the per-batch context is called > sortcontext does not really improve the situation. There are also quite a few misleading or out of date comments in nodeIncrementalSort.c as well. I'm currently working on the hybrid approach I mentioned earlier, but once the patch proper looks like we're coming close to addressing the performance concerns/costing I'll look at doing a pass through the comments to clean them up. Unrelated: if you or someone else you know that's more familiar with the parallel code, I'd be interested in their looking at the patch at some point, because I have a suspicion it might not be operating in parallel ever (either that or I don't know how to trigger it), but I'm not really familiar with that stuff at all currently. :) James Coleman
В списке pgsql-hackers по дате отправления: