Re: Parallel tuplesort (for parallel B-Tree index creation)
От | Heikki Linnakangas |
---|---|
Тема | Re: Parallel tuplesort (for parallel B-Tree index creation) |
Дата | |
Msg-id | 286647c2-761a-0e8b-7db9-ea508c2316ba@iki.fi обсуждение исходный текст |
Ответ на | Re: Parallel tuplesort (for parallel B-Tree index creation) (Peter Geoghegan <pg@heroku.com>) |
Ответы |
Re: Parallel tuplesort (for parallel B-Tree index creation)
|
Список | pgsql-hackers |
On 08/16/2016 03:33 AM, Peter Geoghegan wrote: > I attach a patch that changes how we maintain the heap invariant > during tuplesort merging. I already mentioned this over on the > "Parallel tuplesort, partitioning, merging, and the future" thread. As > noted already on that thread, this patch makes merging clustered > numeric input about 2.1x faster overall in one case, which is > particularly useful in the context of a serial final/leader merge > during a parallel CREATE INDEX. Even *random* non-C-collated text > input is made significantly faster. This work is totally orthogonal to > parallelism, though; it's just very timely, given our discussion of > the merge bottleneck on this thread. Nice! > The patch makes tuplesort merging shift down and displace the root > tuple with the tape's next preread tuple, rather than compacting and > then inserting into the heap anew. This approach to maintaining the > heap as tuples are returned to caller will always produce fewer > comparisons overall. The new approach is also simpler. We were already > shifting down to compact the heap within the misleadingly named [2] > function tuplesort_heap_siftup() -- why not instead just use the > caller tuple (the tuple that we currently go on to insert) when > initially shifting down (not the heap's preexisting last tuple, which > is guaranteed to go straight to the leaf level again)? That way, we > don't need to enlarge the heap at all through insertion, shifting up, > etc. We're done, and are *guaranteed* to have performed less work > (fewer comparisons and swaps) than with the existing approach (this is > the reason for my optimism about getting this stuff out of the way > early). Makes sense. > This new approach is more or less the *conventional* way to maintain > the heap invariant when returning elements from a heap during k-way > merging. Our existing approach is convoluted; merging was presumably > only coded that way because the generic functions > tuplesort_heap_siftup() and tuplesort_heap_insert() happened to be > available. Perhaps the problem was masked by unrelated bottlenecks > that existed at the time, too. Yeah, this seems like a very obvious optimization. Is there a standard name for this technique in the literature? I'm OK with "displace", or perhaps just "replace" or "siftup+insert", but if there's a standard name for this, let's use that. - Heikki
В списке pgsql-hackers по дате отправления: