Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
От | Peter Geoghegan |
---|---|
Тема | Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation) |
Дата | |
Msg-id | CAH2-WzkU2xK2dpZ7N8-A1MvuUTTUvhqkfnA+eUtwNwCtgyCJgw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation) (Thomas Munro <thomas.munro@enterprisedb.com>) |
Список | pgsql-hackers |
On Fri, Feb 3, 2017 at 5:04 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > I applied your v2 patch on top of > 7ac4a389a7dbddaa8b19deb228f0a988e79c5795^ to avoid a conflict. It > still had a couple of harmless conflicts that I was able to deal with > (not code, just some header stuff moving around). You must mean my V7 patch. FWIW, I've resolved the conflicts with 7ac4a389a7dbddaa8b19deb228f0a988e79c5795 in my own private branch, and have worked through some of the open items that you raised. > See full results from all permutations attached, but I wanted to > highlight the measurements from 'textwide', 'u', 'nonu' which show > interesting 'asc' numbers (data already sorted). The 'mem' column is > maintenance_work_mem in megabytes. The 'w = 0' column shows the time > in seconds for parallel_workers = 0. The other 'w = N' columns show > times with higher parallel_workers settings, represented as speed-up > relative to the 'w = 0' time. The thing to keep in mind about testing presorted cases in tuplesort in general is that we have this weird precheck for presorted input in our qsort. This is something added by us to the original Bentley & McIlroy algorithm in 2006. I am very skeptical of this addition, in general. It tends to have the effect of highly distorting how effective most optimizations are for presorted cases, which comes up again and again. It only works when the input is *perfectly* presorted, but can throw away an enormous amount of work when the last tuple of input is out of order -- that will throw away all work before that point (not so bad when you think your main cost is comparisons rather than memory accesses, but that isn't the case). Your baseline case can either be made unrealistically fast due to the fact that you get a perfectly sympathetic case for this optimization, or unrealistically slow (very CPU bound) due to the fact that you have that one last tuple out of place. I once said that this last tuple can act like a discarded banana skin. There is nothing wrong with the idea of exploiting presortedness, and to some extent the original algorithm does that (by using insertion sort), but an optimization along the lines of Timsort's "galloping mode" (which is what this modification of ours attempts) requires non-trivial bookkeeping to do right. -- Peter Geoghegan
В списке pgsql-hackers по дате отправления: