Re: Minor performance improvement in transition to external sort
От | Jeremy Harris |
---|---|
Тема | Re: Minor performance improvement in transition to external sort |
Дата | |
Msg-id | 52F7B6C2.2000705@wizmail.org обсуждение исходный текст |
Ответ на | Re: Minor performance improvement in transition to external sort (Jeff Janes <jeff.janes@gmail.com>) |
Ответы |
Re: Minor performance improvement in transition to external
sort
|
Список | pgsql-hackers |
On 06/02/14 18:21, Jeff Janes wrote: > Did you try sorting already-sorted, reverse > sorted, or pipe-organ shaped data sets? We will also need to test it on > strings. I usually use md5(random()::text) to generate strings for such > purposes, at least for a first pass. Attached is version 2 of the patch, which fixes the performance on constant-input. Summary (low numbers better, vs 858ec11): Random ints: 83% compares, level on time. Sorted ints: level compares, 70% time. Reverse-sorted ints: 10% compares, 15% time (!) Constant ints: level compares, 75% time Pipe-organ ints: 80% compares, 107% time Random text: 83% compares, 106% time > Another optimization that is possible is to do only one comparison at each > level (between the two children) all the way down the heap, and then > compare the parent to the chain of smallest children in reverse order. > This can substantially reduce the number of comparisons on average, > because most tuples sift most of the way down the heap (because most of the > tuples are at the bottom of the heap). Robert submitted a patch to do this > in the main siftdown routine (which for some reason is > named tuplesort_heap_siftup, rather than siftdown), and I found it a > substantial improvement but it never got pushed forward. I think Robert > was concerned it might make some obscure cases worse rather than better, > and no one put it through enough serious testing to overcome that concern. I've done an implementation of this (also attached: siftdown_bounce). Summary: Random ints: 72% compares, 104% time. Sorted ints: 200% compares, 270% time. (ouch) Reverse-sorted ints: 7% compares, 12% time Constant ints: 150% compares, 275% time Pipe-organ ints: 93% compares, 115% time Random text: 72% compares, 91% time - I don't like the performance on already-sorted input. Thinking into this, a sorted array is already a well-formed heap so optimal behaviour is a single compare per item (lacking the global knowledge). This "bounce" method will always do a full walk down the tree and back, hence the poor behaviour. Unless the planner can tell the sort when sorted input is possible I don't think we dare use this despite the benefit on random input. The tests were run with an instrumented variant of each patch, counting compares and time for just the heapify portion of the inittapes() routine. Test script (jgh.sql and results spreadsheet) also attached. -- Cheers, Jeremy
Вложения
В списке pgsql-hackers по дате отправления: