Re: Minor performance improvement in transition to external sort
От | Jeremy Harris |
---|---|
Тема | Re: Minor performance improvement in transition to external sort |
Дата | |
Msg-id | 52F40CE9.1070509@wizmail.org обсуждение исходный текст |
Ответ на | Re: Minor performance improvement in transition to external sort (Robert Haas <robertmhaas@gmail.com>) |
Список | pgsql-hackers |
On 06/02/14 14:22, Robert Haas wrote: > On Thu, Feb 6, 2014 at 4:22 AM, Jeremy Harris <jgh@wizmail.org> wrote: >> On 05/02/14 21:56, Robert Haas wrote: >>> On Tue, Feb 4, 2014 at 5:22 PM, Jeremy Harris <jgh@wizmail.org> wrote: >>>> The attached patch replaces the existing siftup method for heapify with >>>> a siftdown method. Tested with random integers it does 18% fewer >>>> compares and takes 10% less time for the heapify, over the work_mem >>>> range 1024 to 1048576. >>>> >>>> Both algorithms appear to be O(n) (contradicting Wikipedia's claim >>>> that a siftup heapify is O(n log n)). >>> >>> >>> I think Wikipedia is right. Inserting an a tuple into a heap is O(lg >>> n); doing that n times is therefore O(n lg n). Your patch likewise >>> implements an outer loop which runs O(n) times and an inner loop which >>> runs at most O(lg n) times, so a naive analysis of that algorithm also >>> comes out to O(n lg n). >> >> The patch implements a siftdown. Measurements attached. > > Uh, this spreadsheet is useless. You haven't explained how these > numbers were generated, or what the columns in the spreadsheet > actually mean. Ideally, you should include enough information that > someone else can try to reproduce your results. > I'm sorry I wasn't clear. Test code was groups of the form: set work_mem to 1024; CREATE TABLE jgh (i integer); INSERT INTO jgh (i) SELECT 20000 * random() FROM generate_series(1, 20000); VACUUM ANALYZE jgh; explain analyze SELECT * FROM jgh ORDER BY i; set enable_siftdown_heapify to off; explain analyze SELECT * FROM jgh ORDER BY i; set enable_siftdown_heapify to on; DROP TABLE jgh; .. for the tabulated work_mem, and scaled values for the INSERT. Compares were counted for the heapify stage (only), and timings taken using pg_rusage_init() before and pg_rusage_show() after it. Times are in seconds. Baseline value for compares and time used 858ec11858. Sift-down compares and time are for the patch. "Baseline K cmps" is compares divided by work_mem (which should be a scaled version of compares-per-tuple being heapified) pre-patch. "Siftdown K cmps" is likewise with the patch. "K ratio cmps" is the ratio (patched compares)/(unpatched compares). Repeat for time measurements. -- Cheers, Jeremy
В списке pgsql-hackers по дате отправления: