Re: Memory usage during sorting
От | Peter Geoghegan |
---|---|
Тема | Re: Memory usage during sorting |
Дата | |
Msg-id | CAEYLb_UbhYiZGm4z9gXyOEDGWKO_-3VCoMAdk4dERDEvXk0m8A@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Memory usage during sorting (Robert Haas <robertmhaas@gmail.com>) |
Ответы |
Re: Memory usage during sorting
|
Список | pgsql-hackers |
On 13 April 2012 18:33, Robert Haas <robertmhaas@gmail.com> wrote: > We do use insertion sort for partitions of size < 7. I assume you are > referring to something else. I stand corrected. My confusion came from the removal of a later *switch* to insertion sort in a3f0b3d68f9a5357a3f72b40a45bcc714a9e0649, which also added the pre-sorted check where you'd expect to see the insertion switch. Of course, the n < 7 insertion sort thing is right beside the added check. So another, redundant copy of insertion sort was removed, and not the one that almost every real-world quicksort implementation has. > Heap-sorting could also benefit from some optimization in this area. > Right now, if you heap-sort data that's already in order, it gets > progressively slower as you crank up work_mem, because the heap gets > deeper and so extraction and siftup get more expensive, for no real > benefit. We could do something like check, every time we use up our > available memory, whether the heap is already entirely in order. If > so, we could dump all but one tuple to the current run and the start > reading tuples again. Or maybe dump some percentage of the array, > though that might require a bit more bookkeeping. Or maybe there's a > smarter algorithm that would also cater to mostly-sorted data. Well, timsort is specifically designed to take advantage of pre-sorted data. It does appear to have a lot of traction, as wikipedia points out: "Timsort has been Python's standard sorting algorithm since version 2.3. It is now also used to sort arrays in Java SE 7,[2] and on the Android platform.[3] " Somebody has produced a BSD-licensed C implementation that draws heavily on the Python/Java one here: http://code.google.com/p/timsort/source/browse/trunk/timSort.c?spec=svn17&r=17 It looks like it has exactly the same interface as a c stdlib qsort. So it'd be fairly easy to produce a timsort_arg() based on this, if anyone cares to. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
В списке pgsql-hackers по дате отправления: