Re: Memory usage during sorting
От | Robert Haas |
---|---|
Тема | Re: Memory usage during sorting |
Дата | |
Msg-id | CA+TgmoZQPff2iSoYUSuzyi_ZzHE83Sbft1FvUHm_ov71ga6DNQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Memory usage during sorting (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: Memory usage during sorting
|
Список | pgsql-hackers |
On Tue, Mar 20, 2012 at 12:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Tue, Mar 20, 2012 at 7:44 AM, Greg Stark <stark@mit.edu> wrote: >>> Offhand I wonder if this is all because we don't have the O(n) heapify >>> implemented. > >> I'm pretty sure that's not the problem. Even though our heapify is >> not as efficient as it could be, it's plenty fast enough. I thought >> about writing a patch to implement the better algorithm, but it seems >> like a distraction at this point because the heapify step is such a >> small contributor to overall sort time. What's taking all the time is >> the repeated siftup operations as we pop things out of the heap. > > Right, but wouldn't getting rid of the run-number comparisons provide > some marginal improvement in the speed of tuplesort_heap_siftup? No. It does the opposite: it slows it down. This is a highly surprising result but it's quite repeatable: removing comparisons makes it slower. As previously pontificated, I think this is probably because the heap can fill up with next-run tuples that are cheap to compare against, and that spares us having to do "real" comparisons involving the actual datatype comparators. > BTW, there's a link at the bottom of the wikipedia page to a very > interesting ACM Queue article, which argues that the binary-tree > data structure isn't terribly well suited to virtual memory because > it touches random locations in succession. I'm not sure I believe > his particular solution, but I'm wondering about B+ trees, ie more > than 2 children per node. I don't think virtual memory locality is the problem. I read somewhere that a ternary heap is supposed to be about one-eighth faster than a binary heap, but that's because picking the smallest of three tuples requires two comparisons, whereas picking the smallest of four tuples requires three comparisons, which is better. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления: