Re: No merge sort?
От | Ron Peacetree |
---|---|
Тема | Re: No merge sort? |
Дата | |
Msg-id | rBYja.11707$4P1.1013834@newsread2.prod.itd.earthlink.net обсуждение исходный текст |
Ответ на | No merge sort? (Taral <taral@taral.net>) |
Ответы |
Re: No merge sort?
Re: No merge sort? |
Список | pgsql-hackers |
AFAIK, there are only 3 general purpose internal sorting techniques that have O(n) behavior: 1= Insertion Sort for "almost sorted" lists. Since "almost sorted" is a fuzzy concept, let's make the first approximation definition that no more than n^(1/2) of the elements can be disordered. There are better definitions in the literature, but I don't remember them off the top of my head. 2= Sorts from the class of Address Calculation, Counting, or Distribution Sort. These need to be able to carry out something more complex than simply a comparison in order to achieve O(n), and therefore have high constants in their execution. For large enough n though, these are the performance kings. 3= Straight Radix Sort where you minimize the number of passes by using a base much greater than two for the the radix. Usually octal or hexidecimal. On a 32b or 64b system, this approach will let you sort in 2 passes. All of the above have potentially nasty trade-offs in comparision to the standard heavily tuned median-of-three quicksort used by the sort unix command. Nonetheless, I could see some value in providing all of these with PostgeSQL (including a decent port of the unix sort routine for the Win folks). I'll note in passing that quicksort's Achille's heel is that it's unstable (all of the rest are stable), which can be a problem in a DB. In the general case there's a few papers out there that state you can sort in O(n) if you can throw O(n^2) space at the problem. That implies to me that for DB's, we are not going to be able to use O(n) algorithms for most of our needs. As for external sorts, everything I've ever read says that some sort of merge technique is used: balanced, multiway, or polyphase. In all cases, I've seen comments to the effect that reading some part of the data into internal buffers, sorting them, and then merging them with already sorted data is the best practice. All of this seems to imply that instead of mergesort (which is stable), PostgreSQL might be better off with the 4 sorts I've listed plus a viciously efficient merge utility for combining partial results that do fit into memory into result files on disk that don't. Or am I crazy? Ron Peacetree
В списке pgsql-hackers по дате отправления: