Re: No merge sort?
От | Ron Peacetree |
---|---|
Тема | Re: No merge sort? |
Дата | |
Msg-id | 5Fpka.13367$ey1.1150154@newsread1.prod.itd.earthlink.net обсуждение исходный текст |
Ответ на | Re: No merge sort? (cbbrowne@cbbrowne.com) |
Список | pgsql-hackers |
<cbbrowne@cbbrowne.com> wrote in message news:20030407202001.1EC3C58E0D@cbbrowne.com... > Ron Peacetree wrote: > > AFAIK, there are only 3 general purpose internal sorting techniques > > that have O(n) behavior: > > Hum? NO "general purpose" sorting technique has O(n) behaviour. > > The theoretical best scenario, _in general_, is O(n log n). The O(log(n!)) bound is only for comparison based sorts operating on arbitarily disordered input. There are general techniques that can sort in O(n) time if we can "break" either assumption. In a DB, we often are dealing with data that is only slightly disordered, or we are have enough meta knowledge that we can use more powerful ordering operators than simple comparisons, or both. > Insertion sort is expected to provide O(n^2) behaviour, and radix-like > schemes get arbitrarily memory hungry and have bad pathological results. > None of these comments is accurate. The sources for the following discussion are A= Vol 3 of Knuth 2nd ed, (ISBN 0-201-89685-0) B= Sedgewick's Algorithms in C, (0-201-51425-7) C= Handbook of Algorithms and Data Structures 2nd ed by Gonnet and Baeza-Yates. (0-201-41607-7) 1= Insertion sort is O(n) for "almost sorted" input. p103 of Sedgewick. There's also discussion on this in Knuth. 2= Distribution Sort and it's "cousins" which use more powerful ordering operators than simply comparisons are a= reasonably general, and b= O(n). Look in all three references. 3= A proper implementation of straight Radix sort using 8b or 16b at a time a= is NOT pathological, and b= is O(n). Sedgewick is the most blunt about it on p142-143, but Knuth discusses this as well. All of the above are stable, which quicksort is not. There are no "pessimal" inputs for any of the above that will force worst case behavior. For quicksort there are (they are =very= unlikely however). In real world terms, if you can use any of these approaches you should be able to internally sort your data between 2X and 3X faster. Unfortunately, most of us do not have the luxury of working with Memory Resident DB's. In the real world, disk access is an important part of our DB sorting efforts, and that changes things. Very fast internal sorting routines combined with multidisk merging algorithms that minimize overall disk I/O while maximizing the sequential nature of that I/O are the best we can do overall in such a situation.
В списке pgsql-hackers по дате отправления: