Re: Why do we still perform a check for pre-sorted input within qsort variants?
От | Dann Corbit |
---|---|
Тема | Re: Why do we still perform a check for pre-sorted input within qsort variants? |
Дата | |
Msg-id | 87F42982BF2B434F831FCEF4C45FC33E5BD36AE2@EXCHANGE.corporate.connx.com обсуждение исходный текст |
Ответ на | Re: Why do we still perform a check for pre-sorted input within qsort variants? (Greg Stark <stark@mit.edu>) |
Список | pgsql-hackers |
-----Original Message----- From: gsstark@gmail.com [mailto:gsstark@gmail.com] On Behalf Of Greg Stark Sent: Saturday, March 09, 2013 5:16 PM To: Dann Corbit Cc: Bruce Momjian; Peter Geoghegan; Robert Haas; Tom Lane; PG Hackers Subject: Re: Why do we still perform a check for pre-sorted input within qsort variants? On Sat, Mar 9, 2013 at 10:22 PM, Dann Corbit <DCorbit@connx.com> wrote: > Yes, you are right. I knew of a median of medians technique for pivot selection and I mistook that for the median of mediansmedian selection algorithm (which it definitely isn't). > I was not aware of a true linear time selection of the median algorithm {which is what median of medians accomplishes). The fastest median selection algorithm that I was aware of was quickselect, which is only linear on average. > I think that you analysis is correct, at any rate. Hm, I was using the terminology differently than the Wikipedia page. I was referring to the recursive median of 5s used asthe pivot selection as "median of medians". And I still called Quicksort or Quickselect using that pivot Quicksort or Quickselectwith that specific pivot choice algorithm. When using that pivot choice Quicksort is O(n*log(n)) and Quickselect (Median of Medians on Wikipedia) is O(n). But the constantfactor becomes larger than if the pivot choice algorithm is O(1). I suppose it's more interesting in the case ofQuickselect since there's no other alternative algorithms that could be used that have better constant factors whereasfor sorting we have other options. I wonder if it makes sense to use timsort as the fallback for quicksort if the partition sizes are skewed. Timsort is specificallydesigned to handle presorted inputs well. On the other hand it is itself a hybrid sort so it might be gettingoverly complex to make it part of a hybrid algorithm. -- >> My opinion (and it is no better than anyone else's) is that for a database you have to be very defensive in programming. Since database systems can contain any possible distribution (and over time they likely will encounter almost every possibility)it is important to prevent unusual inputs from causing disaster. The reason we added introspection to the sort here is that we already had quicksort with ordered partition check along witha median of three sample from three medians of three (not to mention the standard recursion of the smallest partitionfirst and switch to insertion at small enough partition size). Sure enough, there was a customer who had a datadistribution that caused bad behavior. We have customers with mainframe data where a single table can be 24 GB (I know this offhand, there are sure to be some thatare larger), so a bad behavior on a sort *will* be noticed. Sorry about the oddball quoting. I will look at fixing it so that my posts are not so hard to grok. <<
В списке pgsql-hackers по дате отправления: