Re: Timsort performance, quicksort (was: Re: Memory usage during sorting)
От | Robert Haas |
---|---|
Тема | Re: Timsort performance, quicksort (was: Re: Memory usage during sorting) |
Дата | |
Msg-id | CA+Tgmoa8_u=HQjpWvfrBdqDRCidyZsF05RC24ogtvPxWHuhCWQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Timsort performance, quicksort (was: Re: Memory usage during sorting) (Peter Geoghegan <peter@2ndquadrant.com>) |
Ответы |
Re: Timsort performance, quicksort (was: Re: Memory usage
during sorting)
|
Список | pgsql-hackers |
On Tue, Apr 24, 2012 at 12:51 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote: > On 24 April 2012 16:17, Robert Haas <robertmhaas@gmail.com> wrote: >> If they are in sorted order with an empty string >> appended onto the end, it takes about 25 seconds. > > That's exactly what I'd have expected, but was surprised to have not > found with my own test. Perhaps it was same kind of fluke (i.e. a > re-creatable one - I'm quite confident that my benchmark was not > methodologically flawed, at least in execution). Oh, I read your results as showing something quite similar. >> Based on that, I'm inclined to propose rejiggering things so that the >> presorted-input check runs only at the top level, and not during any >> recursive steps. The theory is that it won't cost much because if the >> data is randomly ordered we won't do many comparisons before falling >> out, and that seems to be true. But the only point of doing it in the >> recursive steps is to win when the data is partially ordered, and we >> actually seem to be losing big-time in that case, perhaps because when >> the data is partially ordered, the presort check will frequently to >> run through a significant part of the array - wasting cycles - but >> fall out before it actually gets to the end. > > That makes sense to me, but obviously more data is needed here. What more data do you think is needed? I've been suspicious of that code since the first time I looked at it, and I'm now fairly well convinced that it's full of suckitude. Honestly, I'm not sure I could manage to contrive a case where that code wins if I set out to do so. >> Of course, even if we did that, it might not be as good as your >> timsort numbers, but that doesn't mean we shouldn't do it... > > The frustrating thing about my timsort numbers, as I mentioned in > reply to Dimitri (He modified the subject a bit, so that might appear > to be a different thread to you), is that they appear to be almost > consistently winning when you consider the number of comparisons, but > in fact lose when you measure the duration of execution or TPS or > whatever. I expected a certain amount of this, but not enough to > entirely derail the case for replacing quicksort with timsort when > sorting a single key of text, which is obviously the really compelling > case for optimisation here. This situation is only going to be made > "worse" by the work you've done on SortSupport for text, ... That is quite baffling. Have you profiled it at all? > ...which, > incidentally, I agree is worthwhile. Cool, thanks. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления: