Re: some aspects of our qsort might not be ideal

Поиск
Список
Период
Сортировка
От John Naylor
Тема Re: some aspects of our qsort might not be ideal
Дата
Msg-id CAFBsxsHWituORSTKvvAL7bmDmQ-qy580Mnq=7BFrPY7-0qg_NQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: some aspects of our qsort might not be ideal  (Peter Geoghegan <pg@bowt.ie>)
Ответы Re: some aspects of our qsort might not be ideal  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On Fri, Jun 3, 2022 at 10:44 AM Peter Geoghegan <pg@bowt.ie> wrote:

> What about dual-pivot quicksort, which is used in Java 7+? That is the
> defacto successor to Bentley & McIlroy. In fact, Jon Bentley himself
> collaborated with its author, and provided benchmarking input. The
> underlying philosophy is essentially the same as the original -- it
> is supposed to be an "industrial strength" quicksort, with various
> adversarial cases considered directly.

Here is a *rough* first pass at dual-pivot quicksort. I haven't
changed the regression tests to adjust for qsort being an unstable
sort, and there is some dead code. I looked to a couple JDKs for
examples of design decisions as a first approximation. It includes a
module with a few debugging printouts, run like so: select
debug_qsort_int(array[7,6,5,4,3,2,1]);

Pivot selection: A common way is to pick five candidates (here: mid,
+/- 1/7, +/- 2/7), sort them in place, then pick the 2nd and 4th of
them, or "tertile of five". If they are all evenly spaced, we can do
insertion sort.

Fall back to single-pivot: If any of the pivot candidates are equal,
JDK assumes there could be a large number of duplicates and falls back
to single-pivot, using the median of the five. I've done the same
here. Despite having two code paths in all of our template instances,
the VM binary is only about 5kb bigger, since MED3 is no longer built.

Performance testing: Not started yet. I'm thinking an apples-to-apples
comparison is not insightful enough, since the current insertion sort
threshold 7 is already a bit constrained for single-pivot, and would
be even more so for dual pivot, especially since 5 of the elements are
pre-sorted to choose the pivots. My plan is to retest the threshold on
HEAD using my latest tests, then use that as a starting point to test
thresholds with dual-pivot.

-- 
John Naylor
EDB: http://www.enterprisedb.com

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Julien Rouhaud
Дата:
Сообщение: Re: NAMEDATALEN increase because of non-latin languages
Следующее
От: Pavel Borisov
Дата:
Сообщение: Re: Custom tuplesorts for extensions