Re: some aspects of our qsort might not be ideal

Поиск

Список

Период

Сортировка

От	John Naylor
Тема	Re: some aspects of our qsort might not be ideal
Дата	5 июля 2022 г. 05:14:17
Msg-id	CAFBsxsH_JPjaDMZKsCrbA8OmCx-mYp12F29o4gaJB6us+NGGmQ@mail.gmail.com обсуждение исходный текст
Ответ на	Re: some aspects of our qsort might not be ideal (John Naylor <john.naylor@enterprisedb.com>)
Ответы	Re: some aspects of our qsort might not be ideal
Список	pgsql-hackers

Дерево обсуждения

I wrote:
> TODO: add results for queries.

Attached are three spreadsheets for queries, similar to the earlier
ones for microbenchmarks. There are three datatypes:

- Bigint: by selecting just the value, these can use Datum sorts
- Text: the same number as bigint, but in lpadded text form, followed
by 8 identical chararters, so 16 bytes.
- Text no-abbrev: also 16 bytes of text derived from bigint, but with
the 8 characters at the beginning to defeat abbreviation.

For the single-vs-dual comparison, this time I chose the insertion
sort threshold to be 10 for single pivot and 18 for dual-pivot. I see
a big speedup in descending values here as well, completing in 20-50%
less time. Everything else is the same, slightly faster, or slightly
slower, at least on this machine (Comet Lake, gcc 12.1). The slightly
slower cases are ones with duplicates. It would make sense that the
duplicate-detection heuristic lags behind where we would ideally be
using single-pivot, but the difference is small. In any case, there
doesn't seem to be a compelling benefit from dual-pivot for most
cases.

I suspect the branches in the sorttuple comparators for null handling
are negating the benefits from better d-cache and TLB behavior. In
that case, a useful thing to try would be to pre-partition the
sorttuple array between null and non-null values as it's being
populated, then sort each of those in turn. (The null portion would
only need sorting if there was more than one sort key.) That would
automatically take care of nulls-first vs nulls-last upfront, save a
bunch of binary space used for branches (worth doing on its own), and
the "isnull" member would not be needed anymore. That wouldn't save
data space because of alignment, but the compiler wouldn't need to
generate a load/store instruction for it. gcc does, but clang and msvc
just treat the 24-byte struct as 3 8-byte integers:

https://godbolt.org/z/d8j1EvroP

-- 
John Naylor
EDB: http://www.enterprisedb.com

Вложения

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: some aspects of our qsort might not be ideal

Вложения