Re: Using quicksort for every external sort run

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: Using quicksort for every external sort run
Дата
Msg-id c96c501b-d394-40c1-86c5-e6ac86922f21@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: Using quicksort for every external sort run  (Peter Geoghegan <pg@heroku.com>)
Ответы Re: Using quicksort for every external sort run  (Peter Geoghegan <pg@heroku.com>)
Re: Using quicksort for every external sort run  (Peter Geoghegan <pg@heroku.com>)
Список pgsql-hackers
Hi,

I've finally managed to do some benchmarks on the patches. I haven't
really studied the details of the patch, so I simply collected a bunch
of queries relying on sorting - various forms of SELECT and a few CREATE
INDEX commands). It's likely some of the queries can't really benefit
from the patch - those should not be positively or negatively affected,
though.

I've executed the queries on a few basic synthetic data sets with
different cardinality

   1) unique data
   2) hight cardinality (rows/100)
   3) low cardinality (rows/1000)

initial ordering

   1) random
   2) sorted
   3) almost sorted

and different data types

   1) int
   2) numeric
   3) text

Tables with and without additional data (padding) were created.

So there are quite a few combinations. Attached is a shell script I've
used for testing, and also results for 1M and 10M rows on two different
machines (one with i5-2500k CPU, the other one with Xeon E5450).

Each query was executed 5x for each work_mem value (between 8MB and
1GB), and then a median of the runs was computed and that's what's on
the "comparison". This compares a414d96ad2b without (master) and with
the patches applied (patched). The last set of columns is simply a
"speedup" where "<1.0" means the patched code is faster, while >1.0
means it's slower. Values below 0.9 or 1.1 are using green or red
background, to make the most significant improvements or regressions
clearly visible.

For the smaller data set (1M rows), things works pretty fine. There are
pretty much no red cells (so no significant regressions), but quite a
few green ones (with duration reduced by up to 50%). There are some
results in the 1.0-1.05 range, but considering how short the queries
are, I don't think this is a problem. Overall the total duration was
reduced by ~20%, which is nice.

For the 10M data sets, total speedup is also almost ~20%, and the
speedups for most queries are also very nice (often ~50%). But the
number of regressions is considerably higher - there's a small number of
queries that got significantly slower for multiple data sets,
particularly for smaller work_mem values.

For example these two queries got almost 2x as slow for some data sets:

SELECT a FROM numeric_test UNION SELECT a FROM numeric_test_padding
SELECT a FROM text_test UNION SELECT a FROM text_test_padding

I assume the slowdown is related to the batching (as it's only happening
for low work_mem values), so perhaps there's an internal heuristics that
we could tune?

I also find it quite interesting that on the i5 machine the CREATE INDEX
commands are pretty much not impacted, while on the Xeon machine there's
an obvious significant improvement.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Yury Zhuravlev
Дата:
Сообщение: Re: NOT EXIST for PREPARE
Следующее
От: "David G. Johnston"
Дата:
Сообщение: Re: problem with precendence order in JSONB merge operator