Re: Using quicksort for every external sort run
От | Tomas Vondra |
---|---|
Тема | Re: Using quicksort for every external sort run |
Дата | |
Msg-id | c96c501b-d394-40c1-86c5-e6ac86922f21@2ndquadrant.com обсуждение исходный текст |
Ответ на | Re: Using quicksort for every external sort run (Peter Geoghegan <pg@heroku.com>) |
Ответы |
Re: Using quicksort for every external sort run
Re: Using quicksort for every external sort run |
Список | pgsql-hackers |
Hi, I've finally managed to do some benchmarks on the patches. I haven't really studied the details of the patch, so I simply collected a bunch of queries relying on sorting - various forms of SELECT and a few CREATE INDEX commands). It's likely some of the queries can't really benefit from the patch - those should not be positively or negatively affected, though. I've executed the queries on a few basic synthetic data sets with different cardinality 1) unique data 2) hight cardinality (rows/100) 3) low cardinality (rows/1000) initial ordering 1) random 2) sorted 3) almost sorted and different data types 1) int 2) numeric 3) text Tables with and without additional data (padding) were created. So there are quite a few combinations. Attached is a shell script I've used for testing, and also results for 1M and 10M rows on two different machines (one with i5-2500k CPU, the other one with Xeon E5450). Each query was executed 5x for each work_mem value (between 8MB and 1GB), and then a median of the runs was computed and that's what's on the "comparison". This compares a414d96ad2b without (master) and with the patches applied (patched). The last set of columns is simply a "speedup" where "<1.0" means the patched code is faster, while >1.0 means it's slower. Values below 0.9 or 1.1 are using green or red background, to make the most significant improvements or regressions clearly visible. For the smaller data set (1M rows), things works pretty fine. There are pretty much no red cells (so no significant regressions), but quite a few green ones (with duration reduced by up to 50%). There are some results in the 1.0-1.05 range, but considering how short the queries are, I don't think this is a problem. Overall the total duration was reduced by ~20%, which is nice. For the 10M data sets, total speedup is also almost ~20%, and the speedups for most queries are also very nice (often ~50%). But the number of regressions is considerably higher - there's a small number of queries that got significantly slower for multiple data sets, particularly for smaller work_mem values. For example these two queries got almost 2x as slow for some data sets: SELECT a FROM numeric_test UNION SELECT a FROM numeric_test_padding SELECT a FROM text_test UNION SELECT a FROM text_test_padding I assume the slowdown is related to the batching (as it's only happening for low work_mem values), so perhaps there's an internal heuristics that we could tune? I also find it quite interesting that on the i5 machine the CREATE INDEX commands are pretty much not impacted, while on the Xeon machine there's an obvious significant improvement. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Вложения
В списке pgsql-hackers по дате отправления: