Do we need to rethink how to parallelize regression tests to speedup CLOBBER_CACHE_ALWAYS?
От | David Rowley |
---|---|
Тема | Do we need to rethink how to parallelize regression tests to speedup CLOBBER_CACHE_ALWAYS? |
Дата | |
Msg-id | CAApHDvp2d_pV8uneX+oLzxPUk1g+Hq_Mvx_quzhtuWt3MGDibA@mail.gmail.com обсуждение исходный текст |
Ответы |
Re: Do we need to rethink how to parallelize regression tests to speedup CLOBBER_CACHE_ALWAYS?
|
Список | pgsql-hackers |
Right now Tom is doing a bit of work to try and improve the performance of regression test runs with CLOBBER _CACHE_ALWAYS. I'm on board with making this go faster too. I did a CLOBBER_CACHE_ALWAYS_RUN today and it took my machine almost 7 hours to complete. I occasionally checked top -c and was a bit disappointed that the majority of the time just a single backend was busy. The reason for this is that most groups have some test that takes much longer to run than others and I just often caught it once it had finished all the faster tests and was stuck on the slow one. I did a bit of analysis into the runtimes and found that: 1. Without parallelism, the total run-time of all tests was 12.29 hours. 2. The run took 6.45 hours. (I took the max time from each group and summed that from each group) That means the average backends utilized here was about 1.9. I wondered if there might be a better way to handle how parallel tests work in pg_regress. We have many parallel groups that have reached 20 tests and we often just create another parallel group because of the not exceeding 20 rule. In many cases, we could get busy running another test instead of sitting around idle. Right now we start 1 backend for each test in a parallel group then wait for the final backend to complete before running the next group. Is a particular reason for it to work that way? Why can we not just have a much larger parallel group and lump all of the tests that have no special needs not to be run concurrently or concurrently with another test in particular and just run all those with up to N workers. Once a worker completes, give it another test to process until there are none left. We could still limit the total concurrency with --max-connections=20. I don't think we'd need to make any code changes to make this idea work. I did the maths on that and if it worked that way, and assuming all the parallel tests don't mind being run at the same time with any other parallel test, then the theoretical run-time comes down to 3.75 hours with 8 workers, or 4.11 with 4 workers. The primary reason it does not become much faster is due to the "privileges" test taking 3 hours. If I calculate assuming 128 workers the time only drops to 3.46 hours. Here there are enough workers to start the slow privileges test on a worker that's not done anything else yet. So the 3.46 hours is just the time for the privileges test plus the time to do the serial tests, one by one. For the above, I didn't do anything to change the order of the tests to start the long-running ones first, but if I do that, I can get the times down to 3.46 with just 4 workers. That's 1.86x faster than my run. I've attached a text file with the method I used to calculate each of the numbers above and I've also attached the results with timings from my CLOBBER_CACHE_ALWAYS run for anyone who'd like to check my maths. If I split the "privileges" test into 2 even parts, then 8 workers would run the tests in 1.95 hours which is 3.2x faster than my run. David
Вложения
В списке pgsql-hackers по дате отправления: