Pg_upgrade faster, again!
От | Bruce Momjian |
---|---|
Тема | Pg_upgrade faster, again! |
Дата | |
Msg-id | 20121222231320.GA30566@momjian.us обсуждение исходный текст |
Список | pgsql-hackers |
I promised to research allowing parallel execution of schema dump/restore, so I have developed the attached patch, with dramatic results: tables git patch 1000 22.29 18.30 2000 30.75 19.67 4000 46.33 22.31 8000 81.09 29.27 16000 145.43 40.12 32000 309.39 64.85 64000 754.62 108.76 These performance results are best-case because it was run with the the databases all the same size and equal to the number of server cores. (Test script attached.) This uses fork/processes on Unix, and threads on Windows. I need someone to check my use of waitpid() on Unix, and I need code compile and run testing on Windows. It basically adds a --jobs option, like pg_restore uses, to run multiple schema dumps/restores in parallel. I patterned this after the pg_restore pg_backup_archiver.c --jobs code. However, I found the pg_restore Windows code awkward because it puts everything in one struct array that has gaps for dead children. Because WaitForMultipleObjects() requires an array of thread handles with no gaps, the pg_restore code must make a temporary array for every call to WaitForMultipleObjects(). Instead, I created an array just for thread handles (rather than putting it in the same struct), and swapped entries into dead child slots to avoid gaps --- this allows the thread handle array to be passed directly to WaitForMultipleObjects(). Do people like this approach? Should we do the same in pg_restore. I expect us to be doing more parallelism in other areas so I would like to have an consistent approach. The only other optimization I can think of is to do parallel file copy per tablespace (in non-link mode). -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Вложения
В списке pgsql-hackers по дате отправления: