Re: Progress reporting for pg_verify_checksums
От | Michael Paquier |
---|---|
Тема | Re: Progress reporting for pg_verify_checksums |
Дата | |
Msg-id | 20190401061041.GA1685@paquier.xyz обсуждение исходный текст |
Ответ на | Re: Progress reporting for pg_verify_checksums (Michael Banck <michael.banck@credativ.de>) |
Ответы |
Re: Progress reporting for pg_verify_checksums
|
Список | pgsql-hackers |
On Sat, Mar 30, 2019 at 09:07:41PM +0100, Michael Banck wrote: > The way you've now changed this is that there's a function call into > progress_report() for every block that's being read, even if there is no > progress reporting requested. That looks like a pretty severe > performance problem so I suggest to at least stick with checking > showprogress before calling progress_report() and not the other way > round. > > So my vote is in favour of only calling progress_report() every once in > a while - I saw quite a speedup (or removal of slowdown) due to this in > my tests, this was not just some unwarranted microoptimization. Do you have some runtime numbers? If all the pages are in the OS cache you should be able to see more impact. I have been testing with a pgbench database at scale 300 and --check, and perf is showing me that progress_report counts for 0.10% with or without the option of the profile while pg_checksum_page() counts for 36%. Switching the switch in or out of it makes the performance change lost in the noise. I'd rather keep the check for showprogress within progress_report() so as extra callers don't miss bypassing the report if --progress is not enabled, still we are talking about only one caller now so the attached does it the other way around with an assertion. >> + fprintf(stderr, "\r"); > > I think the isatty() check that was in our versions of the patch is > useful here, did you check how this looks when piping the output to a > file? Oops, fixed. The output was strange. > This hunk is from the performance optimization of calling > progress_report only every 1024 blocks and could be removed if we stick > with calling it for every block anyway (but see above). Indeed, removed this one. >> -static void >> -scan_directory(const char *basedir, const char *subdir) >> +/* >> + * Scan the given directory for items which can be checksummed and >> + * operate on each one of them. If "sizeonly" is true, the size of >> + * all the items which have checksums is computed and returned back > > "computed" is maybe a bit strong a word here, how about "added up"? "computed" sounds fine to me. -- Michael
Вложения
В списке pgsql-hackers по дате отправления: