Re: pgbench randomness initialization
От | Andres Freund |
---|---|
Тема | Re: pgbench randomness initialization |
Дата | |
Msg-id | 20160407131526.2342k5etkj6c4g2e@alap3.anarazel.de обсуждение исходный текст |
Ответ на | Re: pgbench randomness initialization (Robert Haas <robertmhaas@gmail.com>) |
Ответы |
Re: pgbench randomness initialization
|
Список | pgsql-hackers |
On 2016-04-07 08:58:16 -0400, Robert Haas wrote: > On Thu, Apr 7, 2016 at 5:56 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > > I think that it depends on what you want, which may vary: > > > > (1) "exactly" reproducible runs, but one run may hit a particular > > steady state not representative of what happens in general. > > > > (2) runs which really vary from one to the next, so as > > to have an idea about how much it may vary, what is the > > performance stability. > > > > Currently pgbench focusses on (2), which may or may not be fine depending on > > what you are doing. From a personal point of view I think that (2) is more > > significant to collect performance data, even if the results are more > > unstable: that simply reflects reality and its intrinsic variations, so I'm > > fine that as the default. > > > > Now for those interested in (1) for some reason, I would suggest to rely a > > PGBENCH_RANDOM_SEED environment variable or --random-seed option which could > > be used to have a oxymoronic "deterministic randomness", if desired. > > I do not think that it should be the default, though. > > I agree entirely. If performance is erratic, that's actually > something you want to discover during benchmarking. If different > pgbench runs (of non-trivial length) are producing substantially > different results, then that's really a problem we need to fix, not > just adjust pgbench to cover it up. It's not about "covering it up"; it's about actually being able to take action based on benchmark results, and about practically being able to run benchmarks. The argument above means essentially that we need to run a significant number of pgbench runs for *anything*, because running them 3-5 times before/after just isn't meaningful enough. It means that you can't separate between OS caused, and pgbench order caused performance differences. I agree that it's a horrid problem that we can get half the throughput dependent on large machines, dependant on the ordering. But without running queries in the same order before/after a patch there's no way to validate whether $patch caused the problem. And no way to reliably trigger problematic scenarios. I also agree that it's important to be able to vary workloads. But if you do so, you should do so in the same order, both pre/post a patch. Afaics the prime use of pgbench is validation of the performance effects of patches; therefore it should be usable for that, and it's not. Greetings, Andres Freund
В списке pgsql-hackers по дате отправления: