Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?
От | David Geier |
---|---|
Тема | Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc? |
Дата | |
Msg-id | a9b2649b-36a0-2e69-e1fd-779e5faf4b7d@gmail.com обсуждение исходный текст |
Ответ на | Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc? (Andres Freund <andres@anarazel.de>) |
Ответы |
Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?
|
Список | pgsql-hackers |
Hi, On 1/23/23 18:41, Andres Freund wrote: > If we add it, it probably shouldn't depend on TIMING, but on > SUMMARY. Regression test queries showing EXPLAIN ANALYZE output all do > something like > EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) > > the SUMMARY OFF gets rid of the "top-level" "Planning Time" and "Execution > Time", whereas the TIMING OFF gets rid of the per-node timing. Those are > separate options because per-node timing is problematic performance-wise > (right now), but whole-query timing rarely is. Makes sense. I wasn't aware of SUMMARY. Let's keep this option in mind, in case we'll revisit exposing the clock source in the future. > Another, independent, thing worth thinking about: I think we might want to > expose both rdtsc and rdtscp. For something like > InstrStartNode()/InstrStopNode(), avoiding the "one-way barrier" of rdtscp is > quite important to avoid changing the query performance. But for measuring > whole-query time, we likely want to measure the actual time. > > It probably won't matter hugely for the whole query time - the out of order > window of modern CPUs is large, but not *that* large - but I don't think we > can generally assume that. That's what I thought as well. I added INSTR_TIME_SET_CURRENT_FAST() and for now call that variant from InstrStartNode(), InstrEndNode() and pg_test_timing. To do so in InstrEndNode(), I removed INSTR_TIME_SET_CURRENT_LAZY(). Otherwise, two variants of that macro would be needed. INSTR_TIME_SET_CURRENT_LAZY() was only used in a single place and the code is more readable that way. INSTR_TIME_SET_CURRENT() is called from a bunch of places. I still have to go through all of them and see which should be changed to call the _FAST() variant. Attached is v7 of the patch: - Rebased on latest master (most importantly on top of the int64 instr_time commits). - Includes two commits from Andres which introduce INSTR_TIME_SET_SECONDS(), INSTR_TIME_IS_LT() and WIP to report pg_test_timing output in nanoseconds. - Converts ticks to nanoseconds only with integer math, while accounting for overflow. - Supports RDTSCP via INSTR_TIME_SET_CURRENT() and introduced INSTR_TIME_SET_CURRENT_FAST() which uses RDTSC. I haven't gotten to the following: - Looking through all calls to INSTR_TIME_SET_CURRENT() and check if they should be replaced by INSTR_TIME_SET_CURRENT_FAST(). - Reviewing Andres commits. Potentially improving on pg_test_timing's output. - Looking at enabling RDTSC on more platforms. Is there a minimum set of platforms we would like support for? Windows should be easy. That would also allow to unify the code a little more. - Add more documentation and do more testing around the calls to CPUID. - Profiling and optimizing the code. A quick test showed about 10% improvement over master with TIMING ON vs TIMING OFF, when using the test-case from Andres' e-mail that started this thread. I hope I'll find time to work on these points during the next days. -- David Geier (ServiceNow)
Вложения
В списке pgsql-hackers по дате отправления: