Обсуждение: Use CLOCK_MONOTONIC_COARSE for instr_time when available
Dear PostgreSQL Hackers,
This patch modifies the instr_time.h header to prefer CLOCK_MONOTONIC_COARSE when available. By using CLOCK_MONOTONIC_COARSE, we can leverage a lower resolution(4ms) but faster alternative for timing operations, which reduces the overhead of frequent timestamp retrievals. This change is expected to provide performance improvements, especially in scenarios with frequent timing operations.
Key Changes:
• CLOCK_MONOTONIC_COARSE is used when available, offering faster performance with slightly reduced precision.
• For macOS, CLOCK_MONOTONIC_RAW remains the preferred choice due to its higher resolution.
• CLOCK_MONOTONIC is used as a fallback when neither of the above options is available.
Performance Improvements:
In testing with a workload that performs a COUNT(*) operation on a table containing 100 million rows, we observed a noticeable performance improvement after applying this patch.
SQL to Reproduce:
-- Create table and insert 10 million rows
CREATE TABLE t1(a int);
INSERT INTO t1
SELECT * FROM generate_series(1, 10000000);
-- Close parallel
SET max_parallel_workers_per_gather = 0;
SET max_parallel_workers = 0;
-- Run the query and check execution time
EXPLAIN ANALYZE SELECT COUNT(*) FROM t1;
SELECT COUNT(*) FROM t1;
Before the Patch:
• EXPLAIN ANALYZE Execution Time: 4914 ms
• Perf Results:
• 33.97% of time spent in [vdso] __vdso_clock_gettime
• 5.28% in heapgettup_pagemode
• 4.44% in InstrStopNode
After the Patch:
• EXPLAIN ANALYZE Execution Time: 3114 ms (down from 4914 ms)
• Perf Results:
• 12.45% of time spent in ExecInterpExpr
• 9.18% in [vdso] __vdso_clock_gettime
• 6.92% in ExecScan
• Reduced usage of clock_gettime, leading to more efficient execution.
The execution time of EXPLAIN ANALYZE SELECT COUNT(*) FROM t1; after the patch is much closer to the actual time of SELECT COUNT(*) FROM t1;, which means the overhead of timing operations has been significantly reduced.
This change provides around a 20-30% reduction in execution time for the tested query.
Patch Details:
From 91d61b8c9a60f0e19b73e03c1a0e46d2dc64573d Mon Sep 17 00:00:00 2001
From: Jianghua Yang <yjhjstz@gmail.com>
Date: Thu, 27 Mar 2025 01:58:58 +0800
Subject: [PATCH] Use CLOCK_MONOTONIC_COARSE for instr_time when available
This patch modifies `instr_time.h` to prefer `CLOCK_MONOTONIC_COARSE`
when available. `CLOCK_MONOTONIC_COARSE` provides a lower resolution
but faster alternative for timing operations, which can reduce the
overhead of frequent timestamp retrievals.
On macOS, `CLOCK_MONOTONIC_RAW` remains the preferred choice when
available, as it provides high-resolution timestamps. Otherwise,
`CLOCK_MONOTONIC` is used as a fallback.
Author: Jianghua Yang
--- src/include/portability/instr_time.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
I believe this change will result in better performance for many PostgreSQL users, especially those with high-frequency timing operations. I look forward to your feedback on this patch.
Best regards,
Jianghua Yang
Вложения
This reflects the correct insertion of 100 million rows instead of 10 million.
-- Create table and insert 100 million rows
CREATE TABLE t1(a int);
INSERT INTO t1 SELECT * FROM generate_series(1, 100000000);
-- close parallel
SET max_parallel_workers_per_gather = 0;
SET max_parallel_workers = 0;
-- Run the query and check execution time
EXPLAIN ANALYZE SELECT COUNT(*) FROM t1;
Dear PostgreSQL Hackers,
This patch modifies the instr_time.h header to prefer CLOCK_MONOTONIC_COARSE when available. By using CLOCK_MONOTONIC_COARSE, we can leverage a lower resolution(4ms) but faster alternative for timing operations, which reduces the overhead of frequent timestamp retrievals. This change is expected to provide performance improvements, especially in scenarios with frequent timing operations.
Key Changes:
• CLOCK_MONOTONIC_COARSE is used when available, offering faster performance with slightly reduced precision.
• For macOS, CLOCK_MONOTONIC_RAW remains the preferred choice due to its higher resolution.
• CLOCK_MONOTONIC is used as a fallback when neither of the above options is available.
Performance Improvements:
In testing with a workload that performs a COUNT(*) operation on a table containing 100 million rows, we observed a noticeable performance improvement after applying this patch.
SQL to Reproduce:
-- Create table and insert 10 million rows CREATE TABLE t1(a int); INSERT INTO t1 SELECT * FROM generate_series(1, 10000000); -- Close parallel SET max_parallel_workers_per_gather = 0; SET max_parallel_workers = 0; -- Run the query and check execution time EXPLAIN ANALYZE SELECT COUNT(*) FROM t1;
SELECT COUNT(*) FROM t1;
Before the Patch:
• EXPLAIN ANALYZE Execution Time: 4914 ms
• Perf Results:
• 33.97% of time spent in [vdso] __vdso_clock_gettime
• 5.28% in heapgettup_pagemode
• 4.44% in InstrStopNode
After the Patch:
• EXPLAIN ANALYZE Execution Time: 3114 ms (down from 4914 ms)
• Perf Results:
• 12.45% of time spent in ExecInterpExpr
• 9.18% in [vdso] __vdso_clock_gettime
• 6.92% in ExecScan
• Reduced usage of clock_gettime, leading to more efficient execution.
The execution time of EXPLAIN ANALYZE SELECT COUNT(*) FROM t1; after the patch is much closer to the actual time of SELECT COUNT(*) FROM t1;, which means the overhead of timing operations has been significantly reduced.
This change provides around a 20-30% reduction in execution time for the tested query.
Patch Details:
From 91d61b8c9a60f0e19b73e03c1a0e46d2dc64573d Mon Sep 17 00:00:00 2001 From: Jianghua Yang <yjhjstz@gmail.com> Date: Thu, 27 Mar 2025 01:58:58 +0800 Subject: [PATCH] Use CLOCK_MONOTONIC_COARSE for instr_time when available This patch modifies `instr_time.h` to prefer `CLOCK_MONOTONIC_COARSE` when available. `CLOCK_MONOTONIC_COARSE` provides a lower resolution but faster alternative for timing operations, which can reduce the overhead of frequent timestamp retrievals. On macOS, `CLOCK_MONOTONIC_RAW` remains the preferred choice when available, as it provides high-resolution timestamps. Otherwise, `CLOCK_MONOTONIC` is used as a fallback. Author: Jianghua Yang --- src/include/portability/instr_time.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
I believe this change will result in better performance for many PostgreSQL users, especially those with high-frequency timing operations. I look forward to your feedback on this patch.
Best regards,
Jianghua Yang
On Wed, Mar 26, 2025 at 11:14:47AM -0700, 杨江华 wrote: > This patch modifies the instr_time.h header to prefer CLOCK_MONOTONIC_COARSE > when available. By using CLOCK_MONOTONIC_COARSE, we can leverage a lower > resolution(4ms) but faster alternative for timing operations, which reduces > the overhead of frequent timestamp retrievals. This change is expected to > provide performance improvements, especially in scenarios with frequent > timing operations. > > *Key Changes:* > > • *CLOCK_MONOTONIC_COARSE* is used when available, offering faster > performance with slightly reduced precision. > > • For macOS, *CLOCK_MONOTONIC_RAW* remains the preferred choice due to its > higher resolution. > > • *CLOCK_MONOTONIC* is used as a fallback when neither of the above options > is available. -#if defined(__darwin__) && defined(CLOCK_MONOTONIC_RAW) +#ifdef CLOCK_MONOTONIC_COARSE +#define PG_INSTR_CLOCK CLOCK_MONOTONIC_COARSE +#elif defined(__darwin__) && defined(CLOCK_MONOTONIC_RAW) Why would we want to make this the default? CLOCK_MONOTONIC_COARSE could show benefits in some code paths. Now, it can also have a precision of a few milliseconds, and we have a bunch of code paths that rely on clock_gettime() to be more precise than that so it could lead to random decisions. You could make that configurable with a GUC, but it would mean plastering some decision-making in instr_time.h based on such a GUC, which would likely be annoying performance-wise. We are at the end of the v18 development cycle, so it is going to get some time before you get any review. Good to see that you are tracking this patch in the commit fest: https://commitfest.postgresql.org/patch/5669/ -- Michael
Вложения
On Wed, Mar 26, 2025 at 11:14:47AM -0700, 杨江华 wrote:
> This patch modifies the instr_time.h header to prefer CLOCK_MONOTONIC_COARSE
> when available. By using CLOCK_MONOTONIC_COARSE, we can leverage a lower
> resolution(4ms) but faster alternative for timing operations, which reduces
> the overhead of frequent timestamp retrievals. This change is expected to
> provide performance improvements, especially in scenarios with frequent
> timing operations.
>
> *Key Changes:*
>
> • *CLOCK_MONOTONIC_COARSE* is used when available, offering faster
> performance with slightly reduced precision.
>
> • For macOS, *CLOCK_MONOTONIC_RAW* remains the preferred choice due to its
> higher resolution.
>
> • *CLOCK_MONOTONIC* is used as a fallback when neither of the above options
> is available.
-#if defined(__darwin__) && defined(CLOCK_MONOTONIC_RAW)
+#ifdef CLOCK_MONOTONIC_COARSE
+#define PG_INSTR_CLOCK CLOCK_MONOTONIC_COARSE
+#elif defined(__darwin__) && defined(CLOCK_MONOTONIC_RAW)
Why would we want to make this the default? CLOCK_MONOTONIC_COARSE
could show benefits in some code paths. Now, it can also have a
precision of a few milliseconds, and we have a bunch of code paths
that rely on clock_gettime() to be more precise than that so it could
lead to random decisions. You could make that configurable with a
GUC, but it would mean plastering some decision-making in instr_time.h
based on such a GUC, which would likely be annoying performance-wise.
We are at the end of the v18 development cycle, so it is going to get
some time before you get any review. Good to see that you are
tracking this patch in the commit fest:
https://commitfest.postgresql.org/patch/5669/
--
Michael
Вложения
=?UTF-8?B?5p2o5rGf5Y2O?= <yjhjstz@gmail.com> writes: > This patch modifies the instr_time.h header to prefer CLOCK_MONOTONIC_COARSE > when available. As far as I know, our usage of instr_time really needs the highest resolution available, because we are usually trying to measure pretty short intervals. You say that this patch reduces execution time, and I imagine that's true ... but I wonder if it doesn't do so at the cost of totally destroying the reliability of the output numbers. regards, tom lane
> As far as I know, our usage of instr_time really needs the highestHI
> resolution available, because we are usually trying to measure pretty
> short intervals. You say that this patch reduces execution time,
> and I imagine that's true ... but I wonder if it doesn't do so at
Hi, On 2025-03-26 23:09:42 -0400, Tom Lane wrote: > =?UTF-8?B?5p2o5rGf5Y2O?= <yjhjstz@gmail.com> writes: > > This patch modifies the instr_time.h header to prefer CLOCK_MONOTONIC_COARSE > > when available. > > As far as I know, our usage of instr_time really needs the highest > resolution available, because we are usually trying to measure pretty > short intervals. You say that this patch reduces execution time, > and I imagine that's true ... but I wonder if it doesn't do so at > the cost of totally destroying the reliability of the output numbers. The reason, on x86, the timestamp querying has a somewhat high overhead is that the "accurate" "read the tsc" instruction serves as a barrier for out-of-order execution. With modern highly out-of-order execution that means we'll wait for all scheduled instructions to finish before determining the current time, multiple times for each tuple. That of course slows things down substantially. There's a patch to use the version of rdtsc that does *not* have barrier semantics: https://postgr.es/m/CAP53PkzO2KpscD-tgFW_V-4WS%2BvkniH4-B00eM-e0bsBF-xUxg%40mail.gmail.com Greetings, Andres Freund
I agree, so this patch only affects explain analyze.
1. This change to CLOCK_MONOTONIC_COARSE only affects EXPLAIN ANALYZE and does not impact other modules.
The patch introduces optional support for CLOCK_MONOTONIC_COARSE specifically within the INSTR_TIMEinstrumentation framework. The modifications are guarded by the compile-time macro USE_CLOCK_MONOTONIC_COARSE, and are only used when gathering timing data for performance instrumentation. Given that INSTR_TIME is mainly used in EXPLAIN ANALYZE, and there are no changes to runtime or planner logic, this patch ensures that only diagnostic outputs are affected—leaving core execution paths and other modules untouched.
2. With this modification, EXPLAIN ANALYZE produces timing results that are closer to real-world wall-clock time, making performance debugging more accurate.
By using CLOCK_MONOTONIC_COARSE, which has lower overhead compared to CLOCK_MONOTONIC, the patch improves the efficiency of timing collection in EXPLAIN ANALYZE. While it may slightly reduce precision, the resulting measurements more closely reflect actual elapsed time observed by users, especially in performance-sensitive environments. This makes EXPLAIN ANALYZE outputs more reliable and helpful for developers diagnosing query performance bottlenecks.
--- origin version
explain analyze select count(*) from t1;
Thu 27 Mar 2025 01:31:20 AM CST (every 1s)
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1852876.63..1852876.64 rows=1 width=8) (actual time=4914.037..4914.038 rows=1 loops=1)
-> Seq Scan on t1 (cost=0.00..1570796.90 rows=112831890 width=0) (actual time=0.039..3072.303 rows=100000000 loops=1)
Planning Time: 0.132 ms
Execution Time: 4914.072 ms
(4 rows)
Time: 4914.676 ms (00:04.915)
--- apply patch
postgres=# explain analyze select count(*) from t1;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1692478.40..1692478.41 rows=1 width=8) (actual time=3116.164..3116.164 rows=1 loops=1)
-> Seq Scan on t1 (cost=0.00..1442478.32 rows=100000032 width=0) (actual time=0.000..2416.127 rows=100000000 loops=1)
Planning Time: 0.000 ms
Execution Time: 3116.164 ms
(4 rows)
Time: 3114.059 ms (00:03.114)
postgres=# select count(*) from t1;
count
-----------
100000000
(1 row)
Time: 2086.130 ms (00:02.086)
Hi,
On 2025-03-26 23:09:42 -0400, Tom Lane wrote:
> 杨江华 <yjhjstz@gmail.com> writes:
> > This patch modifies the instr_time.h header to prefer CLOCK_MONOTONIC_COARSE
> > when available.
>
> As far as I know, our usage of instr_time really needs the highest
> resolution available, because we are usually trying to measure pretty
> short intervals. You say that this patch reduces execution time,
> and I imagine that's true ... but I wonder if it doesn't do so at
> the cost of totally destroying the reliability of the output numbers.
The reason, on x86, the timestamp querying has a somewhat high overhead is
that the "accurate" "read the tsc" instruction serves as a barrier for
out-of-order execution. With modern highly out-of-order execution that means
we'll wait for all scheduled instructions to finish before determining the
current time, multiple times for each tuple. That of course slows things down
substantially.
There's a patch to use the version of rdtsc that does *not* have barrier
semantics:
https://postgr.es/m/CAP53PkzO2KpscD-tgFW_V-4WS%2BvkniH4-B00eM-e0bsBF-xUxg%40mail.gmail.com
Greetings,
Andres Freund
Вложения
Jianghua Yang <yjhjstz@gmail.com> writes: > By using CLOCK_MONOTONIC_COARSE, which has lower overhead compared to > CLOCK_MONOTONIC, the patch improves the efficiency of timing > collection in EXPLAIN > ANALYZE. While it may slightly reduce precision, the resulting measurements > more closely reflect actual elapsed time observed by users, especially in > performance-sensitive environments. This makes EXPLAIN ANALYZE outputs more > reliable and helpful for developers diagnosing query performance > bottlenecks. Well, this is exactly the thing that everybody is worried about. You're asserting on the basis of precisely zero evidence that this will be an improvement; most of the rest of us expect that it will destroy the reliability of the measurements. "It's faster" is totally insufficient as a reason to accept this change. (As a wise man once said, "I can make my program arbitrarily fast if it doesn't have to give the right answer.") If you want to have any chance at all that this gets committed, you need to provide some evidence backing your claim that the results are still sufficiently reliable. The single test case that you presented isn't impressive. For one thing, it shows nontrivial change of the amount of time spent in the seqscan node vs. overall (62% vs 77%). Which ratio is more reflective of reality? How stable are the numbers across multiple runs? What will happen on other platforms besides your own machine? BTW, I'm also unimpressed by the changes to limit this to EXPLAIN ANALYZE. There's no reason to think that our other uses of instr_time.h are more sensitive to accuracy concerns than EXPLAIN ANALYZE is; if anything they are less so, since we don't accumulate a lot of very-tiny deltas anywhere else. So if this is good enough for EXPLAIN ANALYZE it should be fine for everything. regards, tom lane
... BTW, another resource worth looking at is src/bin/pg_test_timing/ which we just improved a few days ago [1]. What I see on two different Linux-on-Intel boxes is that the loop time that that reports is 16 ns and change, and the clock readings appear accurate to full nanosecond precision. Changing instr_time.h to use CLOCK_MONOTONIC_COARSE, the loop time drops to a bit over 5 ns, which would certainly be a nice win if it were cost-free. But the clock precision degrades to 1 ms. It is really hard to believe that giving up a factor of a million in clock precision is going to be an acceptable tradeoff for saving ~10 ns per clock reading. Maybe with a lot of fancy statistical arm-waving, and an assumption that people always look at averages over long query runs, you could make a case that this change isn't going to result in a disaster. But EXPLAIN's results are surely going to become garbage-in-garbage-out for any query that doesn't run for (at least) hundreds of milliseconds. regards, tom lane [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=0b096e379e6f9bd49d38020d880a7da337e570ad
Hi, On 2025-07-16 18:24:33 -0400, Tom Lane wrote: > ... BTW, another resource worth looking at is src/bin/pg_test_timing/ > which we just improved a few days ago [1]. What I see on two different > Linux-on-Intel boxes is that the loop time that that reports is 16 ns > and change, and the clock readings appear accurate to full nanosecond > precision. Changing instr_time.h to use CLOCK_MONOTONIC_COARSE, the > loop time drops to a bit over 5 ns, which would certainly be a nice > win if it were cost-free. But the clock precision degrades to 1 ms. FWIW, switching to using rtscp for timestamp acqusition substantially reduces timing overhead, albeit not quite as low as 5ns, without loosing any meaningful precision. The patch from [1] needs to be rebased unfortunately. Separately, the amount of work we're now doing for each loop iteration in test_timing() got to start having some effect, no? Greetings, Andres [1] https://postgr.es/m/CAP53PkzO2KpscD-tgFW_V-4WS%2BvkniH4-B00eM-e0bsBF-xUxg%40mail.gmail.com