Обсуждение: Use CLOCK_MONOTONIC_COARSE for instr_time when available

Поиск
Список
Период
Сортировка

Use CLOCK_MONOTONIC_COARSE for instr_time when available

От
杨江华
Дата:

Dear PostgreSQL Hackers,

This patch modifies the instr_time.h header to prefer CLOCK_MONOTONIC_COARSE when available. By using CLOCK_MONOTONIC_COARSE, we can leverage a lower resolution(4ms) but faster alternative for timing operations, which reduces the overhead of frequent timestamp retrievals. This change is expected to provide performance improvements, especially in scenarios with frequent timing operations.


Key Changes:

CLOCK_MONOTONIC_COARSE is used when available, offering faster performance with slightly reduced precision.

For macOS, CLOCK_MONOTONIC_RAW remains the preferred choice due to its higher resolution.

CLOCK_MONOTONIC is used as a fallback when neither of the above options is available.


Performance Improvements:


In testing with a workload that performs a COUNT(*) operation on a table containing 100 million rows, we observed a noticeable performance improvement after applying this patch.


SQL to Reproduce:

-- Create table and insert 10 million rows
CREATE TABLE t1(a int);
INSERT INTO t1
SELECT * FROM generate_series(1, 10000000);

-- Close parallel
SET max_parallel_workers_per_gather = 0;
SET max_parallel_workers = 0;

-- Run the query and check execution time
EXPLAIN ANALYZE SELECT COUNT(*) FROM t1;
SELECT COUNT(*) FROM t1;

Before the Patch:

EXPLAIN ANALYZE Execution Time: 4914 ms

Perf Results:

33.97% of time spent in [vdso] __vdso_clock_gettime

5.28% in heapgettup_pagemode

4.44% in InstrStopNode


After the Patch:

EXPLAIN ANALYZE Execution Time: 3114 ms (down from 4914 ms)

Perf Results:

12.45% of time spent in ExecInterpExpr

9.18% in [vdso] __vdso_clock_gettime

6.92% in ExecScan

Reduced usage of clock_gettime, leading to more efficient execution.


The execution time of EXPLAIN ANALYZE SELECT COUNT(*) FROM t1; after the patch is much closer to the actual time of SELECT COUNT(*) FROM t1;, which means the overhead of timing operations has been significantly reduced.


This change provides around a 20-30% reduction in execution time for the tested query.


Patch Details:

From 91d61b8c9a60f0e19b73e03c1a0e46d2dc64573d Mon Sep 17 00:00:00 2001
From: Jianghua Yang <yjhjstz@gmail.com>
Date: Thu, 27 Mar 2025 01:58:58 +0800
Subject: [PATCH] Use CLOCK_MONOTONIC_COARSE for instr_time when available

This patch modifies `instr_time.h` to prefer `CLOCK_MONOTONIC_COARSE`
when available. `CLOCK_MONOTONIC_COARSE` provides a lower resolution
but faster alternative for timing operations, which can reduce the
overhead of frequent timestamp retrievals.

On macOS, `CLOCK_MONOTONIC_RAW` remains the preferred choice when
available, as it provides high-resolution timestamps. Otherwise,
`CLOCK_MONOTONIC` is used as a fallback.

Author: Jianghua Yang
--- src/include/portability/instr_time.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)

I believe this change will result in better performance for many PostgreSQL users, especially those with high-frequency timing operations. I look forward to your feedback on this patch.


Best regards,

Jianghua Yang


Вложения

Re: Use CLOCK_MONOTONIC_COARSE for instr_time when available

От
杨江华
Дата:

This reflects the correct insertion of 100 million rows instead of 10 million.

-- Create table and insert 100 million rows

CREATE TABLE t1(a int);

INSERT INTO t1 SELECT * FROM generate_series(1, 100000000);

-- close parallel

SET max_parallel_workers_per_gather = 0;

SET max_parallel_workers = 0;

-- Run the query and check execution time

EXPLAIN ANALYZE SELECT COUNT(*) FROM t1;


杨江华 <yjhjstz@gmail.com> 于2025年3月26日周三 11:14写道:

Dear PostgreSQL Hackers,

This patch modifies the instr_time.h header to prefer CLOCK_MONOTONIC_COARSE when available. By using CLOCK_MONOTONIC_COARSE, we can leverage a lower resolution(4ms) but faster alternative for timing operations, which reduces the overhead of frequent timestamp retrievals. This change is expected to provide performance improvements, especially in scenarios with frequent timing operations.


Key Changes:

CLOCK_MONOTONIC_COARSE is used when available, offering faster performance with slightly reduced precision.

For macOS, CLOCK_MONOTONIC_RAW remains the preferred choice due to its higher resolution.

CLOCK_MONOTONIC is used as a fallback when neither of the above options is available.


Performance Improvements:


In testing with a workload that performs a COUNT(*) operation on a table containing 100 million rows, we observed a noticeable performance improvement after applying this patch.


SQL to Reproduce:

-- Create table and insert 10 million rows
CREATE TABLE t1(a int);
INSERT INTO t1
SELECT * FROM generate_series(1, 10000000);

-- Close parallel
SET max_parallel_workers_per_gather = 0;
SET max_parallel_workers = 0;

-- Run the query and check execution time
EXPLAIN ANALYZE SELECT COUNT(*) FROM t1;
SELECT COUNT(*) FROM t1;

Before the Patch:

EXPLAIN ANALYZE Execution Time: 4914 ms

Perf Results:

33.97% of time spent in [vdso] __vdso_clock_gettime

5.28% in heapgettup_pagemode

4.44% in InstrStopNode


After the Patch:

EXPLAIN ANALYZE Execution Time: 3114 ms (down from 4914 ms)

Perf Results:

12.45% of time spent in ExecInterpExpr

9.18% in [vdso] __vdso_clock_gettime

6.92% in ExecScan

Reduced usage of clock_gettime, leading to more efficient execution.


The execution time of EXPLAIN ANALYZE SELECT COUNT(*) FROM t1; after the patch is much closer to the actual time of SELECT COUNT(*) FROM t1;, which means the overhead of timing operations has been significantly reduced.


This change provides around a 20-30% reduction in execution time for the tested query.


Patch Details:

From 91d61b8c9a60f0e19b73e03c1a0e46d2dc64573d Mon Sep 17 00:00:00 2001
From: Jianghua Yang <yjhjstz@gmail.com>
Date: Thu, 27 Mar 2025 01:58:58 +0800
Subject: [PATCH] Use CLOCK_MONOTONIC_COARSE for instr_time when available

This patch modifies `instr_time.h` to prefer `CLOCK_MONOTONIC_COARSE`
when available. `CLOCK_MONOTONIC_COARSE` provides a lower resolution
but faster alternative for timing operations, which can reduce the
overhead of frequent timestamp retrievals.

On macOS, `CLOCK_MONOTONIC_RAW` remains the preferred choice when
available, as it provides high-resolution timestamps. Otherwise,
`CLOCK_MONOTONIC` is used as a fallback.

Author: Jianghua Yang
--- src/include/portability/instr_time.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)

I believe this change will result in better performance for many PostgreSQL users, especially those with high-frequency timing operations. I look forward to your feedback on this patch.


Best regards,

Jianghua Yang


Re: Use CLOCK_MONOTONIC_COARSE for instr_time when available

От
Michael Paquier
Дата:
On Wed, Mar 26, 2025 at 11:14:47AM -0700, 杨江华 wrote:
> This patch modifies the instr_time.h header to prefer CLOCK_MONOTONIC_COARSE
> when available. By using CLOCK_MONOTONIC_COARSE, we can leverage a lower
> resolution(4ms) but faster alternative for timing operations, which reduces
> the overhead of frequent timestamp retrievals. This change is expected to
> provide performance improvements, especially in scenarios with frequent
> timing operations.
>
> *Key Changes:*
>
> • *CLOCK_MONOTONIC_COARSE* is used when available, offering faster
> performance with slightly reduced precision.
>
> • For macOS, *CLOCK_MONOTONIC_RAW* remains the preferred choice due to its
> higher resolution.
>
> • *CLOCK_MONOTONIC* is used as a fallback when neither of the above options
> is available.

-#if defined(__darwin__) && defined(CLOCK_MONOTONIC_RAW)
+#ifdef CLOCK_MONOTONIC_COARSE
+#define PG_INSTR_CLOCK CLOCK_MONOTONIC_COARSE
+#elif defined(__darwin__) && defined(CLOCK_MONOTONIC_RAW)

Why would we want to make this the default?  CLOCK_MONOTONIC_COARSE
could show benefits in some code paths.  Now, it can also have a
precision of a few milliseconds, and we have a bunch of code paths
that rely on clock_gettime() to be more precise than that so it could
lead to random decisions.  You could make that configurable with a
GUC, but it would mean plastering some decision-making in instr_time.h
based on such a GUC, which would likely be annoying performance-wise.

We are at the end of the v18 development cycle, so it is going to get
some time before you get any review.  Good to see that you are
tracking this patch in the commit fest:
https://commitfest.postgresql.org/patch/5669/
--
Michael

Вложения

Re: Use CLOCK_MONOTONIC_COARSE for instr_time when available

От
Jianghua Yang
Дата:
 It Makes sense, but we can distinguish such code which needs `CLOCK_MONOTONIC`.

Now I add the configure option `--with-clock-monotonic-coarse`.

Michael Paquier <michael@paquier.xyz> 于2025年3月26日周三 15:34写道:
On Wed, Mar 26, 2025 at 11:14:47AM -0700, 杨江华 wrote:
> This patch modifies the instr_time.h header to prefer CLOCK_MONOTONIC_COARSE
> when available. By using CLOCK_MONOTONIC_COARSE, we can leverage a lower
> resolution(4ms) but faster alternative for timing operations, which reduces
> the overhead of frequent timestamp retrievals. This change is expected to
> provide performance improvements, especially in scenarios with frequent
> timing operations.
>
> *Key Changes:*
>
> • *CLOCK_MONOTONIC_COARSE* is used when available, offering faster
> performance with slightly reduced precision.
>
> • For macOS, *CLOCK_MONOTONIC_RAW* remains the preferred choice due to its
> higher resolution.
>
> • *CLOCK_MONOTONIC* is used as a fallback when neither of the above options
> is available.

-#if defined(__darwin__) && defined(CLOCK_MONOTONIC_RAW)
+#ifdef CLOCK_MONOTONIC_COARSE
+#define PG_INSTR_CLOCK CLOCK_MONOTONIC_COARSE
+#elif defined(__darwin__) && defined(CLOCK_MONOTONIC_RAW)

Why would we want to make this the default?  CLOCK_MONOTONIC_COARSE
could show benefits in some code paths.  Now, it can also have a
precision of a few milliseconds, and we have a bunch of code paths
that rely on clock_gettime() to be more precise than that so it could
lead to random decisions.  You could make that configurable with a
GUC, but it would mean plastering some decision-making in instr_time.h
based on such a GUC, which would likely be annoying performance-wise.

We are at the end of the v18 development cycle, so it is going to get
some time before you get any review.  Good to see that you are
tracking this patch in the commit fest:
https://commitfest.postgresql.org/patch/5669/
--
Michael
Вложения

Re: Use CLOCK_MONOTONIC_COARSE for instr_time when available

От
Tom Lane
Дата:
=?UTF-8?B?5p2o5rGf5Y2O?= <yjhjstz@gmail.com> writes:
> This patch modifies the instr_time.h header to prefer CLOCK_MONOTONIC_COARSE
> when available.

As far as I know, our usage of instr_time really needs the highest
resolution available, because we are usually trying to measure pretty
short intervals.  You say that this patch reduces execution time,
and I imagine that's true ... but I wonder if it doesn't do so at
the cost of totally destroying the reliability of the output numbers.

            regards, tom lane



Re: Use CLOCK_MONOTONIC_COARSE for instr_time when available

От
wenhui qiu
Дата:
HI 

> As far as I know, our usage of instr_time really needs the highest
> resolution available, because we are usually trying to measure pretty
> short intervals.  You say that this patch reduces execution time,
> and I imagine that's true ... but I wonder if it doesn't do so at
> the cost of totally destroying the reliability of the output numbers. 
i strongly agree ,It seems like focusing on the small stuff while missing the big pictur

Re: Use CLOCK_MONOTONIC_COARSE for instr_time when available

От
Andres Freund
Дата:
Hi,

On 2025-03-26 23:09:42 -0400, Tom Lane wrote:
> =?UTF-8?B?5p2o5rGf5Y2O?= <yjhjstz@gmail.com> writes:
> > This patch modifies the instr_time.h header to prefer CLOCK_MONOTONIC_COARSE
> > when available.
> 
> As far as I know, our usage of instr_time really needs the highest
> resolution available, because we are usually trying to measure pretty
> short intervals.  You say that this patch reduces execution time,
> and I imagine that's true ... but I wonder if it doesn't do so at
> the cost of totally destroying the reliability of the output numbers.

The reason, on x86, the timestamp querying has a somewhat high overhead is
that the "accurate" "read the tsc" instruction serves as a barrier for
out-of-order execution. With modern highly out-of-order execution that means
we'll wait for all scheduled instructions to finish before determining the
current time, multiple times for each tuple.  That of course slows things down
substantially.

There's a patch to use the version of rdtsc that does *not* have barrier
semantics:
https://postgr.es/m/CAP53PkzO2KpscD-tgFW_V-4WS%2BvkniH4-B00eM-e0bsBF-xUxg%40mail.gmail.com

Greetings,

Andres Freund



Re: Use CLOCK_MONOTONIC_COARSE for instr_time when available

От
Jianghua Yang
Дата:

I agree, so this patch only affects explain analyze.

1. This change to CLOCK_MONOTONIC_COARSE only affects EXPLAIN ANALYZE and does not impact other modules.

The patch introduces optional support for CLOCK_MONOTONIC_COARSE specifically within the INSTR_TIMEinstrumentation framework. The modifications are guarded by the compile-time macro USE_CLOCK_MONOTONIC_COARSE, and are only used when gathering timing data for performance instrumentation. Given that INSTR_TIME is mainly used in EXPLAIN ANALYZE, and there are no changes to runtime or planner logic, this patch ensures that only diagnostic outputs are affected—leaving core execution paths and other modules untouched.


2. With this modification, EXPLAIN ANALYZE produces timing results that are closer to real-world wall-clock time, making performance debugging more accurate.


By using CLOCK_MONOTONIC_COARSE, which has lower overhead compared to CLOCK_MONOTONIC, the patch improves the efficiency of timing collection in EXPLAIN ANALYZE. While it may slightly reduce precision, the resulting measurements more closely reflect actual elapsed time observed by users, especially in performance-sensitive environments. This makes EXPLAIN ANALYZE outputs more reliable and helpful for developers diagnosing query performance bottlenecks.

--- origin version

explain analyze select count(*) from t1;
                                        Thu 27 Mar 2025 01:31:20 AM CST (every 1s)

                                                        QUERY PLAN                                                        
---------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1852876.63..1852876.64 rows=1 width=8) (actual time=4914.037..4914.038 rows=1 loops=1)
   ->  Seq Scan on t1  (cost=0.00..1570796.90 rows=112831890 width=0) (actual time=0.039..3072.303 rows=100000000 loops=1)
 Planning Time: 0.132 ms
 Execution Time: 4914.072 ms
(4 rows)

Time: 4914.676 ms (00:04.915)


--- apply patch

postgres=# explain analyze select count(*) from t1;
                                                        QUERY PLAN                                                        
---------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1692478.40..1692478.41 rows=1 width=8) (actual time=3116.164..3116.164 rows=1 loops=1)
   ->  Seq Scan on t1  (cost=0.00..1442478.32 rows=100000032 width=0) (actual time=0.000..2416.127 rows=100000000 loops=1)
 Planning Time: 0.000 ms
 Execution Time: 3116.164 ms
(4 rows)

Time: 3114.059 ms (00:03.114)
postgres=# select count(*) from t1;
   count  
-----------
 100000000
(1 row)

Time: 2086.130 ms (00:02.086)


Andres Freund <andres@anarazel.de> 于2025年3月27日周四 07:19写道:
Hi,

On 2025-03-26 23:09:42 -0400, Tom Lane wrote:
> 杨江华 <yjhjstz@gmail.com> writes:
> > This patch modifies the instr_time.h header to prefer CLOCK_MONOTONIC_COARSE
> > when available.
>
> As far as I know, our usage of instr_time really needs the highest
> resolution available, because we are usually trying to measure pretty
> short intervals.  You say that this patch reduces execution time,
> and I imagine that's true ... but I wonder if it doesn't do so at
> the cost of totally destroying the reliability of the output numbers.

The reason, on x86, the timestamp querying has a somewhat high overhead is
that the "accurate" "read the tsc" instruction serves as a barrier for
out-of-order execution. With modern highly out-of-order execution that means
we'll wait for all scheduled instructions to finish before determining the
current time, multiple times for each tuple.  That of course slows things down
substantially.

There's a patch to use the version of rdtsc that does *not* have barrier
semantics:
https://postgr.es/m/CAP53PkzO2KpscD-tgFW_V-4WS%2BvkniH4-B00eM-e0bsBF-xUxg%40mail.gmail.com

Greetings,

Andres Freund
Вложения

Re: Use CLOCK_MONOTONIC_COARSE for instr_time when available

От
Tom Lane
Дата:
Jianghua Yang <yjhjstz@gmail.com> writes:
> By using CLOCK_MONOTONIC_COARSE, which has lower overhead compared to
> CLOCK_MONOTONIC, the patch improves the efficiency of timing
> collection in EXPLAIN
> ANALYZE. While it may slightly reduce precision, the resulting measurements
> more closely reflect actual elapsed time observed by users, especially in
> performance-sensitive environments. This makes EXPLAIN ANALYZE outputs more
> reliable and helpful for developers diagnosing query performance
> bottlenecks.

Well, this is exactly the thing that everybody is worried about.
You're asserting on the basis of precisely zero evidence that this
will be an improvement; most of the rest of us expect that it will
destroy the reliability of the measurements.  "It's faster" is
totally insufficient as a reason to accept this change.  (As a
wise man once said, "I can make my program arbitrarily fast if
it doesn't have to give the right answer.")

If you want to have any chance at all that this gets committed,
you need to provide some evidence backing your claim that the
results are still sufficiently reliable.  The single test case
that you presented isn't impressive.  For one thing, it shows
nontrivial change of the amount of time spent in the seqscan
node vs. overall (62% vs 77%).  Which ratio is more reflective
of reality?  How stable are the numbers across multiple runs?
What will happen on other platforms besides your own machine?

BTW, I'm also unimpressed by the changes to limit this to
EXPLAIN ANALYZE.  There's no reason to think that our other uses of
instr_time.h are more sensitive to accuracy concerns than EXPLAIN
ANALYZE is; if anything they are less so, since we don't accumulate
a lot of very-tiny deltas anywhere else.  So if this is good enough
for EXPLAIN ANALYZE it should be fine for everything.

            regards, tom lane



Re: Use CLOCK_MONOTONIC_COARSE for instr_time when available

От
Tom Lane
Дата:
... BTW, another resource worth looking at is src/bin/pg_test_timing/
which we just improved a few days ago [1].  What I see on two different
Linux-on-Intel boxes is that the loop time that that reports is 16 ns
and change, and the clock readings appear accurate to full nanosecond
precision.  Changing instr_time.h to use CLOCK_MONOTONIC_COARSE, the
loop time drops to a bit over 5 ns, which would certainly be a nice
win if it were cost-free.  But the clock precision degrades to 1 ms.

It is really hard to believe that giving up a factor of a million
in clock precision is going to be an acceptable tradeoff for saving
~10 ns per clock reading.  Maybe with a lot of fancy statistical
arm-waving, and an assumption that people always look at averages
over long query runs, you could make a case that this change isn't
going to result in a disaster.  But EXPLAIN's results are surely
going to become garbage-in-garbage-out for any query that doesn't
run for (at least) hundreds of milliseconds.

            regards, tom lane

[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=0b096e379e6f9bd49d38020d880a7da337e570ad



Re: Use CLOCK_MONOTONIC_COARSE for instr_time when available

От
Andres Freund
Дата:
Hi,

On 2025-07-16 18:24:33 -0400, Tom Lane wrote:
> ... BTW, another resource worth looking at is src/bin/pg_test_timing/
> which we just improved a few days ago [1].  What I see on two different
> Linux-on-Intel boxes is that the loop time that that reports is 16 ns
> and change, and the clock readings appear accurate to full nanosecond
> precision.  Changing instr_time.h to use CLOCK_MONOTONIC_COARSE, the
> loop time drops to a bit over 5 ns, which would certainly be a nice
> win if it were cost-free.  But the clock precision degrades to 1 ms.

FWIW, switching to using rtscp for timestamp acqusition substantially reduces
timing overhead, albeit not quite as low as 5ns, without loosing any
meaningful precision.  The patch from [1] needs to be rebased unfortunately.

Separately, the amount of work we're now doing for each loop iteration in
test_timing() got to start having some effect, no?

Greetings,

Andres

[1] https://postgr.es/m/CAP53PkzO2KpscD-tgFW_V-4WS%2BvkniH4-B00eM-e0bsBF-xUxg%40mail.gmail.com