Обсуждение: Possible performance issues: DISTINCT ON + ORDER BY and JIT with aggregation
Possible performance issues: DISTINCT ON + ORDER BY and JIT with aggregation
От
Jhonathan Cruz
Дата:
Hello,
I would like to report two query execution behaviors that may indicate
performance issues or regressions. I am not certain whether these are
known or expected behaviors, so I would appreciate guidance.
----------------------------------------------------------------------
1) DISTINCT ON combined with ORDER BY
----------------------------------------------------------------------
I observed cases where queries using DISTINCT ON together with ORDER BY
produce execution plans with explicit sorting steps, even when the ordering
requirements are clearly defined.
Minimal reproduction:
CREATE TABLE t (
id bigint,
grp int,
created_at timestamp
);
INSERT INTO t
SELECT
g,
g % 100,
now() - (g || ' seconds')::interval
FROM generate_series(1, 500000) g;
ANALYZE t;
Query:
EXPLAIN (ANALYZE, BUFFERS)
SELECT DISTINCT ON (grp)
grp, id, created_at
FROM t
ORDER BY grp, created_at DESC;
Example EXPLAIN ANALYZE output:
Unique (cost=45231.12..47731.12 rows=100 width=24)
(actual time=182.413..198.721 rows=100 loops=1)
Buffers: shared hit=12845
-> Sort (cost=45231.12..46481.12 rows=500000 width=24)
(actual time=182.410..191.256 rows=500000 loops=1)
Sort Key: grp, created_at DESC
Sort Method: quicksort Memory: 51200kB
Buffers: shared hit=12845
-> Seq Scan on t (cost=0.00..11231.00 rows=500000 width=24)
(actual time=0.012..38.117 rows=500000 loops=1)
Buffers: shared hit=12845
Planning Time: 0.213 ms
Execution Time: 202.981 ms
Observed behavior:
- Explicit Sort node before Unique
- Full sort over all rows
- Execution time higher than expected
Question:
- Is this planner behavior expected for DISTINCT ON queries, or could
this case potentially avoid a full sort?
----------------------------------------------------------------------
2) Possible JIT performance regression with aggregation and light filtering
----------------------------------------------------------------------
On PostgreSQL 16.x, I observed that queries performing aggregation with
light filtering may run slower with JIT enabled compared to JIT disabled.
Query:
SET jit = on;
EXPLAIN (ANALYZE, BUFFERS)
SELECT grp, count(*)
FROM t
WHERE id > 100
GROUP BY grp;
Example EXPLAIN ANALYZE output (JIT enabled):
HashAggregate (cost=12431.22..12441.22 rows=100 width=12)
(actual time=156.234..156.941 rows=100 loops=1)
Group Key: grp
Buffers: shared hit=12483
-> Seq Scan on t (cost=0.00..11231.00 rows=499900 width=4)
(actual time=0.014..52.731 rows=499900 loops=1)
Filter: (id > 100)
Rows Removed by Filter: 100
Buffers: shared hit=12483
JIT:
Functions: 5
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 2.312 ms, Inlining 18.942 ms,
Optimization 94.117 ms, Emission 22.884 ms, Total 138.255 ms
Planning Time: 0.184 ms
Execution Time: 159.214 ms
For comparison, with JIT disabled:
SET jit = off;
EXPLAIN (ANALYZE, BUFFERS)
SELECT grp, count(*)
FROM t
WHERE id > 100
GROUP BY grp;
Example EXPLAIN ANALYZE output (JIT disabled):
HashAggregate (cost=12431.22..12441.22 rows=100 width=12)
(actual time=41.312..42.117 rows=100 loops=1)
Group Key: grp
Buffers: shared hit=12483
-> Seq Scan on t (cost=0.00..11231.00 rows=499900 width=4)
(actual time=0.011..35.912 rows=499900 loops=1)
Filter: (id > 100)
Rows Removed by Filter: 100
Buffers: shared hit=12483
Planning Time: 0.121 ms
Execution Time: 44.287 ms
Observed behavior:
- JIT compilation time dominates query execution
- Query runs significantly faster with JIT disabled
- This differs from behavior observed on PostgreSQL 14.x for similar queries
Environment:
- PostgreSQL version: 16.x
- OS: <fill if needed>
- CPU: <fill if needed>
- jit: on/off
- shared_buffers: <fill if needed>
I am unsure whether these behaviors are expected, regressions, or already
tracked issues, and would appreciate any feedback or direction.
Regards,
Jhonathan Cruz
I would like to report two query execution behaviors that may indicate
performance issues or regressions. I am not certain whether these are
known or expected behaviors, so I would appreciate guidance.
----------------------------------------------------------------------
1) DISTINCT ON combined with ORDER BY
----------------------------------------------------------------------
I observed cases where queries using DISTINCT ON together with ORDER BY
produce execution plans with explicit sorting steps, even when the ordering
requirements are clearly defined.
Minimal reproduction:
CREATE TABLE t (
id bigint,
grp int,
created_at timestamp
);
INSERT INTO t
SELECT
g,
g % 100,
now() - (g || ' seconds')::interval
FROM generate_series(1, 500000) g;
ANALYZE t;
Query:
EXPLAIN (ANALYZE, BUFFERS)
SELECT DISTINCT ON (grp)
grp, id, created_at
FROM t
ORDER BY grp, created_at DESC;
Example EXPLAIN ANALYZE output:
Unique (cost=45231.12..47731.12 rows=100 width=24)
(actual time=182.413..198.721 rows=100 loops=1)
Buffers: shared hit=12845
-> Sort (cost=45231.12..46481.12 rows=500000 width=24)
(actual time=182.410..191.256 rows=500000 loops=1)
Sort Key: grp, created_at DESC
Sort Method: quicksort Memory: 51200kB
Buffers: shared hit=12845
-> Seq Scan on t (cost=0.00..11231.00 rows=500000 width=24)
(actual time=0.012..38.117 rows=500000 loops=1)
Buffers: shared hit=12845
Planning Time: 0.213 ms
Execution Time: 202.981 ms
Observed behavior:
- Explicit Sort node before Unique
- Full sort over all rows
- Execution time higher than expected
Question:
- Is this planner behavior expected for DISTINCT ON queries, or could
this case potentially avoid a full sort?
----------------------------------------------------------------------
2) Possible JIT performance regression with aggregation and light filtering
----------------------------------------------------------------------
On PostgreSQL 16.x, I observed that queries performing aggregation with
light filtering may run slower with JIT enabled compared to JIT disabled.
Query:
SET jit = on;
EXPLAIN (ANALYZE, BUFFERS)
SELECT grp, count(*)
FROM t
WHERE id > 100
GROUP BY grp;
Example EXPLAIN ANALYZE output (JIT enabled):
HashAggregate (cost=12431.22..12441.22 rows=100 width=12)
(actual time=156.234..156.941 rows=100 loops=1)
Group Key: grp
Buffers: shared hit=12483
-> Seq Scan on t (cost=0.00..11231.00 rows=499900 width=4)
(actual time=0.014..52.731 rows=499900 loops=1)
Filter: (id > 100)
Rows Removed by Filter: 100
Buffers: shared hit=12483
JIT:
Functions: 5
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 2.312 ms, Inlining 18.942 ms,
Optimization 94.117 ms, Emission 22.884 ms, Total 138.255 ms
Planning Time: 0.184 ms
Execution Time: 159.214 ms
For comparison, with JIT disabled:
SET jit = off;
EXPLAIN (ANALYZE, BUFFERS)
SELECT grp, count(*)
FROM t
WHERE id > 100
GROUP BY grp;
Example EXPLAIN ANALYZE output (JIT disabled):
HashAggregate (cost=12431.22..12441.22 rows=100 width=12)
(actual time=41.312..42.117 rows=100 loops=1)
Group Key: grp
Buffers: shared hit=12483
-> Seq Scan on t (cost=0.00..11231.00 rows=499900 width=4)
(actual time=0.011..35.912 rows=499900 loops=1)
Filter: (id > 100)
Rows Removed by Filter: 100
Buffers: shared hit=12483
Planning Time: 0.121 ms
Execution Time: 44.287 ms
Observed behavior:
- JIT compilation time dominates query execution
- Query runs significantly faster with JIT disabled
- This differs from behavior observed on PostgreSQL 14.x for similar queries
Environment:
- PostgreSQL version: 16.x
- OS: <fill if needed>
- CPU: <fill if needed>
- jit: on/off
- shared_buffers: <fill if needed>
I am unsure whether these behaviors are expected, regressions, or already
tracked issues, and would appreciate any feedback or direction.
Regards,
Jhonathan Cruz
Re: Possible performance issues: DISTINCT ON + ORDER BY and JIT with aggregation
От
Laurenz Albe
Дата:
On Tue, 2025-12-16 at 21:25 -0300, Jhonathan Cruz wrote: > I would like to report two query execution behaviors that may indicate > performance issues or regressions. I am not certain whether these are > known or expected behaviors, so I would appreciate guidance. > > ---------------------------------------------------------------------- > 1) DISTINCT ON combined with ORDER BY > ---------------------------------------------------------------------- > > I observed cases where queries using DISTINCT ON together with ORDER BY > produce execution plans with explicit sorting steps, even when the ordering > requirements are clearly defined. > > Query: > > EXPLAIN (ANALYZE, BUFFERS) > SELECT DISTINCT ON (grp) > grp, id, created_at > FROM t > ORDER BY grp, created_at DESC; > > Example EXPLAIN ANALYZE output: > > Unique (cost=45231.12..47731.12 rows=100 width=24) > (actual time=182.413..198.721 rows=100 loops=1) > Buffers: shared hit=12845 > -> Sort (cost=45231.12..46481.12 rows=500000 width=24) > (actual time=182.410..191.256 rows=500000 loops=1) > Sort Key: grp, created_at DESC > Sort Method: quicksort Memory: 51200kB > Buffers: shared hit=12845 > -> Seq Scan on t (cost=0.00..11231.00 rows=500000 width=24) > (actual time=0.012..38.117 rows=500000 loops=1) > Buffers: shared hit=12845 > Planning Time: 0.213 ms > Execution Time: 202.981 ms > > Question: > - Is this planner behavior expected for DISTINCT ON queries, or could > this case potentially avoid a full sort? That is normal and as expected. If you need all result rows, you have to sort all the rows to find the unique results. If you put a LIMIT clause on the query, PostgreSQL could perform a top-N sort. DISTINCT is easy to write, but sometimes the most expensive part of a query. I don't think that can be avoided. > ---------------------------------------------------------------------- > 2) Possible JIT performance regression with aggregation and light filtering > ---------------------------------------------------------------------- > > On PostgreSQL 16.x, I observed that queries performing aggregation with > light filtering may run slower with JIT enabled compared to JIT disabled. > > Query: > > SET jit = on; > > EXPLAIN (ANALYZE, BUFFERS) > SELECT grp, count(*) > FROM t > WHERE id > 100 > GROUP BY grp; > > Example EXPLAIN ANALYZE output (JIT enabled): > > HashAggregate (cost=12431.22..12441.22 rows=100 width=12) > (actual time=156.234..156.941 rows=100 loops=1) > Group Key: grp > Buffers: shared hit=12483 > -> Seq Scan on t (cost=0.00..11231.00 rows=499900 width=4) > (actual time=0.014..52.731 rows=499900 loops=1) > Filter: (id > 100) > Rows Removed by Filter: 100 > Buffers: shared hit=12483 > JIT: > Functions: 5 > Options: Inlining true, Optimization true, Expressions true, Deforming true > Timing: Generation 2.312 ms, Inlining 18.942 ms, > Optimization 94.117 ms, Emission 22.884 ms, Total 138.255 ms > Planning Time: 0.184 ms > Execution Time: 159.214 ms Now this is more like a bug. JIT still has its problems. I think not the least of the problems is that PostgreSQL cannot predict how long the compilation will take. At the risk of drawing fire, I have now come to recommend turning JIT off unless you have an analytical workload. Yours, Laurenz Albe