Re: Possible performance issues: DISTINCT ON + ORDER BY and JIT with aggregation
| От | Laurenz Albe |
|---|---|
| Тема | Re: Possible performance issues: DISTINCT ON + ORDER BY and JIT with aggregation |
| Дата | |
| Msg-id | 90df511ca8ff12c52e05b7ed3500b1233ef93def.camel@cybertec.at обсуждение исходный текст |
| Ответ на | Possible performance issues: DISTINCT ON + ORDER BY and JIT with aggregation (Jhonathan Cruz <jhonathancruz48@gmail.com>) |
| Список | pgsql-bugs |
On Tue, 2025-12-16 at 21:25 -0300, Jhonathan Cruz wrote: > I would like to report two query execution behaviors that may indicate > performance issues or regressions. I am not certain whether these are > known or expected behaviors, so I would appreciate guidance. > > ---------------------------------------------------------------------- > 1) DISTINCT ON combined with ORDER BY > ---------------------------------------------------------------------- > > I observed cases where queries using DISTINCT ON together with ORDER BY > produce execution plans with explicit sorting steps, even when the ordering > requirements are clearly defined. > > Query: > > EXPLAIN (ANALYZE, BUFFERS) > SELECT DISTINCT ON (grp) > grp, id, created_at > FROM t > ORDER BY grp, created_at DESC; > > Example EXPLAIN ANALYZE output: > > Unique (cost=45231.12..47731.12 rows=100 width=24) > (actual time=182.413..198.721 rows=100 loops=1) > Buffers: shared hit=12845 > -> Sort (cost=45231.12..46481.12 rows=500000 width=24) > (actual time=182.410..191.256 rows=500000 loops=1) > Sort Key: grp, created_at DESC > Sort Method: quicksort Memory: 51200kB > Buffers: shared hit=12845 > -> Seq Scan on t (cost=0.00..11231.00 rows=500000 width=24) > (actual time=0.012..38.117 rows=500000 loops=1) > Buffers: shared hit=12845 > Planning Time: 0.213 ms > Execution Time: 202.981 ms > > Question: > - Is this planner behavior expected for DISTINCT ON queries, or could > this case potentially avoid a full sort? That is normal and as expected. If you need all result rows, you have to sort all the rows to find the unique results. If you put a LIMIT clause on the query, PostgreSQL could perform a top-N sort. DISTINCT is easy to write, but sometimes the most expensive part of a query. I don't think that can be avoided. > ---------------------------------------------------------------------- > 2) Possible JIT performance regression with aggregation and light filtering > ---------------------------------------------------------------------- > > On PostgreSQL 16.x, I observed that queries performing aggregation with > light filtering may run slower with JIT enabled compared to JIT disabled. > > Query: > > SET jit = on; > > EXPLAIN (ANALYZE, BUFFERS) > SELECT grp, count(*) > FROM t > WHERE id > 100 > GROUP BY grp; > > Example EXPLAIN ANALYZE output (JIT enabled): > > HashAggregate (cost=12431.22..12441.22 rows=100 width=12) > (actual time=156.234..156.941 rows=100 loops=1) > Group Key: grp > Buffers: shared hit=12483 > -> Seq Scan on t (cost=0.00..11231.00 rows=499900 width=4) > (actual time=0.014..52.731 rows=499900 loops=1) > Filter: (id > 100) > Rows Removed by Filter: 100 > Buffers: shared hit=12483 > JIT: > Functions: 5 > Options: Inlining true, Optimization true, Expressions true, Deforming true > Timing: Generation 2.312 ms, Inlining 18.942 ms, > Optimization 94.117 ms, Emission 22.884 ms, Total 138.255 ms > Planning Time: 0.184 ms > Execution Time: 159.214 ms Now this is more like a bug. JIT still has its problems. I think not the least of the problems is that PostgreSQL cannot predict how long the compilation will take. At the risk of drawing fire, I have now come to recommend turning JIT off unless you have an analytical workload. Yours, Laurenz Albe
В списке pgsql-bugs по дате отправления: