Обсуждение: Possible performance issues: DISTINCT ON + ORDER BY and JIT with aggregation

Поиск
Список
Период
Сортировка

Possible performance issues: DISTINCT ON + ORDER BY and JIT with aggregation

От
Jhonathan Cruz
Дата:
Hello,

I would like to report two query execution behaviors that may indicate
performance issues or regressions. I am not certain whether these are
known or expected behaviors, so I would appreciate guidance.

----------------------------------------------------------------------
1) DISTINCT ON combined with ORDER BY
----------------------------------------------------------------------

I observed cases where queries using DISTINCT ON together with ORDER BY
produce execution plans with explicit sorting steps, even when the ordering
requirements are clearly defined.

Minimal reproduction:

CREATE TABLE t (
    id bigint,
    grp int,
    created_at timestamp
);

INSERT INTO t
SELECT
    g,
    g % 100,
    now() - (g || ' seconds')::interval
FROM generate_series(1, 500000) g;

ANALYZE t;

Query:

EXPLAIN (ANALYZE, BUFFERS)
SELECT DISTINCT ON (grp)
       grp, id, created_at
FROM t
ORDER BY grp, created_at DESC;

Example EXPLAIN ANALYZE output:

Unique  (cost=45231.12..47731.12 rows=100 width=24)
        (actual time=182.413..198.721 rows=100 loops=1)
  Buffers: shared hit=12845
  ->  Sort  (cost=45231.12..46481.12 rows=500000 width=24)
            (actual time=182.410..191.256 rows=500000 loops=1)
        Sort Key: grp, created_at DESC
        Sort Method: quicksort  Memory: 51200kB
        Buffers: shared hit=12845
        ->  Seq Scan on t  (cost=0.00..11231.00 rows=500000 width=24)
              (actual time=0.012..38.117 rows=500000 loops=1)
              Buffers: shared hit=12845
Planning Time: 0.213 ms
Execution Time: 202.981 ms

Observed behavior:
- Explicit Sort node before Unique
- Full sort over all rows
- Execution time higher than expected

Question:
- Is this planner behavior expected for DISTINCT ON queries, or could
  this case potentially avoid a full sort?

----------------------------------------------------------------------
2) Possible JIT performance regression with aggregation and light filtering
----------------------------------------------------------------------

On PostgreSQL 16.x, I observed that queries performing aggregation with
light filtering may run slower with JIT enabled compared to JIT disabled.

Query:

SET jit = on;

EXPLAIN (ANALYZE, BUFFERS)
SELECT grp, count(*)
FROM t
WHERE id > 100
GROUP BY grp;

Example EXPLAIN ANALYZE output (JIT enabled):

HashAggregate  (cost=12431.22..12441.22 rows=100 width=12)
               (actual time=156.234..156.941 rows=100 loops=1)
  Group Key: grp
  Buffers: shared hit=12483
  ->  Seq Scan on t  (cost=0.00..11231.00 rows=499900 width=4)
        (actual time=0.014..52.731 rows=499900 loops=1)
        Filter: (id > 100)
        Rows Removed by Filter: 100
        Buffers: shared hit=12483
JIT:
  Functions: 5
  Options: Inlining true, Optimization true, Expressions true, Deforming true
  Timing: Generation 2.312 ms, Inlining 18.942 ms,
          Optimization 94.117 ms, Emission 22.884 ms, Total 138.255 ms
Planning Time: 0.184 ms
Execution Time: 159.214 ms

For comparison, with JIT disabled:

SET jit = off;

EXPLAIN (ANALYZE, BUFFERS)
SELECT grp, count(*)
FROM t
WHERE id > 100
GROUP BY grp;

Example EXPLAIN ANALYZE output (JIT disabled):

HashAggregate  (cost=12431.22..12441.22 rows=100 width=12)
               (actual time=41.312..42.117 rows=100 loops=1)
  Group Key: grp
  Buffers: shared hit=12483
  ->  Seq Scan on t  (cost=0.00..11231.00 rows=499900 width=4)
        (actual time=0.011..35.912 rows=499900 loops=1)
        Filter: (id > 100)
        Rows Removed by Filter: 100
        Buffers: shared hit=12483
Planning Time: 0.121 ms
Execution Time: 44.287 ms

Observed behavior:
- JIT compilation time dominates query execution
- Query runs significantly faster with JIT disabled
- This differs from behavior observed on PostgreSQL 14.x for similar queries

Environment:
- PostgreSQL version: 16.x
- OS: <fill if needed>
- CPU: <fill if needed>
- jit: on/off
- shared_buffers: <fill if needed>

I am unsure whether these behaviors are expected, regressions, or already
tracked issues, and would appreciate any feedback or direction.

Regards,
Jhonathan Cruz

Re: Possible performance issues: DISTINCT ON + ORDER BY and JIT with aggregation

От
Laurenz Albe
Дата:
On Tue, 2025-12-16 at 21:25 -0300, Jhonathan Cruz wrote:
> I would like to report two query execution behaviors that may indicate
> performance issues or regressions. I am not certain whether these are
> known or expected behaviors, so I would appreciate guidance.
>
> ----------------------------------------------------------------------
> 1) DISTINCT ON combined with ORDER BY
> ----------------------------------------------------------------------
>
> I observed cases where queries using DISTINCT ON together with ORDER BY
> produce execution plans with explicit sorting steps, even when the ordering
> requirements are clearly defined.
>
> Query:
>
> EXPLAIN (ANALYZE, BUFFERS)
> SELECT DISTINCT ON (grp)
>        grp, id, created_at
> FROM t
> ORDER BY grp, created_at DESC;
>
> Example EXPLAIN ANALYZE output:
>
> Unique  (cost=45231.12..47731.12 rows=100 width=24)
>         (actual time=182.413..198.721 rows=100 loops=1)
>   Buffers: shared hit=12845
>   ->  Sort  (cost=45231.12..46481.12 rows=500000 width=24)
>             (actual time=182.410..191.256 rows=500000 loops=1)
>         Sort Key: grp, created_at DESC
>         Sort Method: quicksort  Memory: 51200kB
>         Buffers: shared hit=12845
>         ->  Seq Scan on t  (cost=0.00..11231.00 rows=500000 width=24)
>               (actual time=0.012..38.117 rows=500000 loops=1)
>               Buffers: shared hit=12845
> Planning Time: 0.213 ms
> Execution Time: 202.981 ms
>
> Question:
> - Is this planner behavior expected for DISTINCT ON queries, or could
>   this case potentially avoid a full sort?

That is normal and as expected.  If you need all result rows, you have
to sort all the rows to find the unique results.

If you put a LIMIT clause on the query, PostgreSQL could perform a
top-N sort.

DISTINCT is easy to write, but sometimes the most expensive part of a
query.  I don't think that can be avoided.

> ----------------------------------------------------------------------
> 2) Possible JIT performance regression with aggregation and light filtering
> ----------------------------------------------------------------------
>
> On PostgreSQL 16.x, I observed that queries performing aggregation with
> light filtering may run slower with JIT enabled compared to JIT disabled.
>
> Query:
>
> SET jit = on;
>
> EXPLAIN (ANALYZE, BUFFERS)
> SELECT grp, count(*)
> FROM t
> WHERE id > 100
> GROUP BY grp;
>
> Example EXPLAIN ANALYZE output (JIT enabled):
>
> HashAggregate  (cost=12431.22..12441.22 rows=100 width=12)
>                (actual time=156.234..156.941 rows=100 loops=1)
>   Group Key: grp
>   Buffers: shared hit=12483
>   ->  Seq Scan on t  (cost=0.00..11231.00 rows=499900 width=4)
>         (actual time=0.014..52.731 rows=499900 loops=1)
>         Filter: (id > 100)
>         Rows Removed by Filter: 100
>         Buffers: shared hit=12483
> JIT:
>   Functions: 5
>   Options: Inlining true, Optimization true, Expressions true, Deforming true
>   Timing: Generation 2.312 ms, Inlining 18.942 ms,
>           Optimization 94.117 ms, Emission 22.884 ms, Total 138.255 ms
> Planning Time: 0.184 ms
> Execution Time: 159.214 ms

Now this is more like a bug.  JIT still has its problems.  I think not the
least of the problems is that PostgreSQL cannot predict how long the
compilation will take.

At the risk of drawing fire, I have now come to recommend turning JIT off
unless you have an analytical workload.

Yours,
Laurenz Albe