Thread

  1. Re: Possible performance issues: DISTINCT ON + ORDER BY and JIT with aggregation

    Laurenz Albe <laurenz.albe@cybertec.at> — 2025-12-17T15:14:11Z

    On Tue, 2025-12-16 at 21:25 -0300, Jhonathan Cruz wrote:
    > I would like to report two query execution behaviors that may indicate
    > performance issues or regressions. I am not certain whether these are
    > known or expected behaviors, so I would appreciate guidance.
    > 
    > ----------------------------------------------------------------------
    > 1) DISTINCT ON combined with ORDER BY
    > ----------------------------------------------------------------------
    > 
    > I observed cases where queries using DISTINCT ON together with ORDER BY
    > produce execution plans with explicit sorting steps, even when the ordering
    > requirements are clearly defined.
    > 
    > Query:
    > 
    > EXPLAIN (ANALYZE, BUFFERS)
    > SELECT DISTINCT ON (grp)
    >        grp, id, created_at
    > FROM t
    > ORDER BY grp, created_at DESC;
    > 
    > Example EXPLAIN ANALYZE output:
    > 
    > Unique  (cost=45231.12..47731.12 rows=100 width=24)
    >         (actual time=182.413..198.721 rows=100 loops=1)
    >   Buffers: shared hit=12845
    >   ->  Sort  (cost=45231.12..46481.12 rows=500000 width=24)
    >             (actual time=182.410..191.256 rows=500000 loops=1)
    >         Sort Key: grp, created_at DESC
    >         Sort Method: quicksort  Memory: 51200kB
    >         Buffers: shared hit=12845
    >         ->  Seq Scan on t  (cost=0.00..11231.00 rows=500000 width=24)
    >               (actual time=0.012..38.117 rows=500000 loops=1)
    >               Buffers: shared hit=12845
    > Planning Time: 0.213 ms
    > Execution Time: 202.981 ms
    > 
    > Question:
    > - Is this planner behavior expected for DISTINCT ON queries, or could
    >   this case potentially avoid a full sort?
    
    That is normal and as expected.  If you need all result rows, you have
    to sort all the rows to find the unique results.
    
    If you put a LIMIT clause on the query, PostgreSQL could perform a
    top-N sort.
    
    DISTINCT is easy to write, but sometimes the most expensive part of a
    query.  I don't think that can be avoided.
    
    > ----------------------------------------------------------------------
    > 2) Possible JIT performance regression with aggregation and light filtering
    > ----------------------------------------------------------------------
    > 
    > On PostgreSQL 16.x, I observed that queries performing aggregation with
    > light filtering may run slower with JIT enabled compared to JIT disabled.
    > 
    > Query:
    > 
    > SET jit = on;
    > 
    > EXPLAIN (ANALYZE, BUFFERS)
    > SELECT grp, count(*)
    > FROM t
    > WHERE id > 100
    > GROUP BY grp;
    > 
    > Example EXPLAIN ANALYZE output (JIT enabled):
    > 
    > HashAggregate  (cost=12431.22..12441.22 rows=100 width=12)
    >                (actual time=156.234..156.941 rows=100 loops=1)
    >   Group Key: grp
    >   Buffers: shared hit=12483
    >   ->  Seq Scan on t  (cost=0.00..11231.00 rows=499900 width=4)
    >         (actual time=0.014..52.731 rows=499900 loops=1)
    >         Filter: (id > 100)
    >         Rows Removed by Filter: 100
    >         Buffers: shared hit=12483
    > JIT:
    >   Functions: 5
    >   Options: Inlining true, Optimization true, Expressions true, Deforming true
    >   Timing: Generation 2.312 ms, Inlining 18.942 ms,
    >           Optimization 94.117 ms, Emission 22.884 ms, Total 138.255 ms
    > Planning Time: 0.184 ms
    > Execution Time: 159.214 ms
    
    Now this is more like a bug.  JIT still has its problems.  I think not the
    least of the problems is that PostgreSQL cannot predict how long the
    compilation will take.
    
    At the risk of drawing fire, I have now come to recommend turning JIT off
    unless you have an analytical workload.
    
    Yours,
    Laurenz Albe