Thread
-
Possible performance issues: DISTINCT ON + ORDER BY and JIT with aggregation
Jhonathan Cruz <jhonathancruz48@gmail.com> — 2025-12-17T00:25:08Z
Hello, I would like to report two query execution behaviors that may indicate performance issues or regressions. I am not certain whether these are known or expected behaviors, so I would appreciate guidance. ---------------------------------------------------------------------- 1) DISTINCT ON combined with ORDER BY ---------------------------------------------------------------------- I observed cases where queries using DISTINCT ON together with ORDER BY produce execution plans with explicit sorting steps, even when the ordering requirements are clearly defined. Minimal reproduction: CREATE TABLE t ( id bigint, grp int, created_at timestamp ); INSERT INTO t SELECT g, g % 100, now() - (g || ' seconds')::interval FROM generate_series(1, 500000) g; ANALYZE t; Query: EXPLAIN (ANALYZE, BUFFERS) SELECT DISTINCT ON (grp) grp, id, created_at FROM t ORDER BY grp, created_at DESC; Example EXPLAIN ANALYZE output: Unique (cost=45231.12..47731.12 rows=100 width=24) (actual time=182.413..198.721 rows=100 loops=1) Buffers: shared hit=12845 -> Sort (cost=45231.12..46481.12 rows=500000 width=24) (actual time=182.410..191.256 rows=500000 loops=1) Sort Key: grp, created_at DESC Sort Method: quicksort Memory: 51200kB Buffers: shared hit=12845 -> Seq Scan on t (cost=0.00..11231.00 rows=500000 width=24) (actual time=0.012..38.117 rows=500000 loops=1) Buffers: shared hit=12845 Planning Time: 0.213 ms Execution Time: 202.981 ms Observed behavior: - Explicit Sort node before Unique - Full sort over all rows - Execution time higher than expected Question: - Is this planner behavior expected for DISTINCT ON queries, or could this case potentially avoid a full sort? ---------------------------------------------------------------------- 2) Possible JIT performance regression with aggregation and light filtering ---------------------------------------------------------------------- On PostgreSQL 16.x, I observed that queries performing aggregation with light filtering may run slower with JIT enabled compared to JIT disabled. Query: SET jit = on; EXPLAIN (ANALYZE, BUFFERS) SELECT grp, count(*) FROM t WHERE id > 100 GROUP BY grp; Example EXPLAIN ANALYZE output (JIT enabled): HashAggregate (cost=12431.22..12441.22 rows=100 width=12) (actual time=156.234..156.941 rows=100 loops=1) Group Key: grp Buffers: shared hit=12483 -> Seq Scan on t (cost=0.00..11231.00 rows=499900 width=4) (actual time=0.014..52.731 rows=499900 loops=1) Filter: (id > 100) Rows Removed by Filter: 100 Buffers: shared hit=12483 JIT: Functions: 5 Options: Inlining true, Optimization true, Expressions true, Deforming true Timing: Generation 2.312 ms, Inlining 18.942 ms, Optimization 94.117 ms, Emission 22.884 ms, Total 138.255 ms Planning Time: 0.184 ms Execution Time: 159.214 ms For comparison, with JIT disabled: SET jit = off; EXPLAIN (ANALYZE, BUFFERS) SELECT grp, count(*) FROM t WHERE id > 100 GROUP BY grp; Example EXPLAIN ANALYZE output (JIT disabled): HashAggregate (cost=12431.22..12441.22 rows=100 width=12) (actual time=41.312..42.117 rows=100 loops=1) Group Key: grp Buffers: shared hit=12483 -> Seq Scan on t (cost=0.00..11231.00 rows=499900 width=4) (actual time=0.011..35.912 rows=499900 loops=1) Filter: (id > 100) Rows Removed by Filter: 100 Buffers: shared hit=12483 Planning Time: 0.121 ms Execution Time: 44.287 ms Observed behavior: - JIT compilation time dominates query execution - Query runs significantly faster with JIT disabled - This differs from behavior observed on PostgreSQL 14.x for similar queries Environment: - PostgreSQL version: 16.x - OS: <fill if needed> - CPU: <fill if needed> - jit: on/off - shared_buffers: <fill if needed> I am unsure whether these behaviors are expected, regressions, or already tracked issues, and would appreciate any feedback or direction. Regards, Jhonathan Cruz