Thread
-
Re: Eager aggregation, take 3
Radim Marek <radim@boringsql.com> — 2026-05-29T15:55:01Z
Hey Richard, I might be out of my depth here - but while testing RegreSQL as correctness/performance harness on PostgreSQL it picked up a problem with the wrong-results case during eager aggregation. It reproduces on current HEAD (commit 2670cc298f42cd7b1c426bf7ccfb0652d8e0b347 now) with enable_eager_aggregate enabled. My testing environment - Linux aarch64, gcc 12 (Debian) - macOS arm64, Apple clang 21 (PostgreSQL 19devel on aarch64-apple-darwin25.5.0) == How to reproduce CREATE TEMP TABLE c(id int, country text); CREATE TEMP TABLE o(customer_id int); INSERT INTO c VALUES (1,'US'),(2,'US'),(3,'DE'),(4,'DE'),(5,'DE'); INSERT INTO o VALUES (1),(3); -- only customers 1 and 3 have a row in o SELECT c.country, count(*) AS n FROM c WHERE NOT EXISTS (SELECT 1 FROM o WHERE o.customer_id = c.id) GROUP BY c.country ORDER BY c.country; Expected results (everywhere except master) country | n ---------+--- DE | 2 US | 1 (2 rows) The actual result with enable_eager_aggregate = on (default) country | n ---------+--- DE | 0 US | 0 (2 rows) With SET enable_eager_aggregate = off, the result is correct (DE=2, US=1), as it is on PG18. Query Plan QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------- Sort (cost=108.19..108.69 rows=200 width=40) (actual time=0.195..0.197 rows=2.00 loops=1) Sort Key: c.country Sort Method: quicksort Memory: 25kB Buffers: local hit=2 -> Finalize HashAggregate (cost=98.55..100.55 rows=200 width=40) (actual time=0.183..0.186 rows=2.00 loops=1) Group Key: c.country Batches: 1 Memory Usage: 32kB Buffers: local hit=2 -> Hash Anti Join (cost=52.75..95.37 rows=635 width=40) (actual time=0.177..0.179 rows=3.00 loops=1) Hash Cond: (c.id = o.customer_id) Buffers: local hit=2 -> Seq Scan on c (cost=0.00..22.70 rows=1270 width=36) (actual time=0.024..0.025 rows=5.00 loops=1) Buffers: local hit=1 -> Hash (cost=50.25..50.25 rows=200 width=12) (actual time=0.145..0.146 rows=2.00 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB Buffers: local hit=1 -> Partial HashAggregate (cost=48.25..50.25 rows=200 width=12) (actual time=0.122..0.123 rows=2.00 loops=1) Group Key: o.customer_id Batches: 1 Memory Usage: 32kB Buffers: local hit=1 -> Seq Scan on o (cost=0.00..35.50 rows=2550 width=4) (actual time=0.002..0.003 rows=2.00 loops=1) Buffers: local hit=1 Planning Time: 0.294 ms Execution Time: 0.255 ms (24 rows) If this is already known or in progress, apologies for the noise. --- Radim On Fri, 29 May 2026 at 17:25, Richard Guo <guofenglinux@gmail.com> wrote: > Hi All, > > Eager aggregation is a query optimization technique that partially > pushes a group-by past a join, and finalizes it once all the relations > are joined. Eager aggregation reduces the number of input rows to the > join and thus may result in a better overall plan. This technique is > thoroughly described in the 'Eager Aggregation and Lazy Aggregation' > paper [1]. > > Back in 2017, a patch set has been proposed by Antonin Houska to > implement eager aggregation in thread [2]. However, it was at last > withdrawn after entering the pattern of "please rebase thx" followed by > rebasing and getting no feedback until "please rebase again thx". A > second attempt in 2022 unfortunately fell into the same pattern about > one year ago and was eventually closed again [3]. > > That patch set has included most of the necessary concepts to implement > eager aggregation. However, as far as I can see, it has several weak > points that we need to address. It introduces invasive changes to some > core planner functions, such as make_join_rel(). And with such changes > join_is_legal() would be performed three times for the same proposed > join, which is not great. Another weak point is that the complexity of > join searching dramatically increases with the growing number of > relations to be joined. This occurs because when we generate partially > aggregated paths, each path of the input relation is considered as an > input path for the grouped paths. As a result, the number of grouped > paths we generate increases exponentially, leading to a significant > explosion in computational complexity. Other weak points include the > lack of support for outer joins and partitionwise joins. And during my > review of the code, I came across several bugs (planning error or crash) > that need to be addressed. > > I'd like to give it another take to implement eager aggregation, while > borrowing lots of concepts and many chunks of codes from the previous > patch set. Please see attached. I have chosen to use the term 'Eager > Aggregation' from the paper [1] instead of 'Aggregation push-down', to > differentiate the aggregation push-down technique in FDW. > > The patch has been split into small pieces to make the review easier. > > 0001 introduces the RelInfoList structure, which encapsulates both a > list and a hash table, so that we can leverage the hash table for faster > lookups not only for join relations but also for upper relations. With > eager aggregation, it is possible that we generate so many upper rels of > type UPPERREL_PARTIAL_GROUP_AGG that a hash table can help a lot with > lookups. > > 0002 introduces the RelAggInfo structure to store information needed to > create grouped paths for base and join rels. It also revises the > RelInfoList related structures and functions so that they can be used > with RelAggInfos. > > 0003 checks if eager aggregation is applicable, and if so, collects > suitable aggregate expressions and grouping expressions in the query, > and records them in root->agg_clause_list and root->group_expr_list > respectively. > > 0004 implements the functions that check if eager aggregation is > applicable for a given relation, and if so, create RelAggInfo structure > for the relation, using the infos about aggregate expressions and > grouping expressions we collected earlier. In this patch, when we check > if a target expression can act as grouping expression, we need to check > if this expression can be known equal to other expressions due to ECs > that can act as grouping expressions. This patch leverages function > exprs_known_equal() to achieve that, after enhancing this function to > consider opfamily if provided. > > 0005 implements the functions that generate paths for grouped relations > by adding sorted and hashed partial aggregation paths on top of paths of > the plain base or join relations. For sorted partial aggregation paths, > we only consider any suitably-sorted input paths as well as sorting the > cheapest-total path. For hashed partial aggregation paths, we only > consider the cheapest-total path as input. By not considering other > paths we can reduce the number of grouping paths as much as possible > while still achieving reasonable results. > > 0006 builds grouped relations for each base relation if possible, and > generates aggregation paths for the grouped base relations. > > 0007 builds grouped relations for each just-processed join relation if > possible, and generates aggregation paths for the grouped join > relations. The changes made to make_join_rel() are relatively minor, > with the addition of a new function make_grouped_join_rel(), which finds > or creates a grouped relation for the just-processed joinrel, and > generates grouped paths by joining a grouped input relation with a > non-grouped input relation. > > The other way to generate grouped paths is by adding sorted and hashed > partial aggregation paths on top of paths of the joinrel. This occurs > in standard_join_search(), after we've run set_cheapest() for the > joinrel. The reason for performing this step after set_cheapest() is > that we need to know the joinrel's cheapest paths (see 0005). > > This patch also makes the grouped relation for the topmost join rel act > as the upper rel representing the result of partial aggregation, so that > we can add the final aggregation on top of that. Additionally, this > patch extends the functionality of eager aggregation to work with > partitionwise join and geqo. > > This patch also makes eager aggregation work with outer joins. With > outer join, the aggregate cannot be pushed down if any column referenced > by grouping expressions or aggregate functions is nullable by an outer > join above the relation to which we want to apply the partiall > aggregation. Thanks to Tom's outer-join-aware-Var infrastructure, we > can easily identify such situations and subsequently refrain from > pushing down the aggregates. > > Starting from this patch, you should be able to see plans with eager > aggregation. > > 0008 adds test cases for eager aggregation. > > 0009 adds a section in README that describes this feature (copied from > previous patch set, with minor tweaks). > > Thoughts and comments are welcome. > > [1] https://www.vldb.org/conf/1995/P345.PDF > [2] https://www.postgresql.org/message-id/flat/9666.1491295317%40localhost > [3] > https://www.postgresql.org/message-id/flat/OS3PR01MB66609589B896FBDE190209F495EE9%40OS3PR01MB6660.jpnprd01.prod.outlook.com > > Thanks > Richard >