Thread

  1. Re: Eager aggregation, take 3

    Radim Marek <radim@boringsql.com> — 2026-05-29T15:55:01Z

    Hey Richard,
    
    I might be out of my depth here - but while testing RegreSQL as
    correctness/performance harness on PostgreSQL it picked up a problem with
    the wrong-results case during eager aggregation.
    
    It reproduces on current HEAD
    (commit 2670cc298f42cd7b1c426bf7ccfb0652d8e0b347 now)
    with enable_eager_aggregate enabled.
    
    My testing environment
      - Linux aarch64, gcc 12 (Debian)
      - macOS arm64, Apple clang 21
        (PostgreSQL 19devel on aarch64-apple-darwin25.5.0)
    
    == How to reproduce
    
      CREATE TEMP TABLE c(id int, country text);
      CREATE TEMP TABLE o(customer_id int);
      INSERT INTO c VALUES (1,'US'),(2,'US'),(3,'DE'),(4,'DE'),(5,'DE');
      INSERT INTO o VALUES (1),(3);   -- only customers 1 and 3 have a row in o
    
      SELECT c.country, count(*) AS n
      FROM c
      WHERE NOT EXISTS (SELECT 1 FROM o WHERE o.customer_id = c.id)
      GROUP BY c.country
      ORDER BY c.country;
    
    Expected results (everywhere except master)
    
     country | n
    ---------+---
     DE      | 2
     US      | 1
    (2 rows)
    
    The actual result with enable_eager_aggregate = on (default)
    
     country | n
    ---------+---
     DE      | 0
     US      | 0
    (2 rows)
    
    With SET enable_eager_aggregate = off, the result is correct (DE=2, US=1),
    as it is on PG18.
    
    Query Plan
    
                                                                QUERY PLAN
    -----------------------------------------------------------------------------------------------------------------------------------
     Sort  (cost=108.19..108.69 rows=200 width=40) (actual time=0.195..0.197
    rows=2.00 loops=1)
       Sort Key: c.country
       Sort Method: quicksort  Memory: 25kB
       Buffers: local hit=2
       ->  Finalize HashAggregate  (cost=98.55..100.55 rows=200 width=40)
    (actual time=0.183..0.186 rows=2.00 loops=1)
             Group Key: c.country
             Batches: 1  Memory Usage: 32kB
             Buffers: local hit=2
             ->  Hash Anti Join  (cost=52.75..95.37 rows=635 width=40) (actual
    time=0.177..0.179 rows=3.00 loops=1)
                   Hash Cond: (c.id = o.customer_id)
                   Buffers: local hit=2
                   ->  Seq Scan on c  (cost=0.00..22.70 rows=1270 width=36)
    (actual time=0.024..0.025 rows=5.00 loops=1)
                         Buffers: local hit=1
                   ->  Hash  (cost=50.25..50.25 rows=200 width=12) (actual
    time=0.145..0.146 rows=2.00 loops=1)
                         Buckets: 1024  Batches: 1  Memory Usage: 9kB
                         Buffers: local hit=1
                         ->  Partial HashAggregate  (cost=48.25..50.25 rows=200
    width=12) (actual time=0.122..0.123 rows=2.00 loops=1)
                               Group Key: o.customer_id
                               Batches: 1  Memory Usage: 32kB
                               Buffers: local hit=1
                               ->  Seq Scan on o  (cost=0.00..35.50 rows=2550
    width=4) (actual time=0.002..0.003 rows=2.00 loops=1)
                                     Buffers: local hit=1
     Planning Time: 0.294 ms
     Execution Time: 0.255 ms
    (24 rows)
    
    If this is already known or in progress, apologies for the noise.
    
    ---
    
    Radim
    
    On Fri, 29 May 2026 at 17:25, Richard Guo <guofenglinux@gmail.com> wrote:
    
    > Hi All,
    >
    > Eager aggregation is a query optimization technique that partially
    > pushes a group-by past a join, and finalizes it once all the relations
    > are joined.  Eager aggregation reduces the number of input rows to the
    > join and thus may result in a better overall plan.  This technique is
    > thoroughly described in the 'Eager Aggregation and Lazy Aggregation'
    > paper [1].
    >
    > Back in 2017, a patch set has been proposed by Antonin Houska to
    > implement eager aggregation in thread [2].  However, it was at last
    > withdrawn after entering the pattern of "please rebase thx" followed by
    > rebasing and getting no feedback until "please rebase again thx".  A
    > second attempt in 2022 unfortunately fell into the same pattern about
    > one year ago and was eventually closed again [3].
    >
    > That patch set has included most of the necessary concepts to implement
    > eager aggregation.  However, as far as I can see, it has several weak
    > points that we need to address.  It introduces invasive changes to some
    > core planner functions, such as make_join_rel().  And with such changes
    > join_is_legal() would be performed three times for the same proposed
    > join, which is not great.  Another weak point is that the complexity of
    > join searching dramatically increases with the growing number of
    > relations to be joined.  This occurs because when we generate partially
    > aggregated paths, each path of the input relation is considered as an
    > input path for the grouped paths.  As a result, the number of grouped
    > paths we generate increases exponentially, leading to a significant
    > explosion in computational complexity.  Other weak points include the
    > lack of support for outer joins and partitionwise joins.  And during my
    > review of the code, I came across several bugs (planning error or crash)
    > that need to be addressed.
    >
    > I'd like to give it another take to implement eager aggregation, while
    > borrowing lots of concepts and many chunks of codes from the previous
    > patch set.  Please see attached.  I have chosen to use the term 'Eager
    > Aggregation' from the paper [1] instead of 'Aggregation push-down', to
    > differentiate the aggregation push-down technique in FDW.
    >
    > The patch has been split into small pieces to make the review easier.
    >
    > 0001 introduces the RelInfoList structure, which encapsulates both a
    > list and a hash table, so that we can leverage the hash table for faster
    > lookups not only for join relations but also for upper relations.  With
    > eager aggregation, it is possible that we generate so many upper rels of
    > type UPPERREL_PARTIAL_GROUP_AGG that a hash table can help a lot with
    > lookups.
    >
    > 0002 introduces the RelAggInfo structure to store information needed to
    > create grouped paths for base and join rels.  It also revises the
    > RelInfoList related structures and functions so that they can be used
    > with RelAggInfos.
    >
    > 0003 checks if eager aggregation is applicable, and if so, collects
    > suitable aggregate expressions and grouping expressions in the query,
    > and records them in root->agg_clause_list and root->group_expr_list
    > respectively.
    >
    > 0004 implements the functions that check if eager aggregation is
    > applicable for a given relation, and if so, create RelAggInfo structure
    > for the relation, using the infos about aggregate expressions and
    > grouping expressions we collected earlier.  In this patch, when we check
    > if a target expression can act as grouping expression, we need to check
    > if this expression can be known equal to other expressions due to ECs
    > that can act as grouping expressions.  This patch leverages function
    > exprs_known_equal() to achieve that, after enhancing this function to
    > consider opfamily if provided.
    >
    > 0005 implements the functions that generate paths for grouped relations
    > by adding sorted and hashed partial aggregation paths on top of paths of
    > the plain base or join relations.  For sorted partial aggregation paths,
    > we only consider any suitably-sorted input paths as well as sorting the
    > cheapest-total path.  For hashed partial aggregation paths, we only
    > consider the cheapest-total path as input.  By not considering other
    > paths we can reduce the number of grouping paths as much as possible
    > while still achieving reasonable results.
    >
    > 0006 builds grouped relations for each base relation if possible, and
    > generates aggregation paths for the grouped base relations.
    >
    > 0007 builds grouped relations for each just-processed join relation if
    > possible, and generates aggregation paths for the grouped join
    > relations.  The changes made to make_join_rel() are relatively minor,
    > with the addition of a new function make_grouped_join_rel(), which finds
    > or creates a grouped relation for the just-processed joinrel, and
    > generates grouped paths by joining a grouped input relation with a
    > non-grouped input relation.
    >
    > The other way to generate grouped paths is by adding sorted and hashed
    > partial aggregation paths on top of paths of the joinrel.  This occurs
    > in standard_join_search(), after we've run set_cheapest() for the
    > joinrel.  The reason for performing this step after set_cheapest() is
    > that we need to know the joinrel's cheapest paths (see 0005).
    >
    > This patch also makes the grouped relation for the topmost join rel act
    > as the upper rel representing the result of partial aggregation, so that
    > we can add the final aggregation on top of that.  Additionally, this
    > patch extends the functionality of eager aggregation to work with
    > partitionwise join and geqo.
    >
    > This patch also makes eager aggregation work with outer joins.  With
    > outer join, the aggregate cannot be pushed down if any column referenced
    > by grouping expressions or aggregate functions is nullable by an outer
    > join above the relation to which we want to apply the partiall
    > aggregation.  Thanks to Tom's outer-join-aware-Var infrastructure, we
    > can easily identify such situations and subsequently refrain from
    > pushing down the aggregates.
    >
    > Starting from this patch, you should be able to see plans with eager
    > aggregation.
    >
    > 0008 adds test cases for eager aggregation.
    >
    > 0009 adds a section in README that describes this feature (copied from
    > previous patch set, with minor tweaks).
    >
    > Thoughts and comments are welcome.
    >
    > [1] https://www.vldb.org/conf/1995/P345.PDF
    > [2] https://www.postgresql.org/message-id/flat/9666.1491295317%40localhost
    > [3]
    > https://www.postgresql.org/message-id/flat/OS3PR01MB66609589B896FBDE190209F495EE9%40OS3PR01MB6660.jpnprd01.prod.outlook.com
    >
    > Thanks
    > Richard
    >