Thread

  1. Re: Introduce Index Aggregate - new GROUP BY strategy

    David Rowley <dgrowleyml@gmail.com> — 2025-12-08T23:11:46Z

    On Tue, 9 Dec 2025 at 04:37, Sergey Soloviev
    <sergey.soloviev@tantorlabs.ru> wrote:
    > I would like to introduce new GROUP BY strategy, called Index Aggregate.
    
    > In a nutshell, we build B+tree index where GROUP BY attributes are index
    > keys and if memory limit reached we will build index for each batch and
    > spill it to the disk as sorted run performing final external merge.
    > Mean IndexAgg time is about 1.8 ms and 2 ms for hash + sort, so win is about 10%.
    >
    > Also, I have run TPC-H tests and 2 tests used Index Agg node (4 and 5) and this gave
    > near 5% gain in time.
    
    Interesting.
    
    Are you able to provide benchmarks with increasing numbers of groups,
    say 100 to 100 million, increasing in multiples of 10, with say 1GB
    work_mem, and to be fair, hash_mem_multiplier=1 with all 3 strategies.
    A binary search's performance characteristics will differ vastly from
    that of simplehash's hash lookup and linear probe type search. Binary
    searches become much less optimal when the array becomes large as
    there are many more opportunities for cache misses than with a linear
    probing hash table. I think you're going to have to demonstrate that
    the window where this is useful is big enough to warrant the extra
    code.
    
    Ideally, if you could show a graph and maybe name Hash Aggregate as
    the baseline and show that as 1 always, then run the same benchmark
    forcing a Sort -> Group Agg, and then also your Index Agg. Also,
    ideally, if you could provide scripts for this so people can easily
    run it themselves, to allow us to see how other hardware compares to
    yours.  Doing this may also help you move forward with your costing
    code for the planner, but the main thing to show is that there is a
    useful enough data size where this is useful.
    
    You might want to repeat the test a few times with different data
    types. Perhaps int or bigint, then also something varlena and maybe
    something byref, such as UUID. Also, you might want to avoid presorted
    data as I suspect it'll be hard to beat Sort -> Group Agg with
    presorted data. Not causing performance regressions for presorted data
    might be quite a tricky aspect of this patch.
    
    David