Thread

  1. Re: Introduce Index Aggregate - new GROUP BY strategy

    Andrei Lepikhov <lepihov@gmail.com> — 2025-12-26T15:20:38Z

    On 12/12/25 17:23, Sergey Soloviev wrote:
    > Also, cost calculation logic is adjusted a bit - it takes into account 
    > top-down index
    > traversal and final external merge cost is added only if spill expected.
    
    Hi,
    Here is my 'aerial' review:
    The patch proposes a new aggregation strategy that builds an in-memory 
    B+tree index for grouping. This combines incremental group formation 
    (like AGG_HASHED) with sorted output (like AGG_SORTED), which is 
    beneficial when the query requires both grouping and ordering on 
    (almost) the same columns.
    The key advantage is avoiding a separate sort step when the sorted 
    output is needed, at the cost of additional CPU overhead.
    
    My doubts:
    1. Can you benchmark the scenario where the optimiser mispredicts 
    numGroups. If the planner underestimates group cardinality, the btree 
    overhead could be much higher than expected. Does the approach degrade 
    gracefully?
    2. Consider splitting the hash_* → spill_* field renaming into a 
    separate preparatory commit to reduce the complexity of reviewing the 
    core logic changes.
    3. I notice AGG_INDEX requires both sortable AND hashable types. While I 
    understand this is for the hash-based spill partitioning, is this 
    limitation necessary? Could you use sort-based spilling (similar to 
    tuplesort's external merge) instead? This would allow AGG_INDEX to work 
    with sortable-only types (I can imagine a geometric type with B-tree 
    operators but no hash functions).
    
    The main question for me is: can you invent a robust cost model to set 
    smooth boundaries between all three types of grouping? Does it really 
    promise frequent benefits and avoid degradations? -  Remember, 
    increasing search space we increase planning time, which may be palpable 
    in cases with many groupings/grouping attributes - for example, an 
    APPEND over a partitioned table with pushed-down aggregate looks like a 
    trivial case.
    
    -- 
    regards, Andrei Lepikhov,
    pgEdge