Thread

  1. Re: Use merge-based matching for MCVs in eqjoinsel

    Ilia Evdokimov <ilya.evdokimov@tantorlabs.com> — 2025-09-08T13:45:27Z

    On 08.09.2025 13:56, David Geier wrote:
    >> To evaluate it, I ran benchmarks on JOB with three variants:
    >>
    >> $ ./benchmark.sh master
    >> $ ./benchmark.sh merge
    >> $ ./benchmark.sh hash
    >>
    >> I compared total planning time across all 113 queries.
    > Was this running with optimizations? How did you extract the planning time?
    
    
    I save all query plans using EXPLAIN SUMMARY, then go through all the 
    plans, read the 'Planning Time' for each, and sum them up.
    
    > I would have expected the delta between the "merge" and "hash" variant
    > to be bigger, especially for default_statistics_target=10000. My small
    > test also showed that. Any idea why this is not showing in your results?
    
    
    So would I. With default_statistics_target = 10000 and the selectivity 
    in the JOB queries being close to zero, the difference should be 
    noticeable. I can only explain the previous results by cache-related 
    effects on my machine.
    
    I reran the benchmark on a clean cluster and collected the top slowest 
    JOB queries — now the effect is clearly visible.
    
    Merge (sum of all JOB queries)
    ==================
    default_statistics_target | Planner Speedup (×) | Planner Before (ms) | 
    Planner After (ms)
    --------------------------------------------------------------------------------
    100                       | *1.00*                | 1888.105            
    | 1879.431
    1000                      | *1.14*                | 2282.239            
    | 2009.114
    2500                      | *2.10*                | 5595.030            
    | 2668.530
    5000                      | *5.56*                | 18544.933           
    | 3333.252
    7500                      | *9.17*                | 37390.956           
    | 4076.390
    10000                     | *16.10*               | 69319.479           
    | 4306.417
    
    HashMap (sum of all JOB queries)
    ==================
    default_statistics_target | Planner Speedup (×) | Planner Before (ms) | 
    Planner After (ms)
    --------------------------------------------------------------------------------
    100                     | *1.03*                | 1888.105            | 
    1828.088
    1000                    | *1.18*                | 2282.239            | 
    1939.884
    2500                    | *2.64*                | 5595.030            | 
    2117.872
    5000                    | *7.80*                | 18544.933           | 
    2377.206
    7500                    | *13.80*               | 37390.956           | 
    2709.973
    10000                   | *23.32*               | 69319.479           | 
    2973.073
    
    Top 10 slowest JOB queries (default_statistics_target = 10000)
    Query | master (ms) | merge (ms) | Hash (ms)
    ------+-------------+------------+-----------
    29c   | 1904.586    | 144.135    | 100.473
    29b   | 1881.392    | 117.891    | 89.028
    29a   | 1868.805    | 112.242    | 83.913
    31c   | 1867.234    | 76.498     | 56.140
    30c   | 1646.630    | 88.494     | 62.549
    30b   | 1608.820    | 84.821     | 64.603
    31a   | 1573.964    | 75.978     | 56.140
    28a   | 1457.738    | 95.939     | 77.309
    28b   | 1455.052    | 99.383     | 73.065
    30a   | 1416.699    | 91.057     | 62.549
    
    
    BTW, the hashmap from your patch could also be applied to 
    eqjoinsel_semi() function.
    
    -- 
    Best regards,
    Ilia Evdokimov,
    Tantor Labs LLC,
    https://tantorlabs.com