Thread
-
Re: Use merge-based matching for MCVs in eqjoinsel
Ilia Evdokimov <ilya.evdokimov@tantorlabs.com> — 2025-09-08T13:45:27Z
On 08.09.2025 13:56, David Geier wrote: >> To evaluate it, I ran benchmarks on JOB with three variants: >> >> $ ./benchmark.sh master >> $ ./benchmark.sh merge >> $ ./benchmark.sh hash >> >> I compared total planning time across all 113 queries. > Was this running with optimizations? How did you extract the planning time? I save all query plans using EXPLAIN SUMMARY, then go through all the plans, read the 'Planning Time' for each, and sum them up. > I would have expected the delta between the "merge" and "hash" variant > to be bigger, especially for default_statistics_target=10000. My small > test also showed that. Any idea why this is not showing in your results? So would I. With default_statistics_target = 10000 and the selectivity in the JOB queries being close to zero, the difference should be noticeable. I can only explain the previous results by cache-related effects on my machine. I reran the benchmark on a clean cluster and collected the top slowest JOB queries — now the effect is clearly visible. Merge (sum of all JOB queries) ================== default_statistics_target | Planner Speedup (×) | Planner Before (ms) | Planner After (ms) -------------------------------------------------------------------------------- 100 | *1.00* | 1888.105 | 1879.431 1000 | *1.14* | 2282.239 | 2009.114 2500 | *2.10* | 5595.030 | 2668.530 5000 | *5.56* | 18544.933 | 3333.252 7500 | *9.17* | 37390.956 | 4076.390 10000 | *16.10* | 69319.479 | 4306.417 HashMap (sum of all JOB queries) ================== default_statistics_target | Planner Speedup (×) | Planner Before (ms) | Planner After (ms) -------------------------------------------------------------------------------- 100 | *1.03* | 1888.105 | 1828.088 1000 | *1.18* | 2282.239 | 1939.884 2500 | *2.64* | 5595.030 | 2117.872 5000 | *7.80* | 18544.933 | 2377.206 7500 | *13.80* | 37390.956 | 2709.973 10000 | *23.32* | 69319.479 | 2973.073 Top 10 slowest JOB queries (default_statistics_target = 10000) Query | master (ms) | merge (ms) | Hash (ms) ------+-------------+------------+----------- 29c | 1904.586 | 144.135 | 100.473 29b | 1881.392 | 117.891 | 89.028 29a | 1868.805 | 112.242 | 83.913 31c | 1867.234 | 76.498 | 56.140 30c | 1646.630 | 88.494 | 62.549 30b | 1608.820 | 84.821 | 64.603 31a | 1573.964 | 75.978 | 56.140 28a | 1457.738 | 95.939 | 77.309 28b | 1455.052 | 99.383 | 73.065 30a | 1416.699 | 91.057 | 62.549 BTW, the hashmap from your patch could also be applied to eqjoinsel_semi() function. -- Best regards, Ilia Evdokimov, Tantor Labs LLC, https://tantorlabs.com