Thread
-
Re: Add a greedy join search algorithm to handle large join problems
Chengpeng Yan <chengpeng_yan@outlook.com> — 2026-05-03T12:46:46Z
> On Feb 14, 2026, at 13:39, Chengpeng Yan <chengpeng_yan@outlook.com> wrote: > > My current next steps are: > > 1. Continue evaluating plan quality on more datasets/workloads. I’ve > already collected several candidate tests: some are JOB-based > variants, and others are synthetic workloads. Next, I plan to > consolidate these into a unified test set (with reproducible > setup/details), publish it, and run broader comparative evaluation. > > 2. Prototype a hybrid handoff approach: use greedy contraction first to > reduce the join graph, then let DP optimize the reduced problem. The > goal is a smoother transition around the threshold, avoiding abrupt > plan-shape changes from a hard optimizer switch. > > 3. Explore more join-ordering improvements incrementally, including > ideas from “Simplicity Done Right for Join Ordering” and related > work. Hi, I ran the current pure-GOO variants on JOB and JOB-Complex [1], 143 queries in total. JOB-Complex uses the same IMDB/JOB setting, but adds harder predicates and more challenging join patterns. The tested variants were: DP / GEQO / GOO(cost) / GOO(result_size) / GOO(selectivity) / GOO(combined) The benchmark uses the same setup as my previous JOB measurements. It prepares the IMDB-backed workloads, applies the same Postgres GUC settings used in the previous runs, runs one warmup pass and three measured repetitions for each query/variant pair, and reports the median of the measured repetitions. Each measured execution is collected with: EXPLAIN (ANALYZE, TIMING OFF, SUMMARY ON, FORMAT JSON, SETTINGS ON) Planning time and execution time are parsed separately. Planning time is still reported, but the main metric below is execution time, because it better reflects the quality of the chosen plan. I will attach v5-job-benchmark-result.xlsx with both planning-time and execution-time results, and v5-job-benchmark-execution-result.pdf as the execution-time PDF view. The color convention is the same as in the earlier tables. Timeouts use a 10-minute statement_timeout. The scripts, workload definitions, and reproduction details are in the benchmark repository [2]. I used a repository rather than a single script because the evaluation now includes multiple workloads, several algorithm variants, and enough setup details that keeping the workload definitions and run protocol together makes reproduction and review easier. As expected from the previous discussion, planning time is low for the GOO variants. GOO(combined) is about 5.2x faster than GEQO in planning time in this run. The execution-time picture is mixed. GOO(combined) has clear positives, but does not look robust enough to replace GEQO directly. Compared with GEQO over the 143 comparable queries: GOO(combined) faster by 20%-2x: 6 queries GOO(combined) faster by 2x-10x: 8 queries GOO(combined) faster by >10x: 7 queries GOO(combined) within +/-20% of GEQO: 82 queries GOO(combined) slower by 20%-2x: 10 queries GOO(combined) slower by 2x-10x: 26 queries GOO(combined) slower by >10x: 4 queries timeouts: 0 for GEQO, 0 for GOO(combined) On execution-time sum, GOO(combined) is better than GEQO: GEQO: 430459 ms GOO(combined): 292194 ms That is about 1.47x faster than GEQO on the workload sum. However, in the execution-time distribution, the result is not one-sided: there are meaningful wins and meaningful regressions. Some cases where GOO(combined) is much better than GEQO: job_complex q15: 229 ms vs 114942 ms, about 503x faster job_complex q18: 2456 ms vs 116014 ms, about 47x faster job 26c: 859 ms vs 25735 ms, about 30x faster job 26a: 359 ms vs 8107 ms, about 23x faster job_complex q23: 1121 ms vs 15711 ms, about 14x faster job_complex q05: 462 ms vs 5678 ms, about 12x faster Some cases where GOO(combined) is much worse than GEQO: job_complex q28: 56563 ms vs 272 ms, about 208x slower job_complex q20: 1918 ms vs 88 ms, about 22x slower job_complex q12: 70750 ms vs 3456 ms, about 20x slower job 28b: 1390 ms vs 103 ms, about 14x slower ## Selectivity One topic from the earlier discussion was whether selectivity should be ignored. Based on this run, I do not think selectivity is good enough as a primary greedy strategy. GOO(selectivity) has low planning time, but its execution-time behavior is poor: execution-time sum: about 16.46x DP and 4.76x GEQO timeouts: 6 cases >=2x GEQO: 110 Some examples: job 24b: 87569 ms vs GEQO 13.7 ms, about 6374x slower job 29a: 56080 ms vs GEQO 8.9 ms, about 6334x slower job 31b: 207334 ms vs GEQO 165 ms, about 1259x slower job_complex q03: 14345 ms vs GEQO 25 ms, about 574x slower job_complex q04: 15026 ms vs GEQO 26 ms, about 589x slower This does not mean that selectivity is meaningless, or that GOO itself is useless. The more precise issue is that a greedy join search commits much earlier than DP or GEQO. If an early estimate is wrong, that error is more likely to be amplified into the final join order. For selectivity specifically, the main problem seems to be that it is scale-free. It optimizes the relative reduction of one join pair, not the absolute size of the intermediate result. A join can have a very low estimated selectivity and still produce a large intermediate if the inputs are large. Result_size is also estimate-dependent, but it at least includes the estimated output size; cost includes Postgres's normal physical cost model. Selectivity throws away much of that scale information, which makes it a brittle primary greedy key on these workloads. In GOO(combined), the selectivity candidate was never selected as the final plan, so it did not improve the combined variant in this run. There were a few positive standalone selectivity cases, but they were rare compared with the tail risk. ## GOO(combined) GOO(combined) is still the best pure-GOO variant in this run. It reduces planning time and avoids several severe GEQO bad cases. At the same time, the remaining regressions suggest three limitations: 1. A greedy path is more sensitive to estimation errors and local-choice mistakes. A locally reasonable early join can be globally poor once the rest of the join graph is considered. 2. Sometimes none of the current greedy variants produces a good candidate. For example, in q20, 28b, 19a, and q03, all current GOO candidates are clearly worse than GEQO. In that case, improving only the final selector would not be enough. 3. Sometimes a good GOO candidate exists, but the final estimated-cost selector does not choose it. For example, in q28, q12, and 15a, result_size produced a much better plan, but the final cost-based choice selected a worse one. If we continue improving pure GOO, I think the two natural directions are: * increase the diversity of greedy strategies, so different queries have a better chance of getting at least one good candidate; * once such a good candidate exists, make the final selector more likely to pick it. In several cases a faster GOO candidate existed, but it had a higher estimated cost and was not selected. The CIDR 2021 paper "Simplicity Done Right for Join Ordering" is relevant to both ideas: its E-block suggests another enumeration/candidate-generation direction, and its U-block gives an upper-bound style signal that could be used as a risk-aware selector input. I think that is a good fit in principle, but it would not be a small extension to the current GOO patch. E-block would require a different enumeration path, and U-block would require extra metadata such as maximum value frequency for join attributes. For this patch series, I would prefer to keep the scope focused on plan search rather than also changing the estimation side. There is also a usability issue. The current pure-GOO variants do have some tuning surface: users can switch between cost and result_size, or use a combined mode. That is easy to try, but not necessarily easy to tune. It is often unclear which strategy should be preferred for a given query, and in some tail cases neither strategy produces a good plan. GOO(combined) with cost and result_size may already be a useful baseline or candidate generator, but the regressions above do not support making it the default GEQO replacement. One possible advantage over GEQO is that GOO may work better with pg_plan_advice, but I need to understand the details better before making a stronger claim. So my current conclusion is not that GOO has no value. It is fast to plan, and it can avoid some GEQO bad cases. The issue is that pure GOO appears to have structural robustness limits in some tail cases. Because of that, I plan to switch the next experiment to a direction closer to the earlier community discussion about making the transition from DP to heuristic search more gradual [3]. The idea is to give planning a fixed effort budget, use exact DP while the budget allows it, and complete the remaining search space with GOO when the DP quota is not enough. This seems smoother, easier to tune, and possibly easier to accept: increasing the effort should naturally allow either deeper DP or more greedy completions, instead of switching abruptly from DP to a pure greedy or GEQO-style method. I already have a demo implementation under test, and I will share the code and benchmark results separately once the numbers are ready. References: [1] JOB-Complex: https://github.com/DataManagementLab/JOB-Complex [2] Benchmark repository: https://github.com/Reminiscent/join-order-benchmark [3] Earlier discussion about replacing GEQO more gradually: https://www.postgresql.org/message-id/flat/CAFBsxsE3Sb889r-Xvun%2BiAa5Onjy71m8nzp6JSH387JJF-YmrA%40mail.gmail.com -- Best regards, Chengpeng Yan