Thread

  1. Re: Add a greedy join search algorithm to handle large join problems

    Chengpeng Yan <chengpeng_yan@outlook.com> — 2025-12-28T02:54:16Z

    > On Dec 28, 2025, at 01:00, Tomas Vondra <tomas@vondra.me> wrote:
    > 
    > You realize this does not actually change shared buffers size, right?
    > Because that requires a restart, not just pg_reload_conf. Are you sure
    > you're running with 4GB shared buffers?
    
    Good catch, and sorry for not mentioning that explicitly. I did restart
    the instance and verified that the new shared_buffers setting was in
    effect. I’ll make sure to call this out clearly in future descriptions.
    
    > I'm not sure SF=1 is sufficient for making any clear conclusions. No
    > matter what the shared_buffers value is, it's going to fit into RAM, the
    > runs are going to be fully cached.
    
    Thanks for the comment. I agree that TPC-H at SF=1 is too small to
    support strong conclusions. I mainly used SF=1 as a convenient starting
    point to quickly validate ideas and iterate on the implementation. I’ve
    already run experiments on JOB and plan to share those results next,
    where the behavior should be more representative and the conclusions
    more meaningful.
    
    > I don't quite see how we could make good decisions if the estimates are
    > this wrong. Garbage in, garbage out. AFAICS this affects both the
    > regular DP planning, which heavily relies on the costing, but also the
    > approaches on heuristics, which still rely on estimates in some way.
    > 
    > If the aggregate subquery is 1000x off, why should any of the following
    > decisions be right? The approaches may have different tolerance for
    > estimation errors - some may be "defensive" and handle misestimates
    > better, but the trade off is that it may pick slower plans when the
    > estimates are exact.
    > 
    > I don't think there's an "optimal" approach picking the best plan in
    > both cases. If there was, join ordering wouldn't be such a challenge.
    > 
    > I don't think Q20 is about greediness. Poor estimates are going to be an
    > issue for all approaches relying on them in some way.
    > 
    > The issue with Q7 seems pretty inherent to greedy algorithms. Picking
    > solutions that are optimal locally but not globally is the definition of
    > "greedy". I don't think it matters which exact metric is used, this flaw
    > is built-in. And I don't think it's solvable.
    
    I agree that all approaches are affected by estimation errors, but as
    you noted, they differ in how much error they can tolerate. In its
    current form, GOO is still quite fragile in this respect and can exhibit
    severe regressions when estimates are badly off. Based on additional
    experiments, I also agree that a single greedy rule inevitably has cases
    where it performs extremely poorly; as a result, my current direction is
    no longer to further tune the greedy criterion itself, but to improve
    the robustness of the existing greedy choices through additional
    mechanisms.
    
    Regarding Q20, I agree that this is not fundamentally a “greedy” issue.
    If the rowcount estimates were reasonably accurate, the extreme behavior
    observed there would likely be avoided. At the same time, when estimates
    are wrong by orders of magnitude, the current GOO approach does not
    handle this well, and improving robustness under such severe
    misestimation is exactly one of the main aspects I am currently working
    on.
    
    I have already run additional experiments in this direction, using
    result_size and cost as the base signals, and will share the results
    separately.
    
    > Sufficient for what? Sufficient to replace DP or to replace GEQO?
    > 
    > I don't think we'd want to replace DP with such greedy algorithm. With
    > accurate estimates and modest number of joins, we should be able to find
    > the best join (more or less).
    > 
    > But this thread is not about replacing DP, I see GOO more as a GEQO
    > replacement, right? Or did the goal change?
    
    The goal has always been to use GOO as a replacement for GEQO, not DP.
    
    > I'm not sure ignoring selectivity entirely is a good plan, though. Yes,
    > if the selectivity/estimate is significantly off, the plan may be poor.
    > But what exactly do you plan to use instead? Isn't a lot of the metrics
    > (output size, ...) derived from the selectivity/estimate anyway?
    > 
    > I agree the JOB benchmark may be a better fit for this. The TPC-H is
    > pretty simple, and even for TPC-DS the number of joins is not all that
    > high. Sorry if my initial feedback suggested these are the proper
    > benchmarks to evaluate this.
    > 
    > I suggest the evaluation should focus on cases where we expect GOO to
    > actually be used in practice - as a replacement for GEQO. Only when it
    > proves useful/acceptable for that use case (for sufficiently many joins
    > etc.), we can start comparing with DP to figure out if we need to adjust
    > the thresholds and so on.
    > 
    > Another suggestion is to also test with larger data sets. The problems
    > with SF=10 or SF=100 may be very different, even with TPC-H. Also,
    > consider testing with "cold" cases, as if there's nothing cached by
    > restarting the Postgres instance and drop page cache between runs.
    
    Thanks for the feedback.
    
    I’ve already run experiments on the JOB benchmark and will share the
    results shortly, including a comparison against GEQO, with DP used as a
    baseline. I agree that the primary goal at this stage is to show that
    the approach is good enough for the intended use case first, before
    looking into secondary aspects such as threshold tuning.
    
    Larger scale factors and cold-cache runs are also part of my upcoming
    evaluation plan, and I agree those are important dimensions to cover.
    
    Regarding selectivity, I agree the situation is not entirely clear-cut.
    As you noted, result_size is also derived from selectivity to some
    extent, and many bad cases share similar characteristics. However, in my
    initial TPC-H SF=1 experiments, selectivity-based greedy rules already
    showed many poor cases, and I expect this to become worse in more
    complex scenarios. For now, I therefore focused on cost and result_size:
    cost because it is a first-class concept in PostgreSQL, and result_size
    because it is commonly cited in the literature as a more robust signal
    for greedy join ordering. If you think selectivity is still an important
    signal to evaluate, I can certainly add further experiments for it, but
    based on the above I have not pursued additional selectivity-focused
    tests so far.
    > 
    > I realized the paper mentioned at the beginning of this thread is from
    > 1998. That doesn't make it wrong, but I was wondering if there are some
    > newer papers about join order search, with interesting
    > alternative/better approaches.
    > 
    > An interesting paper I found is this CIDR21 paper:
    > 
    > Simplicity Done Right for Join Ordering
    > Axel Hertzschuch, Claudio Hartmann, Dirk Habich, Wolfgang Lehner
    > https://vldb.org/cidrdb/2021/simplicity-done-right-for-join-ordering.html
    > https://vldb.org/cidrdb/papers/2021/cidr2021_paper01.pdfl
    > 
    > Which does something similar to our approach, although it seems to be
    > more a replacement for the DP and not just for GEQO. It's meant to be
    > cheaper, but also more resilient to poor join orders, as it cares about
    > "upper bound" (~worst case) for the join orders.
    > 
    > It goes beyond the scope of GOO, as the paper also talks about sampling
    > to determine some of the estimates. But maybe it'd be useful (better
    > than GEQO) even without that?
    > 
    > I find the focus on the "worst case" (and only trusting baserel
    > estimates) interesting. I wonder if it might help with the common
    > nestloop problem, where we opt to believe a very optimistic low
    > cardinality estimate, only to end with a sequence of nestloops that
    > never complete.
    
    Thanks for the pointer. I’ve also been looking at more recent work on
    join ordering, and there are indeed several ideas that seem relevant to
    the current direction, including the paper you mentioned. These are
    definitely directions I plan to consider more seriously.
    
    My current plan is to first establish a reasonably solid baseline using
    relatively simple techniques—reducing extreme regressions and getting
    overall plan quality closer to DP. Once that is in place, I’d like to
    step back, summarize which recent papers and ideas seem most applicable,
    and then experiment with more advanced approaches incrementally,
    building on that foundation.
    
    Thanks for all the detailed feedback and the many useful insights
    throughout this discussion.
    
    --
    Best regards,
    Chengpeng Yan