Thread

  1. Re: Add a greedy join search algorithm to handle large join problems

    Chengpeng Yan <chengpeng_yan@outlook.com> — 2025-12-10T05:20:01Z

    Hi,
    
    > On Dec 10, 2025, at 07:30, Tomas Vondra <tomas@vondra.me> wrote:
    >
    > I looked at a couple more failing queries, and removing the aggregates
    > fixes them too. Maybe there are other issues/crashes, of course.
    
    Thanks a lot for pointing this out. I also noticed the same issue when
    testing TPC-H Q5. The root cause was that in the goo algorithm I forgot
    to handle the case of eager aggregation. This has been fixed in the v2
    patch (after the fix, the v2 version works correctly for all TPC-H
    queries). I will continue testing it on TPC-DS as well.
    
    Sorry that I didn’t push the related changes earlier. I was running more
    experiments on the greedy strategy, and I still needed some time to
    organize and analyze the results. During this process, I found that the
    current greedy strategy may lead to suboptimal plan quality in some
    cases. Because of that, I plan to first evaluate a few basic greedy
    heuristics on TPC-H to understand their behavior and limitations. If
    there are meaningful results or conclusions, I will share them for
    discussion.
    
    Based on some preliminary testing, I’m leaning toward keeping the greedy
    strategy simple and explainable, and focusing on the following
    indicators together with the planner’s cost estimates:
    * join cardinality (number of output rows)
    * selectivity (join_size / (left_size * right_size))
    * estimated result size in bytes(joinrel->reltarget->width * joinrel->rows)
    * cheapest total path cost (cheapest_total_path->total_cost)
    
    At the moment, I’m inclined to prioritize join cardinality with input
    size as a secondary factor, but I’d like to validate this step by step:
    first by testing these simple heuristics on TPC-H (as a relatively
    simple workload) and summarizing some initial conclusions there. After
    that, I plan to run more comprehensive experiments on more complex
    benchmarks such as JOB and TPC-DS and report the results.
    
    Do you have any thoughts or suggestions on this direction?
    
    Thanks again for your feedback and help.
    
    --
    Best regards,
    Chengpeng Yan