Thread

  1. Re: should we have a fast-path planning for OLTP starjoins?

    Tomas Vondra <tomas@vondra.me> — 2026-05-30T18:57:20Z

    
    On 5/30/26 20:18, Bruce Momjian wrote:
    > On Fri, May 29, 2026 at 12:11:32AM +0200, Tomas Vondra wrote:
    >> But re-reading the old thread, this doesn't seem to be why it got stuck.
    >> We already can identify dimensions joined on foreign keys, and that
    >> seems like a good start.
    >>
    >> IIRC the thing that worried me was that just sticking the joins at the
    >> end is pretty heavy-handed. It can easily end up making the plan worse,
    >> if one of the other joins increases the cardinality. Would that be
    >> common? Probably not, but it seems unnecessarily risky.
    > 
    > Right.
    > 
    >> Ideally, we'd do join that reduce cardinality first (with the regular DP
    >> join search), then join all the dimensions, and finally do all joins
    >> that expand cardinality (again, using the regular DP). But the earlier
    >> patches worked by adjusting the join tree in query_planner(), i.e. way
    >> before we get to calculate join cardinalities.
    > 
    > Yes, I remember discussing that.
    > 
    >> It works like this:
    >>
    >> 1) query_planner()
    >>
    >> Determine if the query includes a starjoin (or multiple), and remember
    >> the relids included in the starjoin cluster. Pick a "canonical" join
    >> order for each starjoin cluster we found (e.g. with dimensions in relid
    >> order).
    >>
    >> 2) standard_join_search()/join_search_one_level()
    >>
    >> When constructing the join rels (e.g. in make_join_rel or right before
    >> it's called), check that the new rel would violate the canonical order.
    >> If it would, refuse to create it, just like we do for various other join
    >> restrictions.
    > 
    > This is how you avoid the factorial explosion of plan options, right?
    >
    
    Yes, exactly.
    
    >> The new join restriction is that if the join result includes a subset of
    >> the starjoin cluster, then it has to include the fact + prefix of the
    >> list of dimensions (which is the canonical join order).
    > 
    > Sorry, I got lost here.  What is "prefix?"  I looked at the patch and
    > also could not understand it.
    
    Apologies, it may not be obvious from the code / comments (I'll try to
    improve that in the next version).
    
    Let's say we're joining "F" with dimensions D1, D2, D3. Then the
    starjoins_canonicalize() finds the cluster, and picks a canonical join
    order. Could be [F, D1, D2, D3] - in this order. Or whatever other
    permutation of the dimensions, it's all equal.
    
    Then starjoin_order_invalid() ensures that whatever join relation we
    produce, it only even contains a prefix of this list. So a join relation
    can contain [F], [F, D1], [F, D1, D2], [F, D1, D2, D3]. But it can't
    contain e.g. [F, D2], because that skips the D1 - it's not a prefix.
    
    The patch only applies this to relations from the cluster. There can be
    other relations in the join "in between" the dimensions - that does not
    make the join order "invalid".
    
    So for example there may be joins to non-dimensions A and B, and we will
    consider joins [F, A, D1, B, D2, D3] and so on as valid. The joins to A
    and B joins can increase/decrease cardinality, but thanks to this we
    should find the right place to join the dimensions.
    
    We could even make it a bit stricter, and require that all dimensions
    join "at once". I.e. after joining a dimension, only dimensions can be
    joined (until all dimensions are joined). So [F, D1, A, D2] would not be
    allowed. This would further reduce the number of join orders considered.
    
    > 
    >> Note: It should be possible to make the restriction even more strict, if
    >> needed (e.g. to effectively join all dimensions at once, with no other
    >> joins in between).
    > 
    > The patch is quite small.
    > 
    >> Attached are a couple charts from a test with 1-15 dimensions (scripts
    >> attached too). I was wondering how geqo affects this, so I tried with
    >> geqo=on/off, and with join_collapse_limit=1/8/16.
    >>
    >> With join_collapse_limit=1 there's no difference between any of the runs
    >> (master, patches with on/off). Here's an example of results:
    >>
    >>   dims   master(1)   master  sj/off    sj/on  master   sj/off   sj/on
    >>   -------------------------------------------------------------------
    >>   1          49485    48797   48966    49118     99%      99%     99%
    >>   3          26886    22003   21319    24322     82%      79%     90%
    >>   5          17759     7923    7634    15434     45%      43%     87%
    >>   7          13110     2122    2071    11290     16%      16%     86%
    >>   9          10390      462     445     8709      4%       4%     84%
    >>   11          7781       87      86     6488      1%       1%     83%
    >>   13          5948       14      14     5749      0%       0%     97%
    >>   15          5237        1       1     4227      0%       0%     81%
    > 
    > Impressive.
    > 
    
    Indeed. I like how it fits into the existing approach. It's a bit like
    having yet another "join order restriction".
    
    
    regards
    
    -- 
    Tomas Vondra