Thread

  1. Re: should we have a fast-path planning for OLTP starjoins?

    Bruce Momjian <bruce@momjian.us> — 2026-05-30T20:16:11Z

    On Sat, May 30, 2026 at 08:57:20PM +0200, Tomas Vondra wrote:
    > >> The new join restriction is that if the join result includes a subset of
    > >> the starjoin cluster, then it has to include the fact + prefix of the
    > >> list of dimensions (which is the canonical join order).
    > > 
    > > Sorry, I got lost here.  What is "prefix?"  I looked at the patch and
    > > also could not understand it.
    > 
    > Apologies, it may not be obvious from the code / comments (I'll try to
    > improve that in the next version).
    > 
    > Let's say we're joining "F" with dimensions D1, D2, D3. Then the
    > starjoins_canonicalize() finds the cluster, and picks a canonical join
    > order. Could be [F, D1, D2, D3] - in this order. Or whatever other
    > permutation of the dimensions, it's all equal.
    
    Uh, are D1, D2, D3 in relid order at this point?
    
    > Then starjoin_order_invalid() ensures that whatever join relation we
    > produce, it only even contains a prefix of this list. So a join relation
    > can contain [F], [F, D1], [F, D1, D2], [F, D1, D2, D3]. But it can't
    > contain e.g. [F, D2], because that skips the D1 - it's not a prefix.
    
    Okay, prefix like a multi-column index prefix of columns.
    
    > The patch only applies this to relations from the cluster. There can be
    > other relations in the join "in between" the dimensions - that does not
    > make the join order "invalid".
    > 
    > So for example there may be joins to non-dimensions A and B, and we will
    > consider joins [F, A, D1, B, D2, D3] and so on as valid. The joins to A
    > and B joins can increase/decrease cardinality, but thanks to this we
    > should find the right place to join the dimensions.
    
    Okay, so if D1, D2, and D3 are all "cluster" joins then aren't they are
    1:1, so why would you ever put something like B between them?  If B
    reduces columns, it would be before the cluster, and if it expands them
    it would be after cluster.  So if B is 1:1 too, in what cases might it
    be better to join B between dimension joins?
    
    > We could even make it a bit stricter, and require that all dimensions
    > join "at once". I.e. after joining a dimension, only dimensions can be
    > joined (until all dimensions are joined). So [F, D1, A, D2] would not be
    > allowed. This would further reduce the number of join orders considered.
    
    Right, I guess that is what I am asking above.
    
    > > Impressive.
    > > 
    > 
    > Indeed. I like how it fits into the existing approach. It's a bit like
    > having yet another "join order restriction".
    
    This would be a big feature improvement for OLAP workloads.
    
    -- 
      Bruce Momjian  <bruce@momjian.us>        https://momjian.us
      EDB                                      https://enterprisedb.com
    
      Do not let urgent matters crowd out time for investment in the future.