Thread

Re: should we have a fast-path planning for OLTP starjoins?

Bruce Momjian <bruce@momjian.us> — 2026-05-30T20:16:11Z
On Sat, May 30, 2026 at 08:57:20PM +0200, Tomas Vondra wrote:
> >> The new join restriction is that if the join result includes a subset of
> >> the starjoin cluster, then it has to include the fact + prefix of the
> >> list of dimensions (which is the canonical join order).
> > 
> > Sorry, I got lost here.  What is "prefix?"  I looked at the patch and
> > also could not understand it.
> 
> Apologies, it may not be obvious from the code / comments (I'll try to
> improve that in the next version).
> 
> Let's say we're joining "F" with dimensions D1, D2, D3. Then the
> starjoins_canonicalize() finds the cluster, and picks a canonical join
> order. Could be [F, D1, D2, D3] - in this order. Or whatever other
> permutation of the dimensions, it's all equal.

Uh, are D1, D2, D3 in relid order at this point?

> Then starjoin_order_invalid() ensures that whatever join relation we
> produce, it only even contains a prefix of this list. So a join relation
> can contain [F], [F, D1], [F, D1, D2], [F, D1, D2, D3]. But it can't
> contain e.g. [F, D2], because that skips the D1 - it's not a prefix.

Okay, prefix like a multi-column index prefix of columns.

> The patch only applies this to relations from the cluster. There can be
> other relations in the join "in between" the dimensions - that does not
> make the join order "invalid".
> 
> So for example there may be joins to non-dimensions A and B, and we will
> consider joins [F, A, D1, B, D2, D3] and so on as valid. The joins to A
> and B joins can increase/decrease cardinality, but thanks to this we
> should find the right place to join the dimensions.

Okay, so if D1, D2, and D3 are all "cluster" joins then aren't they are
1:1, so why would you ever put something like B between them?  If B
reduces columns, it would be before the cluster, and if it expands them
it would be after cluster.  So if B is 1:1 too, in what cases might it
be better to join B between dimension joins?

> We could even make it a bit stricter, and require that all dimensions
> join "at once". I.e. after joining a dimension, only dimensions can be
> joined (until all dimensions are joined). So [F, D1, A, D2] would not be
> allowed. This would further reduce the number of join orders considered.

Right, I guess that is what I am asking above.

> > Impressive.
> > 
> 
> Indeed. I like how it fits into the existing approach. It's a bit like
> having yet another "join order restriction".

This would be a big feature improvement for OLAP workloads.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Do not let urgent matters crowd out time for investment in the future.