Re: should we have a fast-path planning for OLTP starjoins?
Tomas Vondra <tomas@vondra.me>
From: Tomas Vondra <tomas@vondra.me>
To: Bruce Momjian <bruce@momjian.us>
Cc: Robert Haas <robertmhaas@gmail.com>, Tom Lane <tgl@sss.pgh.pa.us>,
PostgreSQL Hackers <pgsql-hackers@postgresql.org>
Date: 2026-05-30T18:57:20Z
Lists: pgsql-hackers
On 5/30/26 20:18, Bruce Momjian wrote: > On Fri, May 29, 2026 at 12:11:32AM +0200, Tomas Vondra wrote: >> But re-reading the old thread, this doesn't seem to be why it got stuck. >> We already can identify dimensions joined on foreign keys, and that >> seems like a good start. >> >> IIRC the thing that worried me was that just sticking the joins at the >> end is pretty heavy-handed. It can easily end up making the plan worse, >> if one of the other joins increases the cardinality. Would that be >> common? Probably not, but it seems unnecessarily risky. > > Right. > >> Ideally, we'd do join that reduce cardinality first (with the regular DP >> join search), then join all the dimensions, and finally do all joins >> that expand cardinality (again, using the regular DP). But the earlier >> patches worked by adjusting the join tree in query_planner(), i.e. way >> before we get to calculate join cardinalities. > > Yes, I remember discussing that. > >> It works like this: >> >> 1) query_planner() >> >> Determine if the query includes a starjoin (or multiple), and remember >> the relids included in the starjoin cluster. Pick a "canonical" join >> order for each starjoin cluster we found (e.g. with dimensions in relid >> order). >> >> 2) standard_join_search()/join_search_one_level() >> >> When constructing the join rels (e.g. in make_join_rel or right before >> it's called), check that the new rel would violate the canonical order. >> If it would, refuse to create it, just like we do for various other join >> restrictions. > > This is how you avoid the factorial explosion of plan options, right? > Yes, exactly. >> The new join restriction is that if the join result includes a subset of >> the starjoin cluster, then it has to include the fact + prefix of the >> list of dimensions (which is the canonical join order). > > Sorry, I got lost here. What is "prefix?" I looked at the patch and > also could not understand it. Apologies, it may not be obvious from the code / comments (I'll try to improve that in the next version). Let's say we're joining "F" with dimensions D1, D2, D3. Then the starjoins_canonicalize() finds the cluster, and picks a canonical join order. Could be [F, D1, D2, D3] - in this order. Or whatever other permutation of the dimensions, it's all equal. Then starjoin_order_invalid() ensures that whatever join relation we produce, it only even contains a prefix of this list. So a join relation can contain [F], [F, D1], [F, D1, D2], [F, D1, D2, D3]. But it can't contain e.g. [F, D2], because that skips the D1 - it's not a prefix. The patch only applies this to relations from the cluster. There can be other relations in the join "in between" the dimensions - that does not make the join order "invalid". So for example there may be joins to non-dimensions A and B, and we will consider joins [F, A, D1, B, D2, D3] and so on as valid. The joins to A and B joins can increase/decrease cardinality, but thanks to this we should find the right place to join the dimensions. We could even make it a bit stricter, and require that all dimensions join "at once". I.e. after joining a dimension, only dimensions can be joined (until all dimensions are joined). So [F, D1, A, D2] would not be allowed. This would further reduce the number of join orders considered. > >> Note: It should be possible to make the restriction even more strict, if >> needed (e.g. to effectively join all dimensions at once, with no other >> joins in between). > > The patch is quite small. > >> Attached are a couple charts from a test with 1-15 dimensions (scripts >> attached too). I was wondering how geqo affects this, so I tried with >> geqo=on/off, and with join_collapse_limit=1/8/16. >> >> With join_collapse_limit=1 there's no difference between any of the runs >> (master, patches with on/off). Here's an example of results: >> >> dims master(1) master sj/off sj/on master sj/off sj/on >> ------------------------------------------------------------------- >> 1 49485 48797 48966 49118 99% 99% 99% >> 3 26886 22003 21319 24322 82% 79% 90% >> 5 17759 7923 7634 15434 45% 43% 87% >> 7 13110 2122 2071 11290 16% 16% 86% >> 9 10390 462 445 8709 4% 4% 84% >> 11 7781 87 86 6488 1% 1% 83% >> 13 5948 14 14 5749 0% 0% 97% >> 15 5237 1 1 4227 0% 0% 81% > > Impressive. > Indeed. I like how it fits into the existing approach. It's a bit like having yet another "join order restriction". regards -- Tomas Vondra