Re: should we have a fast-path planning for OLTP starjoins?
Tomas Vondra <tomas@vondra.me>
From: Tomas Vondra <tomas@vondra.me>
To: Robert Haas <robertmhaas@gmail.com>, Tom Lane <tgl@sss.pgh.pa.us>
Cc: Bruce Momjian <bruce@momjian.us>,
PostgreSQL Hackers <pgsql-hackers@postgresql.org>
Date: 2026-05-28T22:11:32Z
Lists: pgsql-hackers
Attachments
- v6-starjoin-fastpath-planning.patch (text/x-patch)
- starjoin-scripts.tgz (application/x-compressed-tar)
- throughput _ join_collapse_limit=1.png (image/png)
- throughput _ join_collapse_limit=8.png (image/png)
- throughput _ join_collapse_limit=16.png (image/png)
- throughput _ join_collapse_limit=8 vs. master_limit=1.png (image/png)
- throughput _ join_collapse_limit=16 vs. master_limit=1.png (image/png)
- starjoins.csv (text/csv)
Hi, I got into a couple discussions regarding this patch in Vancouver last week. Some of the questions were about possible overlap with the "foreign key joins" patch Joel is working on. That patch needs to prove some of the same questions about joins (e.g. which joins can't change cardinality). So it seems plausible some this join planning patch could leverage some of the information eventually introduced by that patch, and maybe apply the optimization in more cases. Not sure. But re-reading the old thread, this doesn't seem to be why it got stuck. We already can identify dimensions joined on foreign keys, and that seems like a good start. IIRC the thing that worried me was that just sticking the joins at the end is pretty heavy-handed. It can easily end up making the plan worse, if one of the other joins increases the cardinality. Would that be common? Probably not, but it seems unnecessarily risky. Ideally, we'd do join that reduce cardinality first (with the regular DP join search), then join all the dimensions, and finally do all joins that expand cardinality (again, using the regular DP). But the earlier patches worked by adjusting the join tree in query_planner(), i.e. way before we get to calculate join cardinalities. My plan (mentioned even during the Vancouver hallway track) was to invent a new type of jointree "entry" representing the dimensions, and modifying the join_search_one_level() et al to "expand" it whenever we decide to join it. So it'd get the join group, and replace it with a sequence of joins of all the dimensions. We'd try it in various places, and could pick the best plan. But it seemed pretty complex and invasive, so I kept postponing the work. I started hacking on it this week, but I also started thinking if there might be a better solution, that would fit into the current join search more naturally. And I think I actually found a good approach. It works like this: 1) query_planner() Determine if the query includes a starjoin (or multiple), and remember the relids included in the starjoin cluster. Pick a "canonical" join order for each starjoin cluster we found (e.g. with dimensions in relid order). 2) standard_join_search()/join_search_one_level() When constructing the join rels (e.g. in make_join_rel or right before it's called), check that the new rel would violate the canonical order. If it would, refuse to create it, just like we do for various other join restrictions. The new join restriction is that if the join result includes a subset of the starjoin cluster, then it has to include the fact + prefix of the list of dimensions (which is the canonical join order). Note: It should be possible to make the restriction even more strict, if needed (e.g. to effectively join all dimensions at once, with no other joins in between). The attached patch v6 does this. It's a bit of a PoC quality, but it should be good enough for testing and experiments (it does pass check-world for me, but some parts still need more care / cleanup). I also did a buch of tests to see how effective it is, and it seems pretty effective. It gives me ~80% throughput of join_collapse_limit=1, even with extreme number of dimensions (e.g. 16), with disabled geqo and join_collapse_limit=16 (so it's all in one join list). The current code drops to ~1% of join_collapse_limit=1, so this is about two orders of magnitude faster. Yes, this is in situations when the join does nothing during execution (the fact table is empty). So in practice the improvement will be smaller, but it can still be a substantial speedup, depending on how much other work it's doing. Attached are a couple charts from a test with 1-15 dimensions (scripts attached too). I was wondering how geqo affects this, so I tried with geqo=on/off, and with join_collapse_limit=1/8/16. With join_collapse_limit=1 there's no difference between any of the runs (master, patches with on/off). Here's an example of results: dims master(1) master sj/off sj/on master sj/off sj/on ------------------------------------------------------------------- 1 49485 48797 48966 49118 99% 99% 99% 3 26886 22003 21319 24322 82% 79% 90% 5 17759 7923 7634 15434 45% 43% 87% 7 13110 2122 2071 11290 16% 16% 86% 9 10390 462 445 8709 4% 4% 84% 11 7781 87 86 6488 1% 1% 83% 13 5948 14 14 5749 0% 0% 97% 15 5237 1 1 4227 0% 0% 81% The first column is for master with join_collapse_limit=1, which I take as the best possible throughtput, if we eliminate the join order search almost completely. The rest is with geqo=off and join_collapse_limit=16. So that's the 80% of possible throughput, pretty good IMO. Especially considering master is at 1 tps. There are a couple more interesting charts, and a CSV with raw results. A funny (but not entirely surprising) detail is that geqo=on helps a bit (from 1tps to ~90tps for 15 dimensions), but it's still far from the limit=1 throughput of ~5200 tps. The fastpath join order does better. I'm sure the code needs cleanups and fixes, but I like the approach way more than the original plan (inventing a new group join entry). Any comments regarding this alternative approach? regards -- Tomas Vondra