Making Vars outer-join aware
Tom Lane <tgl@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
To: pgsql-hackers@lists.postgresql.org
Date: 2022-07-01T16:42:27Z
Lists: pgsql-hackers
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Re-allow INDEX_VAR as rt_index in ChangeVarNodes().
- fbf80421ead5 16.0 landed
-
Fix thinkos in have_unsafe_outer_join_ref; reduce to Assert check.
- f50f029c497d 16.0 landed
-
Invent "join domains" to replace the below_outer_join hack.
- 3bef56e11650 16.0 landed
-
Do assorted mop-up in the planner.
- b448f1c8d83f 16.0 landed
-
Make Vars be outer-join-aware.
- 2489d76c4906 16.0 landed
-
Invent "multibitmapsets", and use them to speed up antijoin detection.
- e9e26b5e7166 16.0 landed
-
Add basic regression tests for semi/antijoin recognition.
- 0043aa6b8597 16.0 landed
-
Improve performance of adjust_appendrel_attrs_multilevel.
- 2f17b57017e5 16.0 landed
-
Refactor addition of PlaceHolderVars to joinrel targetlists.
- afa0ec30bfd1 16.0 landed
-
Use an explicit state flag to control PlaceHolderInfo creation.
- b3ff6c742f6c 16.0 landed
-
Make PlaceHolderInfo lookup O(1).
- 6569ca43973b 16.0 landed
Attachments
- v1-0000-add-overview-documentation.patch (text/x-diff) patch v1-0000
- v1-0001-add-preliminary-infrastructure.patch (text/x-diff) patch v1-0001
- v1-0002-label-Var-nullability-in-parser.patch (text/x-diff) patch v1-0002
- v1-0003-cope-with-nullability-in-planner.patch (text/x-diff) patch v1-0003
[ Before settling into commitfest mode, I wanted to put out a snapshot of what I've been working on for the past few weeks. This is not anywhere near committable, but I think people might be interested in looking at it now anyway. ] We've had many discussions (eg [1][2]) about the need to treat outer joins more honestly in parsed queries, so that the planner's reasoning about things like equivalence classes can stand on a firmer foundation. The attached patch series makes a start at doing that, and carries the idea as far as a working system in which all Vars are labeled as to which outer joins can null them. I have not yet gotten to the fun part of fixing or ripping out all the higher-level planner logic that could now be simplified or removed entirely --- but as examples, I believe that reconsider_outer_join_clause no longer does anything useful, and a lot of the logic in deconstruct_jointree and distribute_qual_to_rels could be simplified, and we shouldn't need the notion of second-class-citizen EquivalenceClasses for "below outer join" cases. Another thing that could be built on this infrastructure, but I've not tackled it yet, is fixing the known semantic bugs in grouping sets [3][4]. What I have in mind there is to invent a dummy RTE representing the action of grouping, and use Vars that are marked as nullable by that RTE to represent possibly-nullable grouping-set expressions. The main thing here that differs from my previous ideas is that the nulling-rel labeling is placed directly on Vars or PlaceHolderVars, whereas I had been advocating to use some sort of wrapper node instead. After several failed attempts I decided that it was too complicated to separate the labeling from the Var itself. (I'll just mention one weak spot in that idea: the entire API concept of replace_rte_variables breaks down, because many of the callbacks using it need to manipulate nulling-rel labeling along the way, which they can only do if they see it on the Var they're passed.) Of course, the objection to doing it like this is that it bloats struct Var, which is a mighty common node type, even in cases where there's no outer join anywhere. However, on a 64-bit machine struct Var would widen from 40 to 48 bytes, which is basically free considering that palloc will round the allocation up to 64 bytes. There's a more valid consideration that the pg_node_tree representation of a Var will get longer; but really, if you're worried about that you should be designing a more compact storage representation for node trees. There's also reason to fear that the distributed cost of maintaining these extra Bitmapsets will pose a noticeable drag on parsing or planning speed. However, I see little point in doing performance measurements when we've not yet reaped any of the foreseeable planner improvements. Anyway, on to the patch series. I've broken it up a little bit for review, but note I'm not claiming that the intermediate states would compile or pass regression testing. 0000: Write some overview documentation in optimizer/README. This might be worth reading even if you lack time to look at the code. I went into some detail about Var semantics, and also added a discussion of PlaceHolderVars, which Robert has rightfully complained are badly underdocumented. (At one point I'd thought maybe we could get rid of PlaceHolderVars, but now I see them as complementary to this work --- indeed, arguably the reason for them to exist is so that a Var's nullingrels markers are not lost when replacing it with a pulled-up expression from a subquery.) The changes in the section about EquivalenceClasses are pretty rough and speculative, since I've not actually coded those changes yet. 0001: add infrastructure, namely add new fields to assorted data structures and update places like backend/nodes/*.c. This is mostly pretty boring, except for the commentary changes in *nodes.h. 0002: change the parser to correctly label emitted Vars with the sets of outer joins that can null them, according to the query text as-written. (That is, we don't account here for the possibility of join strength reduction or anything like that.) 0003: fix the planner to cope, including adjusting nullingrel labeling for join elimination, join strength reduction, join reordering, etc. This is still WIP to some extent. In particular note all the XXX comments in setrefs.c complaining about how we're not verifying that the nullingrel states agree when matching upper-level Vars to lower-level ones. This is partly due to setrefs.c not having readily-available info about which outer joins are applied at which plan nodes (should we expend the space to mark them?), and partly because I'm not sure that we can enforce 100% consistency there anyway. Because of the compromises involved in implementing outer-join identity 3 (see 0000), there will be cases where an upper Var that "should" have a nullingrel bit set will not. I don't know how to make a hole in the check that will allow those cases without rendering such checking mostly useless. Is there a way that we can do the identity-3 transformation without being squishy about the nullability state of Vars in the moved qual? I've not thought of one, but it's very annoying considering that the whole point of this patch series is to not be squishy about that. I guess the good news is that the squishiness only seems to be needed during final transformation of the plan, where all we are losing is the ability to detect bugs in earlier planner stages. All of the decisions that actually count seem to work fine without compromises. So far the patch series changes no regression test results, and I've not added any new tests either. The next steps will probably have visible effects in the form of improved plans for some test queries. Anyway, even though this is far from done, I'm pretty pleased with the results so far, so I thought I'd put it out for review by anyone who cares to take a look. I'll add it to the September CF in hopes that it might be more or less finished by then, and so that the cfbot will check it out. regards, tom lane [1] https://www.postgresql.org/message-id/7771.1576452845%40sss.pgh.pa.us [2] https://www.postgresql.org/message-id/flat/15848.1576515643%40sss.pgh.pa.us [3] https://www.postgresql.org/message-id/17071-24dc13fbfa29672d%40postgresql.org [4] https://www.postgresql.org/message-id/CAMbWs48AtQTQGk37MSyDk_EAgDO3Y0iA_LzvuvGQ2uO_Wh2muw%40mail.gmail.com