Thread

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Re-allow INDEX_VAR as rt_index in ChangeVarNodes().

  2. Fix thinkos in have_unsafe_outer_join_ref; reduce to Assert check.

  3. Invent "join domains" to replace the below_outer_join hack.

  4. Do assorted mop-up in the planner.

  5. Make Vars be outer-join-aware.

  6. Invent "multibitmapsets", and use them to speed up antijoin detection.

  7. Add basic regression tests for semi/antijoin recognition.

  8. Improve performance of adjust_appendrel_attrs_multilevel.

  9. Refactor addition of PlaceHolderVars to joinrel targetlists.

  10. Use an explicit state flag to control PlaceHolderInfo creation.

  11. Make PlaceHolderInfo lookup O(1).

  1. Making Vars outer-join aware

    Tom Lane <tgl@sss.pgh.pa.us> — 2022-07-01T16:42:27Z

    [ Before settling into commitfest mode, I wanted to put out a snapshot
    of what I've been working on for the past few weeks.  This is not
    anywhere near committable, but I think people might be interested
    in looking at it now anyway. ]
    
    We've had many discussions (eg [1][2]) about the need to treat outer
    joins more honestly in parsed queries, so that the planner's reasoning
    about things like equivalence classes can stand on a firmer foundation.
    The attached patch series makes a start at doing that, and carries the
    idea as far as a working system in which all Vars are labeled as to
    which outer joins can null them.  I have not yet gotten to the fun part
    of fixing or ripping out all the higher-level planner logic that could
    now be simplified or removed entirely --- but as examples, I believe
    that reconsider_outer_join_clause no longer does anything useful, and
    a lot of the logic in deconstruct_jointree and distribute_qual_to_rels
    could be simplified, and we shouldn't need the notion of
    second-class-citizen EquivalenceClasses for "below outer join" cases.
    
    Another thing that could be built on this infrastructure, but I've
    not tackled it yet, is fixing the known semantic bugs in grouping sets
    [3][4].  What I have in mind there is to invent a dummy RTE representing
    the action of grouping, and use Vars that are marked as nullable by that
    RTE to represent possibly-nullable grouping-set expressions.
    
    The main thing here that differs from my previous ideas is that the
    nulling-rel labeling is placed directly on Vars or PlaceHolderVars,
    whereas I had been advocating to use some sort of wrapper node instead.
    After several failed attempts I decided that it was too complicated
    to separate the labeling from the Var itself.  (I'll just mention one
    weak spot in that idea: the entire API concept of replace_rte_variables
    breaks down, because many of the callbacks using it need to manipulate
    nulling-rel labeling along the way, which they can only do if they
    see it on the Var they're passed.)  Of course, the objection to doing it
    like this is that it bloats struct Var, which is a mighty common node
    type, even in cases where there's no outer join anywhere.  However, on
    a 64-bit machine struct Var would widen from 40 to 48 bytes, which is
    basically free considering that palloc will round the allocation up to
    64 bytes.  There's a more valid consideration that the pg_node_tree
    representation of a Var will get longer; but really, if you're worried
    about that you should be designing a more compact storage representation
    for node trees.  There's also reason to fear that the distributed cost
    of maintaining these extra Bitmapsets will pose a noticeable drag on
    parsing or planning speed.  However, I see little point in doing
    performance measurements when we've not yet reaped any of the
    foreseeable planner improvements.
    
    Anyway, on to the patch series.  I've broken it up a little bit
    for review, but note I'm not claiming that the intermediate states
    would compile or pass regression testing.
    
    0000: Write some overview documentation in optimizer/README.
    This might be worth reading even if you lack time to look at the code.
    I went into some detail about Var semantics, and also added a discussion
    of PlaceHolderVars, which Robert has rightfully complained are badly
    underdocumented.  (At one point I'd thought maybe we could get rid of
    PlaceHolderVars, but now I see them as complementary to this work ---
    indeed, arguably the reason for them to exist is so that a Var's
    nullingrels markers are not lost when replacing it with a pulled-up
    expression from a subquery.)  The changes in the section about
    EquivalenceClasses are pretty rough and speculative, since I've not
    actually coded those changes yet.
    
    0001: add infrastructure, namely add new fields to assorted data
    structures and update places like backend/nodes/*.c.  This is mostly
    pretty boring, except for the commentary changes in *nodes.h.
    
    0002: change the parser to correctly label emitted Vars with the
    sets of outer joins that can null them, according to the query text
    as-written.  (That is, we don't account here for the possibility
    of join strength reduction or anything like that.)
    
    0003: fix the planner to cope, including adjusting nullingrel labeling
    for join elimination, join strength reduction, join reordering, etc.
    This is still WIP to some extent.  In particular note all the XXX
    comments in setrefs.c complaining about how we're not verifying that the
    nullingrel states agree when matching upper-level Vars to lower-level
    ones.  This is partly due to setrefs.c not having readily-available info
    about which outer joins are applied at which plan nodes (should we
    expend the space to mark them?), and partly because I'm not sure
    that we can enforce 100% consistency there anyway.  Because of the
    compromises involved in implementing outer-join identity 3 (see 0000),
    there will be cases where an upper Var that "should" have a nullingrel
    bit set will not.  I don't know how to make a hole in the check that
    will allow those cases without rendering such checking mostly useless.
    
    Is there a way that we can do the identity-3 transformation without
    being squishy about the nullability state of Vars in the moved qual?
    I've not thought of one, but it's very annoying considering that the
    whole point of this patch series is to not be squishy about that.
    I guess the good news is that the squishiness only seems to be needed
    during final transformation of the plan, where all we are losing is
    the ability to detect bugs in earlier planner stages.  All of the
    decisions that actually count seem to work fine without compromises.
    
    So far the patch series changes no regression test results, and I've
    not added any new tests either.  The next steps will probably have
    visible effects in the form of improved plans for some test queries.
    
    Anyway, even though this is far from done, I'm pretty pleased with
    the results so far, so I thought I'd put it out for review by
    anyone who cares to take a look.  I'll add it to the September CF
    in hopes that it might be more or less finished by then, and so
    that the cfbot will check it out.
    
    			regards, tom lane
    
    [1] https://www.postgresql.org/message-id/7771.1576452845%40sss.pgh.pa.us
    [2] https://www.postgresql.org/message-id/flat/15848.1576515643%40sss.pgh.pa.us
    [3] https://www.postgresql.org/message-id/17071-24dc13fbfa29672d%40postgresql.org
    [4] https://www.postgresql.org/message-id/CAMbWs48AtQTQGk37MSyDk_EAgDO3Y0iA_LzvuvGQ2uO_Wh2muw%40mail.gmail.com
    
    
  2. Re: Making Vars outer-join aware

    Finnerty, Jim <jfinnert@amazon.com> — 2022-07-01T20:19:48Z

    Tom, two quick questions before attempting to read the patch:
    
        Given that views are represented in a parsed representation, does anything need to happen to the Vars inside a view when that view is outer-joined to?  
    
        If an outer join is converted to an inner join, must this information get propagated to all the affected Vars, potentially across query block levels?
    
    
    
  3. Re: Making Vars outer-join aware

    Tom Lane <tgl@sss.pgh.pa.us> — 2022-07-01T20:40:45Z

    "Finnerty, Jim" <jfinnert@amazon.com> writes:
    >     Given that views are represented in a parsed representation, does anything need to happen to the Vars inside a view when that view is outer-joined to?  
    
    No.  The markings only refer to what is in the same Query tree as the Var
    itself.
    
    Subquery flattening during planning does deal with this: if we pull up a
    subquery (possibly inserted from a view) that was underneath an outer
    join, the nullingrel marks on the upper-level Vars referring to subquery
    outputs will get merged into what is pulled up, either by unioning the
    varnullingrel bitmaps if what is pulled up is just a Var, or if what is
    pulled up isn't a Var, by wrapping it in a PlaceHolderVar that carries
    the old outer Var's markings.  We had essentially this same behavior
    with PlaceHolderVars before, but I think this way makes it a lot more
    principled and intelligible (and I suspect there are now cases where we
    manage to avoid inserting unnecessary PlaceHolderVars that the old code
    couldn't avoid).
    
    >     If an outer join is converted to an inner join, must this information get propagated to all the affected Vars, potentially across query block levels?
    
    Yes.  The code is there in the patch to run around and remove nullingrel
    bits from affected Vars.
    
    One thing that doesn't happen (and didn't before, so this is not a
    regression) is that if we strength-reduce a FULL JOIN USING to an outer
    or plain join, it'd be nice if the "COALESCE" hack we represent the
    merged USING column with could be replaced with the same lower-relation
    Var that the parser would have used if the join weren't FULL to begin
    with.  Without that, we're leaving optimization opportunities on the
    table.  I'm hesitant to try to do that though as long as the COALESCE
    structures look exactly like something a user could write.  It'd be
    safer if we used some bespoke node structure for this purpose ...
    but nobody's bothered to invent that.
    
    			regards, tom lane
    
    
    
    
  4. Re: Making Vars outer-join aware

    Richard Guo <guofenglinux@gmail.com> — 2022-07-05T11:02:38Z

    On Sat, Jul 2, 2022 at 12:42 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    
    >
    > Anyway, even though this is far from done, I'm pretty pleased with
    > the results so far, so I thought I'd put it out for review by
    > anyone who cares to take a look.  I'll add it to the September CF
    > in hopes that it might be more or less finished by then, and so
    > that the cfbot will check it out.
    >
    
    Thanks for the work! I have a question about qual clause placement.
    
    For the query in the example
    
        SELECT * FROM t1 LEFT JOIN t2 ON (t1.x = t2.y) WHERE foo(t2.z)
    
    (foo() is not strict.) We want to avoid pushing foo(t2.z) down to the t2
    scan level. Previously we do that with check_outerjoin_delay() by
    scanning all the outer joins below and check if the qual references any
    nullable rels of the OJ, and if so include the OJ's rels into the qual.
    So as a result we'd get that foo(t2.z) is referencing t1 and t2, and
    we'd put the qual into the join lists of t1 and t2.
    
    Now there is the 'varnullingrels' marker in the t2.z, which is the LEFT
    JOIN below (with RTI 3). So we consider the qual is referencing RTE 2
    (which is t2) and RTE 3 (which is the OJ). Do we still need to include
    RTE 1, i.e. t1 into the qual's required relids? How should we do that?
    
    Thanks
    Richard
    
  5. Re: Making Vars outer-join aware

    Tom Lane <tgl@sss.pgh.pa.us> — 2022-07-05T14:24:00Z

    Richard Guo <guofenglinux@gmail.com> writes:
    > For the query in the example
    
    >     SELECT * FROM t1 LEFT JOIN t2 ON (t1.x = t2.y) WHERE foo(t2.z)
    
    > (foo() is not strict.) We want to avoid pushing foo(t2.z) down to the t2
    > scan level. Previously we do that with check_outerjoin_delay() by
    > scanning all the outer joins below and check if the qual references any
    > nullable rels of the OJ, and if so include the OJ's rels into the qual.
    > So as a result we'd get that foo(t2.z) is referencing t1 and t2, and
    > we'd put the qual into the join lists of t1 and t2.
    
    > Now there is the 'varnullingrels' marker in the t2.z, which is the LEFT
    > JOIN below (with RTI 3). So we consider the qual is referencing RTE 2
    > (which is t2) and RTE 3 (which is the OJ). Do we still need to include
    > RTE 1, i.e. t1 into the qual's required relids? How should we do that?
    
    It seems likely to me that we could leave the qual's required_relids
    as just {2,3} and not have to bother ORing any additional bits into
    that.  However, in the case of a Var-free JOIN/ON clause it'd still
    be necessary to artificially add some relids to its initially empty
    relids.  Since I've not yet tried to rewrite distribute_qual_to_rels
    I'm not sure how the details will shake out.
    
    			regards, tom lane
    
    
    
    
  6. Re: Making Vars outer-join aware

    Tom Lane <tgl@sss.pgh.pa.us> — 2022-07-10T19:38:36Z

    Here's v2 of this patch series.  It's functionally identical to v1,
    but I've rebased it over the recent auto-node-support-generation
    changes, and also extracted a few separable bits in hopes of making
    the main planner patch smaller.  (It's still pretty durn large,
    unfortunately.)  Unlike the original submission, each step will
    compile on its own, though the intermediate states mostly don't
    pass all regression tests.
    
    			regards, tom lane
    
    
  7. Re: Making Vars outer-join aware

    Zhihong Yu <zyu@yugabyte.com> — 2022-07-10T21:04:41Z

    On Sun, Jul 10, 2022 at 12:39 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    
    > Here's v2 of this patch series.  It's functionally identical to v1,
    > but I've rebased it over the recent auto-node-support-generation
    > changes, and also extracted a few separable bits in hopes of making
    > the main planner patch smaller.  (It's still pretty durn large,
    > unfortunately.)  Unlike the original submission, each step will
    > compile on its own, though the intermediate states mostly don't
    > pass all regression tests.
    >
    >                         regards, tom lane
    >
    > Hi,
    For v2-0004-cope-with-nullability-in-planner.patch.
    In remove_unneeded_nulling_relids():
    
    +   if (removable_relids == NULL)
    
    Why is bms_is_empty() not used in the above check ?
    Earlier there is `if (bms_is_empty(old_nulling_relids))`
    
    +typedef struct reduce_outer_joins_partial_state
    
    Since there are already reduce_outer_joins_pass1_state
    and reduce_outer_joins_pass2_state, a comment
    above reduce_outer_joins_partial_state would help other people follow its
    purpose.
    
    +       if (j->rtindex)
    +       {
    +           if (j->jointype == JOIN_INNER)
    +           {
    +               if (include_inner_joins)
    +                   result = bms_add_member(result, j->rtindex);
    +           }
    +           else
    +           {
    +               if (include_outer_joins)
    
    Since there are other join types beside JOIN_INNER, should there be an
    assertion in the else block ? e.g. jointype wouldn't be JOIN_UNIQUE_INNER.
    
    Cheers
    
  8. Re: Making Vars outer-join aware

    Tom Lane <tgl@sss.pgh.pa.us> — 2022-07-10T23:51:28Z

    Zhihong Yu <zyu@yugabyte.com> writes:
    > In remove_unneeded_nulling_relids():
    
    > +   if (removable_relids == NULL)
    
    > Why is bms_is_empty() not used in the above check ?
    
    We initialized that to NULL just a few lines above, and then did
    nothing to it other than perhaps bms_add_member, so it's impossible
    for it to be empty-and-yet-not-NULL.
    
    > +typedef struct reduce_outer_joins_partial_state
    
    > Since there are already reduce_outer_joins_pass1_state
    > and reduce_outer_joins_pass2_state, a comment
    > above reduce_outer_joins_partial_state would help other people follow its
    > purpose.
    
    We generally document these sorts of structs in the using code,
    not at the struct declaration.
    
    > +       if (j->rtindex)
    > +       {
    > +           if (j->jointype == JOIN_INNER)
    > +           {
    > +               if (include_inner_joins)
    > +                   result = bms_add_member(result, j->rtindex);
    > +           }
    > +           else
    > +           {
    > +               if (include_outer_joins)
    
    > Since there are other join types beside JOIN_INNER, should there be an
    > assertion in the else block?
    
    Like what?  We don't particularly care what the join type is here,
    as long as it's not INNER.  In any case there is plenty of nearby
    code checking for unsupported join types.
    
    			regards, tom lane
    
    
    
    
  9. Re: Making Vars outer-join aware

    Richard Guo <guofenglinux@gmail.com> — 2022-07-12T07:20:37Z

    On Mon, Jul 11, 2022 at 3:38 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    
    > Here's v2 of this patch series.  It's functionally identical to v1,
    > but I've rebased it over the recent auto-node-support-generation
    > changes, and also extracted a few separable bits in hopes of making
    > the main planner patch smaller.  (It's still pretty durn large,
    > unfortunately.)  Unlike the original submission, each step will
    > compile on its own, though the intermediate states mostly don't
    > pass all regression tests.
    
    
    Noticed a different behavior from previous regarding PlaceHolderVar.
    Take the query below as an example:
    
    select a.i, ss.jj from a left join (select b.i, b.j + 1 as jj from b) ss
    on a.i = ss.i;
    
    Previously the expression 'b.j + 1' would not be wrapped in a
    PlaceHolderVar, since it contains a Var of the subquery and meanwhile it
    does not contain any non-strict constructs. And now in the patch, we
    would insert a PlaceHolderVar for it, in order to have a place to store
    varnullingrels. So the plan for the above query now becomes:
    
    # explain (verbose, costs off) select a.i, ss.jj from a left join
    (select b.i, b.j + 1 as jj from b) ss on a.i = ss.i;
                QUERY PLAN
    ----------------------------------
     Hash Right Join
       Output: a.i, ((b.j + 1))
       Hash Cond: (b.i = a.i)
       ->  Seq Scan on public.b
             Output: b.i, (b.j + 1)
       ->  Hash
             Output: a.i
             ->  Seq Scan on public.a
                   Output: a.i
    (9 rows)
    
    Note that the evaluation of expression 'b.j + 1' now occurs below the
    outer join. Is this something we need to be concerned about?
    
    Thanks
    Richard
    
  10. Re: Making Vars outer-join aware

    Tom Lane <tgl@sss.pgh.pa.us> — 2022-07-12T13:37:41Z

    Richard Guo <guofenglinux@gmail.com> writes:
    > Note that the evaluation of expression 'b.j + 1' now occurs below the
    > outer join. Is this something we need to be concerned about?
    
    It seems more formally correct to me, but perhaps somebody would
    complain about possibly-useless expression evals.  We could likely
    re-complicate the logic that inserts PHVs during pullup so that it
    looks for Vars it can apply the nullingrels to.  Maybe there's an
    opportunity to share code with flatten_join_alias_vars?
    
    			regards, tom lane
    
    
    
    
  11. Re: Making Vars outer-join aware

    Richard Guo <guofenglinux@gmail.com> — 2022-07-13T07:48:41Z

    On Tue, Jul 12, 2022 at 9:37 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    
    > Richard Guo <guofenglinux@gmail.com> writes:
    > > Note that the evaluation of expression 'b.j + 1' now occurs below the
    > > outer join. Is this something we need to be concerned about?
    >
    > It seems more formally correct to me, but perhaps somebody would
    > complain about possibly-useless expression evals.  We could likely
    > re-complicate the logic that inserts PHVs during pullup so that it
    > looks for Vars it can apply the nullingrels to.  Maybe there's an
    > opportunity to share code with flatten_join_alias_vars?
    
    
    Yeah, maybe we can extend and leverage the codes in
    adjust_standard_join_alias_expression() to do that?
    
    But I'm not sure which is better, to evaluate the expression below or
    above the outer join. It seems to me that if the size of base rel is
    large and somehow the size of the joinrel is small, evaluation above the
    outer join would win. And in the opposite case evaluation below the
    outer join would be better.
    
    Thanks
    Richard
    
  12. Re: Making Vars outer-join aware

    Tom Lane <tgl@sss.pgh.pa.us> — 2022-07-13T14:09:23Z

    Richard Guo <guofenglinux@gmail.com> writes:
    > But I'm not sure which is better, to evaluate the expression below or
    > above the outer join. It seems to me that if the size of base rel is
    > large and somehow the size of the joinrel is small, evaluation above the
    > outer join would win. And in the opposite case evaluation below the
    > outer join would be better.
    
    Reasonable question.  But I think for the purposes of this patch,
    it's better to keep the old behavior as much as we can.  People
    have probably relied on it while tuning queries.  (I'm not saying
    it has to be *exactly* bug-compatible, but simple cases like your
    example probably should work the same.)
    
    			regards, tom lane
    
    
    
    
  13. Re: Making Vars outer-join aware

    Tom Lane <tgl@sss.pgh.pa.us> — 2022-08-01T19:51:33Z

    Here's a rebase up to HEAD, mostly to placate the cfbot.
    I accounted for d8e34fa7a (s/all_baserels/all_query_rels/
    in those places) and made one tiny bug-fix change.
    Nothing substantive as yet.
    
    			regards, tom lane
    
    
  14. Re: Making Vars outer-join aware

    Tom Lane <tgl@sss.pgh.pa.us> — 2022-08-01T20:26:30Z

    Zhihong Yu <zyu@yugabyte.com> writes:
    > For v3-0003-label-Var-nullability-in-parser.patch :
    
    > +   if (rtindex > 0 && rtindex <= list_length(pstate->p_nullingrels))
    > +       relids = (Bitmapset *) list_nth(pstate->p_nullingrels, rtindex - 1);
    > +   else
    > +       relids = NULL;
    > +
    > +   /*
    > +    * Merge with any already-declared nulling rels.  (Typically there won't
    > +    * be any, but let's get it right if there are.)
    > +    */
    > +   if (relids != NULL)
    
    > It seems the last if block can be merged into the previous if block. That
    > way `relids = NULL` can be omitted.
    
    No, because the list entry we fetch could be NULL.
    
    			regards, tom lane
    
    
    
    
  15. Re: Making Vars outer-join aware

    Zhihong Yu <zyu@yugabyte.com> — 2022-08-01T20:28:37Z

    On Mon, Aug 1, 2022 at 12:51 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    
    > Here's a rebase up to HEAD, mostly to placate the cfbot.
    > I accounted for d8e34fa7a (s/all_baserels/all_query_rels/
    > in those places) and made one tiny bug-fix change.
    > Nothing substantive as yet.
    >
    >                         regards, tom lane
    >
    > Hi,
    For v3-0003-label-Var-nullability-in-parser.patch :
    
    +   if (rtindex > 0 && rtindex <= list_length(pstate->p_nullingrels))
    +       relids = (Bitmapset *) list_nth(pstate->p_nullingrels, rtindex - 1);
    +   else
    +       relids = NULL;
    +
    +   /*
    +    * Merge with any already-declared nulling rels.  (Typically there won't
    +    * be any, but let's get it right if there are.)
    +    */
    +   if (relids != NULL)
    
    It seems the last if block can be merged into the previous if block. That
    way `relids = NULL` can be omitted.
    
    Cheers
    
  16. Re: Making Vars outer-join aware

    Richard Guo <guofenglinux@gmail.com> — 2022-08-15T08:48:23Z

    On Tue, Aug 2, 2022 at 3:51 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    
    > Here's a rebase up to HEAD, mostly to placate the cfbot.
    > I accounted for d8e34fa7a (s/all_baserels/all_query_rels/
    > in those places) and made one tiny bug-fix change.
    > Nothing substantive as yet.
    
    
    When we add required PlaceHolderVars to a join rel's targetlist, if the
    PHV can be computed in the nullable-side of the join, we would add the
    join's RT index to phnullingrels. This is correct as we know the PHV's
    value can be nulled at this join. But I'm wondering if it is necessary
    since we have already propagated any varnullingrels into the PHV when we
    apply pullup variable replacement in perform_pullup_replace_vars().
    
    On the other hand, the phnullingrels may contain RT indexes of outer
    joins above this join level. It seems not good to add such a PHV to the
    joinrel's targetlist. Below is an example:
    
    # explain (verbose, costs off) select a.i, ss.jj from a left join (b left
    join (select c.i, coalesce(c.j, 1) as jj from c) ss on b.i = ss.i) on true;
                           QUERY PLAN
    ---------------------------------------------------------
     Nested Loop Left Join
       Output: a.i, (COALESCE(c.j, 1))
       ->  Seq Scan on public.a
             Output: a.i, a.j
       ->  Materialize
             Output: (COALESCE(c.j, 1))
             ->  Hash Left Join
                   Output: (COALESCE(c.j, 1))
                   Hash Cond: (b.i = c.i)
                   ->  Seq Scan on public.b
                         Output: b.i, b.j
                   ->  Hash
                         Output: c.i, (COALESCE(c.j, 1))
                         ->  Seq Scan on public.c
                               Output: c.i, COALESCE(c.j, 1)
    (15 rows)
    
    In this query, for the joinrel {B, C}, the PHV in its targetlist has a
    phnullingrels that contains the join of {A} and {BC}, which is confusing
    because we have not reached that join level.
    
    I tried the changes below to illustrate the two issues. The assertion
    holds true during regression tests and the error pops up for the query
    above.
    
    --- a/src/backend/optimizer/util/placeholder.c
    +++ b/src/backend/optimizer/util/placeholder.c
    @@ -464,18 +464,20 @@ add_placeholders_to_joinrel(PlannerInfo *root,
    RelOptInfo *joinrel,
                           {
                                   if (sjinfo->jointype == JOIN_FULL &&
    sjinfo->ojrelid != 0)
                                   {
    -                                      /* PHV's value can be nulled at this
    join */
    -                                      phv->phnullingrels =
    bms_add_member(phv->phnullingrels,
    -
                                       sjinfo->ojrelid);
    +
     Assert(bms_is_member(sjinfo->ojrelid, phv->phnullingrels));
    +
    +                                      if
    (!bms_is_subset(phv->phnullingrels, joinrel->relids))
    +                                              elog(ERROR, "phnullingrels
    is not subset of joinrel's relids");
                                   }
                           }
                           else if (bms_is_subset(phinfo->ph_eval_at,
    inner_rel->relids))
                           {
                                   if (sjinfo->jointype != JOIN_INNER &&
    sjinfo->ojrelid != 0)
                                   {
    -                                      /* PHV's value can be nulled at this
    join */
    -                                      phv->phnullingrels =
    bms_add_member(phv->phnullingrels,
    -
                                       sjinfo->ojrelid);
    +
     Assert(bms_is_member(sjinfo->ojrelid, phv->phnullingrels));
    +
    +                                      if
    (!bms_is_subset(phv->phnullingrels, joinrel->relids))
    +                                              elog(ERROR, "phnullingrels
    is not subset of joinrel's relids");
                                   }
                           }
    
    
    If the two issues are indeed something we need to fix, maybe we can
    change add_placeholders_to_joinrel() to search the PHVs from
    outer_rel/inner_rel's targetlist, and add the ojrelid to phnullingrels
    if needed, just like what we do in build_joinrel_tlist(). The PHVs there
    should have the correct phnullingrels (at least the PHVs in base rels'
    targetlists have correctly set phnullingrels to NULL).
    
    Thanks
    Richard
    
  17. Re: Making Vars outer-join aware

    Tom Lane <tgl@sss.pgh.pa.us> — 2022-08-16T16:08:23Z

    Richard Guo <guofenglinux@gmail.com> writes:
    > When we add required PlaceHolderVars to a join rel's targetlist, if the
    > PHV can be computed in the nullable-side of the join, we would add the
    > join's RT index to phnullingrels. This is correct as we know the PHV's
    > value can be nulled at this join. But I'm wondering if it is necessary
    > since we have already propagated any varnullingrels into the PHV when we
    > apply pullup variable replacement in perform_pullup_replace_vars().
    
    I'm not seeing the connection there?  Any nullingrels that are set
    during perform_pullup_replace_vars would refer to outer joins within the
    pulled-up subquery, whereas what you are talking about here is what
    happens when the PHV's value propagates up through outer joins of the
    parent query.  There's no overlap between those relid sets, or if there
    is we have some fault in the logic that constrains join order to ensure
    that there's a valid place to compute each PHV.
    
    > On the other hand, the phnullingrels may contain RT indexes of outer
    > joins above this join level. It seems not good to add such a PHV to the
    > joinrel's targetlist.
    
    Hmm, yeah, add_placeholders_to_joinrel is doing this wrong.  The
    intent was to match what build_joinrel_tlist does with plain Vars,
    but in that case we're adding the join's relid to what we find in
    varnullingrels in the input tlist.  Using the phnullingrels from
    the placeholder_list entry is wrong.  (I wonder whether a
    placeholder_list entry's phnullingrels is meaningful at all?
    Maybe it isn't.)  I think it might work to take the intersection
    of the join's relids with root->outer_join_rels.
    
    > If the two issues are indeed something we need to fix, maybe we can
    > change add_placeholders_to_joinrel() to search the PHVs from
    > outer_rel/inner_rel's targetlist, and add the ojrelid to phnullingrels
    > if needed, just like what we do in build_joinrel_tlist(). The PHVs there
    > should have the correct phnullingrels (at least the PHVs in base rels'
    > targetlists have correctly set phnullingrels to NULL).
    
    Yeah, maybe we should do something more invasive and make use of the
    input targetlists rather than doing this from scratch.  Not sure.
    I'm worried that doing it that way would increase the risk of getting
    different join tlist contents depending on which pair of input rels
    we happen to consider first.
    
    			regards, tom lane
    
    
    
    
  18. Re: Making Vars outer-join aware

    Tom Lane <tgl@sss.pgh.pa.us> — 2022-08-16T19:24:04Z

    I wrote:
    > Richard Guo <guofenglinux@gmail.com> writes:
    >> If the two issues are indeed something we need to fix, maybe we can
    >> change add_placeholders_to_joinrel() to search the PHVs from
    >> outer_rel/inner_rel's targetlist, and add the ojrelid to phnullingrels
    >> if needed, just like what we do in build_joinrel_tlist(). The PHVs there
    >> should have the correct phnullingrels (at least the PHVs in base rels'
    >> targetlists have correctly set phnullingrels to NULL).
    
    > Yeah, maybe we should do something more invasive and make use of the
    > input targetlists rather than doing this from scratch.  Not sure.
    > I'm worried that doing it that way would increase the risk of getting
    > different join tlist contents depending on which pair of input rels
    > we happen to consider first.
    
    After chewing on that for awhile, I've concluded that that is the way
    to go.  0001 attached is a standalone patch to rearrange the way that
    PHVs are added to joinrel targetlists.  It results in one cosmetic
    change in the regression tests, where the targetlist order for an
    intermediate join node changes.  I think that's fine; if anything,
    the new order is more sensible than the old because it matches the
    inputs' targetlist orders better.
    
    I believe the reason I didn't do it like this to start with is that
    I was concerned about the cost of searching the placeholder_list
    repeatedly.  With a lot of PHVs, build_joinrel_tlist becomes O(N^2)
    just from the repeated find_placeholder_info lookups.  We can fix
    that by adding an index array to go straight from phid to the
    PlaceHolderInfo.  While thinking about where to construct such
    an index array, I decided it'd be a good idea to have an explicit
    step to "freeze" the set of PlaceHolderInfos, at the start of
    deconstruct_jointree.  This allows getting rid of the create_new_ph
    flags for find_placeholder_info and add_vars_to_targetlist, which
    I've always feared were bugs waiting to happen: they require callers
    to have a very clear understanding of when they're invoked.  There
    might be some speed gain over existing code too, but I've not really
    tried to measure it.  I did drop a couple of hacks that were only
    meant to short-circuit find_placeholder_info calls; that function
    should now be cheap enough to not matter.
    
    Barring objections, I'd like to push these soon and then rebase
    the main outer-join-vars patch set over them.
    
    			regards, tom lane
    
    
  19. Re: Making Vars outer-join aware

    Tom Lane <tgl@sss.pgh.pa.us> — 2022-08-16T20:57:08Z

    I wrote:
    > ... We can fix
    > that by adding an index array to go straight from phid to the
    > PlaceHolderInfo.  While thinking about where to construct such
    > an index array, I decided it'd be a good idea to have an explicit
    > step to "freeze" the set of PlaceHolderInfos, at the start of
    > deconstruct_jointree.
    
    On further thought, it seems better to maintain the index array
    from the start, allowing complete replacement of the linear list
    searches.  We can add a separate bool flag to denote frozen-ness.
    This does have minor downsides:
    
    * Some fiddly code is needed to enlarge the index array at need.
    But it's not different from that for, say, simple_rel_array.
    
    * If we ever have a need to mutate the placeholder_list as a whole,
    we'd have to reconstruct the index array to point to the new
    objects.  We don't do that at present, except in one place in
    analyzejoins.c, which is easily fixed.  While the same argument
    could be raised against the v1 patch, it's not very likely that
    we'd be doing such mutation beyond the start of deconstruct_jointree.
    
    Hence, see v2 attached.
    
    			regards, tom lane
    
    
  20. Re: Making Vars outer-join aware

    Richard Guo <guofenglinux@gmail.com> — 2022-08-17T09:25:53Z

    On Wed, Aug 17, 2022 at 4:57 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    
    > On further thought, it seems better to maintain the index array
    > from the start, allowing complete replacement of the linear list
    > searches.  We can add a separate bool flag to denote frozen-ness.
    
    
    +1 for 0001 patch. Now we process plain Vars and PlaceHolderVars in a
    more consistent way when building joinrel's tlist. And this change would
    make it easier to build up phnullingrels for PHVs as we climb up the
    join tree.
    
    BTW, the comment just above the two calls to build_joinrel_tlist says:
    
     * Create a new tlist containing just the vars that need to be output from
    
    Here by 'vars' it means both plain Vars and PlaceHolderVars, right? If
    not we may need to adjust this comment to also include PlaceHolderVars.
    
    
    0002 patch looks good to me. Glad we can get rid of create_new_ph flag.
    A minor comment is that seems we can get rid of phid inside
    PlaceHolderInfo, since we do not do linear list searches any more. It's
    some duplicate to the phid inside PlaceHolderVar. Currently there are
    two places referencing PlaceHolderInfo->phid, remove_rel_from_query and
    find_placeholder_info. We can use PlaceHolderVar->phid instead in both
    the two places.
    
    Thanks
    Richard
    
  21. Re: Making Vars outer-join aware

    Tom Lane <tgl@sss.pgh.pa.us> — 2022-08-17T20:17:36Z

    Richard Guo <guofenglinux@gmail.com> writes:
    > BTW, the comment just above the two calls to build_joinrel_tlist says:
    >  * Create a new tlist containing just the vars that need to be output from
    > Here by 'vars' it means both plain Vars and PlaceHolderVars, right? If
    > not we may need to adjust this comment to also include PlaceHolderVars.
    
    I think it did intend just Vars because that's all that
    build_joinrel_tlist did; but we really should have updated it when we
    invented PlaceHolderVars, and even more so now that build_joinrel_tlist
    adds PHVs too.  I changed the wording.
    
    > A minor comment is that seems we can get rid of phid inside
    > PlaceHolderInfo, since we do not do linear list searches any more. It's
    > some duplicate to the phid inside PlaceHolderVar. Currently there are
    > two places referencing PlaceHolderInfo->phid, remove_rel_from_query and
    > find_placeholder_info. We can use PlaceHolderVar->phid instead in both
    > the two places.
    
    Meh, I'm not excited about that.  I don't think that the phid field
    is only there to make the search loops faster; it's the basic
    identity of the PlaceHolderInfo.
    
    			regards, tom lane
    
    
    
    
  22. Re: Making Vars outer-join aware

    Tom Lane <tgl@sss.pgh.pa.us> — 2022-08-18T18:45:46Z

    Here's a rebase up to HEAD, mainly to get the cfbot back in sync
    as to what's the live patch.
    
    I went ahead and pushed improve-adjust_appendrel_attrs_multilevel.patch,
    as that seemed uncontroversial and independently useful.  So that's
    gone from this patchset.  I also cleaned up the mess with phnullingrels
    in PHVs created for join tlists, as we discussed.  No other interesting
    changes.
    
    			regards, tom lane
    
    
  23. Re: Making Vars outer-join aware

    Tom Lane <tgl@sss.pgh.pa.us> — 2022-08-20T22:52:11Z

    Progress report on this ...
    
    I've been trying to remove some of the cruftier aspects of
    EquivalenceClasses (which really is the main point of this whole
    effort), and gotten repeatedly blocked by the fact that the semantics
    are still a bit crufty thanks to the hacking associated with outer
    join identity 3.  I think I see a path forward though.
    
    To recap, the thing about identity 3 is that it says you can get
    equivalent results from
    
    	(A leftjoin B on (Pab)) leftjoin C on (Pb*c)
    
    	A leftjoin (B leftjoin C on (Pbc)) on (Pab)
    
    if Pbc is strict for B.  Unlike what it says in optimizer/README,
    I've written the first form as "Pb*c" to indicate that any B Vars
    appearing in that clause will be marked as possibly nulled by
    the A/B join.  This makes the problem apparent: we cannot use
    the same representation of Pbc for both join orders, because
    in the second variant B's Vars are not nulled by anything.
    We've been trying to get away with writing Pbc just one way,
    and that leads directly to the semantic squishiness I've been
    fighting.
    
    What I'm thinking we should do about this, once we detect that
    this identity is applicable, is to generate *both* forms of Pbc,
    either adding or removing the varnullingrels bits depending on
    which form we got from the parser.  Then, when we come to forming
    join paths, use the appropriate variant depending on which join
    order we're considering.  build_join_rel() already has the concept
    that the join restriction list varies depending on which input
    relations we're trying to join, so this doesn't require any
    fundamental restructuring -- only a way to identify which
    RestrictInfos to use or ignore for a particular join.  That will
    probably require some new field in RestrictInfo, but I'm not
    fussed about that because there are other fields we'll be able
    to remove as this work progresses.
    
    Similarly, generate_join_implied_equalities() will need to generate
    EquivalenceClass-derived join clauses with or without varnullingrels
    marks, as appropriate.  I'm not quite sure how to do that, but it
    feels like just a small matter of programming, not a fundamental
    problem with the model which is where things are right now.
    We'll only need this for ECs that include source clauses coming
    from a movable outer join clause (i.e., Pbc in identity 3).
    
    An interesting point is that I think we want to force movable
    outer joins into the second format for the purpose of generating
    ECs, that is we want to use Pbc not Pb*c as the EC source form.
    The reason for that is to allow generation of relation-scan-level
    clauses from an EC, particularly an EC that includes a constant.
    As an example, given
    
    	A leftjoin (B leftjoin C on (B.b = C.c)) on (A.a = B.b)
    	where A.a = constant
    
    we can decide unconditionally that A.a, B.b, C.c, and the constant all
    belong to the same equivalence class, and thereby generate relation
    scan restrictions A.a = constant, B.b = constant, and C.c = constant.
    If we start with the other join order, which will include "B.b* = C.c"
    (ie Pb*c) then we'd have two separate ECs: {A.a, B.b, constant} and
    {B.b*, C.c}.  So we'll fail to produce any scan restriction for C, or
    at least we can't do so in any principled way.
    
    Furthermore, if the joins are done in the second order then we don't
    need any additional join clauses -- both joins can act like "LEFT JOIN
    ON TRUE".  (Right now, we'll emit redundant B.b = C.c and A.a = B.b
    join clauses in addition to the scan-level clauses, which is
    inefficient.)  However, if we make use of identity 3 to do the
    joins in the other order, then we do need an extra join clause, like
    
    	(A leftjoin B on (true)) leftjoin C on (B.b* = C.c)
    
    (or maybe we could just emit "B.b* IS NOT NULL" for Pb*c?)
    Without any Pb*c join constraint we get wrong answers because
    nulling of B fails to propagate to C.
    
    So while there are lots of details to work out, it feels like
    this line of thought can lead to something where setrefs.c
    doesn't have to ignore varnullingrels mismatches (yay) and
    there is no squishiness in EquivalenceClass semantics.
    
    			regards, tom lane
    
    
    
    
  24. Re: Making Vars outer-join aware

    Richard Guo <guofenglinux@gmail.com> — 2022-08-24T10:26:46Z

    On Sun, Aug 21, 2022 at 6:52 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    
    > What I'm thinking we should do about this, once we detect that
    > this identity is applicable, is to generate *both* forms of Pbc,
    > either adding or removing the varnullingrels bits depending on
    > which form we got from the parser.  Then, when we come to forming
    > join paths, use the appropriate variant depending on which join
    > order we're considering.  build_join_rel() already has the concept
    > that the join restriction list varies depending on which input
    > relations we're trying to join, so this doesn't require any
    > fundamental restructuring -- only a way to identify which
    > RestrictInfos to use or ignore for a particular join.  That will
    > probably require some new field in RestrictInfo, but I'm not
    > fussed about that because there are other fields we'll be able
    > to remove as this work progresses.
    
    
    Do you mean we generate two RestrictInfos for Pbc in the case of
    identity 3, one with varnullingrels and one without varnullingrels, and
    choose the appropriate one when forming join paths? Do we need to also
    generate two SpecialJoinInfos for the B/C join in the first order, with
    and without the A/B join in its min_lefthand?
    
    Thanks
    Richard
    
  25. Re: Making Vars outer-join aware

    Tom Lane <tgl@sss.pgh.pa.us> — 2022-08-24T21:18:53Z

    Richard Guo <guofenglinux@gmail.com> writes:
    > On Sun, Aug 21, 2022 at 6:52 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    >> What I'm thinking we should do about this, once we detect that
    >> this identity is applicable, is to generate *both* forms of Pbc,
    >> either adding or removing the varnullingrels bits depending on
    >> which form we got from the parser.
    
    > Do you mean we generate two RestrictInfos for Pbc in the case of
    > identity 3, one with varnullingrels and one without varnullingrels, and
    > choose the appropriate one when forming join paths?
    
    Right.
    
    > Do we need to also
    > generate two SpecialJoinInfos for the B/C join in the first order, with
    > and without the A/B join in its min_lefthand?
    
    No, the SpecialJoinInfos would stay as they are now.  It's already the
    case that the first join's min_righthand would contain only B, and
    the second one's min_righthand would contain only C.
    
    			regards, tom lane
    
    
    
    
  26. Re: Making Vars outer-join aware

    Richard Guo <guofenglinux@gmail.com> — 2022-08-25T10:27:38Z

    On Thu, Aug 25, 2022 at 5:18 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    
    > Richard Guo <guofenglinux@gmail.com> writes:
    > > Do we need to also
    > > generate two SpecialJoinInfos for the B/C join in the first order, with
    > > and without the A/B join in its min_lefthand?
    >
    > No, the SpecialJoinInfos would stay as they are now.  It's already the
    > case that the first join's min_righthand would contain only B, and
    > the second one's min_righthand would contain only C.
    
    
    I'm not sure if I understand it correctly. If we are given the first
    order from the parser, the SpecialJoinInfo for the B/C join would have
    min_lefthand as containing both B and the A/B join. And this
    SpecialJoinInfo would make the B/C join be invalid, which is not what we
    want. Currently the patch resolves this by explicitly running
    remove_unneeded_nulling_relids, and the A/B join would be removed from
    B/C join's min_lefthand, if Pbc is strict for B.
    
    Do we still need this kind of fixup if we are to keep just one form of
    SpecialJoinInfo and two forms of RestrictInfos?
    
    Thanks
    Richard
    
  27. Re: Making Vars outer-join aware

    Richard Guo <guofenglinux@gmail.com> — 2022-08-29T06:30:23Z

    On Fri, Aug 19, 2022 at 2:45 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    
    > Here's a rebase up to HEAD, mainly to get the cfbot back in sync
    > as to what's the live patch.
    
    
    Noticed another different behavior from previous. When we try to reduce
    JOIN_LEFT to JOIN_ANTI, we want to know if the join's own quals are
    strict for any Var that was forced null by higher qual levels. We do
    that by checking whether local_nonnullable_vars and forced_null_vars
    overlap. However, the same Var from local_nonnullable_vars and
    forced_null_vars may be labeled with different varnullingrels. If that
    is the case, currently we would fail to tell they actually overlap. As
    an example, consider 'b.i' in the query below
    
    # explain (costs off) select * from a left join b on a.i = b.i where b.i is
    null;
            QUERY PLAN
    ---------------------------
     Hash Left Join
       Hash Cond: (a.i = b.i)
       Filter: (b.i IS NULL)
       ->  Seq Scan on a
       ->  Hash
             ->  Seq Scan on b
    (6 rows)
    
    Thanks
    Richard
    
  28. Re: Making Vars outer-join aware

    Richard Guo <guofenglinux@gmail.com> — 2022-08-30T03:21:42Z

    On Mon, Aug 29, 2022 at 2:30 PM Richard Guo <guofenglinux@gmail.com> wrote:
    
    >
    > On Fri, Aug 19, 2022 at 2:45 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    >
    >> Here's a rebase up to HEAD, mainly to get the cfbot back in sync
    >> as to what's the live patch.
    >
    >
    > Noticed another different behavior from previous. When we try to reduce
    > JOIN_LEFT to JOIN_ANTI, we want to know if the join's own quals are
    > strict for any Var that was forced null by higher qual levels. We do
    > that by checking whether local_nonnullable_vars and forced_null_vars
    > overlap. However, the same Var from local_nonnullable_vars and
    > forced_null_vars may be labeled with different varnullingrels. If that
    > is the case, currently we would fail to tell they actually overlap.
    >
    
    I wonder why this is not noticed by regression tests. So I did some
    search and it seems we do not have any test cases covering the
    transformation we apply to reduce outer joins. I think maybe we should
    add such cases in regression tests.
    
    Thanks
    Richard
    
  29. Re: Making Vars outer-join aware

    Tom Lane <tgl@sss.pgh.pa.us> — 2022-11-01T00:19:02Z

    Richard Guo <guofenglinux@gmail.com> writes:
    > I wonder why this is not noticed by regression tests. So I did some
    > search and it seems we do not have any test cases covering the
    > transformation we apply to reduce outer joins. I think maybe we should
    > add such cases in regression tests.
    
    Right, done at 0043aa6b8.  The actual fix is in 0010 below (it would
    have been earlier, except I'd forgotten about this issue).
    
    I've been working away at this patch series, and here is an up-to-date
    version.  I've mostly fixed the inability to check in setrefs.c that
    varnullingrels match up at different join levels, and I've found a
    solution that I feel reasonably happy about for variant join quals
    depending on application of outer-join identity 3.  There's certainly
    bits of this that could be done in other ways, but overall I'm pleased
    with the state of these patches.
    
    I think that the next step is to change things so that the "push
    a constant through outer-join quals" hacks are replaced by
    EquivalenceClass-based logic.  That turns out to be harder than
    I'd supposed initially, because labeling Vars with nullingrels
    fixes only part of the problem there.  The other part is that
    deductions we make from an outer-join qual can only be applied
    below the nullable side of that join --- we can't use them to
    remove rows from the non-nullable side.  I have an idea how to
    make that work, but it's not passing regression tests yet :-(.
    
    Anyway, there's much more to do, but here's what I've got today.
    
    			regards, tom lane
    
    
  30. Re: Making Vars outer-join aware

    Tom Lane <tgl@sss.pgh.pa.us> — 2022-11-05T21:53:31Z

    I wrote:
    > I've been working away at this patch series, and here is an up-to-date
    > version.
    
    This needs a rebase after ff8fa0bf7 and b0b72c64a.  I also re-ordered
    the patches so that the commit messages' claims about when regression
    tests start to pass are true again.  No interesting changes, though.
    
    			regards, tom lane
    
    
  31. Re: Making Vars outer-join aware

    Richard Guo <guofenglinux@gmail.com> — 2022-11-10T10:13:54Z

    On Thu, Aug 25, 2022 at 6:27 PM Richard Guo <guofenglinux@gmail.com> wrote:
    
    > I'm not sure if I understand it correctly. If we are given the first
    > order from the parser, the SpecialJoinInfo for the B/C join would have
    > min_lefthand as containing both B and the A/B join. And this
    > SpecialJoinInfo would make the B/C join be invalid, which is not what we
    > want.
    >
    
    Now I see how this works from v6 patch.  Once we notice identity 3
    applies, we will remove the lower OJ's ojrelid from the min_lefthand or
    min_righthand so that the commutation is allowed.  So in this case, the
    A/B join would be removed from B/C join's min_lefthand when we build the
    SpecialJoinInfo for B/C join, and that makes the B/C join to be legal.
    
    BTW, inner_join_rels can contain base Relids and OJ Relids.  Maybe we
    can revise the comments a bit for it atop deconstruct_recurse and
    make_outerjoininfo.  The same for the comments of qualscope, ojscope and
    outerjoin_nonnullable atop distribute_qual_to_rels.
    
    The README mentions restriction_is_computable_at(), I think it should be
    clause_is_computable_at().
    
    Thanks
    Richard
    
  32. Re: Making Vars outer-join aware

    Richard Guo <guofenglinux@gmail.com> — 2022-11-15T08:59:27Z

    On Sun, Nov 6, 2022 at 5:53 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    
    > I wrote:
    > > I've been working away at this patch series, and here is an up-to-date
    > > version.
    >
    > This needs a rebase after ff8fa0bf7 and b0b72c64a.  I also re-ordered
    > the patches so that the commit messages' claims about when regression
    > tests start to pass are true again.  No interesting changes, though.
    
    
    I'm reviewing the part about multiple version clauses, and I find a case
    that may not work as expected.  I tried with some query as below
    
     (A leftjoin (B leftjoin C on (Pbc)) on (Pab)) left join D on (Pcd)
    
    Assume Pbc is strict for B and Pcd is strict for C.
    
    According to identity 3, we know one of its equivalent form is
    
     ((A leftjoin B on (Pab)) leftjoin C on (Pbc)) left join D on (Pcd)
    
    For outer join clause Pcd, we would generate two versions from the first
    form
    
        Version 1: C Vars with nullingrels as {A/B}
        Version 2: C Vars with nullingrels as {B/C, A/B}
    
    I understand version 2 is reasonable as the nullingrels from parser
    would be set as that.  But it seems version 1 is not applicable in
    either form.
    
    Looking at the two forms again, it seems the expected two versions for
    Pcd should be
    
        Version 1: C Vars with nullingrels as {B/C}
        Version 2: C Vars with nullingrels as {B/C, A/B}
    
    With this we may have another problem that the two versions are both
    applicable at the C/D join according to clause_is_computable_at(), in
    both forms.
    
    Another thing is I believe we have another equivalent form as
    
     (A left join B on (Pab)) left join (C left join D on (Pcd)) on (Pbc)
    
    Currently this form cannot be generated because of the issue discussed
    in [1].  But someday when we can do that, I think we should have a third
    version for Pcd
    
        Version 3: C Vars with empty nullingrels
    
    [1]
    https://www.postgresql.org/message-id/flat/CAMbWs4_8n5ANh_aX2PinRZ9V9mtBguhnRd4DOVt9msPgHmEMOQ%40mail.gmail.com
    
    Thanks
    Richard
    
  33. Re: Making Vars outer-join aware

    Tom Lane <tgl@sss.pgh.pa.us> — 2022-11-16T20:46:44Z

    Richard Guo <guofenglinux@gmail.com> writes:
    > I'm reviewing the part about multiple version clauses, and I find a case
    > that may not work as expected.  I tried with some query as below
    >  (A leftjoin (B leftjoin C on (Pbc)) on (Pab)) left join D on (Pcd)
    > Assume Pbc is strict for B and Pcd is strict for C.
    > According to identity 3, we know one of its equivalent form is
    >  ((A leftjoin B on (Pab)) leftjoin C on (Pbc)) left join D on (Pcd)
    > For outer join clause Pcd, we would generate two versions from the first
    > form
    >     Version 1: C Vars with nullingrels as {A/B}
    >     Version 2: C Vars with nullingrels as {B/C, A/B}
    > I understand version 2 is reasonable as the nullingrels from parser
    > would be set as that.  But it seems version 1 is not applicable in
    > either form.
    
    Hmm.  Looking at the data structures generated for the first form,
    we have
    
    B/C join:
    
          {SPECIALJOININFO 
          :min_lefthand (b 2)
          :min_righthand (b 3)
          :syn_lefthand (b 2)
          :syn_righthand (b 3)
          :jointype 1 
          :ojrelid 4 
          :commute_above_l (b 7)
          :commute_above_r (b 5)
          :commute_below (b)
    
    A/B join:
    
          {SPECIALJOININFO 
          :min_lefthand (b 1)
          :min_righthand (b 2)
          :syn_lefthand (b 1)
          :syn_righthand (b 2 3 4)
          :jointype 1 
          :ojrelid 5 
          :commute_above_l (b)
          :commute_above_r (b)
          :commute_below (b 4)
    
    everything-to-D join:
    
          {SPECIALJOININFO 
          :min_lefthand (b 1 2 3 4 5)
          :min_righthand (b 6)
          :syn_lefthand (b 1 2 3 4 5)
          :syn_righthand (b 6)
          :jointype 1 
          :ojrelid 7 
          :commute_above_l (b)
          :commute_above_r (b)
          :commute_below (b 4)
    
    So we've marked the 4 and 7 joins as possibly commuting, but they
    cannot commute according to 7's min_lefthand set.  I don't think
    the extra clone condition is terribly harmful --- it's useless
    but shouldn't cause any problems.  However, if these joins should be
    able to commute then the min_lefthand marking is preventing us
    from considering legal join orders (and has been doing so all along,
    that's not new in this patch).  It looks to me like they should be
    able to commute (giving your third form), so this is a pre-existing
    planning deficiency.
    
    Without having looked too closely, I suspect this is coming from
    the delay_upper_joins/check_outerjoin_delay stuff in initsplan.c.
    That's a chunk of logic that I'd like to nuke altogether, and maybe
    we will be able to do so once this patchset is a bit further along.
    But I've not had time to look at it yet.
    
    I'm not entirely clear on whether the strange selection of clone
    clauses for this example is a bug in process_postponed_left_join_quals
    or if that function is just getting misled by the bogus min_lefthand
    value.
    
    > Looking at the two forms again, it seems the expected two versions for
    > Pcd should be
    >     Version 1: C Vars with nullingrels as {B/C}
    >     Version 2: C Vars with nullingrels as {B/C, A/B}
    > With this we may have another problem that the two versions are both
    > applicable at the C/D join according to clause_is_computable_at(), in
    > both forms.
    
    At least when I tried it just now, clause_is_computable_at correctly
    rejected the first version, because we've already computed A/B when
    we are trying to form the C/D join so we expect it to be listed in
    varnullingrels.
    
    			regards, tom lane
    
    
    
    
  34. Re: Making Vars outer-join aware

    Tom Lane <tgl@sss.pgh.pa.us> — 2022-11-16T22:02:41Z

    Richard Guo <guofenglinux@gmail.com> writes:
    > BTW, inner_join_rels can contain base Relids and OJ Relids.  Maybe we
    > can revise the comments a bit for it atop deconstruct_recurse and
    > make_outerjoininfo.  The same for the comments of qualscope, ojscope and
    > outerjoin_nonnullable atop distribute_qual_to_rels.
    
    Yeah.  I had an XXX comment about whether or not it was okay to
    include OJs in inner_join_rels.  I took a second look and decided it's
    fine, so I removed the XXX and updated these comments.
    
    > The README mentions restriction_is_computable_at(), I think it should be
    > clause_is_computable_at().
    
    Right.  I think when I wrote that I was imagining that there'd be a
    wrapper function specifically concerned with RestrictInfos, but in the
    event it didn't seem useful.  There's only one place that uses this,
    namely subbuild_joinrel_restrictlist.
    
    The cfbot is about to start complaining that this patchset doesn't apply
    over e9e26b5e7, so here's a rebase.
    
    			regards, tom lane
    
    
  35. Re: Making Vars outer-join aware

    Richard Guo <guofenglinux@gmail.com> — 2022-11-17T08:56:34Z

    On Thu, Nov 17, 2022 at 4:46 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    
    > So we've marked the 4 and 7 joins as possibly commuting, but they
    > cannot commute according to 7's min_lefthand set.  I don't think
    > the extra clone condition is terribly harmful --- it's useless
    > but shouldn't cause any problems.  However, if these joins should be
    > able to commute then the min_lefthand marking is preventing us
    > from considering legal join orders (and has been doing so all along,
    > that's not new in this patch).  It looks to me like they should be
    > able to commute (giving your third form), so this is a pre-existing
    > planning deficiency.
    
    
    Yeah.  This is an issue that can also be seen on HEAD and is discussed
    in [1].  It happens because when building SpecialJoinInfo for 7, we find
    A/B join 5 is in our LHS, and our join condition (Pcd) uses 5's
    syn_righthand while is not strict for 5's min_righthand.  So we decide
    to preserve the ordering of 7 and 5, by adding 5's full syntactic relset
    to 7's min_lefthand.  As discussed in [1], maybe we should consider 5's
    min_righthand rather than syn_righthand when checking if Pcd uses 5's
    RHS.
    
    
    > > Looking at the two forms again, it seems the expected two versions for
    > > Pcd should be
    > >     Version 1: C Vars with nullingrels as {B/C}
    > >     Version 2: C Vars with nullingrels as {B/C, A/B}
    > > With this we may have another problem that the two versions are both
    > > applicable at the C/D join according to clause_is_computable_at(), in
    > > both forms.
    >
    > At least when I tried it just now, clause_is_computable_at correctly
    > rejected the first version, because we've already computed A/B when
    > we are trying to form the C/D join so we expect it to be listed in
    > varnullingrels.
    
    
    For the first version of Pcd, which is C Vars with nullingrels as {B/C}
    here, although at the C/D join level A/B join has been computed and
    meanwhile is not listed in varnullingrels, but since Pcd does not
    mention any nullable Vars in A/B's min_righthand, it seems to me this
    version cannot be rejected by clause_is_computable_at().  But maybe I'm
    missing something.
    
    [1]
    https://www.postgresql.org/message-id/flat/CAMbWs4_8n5ANh_aX2PinRZ9V9mtBguhnRd4DOVt9msPgHmEMOQ%40mail.gmail.com
    
    Thanks
    Richard
    
  36. Re: Making Vars outer-join aware

    Tom Lane <tgl@sss.pgh.pa.us> — 2022-12-23T18:20:40Z

    Here's a new edition of this patch series.
    
    I shoved some preliminary refactoring into the 0001 patch,
    notably splitting deconstruct_jointree into two passes.
    0002-0009 cover the same ground as they did before, though
    with some differences in detail.  0010-0012 are new work
    mostly aimed at removing kluges we no longer need.
    
    There are two big areas that I would still like to improve, but
    I think we've run out of time to address them in the v16 cycle:
    
    * It'd be nice to apply the regular EquivalenceClass deduction
    mechanisms to outer-join equalities, instead of the
    reconsider_outer_join_clauses kluge.  I've made several stabs at that
    without much success.  I think that the "join domain" framework added
    in 0012 is likely to provide a workable foundation, but some more
    effort is needed.
    
    * I really want to get rid of RestrictInfo.is_pushed_down and
    RINFO_IS_PUSHED_DOWN(), because those seem exceedingly awkward
    and squishy.  I've not gotten far with finding a better
    replacement there, either.
    
    Despite the work being unfinished, I feel that this has moved us a
    long way towards outer-join handling being less of a jury-rigged
    affair.
    
    			regards, tom lane
    
    
  37. Re: Making Vars outer-join aware

    Ted Yu <yuzhihong@gmail.com> — 2022-12-23T20:59:07Z

    On Fri, Dec 23, 2022 at 10:21 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    
    > Here's a new edition of this patch series.
    >
    > I shoved some preliminary refactoring into the 0001 patch,
    > notably splitting deconstruct_jointree into two passes.
    > 0002-0009 cover the same ground as they did before, though
    > with some differences in detail.  0010-0012 are new work
    > mostly aimed at removing kluges we no longer need.
    >
    > There are two big areas that I would still like to improve, but
    > I think we've run out of time to address them in the v16 cycle:
    >
    > * It'd be nice to apply the regular EquivalenceClass deduction
    > mechanisms to outer-join equalities, instead of the
    > reconsider_outer_join_clauses kluge.  I've made several stabs at that
    > without much success.  I think that the "join domain" framework added
    > in 0012 is likely to provide a workable foundation, but some more
    > effort is needed.
    >
    > * I really want to get rid of RestrictInfo.is_pushed_down and
    > RINFO_IS_PUSHED_DOWN(), because those seem exceedingly awkward
    > and squishy.  I've not gotten far with finding a better
    > replacement there, either.
    >
    > Despite the work being unfinished, I feel that this has moved us a
    > long way towards outer-join handling being less of a jury-rigged
    > affair.
    >
    >                         regards, tom lane
    >
    > Hi,
    For v8-0012-invent-join-domains.patch, in `distribute_qual_to_rels`, it
    seems that `pseudoconstant` and `root->hasPseudoConstantQuals` carry the
    same value.
    Can the variable `pseudoconstant` be omitted ?
    
    Cheers
    
  38. Re: Making Vars outer-join aware

    Tom Lane <tgl@sss.pgh.pa.us> — 2022-12-23T21:09:16Z

    Ted Yu <yuzhihong@gmail.com> writes:
    > For v8-0012-invent-join-domains.patch, in `distribute_qual_to_rels`, it
    > seems that `pseudoconstant` and `root->hasPseudoConstantQuals` carry the
    > same value.
    > Can the variable `pseudoconstant` be omitted ?
    
    Surely not.  'pseudoconstant' tells whether the current qual clause
    is pseudoconstant.  root->hasPseudoConstantQuals remembers whether
    we have found any pseudoconstant qual in the query.
    
    			regards, tom lane
    
    
    
    
  39. Re: Making Vars outer-join aware

    Richard Guo <guofenglinux@gmail.com> — 2022-12-27T08:27:49Z

    On Sat, Dec 24, 2022 at 2:20 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    
    > I shoved some preliminary refactoring into the 0001 patch,
    > notably splitting deconstruct_jointree into two passes.
    > 0002-0009 cover the same ground as they did before, though
    > with some differences in detail.  0010-0012 are new work
    > mostly aimed at removing kluges we no longer need.
    
    
    I'm looking at 0010-0012 and I really like the changes and removals
    there.  Thanks for the great work!
    
    For 0010, the change seems quite independent.  I think maybe we can
    apply it to HEAD directly.
    
    For 0011, I found that some clauses that were outerjoin_delayed and thus
    not equivalent before might be treated as being equivalent now.  For
    example
    
    explain (costs off)
    select * from a left join b on a.i = b.i where coalesce(b.j, 0) = 0 and
    coalesce(b.j, 0) = a.j;
                QUERY PLAN
    ----------------------------------
     Hash Right Join
       Hash Cond: (b.i = a.i)
       Filter: (COALESCE(b.j, 0) = 0)
       ->  Seq Scan on b
       ->  Hash
             ->  Seq Scan on a
                   Filter: (j = 0)
    (7 rows)
    
    This is different behavior from HEAD.  But I think it's an improvement.
    
    For 0012, I'm still trying to understand JoinDomain.  AFAIU all EC
    members of the same EC should have the same JoinDomain, because for
    constants we match EC members only within the same JoinDomain, and for
    Vars if they come from different join domains they will have different
    nullingrels and thus will not match.  So I wonder if we can have the
    JoinDomain kept in EquivalenceClass rather than in each
    EquivalenceMembers.
    
    Thanks
    Richard
    
  40. Re: Making Vars outer-join aware

    Tom Lane <tgl@sss.pgh.pa.us> — 2022-12-27T15:31:20Z

    Richard Guo <guofenglinux@gmail.com> writes:
    > For 0012, I'm still trying to understand JoinDomain.  AFAIU all EC
    > members of the same EC should have the same JoinDomain, because for
    > constants we match EC members only within the same JoinDomain, and for
    > Vars if they come from different join domains they will have different
    > nullingrels and thus will not match.  So I wonder if we can have the
    > JoinDomain kept in EquivalenceClass rather than in each
    > EquivalenceMembers.
    
    Yeah, I tried to do it like that at first, and failed.  There is
    some sort of association between ECs and join domains, for sure,
    but exactly what it is seems to need more elucidation.
    
    The thing that I couldn't get around before is that if you have,
    say, a mergejoinable equality clause in an outer join:
    
        select ... from a left join b on a.x = b.y;
    
    that equality clause can only be associated with the join domain
    for B, because it certainly can't be enforced against A.  However,
    you'd still wish to be able to do a mergejoin using indexes on
    a.x and b.y, and this means that we have to understand the ordering
    induced by a PathKey based on this EC as applicable to A, even
    though that relation is not in the same join domain.  So there are
    situations where sort orderings apply across domain boundaries even
    though equalities don't.  We might have to split the notion of
    EquivalenceClass into two sorts of objects, and somewhere right
    about here is where I realized that this wasn't getting finished
    for v16 :-(.
    
    So the next pass at this is likely going to involve some more
    refactoring, and maybe we'll end up saying that an EquivalenceClass
    is tightly bound to a join domain or maybe we won't.  For the moment
    it seemed to work better to associate domains with only the const
    members of ECs.  (As written, the patch does fill em_jdomain even
    for non-const members, but that was just for simplicity.  I'd
    originally meant to make it NULL for non-const members, but that
    turned out to be a bit too tedious because the responsibility for
    marking a member as const or not is split among several places.)
    
    Another part of the motivation for doing it like that is that
    I've been thinking about having just a single common pool of
    EquivalenceMember objects, and turning EquivalenceClasses into
    bitmapsets of indexes into the shared EquivalenceMember list.
    This would support having ECs that share some member(s) without
    being exactly the same thing, which I think might be necessary
    to get to the point of treating outer-join clauses as creating
    EC equalities.
    
    BTW, I can't escape the suspicion that I've reinvented an idea
    that's already well known in the literature.  Has anyone seen
    something like this "join domain" concept before, and if so
    what was it called?
    
    			regards, tom lane
    
    
    
    
  41. Re: Making Vars outer-join aware

    Richard Guo <guofenglinux@gmail.com> — 2022-12-28T08:49:23Z

    On Tue, Dec 27, 2022 at 11:31 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    
    > The thing that I couldn't get around before is that if you have,
    > say, a mergejoinable equality clause in an outer join:
    >
    >     select ... from a left join b on a.x = b.y;
    >
    > that equality clause can only be associated with the join domain
    > for B, because it certainly can't be enforced against A.  However,
    > you'd still wish to be able to do a mergejoin using indexes on
    > a.x and b.y, and this means that we have to understand the ordering
    > induced by a PathKey based on this EC as applicable to A, even
    > though that relation is not in the same join domain.  So there are
    > situations where sort orderings apply across domain boundaries even
    > though equalities don't.  We might have to split the notion of
    > EquivalenceClass into two sorts of objects, and somewhere right
    > about here is where I realized that this wasn't getting finished
    > for v16 :-(.
    
    
    I think I see where the problem is.  And I can see currently in
    get_eclass_for_sort_expr we always use the top JoinDomain.  So although
    the equality clause 'a.x = b.y' belongs to JoinDomain {B}, we set up ECs
    for 'a.x' and 'b.y' that belong to the top JoinDomain {A, B, A/B}.
    
    But doing so would lead to a situation where the "same" Vars from
    different join domains might have the same varnullingrels and thus would
    match by equal().  As an example, consider
    
        select ... from a left join b on a.x = b.y where a.x = 1;
    
    As said we would set up EC for 'b.y' as belonging to the top JoinDomain.
    Then when reconsider_outer_join_clause generates the equality clause
    'b.y = 1', we figure out that the new clause belongs to JoinDomain {B}.
    Note that the two 'b.y' here belong to different join domains but they
    have the same varnullingrels (empty varnullingrels actually).  As a
    result, the equality 'b.y = 1' would be merged into the existing EC for
    'b.y', because the two 'b.y' matches by equal() and we do not check
    JoinDomain for non-const EC members.  So we would end up with an EC
    containing EC members of different join domains.
    
    And it seems this would make the following statement in README not hold
    any more.
    
        We don't have to worry about this for Vars (or expressions
        containing Vars), because references to the "same" column from
        different join domains will have different varnullingrels and thus
        won't be equal() anyway.
    
    Thanks
    Richard
    
  42. Re: Making Vars outer-join aware

    Tom Lane <tgl@sss.pgh.pa.us> — 2022-12-28T15:36:46Z

    Richard Guo <guofenglinux@gmail.com> writes:
    > I think I see where the problem is.  And I can see currently in
    > get_eclass_for_sort_expr we always use the top JoinDomain.  So although
    > the equality clause 'a.x = b.y' belongs to JoinDomain {B}, we set up ECs
    > for 'a.x' and 'b.y' that belong to the top JoinDomain {A, B, A/B}.
    
    Yeah, that's a pretty squishy point, and likely wrong in detail.
    If we want to associate an EC with the sort order of an index on
    b.y, we almost certainly want that EC to belong to join domain {B}.
    I had tried to do that in an earlier iteration of 0012, by dint of
    adding a JoinDomain argument to get_eclass_for_sort_expr and then
    having build_index_pathkeys specify the lowest join domain containing
    the index's relation.  It did not work very well: it couldn't generate
    mergejoins on full-join clauses, IIRC.
    
    Maybe some variant on that plan can be made to fly, but I'm not at
    all clear on what needs to be adjusted.  Anyway, that's part of why
    I backed off on the notion of explicitly associating ECs with join
    domains.  As long as we only pay attention to the join domain in
    connection with constants, get_eclass_for_sort_expr can get away with
    just using the top join domain, because we'd never apply it to a
    constant unless perhaps somebody manages to ORDER BY or GROUP BY a
    constant, and in those cases the top domain really is the right one.
    (It's syntactically difficult to write such a thing anyway, but not
    impossible.)
    
    I think this is sort of an intermediate state, and hopefully a
    year from now we'll have a better idea of how to manage all that.
    What I mainly settled for doing in 0012 was getting rid of the
    "below outer join" concept for ECs, because having to identify
    a value for that had been giving me fits in several previous
    attempts at extending ECs to cover outer-join equalities.
    I think that the JoinDomain concept will offer a better-defined
    replacement.
    
    			regards, tom lane
    
    
    
    
  43. Re: Making Vars outer-join aware

    Tom Lane <tgl@sss.pgh.pa.us> — 2023-01-23T20:38:06Z

    The cfbot shows that this needs to be rebased over 8eba3e3f0.
    (Just code motion, no interesting changes.)
    
    Richard, are you planning to review this any more?  I'm getting
    a little antsy to get it committed.  For such a large patch,
    it's surprising it's had so few conflicts to date.
    
    			regards, tom lane
    
    
  44. Re: Making Vars outer-join aware

    Hans Buschmann <buschmann@nidsa.net> — 2023-01-24T10:11:00Z

    Hello Tom
    
    
    I just noticed your new efforts in this area.
    
    
    I wanted to recurr to my old thread [1] considering constant propagation of quals.
    
    
    You gave an elaborated explanation at that time, but my knowledge was/is not yet sufficient to reveil the technical details.
    
    
    In our application the described method is widespread used with much success (now at pg15.1 Fedora), but for unexperienced SQL authors this is not really obviously to choose (i.e. using the explicit constant xx_season=3 as qual). This always requires a "Macro" processor to compose the queries (in my case php) and a lot of programmer effort in the source code.
    
    
    I can't review/understand your patchset for the planner, but since it covers the same area, the beformentioned optimization could perhaps be addressed too.
    
    
    With respect of the nullability of these quals I immediately changed all of them to NOT NULL, which seems the most natural way when these quals are also used for partioning.
    
    
    [1]  https://www.postgresql.org/message-id/1571413123735.26467@nidsa.net
    
    
    Thanks for looking
    
    
    Hans Buschmann
    
    
  45. Re: Making Vars outer-join aware

    Tom Lane <tgl@sss.pgh.pa.us> — 2023-01-24T15:30:42Z

    Hans Buschmann <buschmann@nidsa.net> writes:
    > I just noticed your new efforts in this area.
    > I wanted to recurr to my old thread [1] considering constant propagation of quals.
    > [1]  https://www.postgresql.org/message-id/1571413123735.26467@nidsa.net
    
    Yeah, this patch series is not yet quite up to the point of improving
    that.  That area is indeed the very next thing I want to work on, and
    I did spend some effort on it last month, but I ran out of time to get
    it working.  Maybe we'll have something there for v17.
    
    			regards, tom lane
    
    
    
    
  46. Re: Making Vars outer-join aware

    Tom Lane <tgl@sss.pgh.pa.us> — 2023-01-24T19:31:27Z

    I wrote:
    > Hans Buschmann <buschmann@nidsa.net> writes:
    >> I just noticed your new efforts in this area.
    >> I wanted to recurr to my old thread [1] considering constant propagation of quals.
    >> [1]  https://www.postgresql.org/message-id/1571413123735.26467@nidsa.net
    
    > Yeah, this patch series is not yet quite up to the point of improving
    > that.  That area is indeed the very next thing I want to work on, and
    > I did spend some effort on it last month, but I ran out of time to get
    > it working.  Maybe we'll have something there for v17.
    
    BTW, to clarify what's going on there: what I want to do is allow
    the regular equivalence-class machinery to handle deductions from
    equality operators appearing in LEFT JOIN ON clauses (maybe full
    joins too, but I'd be satisfied if it works for one-sided outer
    joins).  I'd originally hoped that distinguishing pre-nulled from
    post-nulled variables would be enough to make that safe, but it's
    not.  Here's an example:
    
    	select ... from t1 left join t2 on (t1.x = t2.y and t1.x = 1);
    
    If we turn the generic equivclass.c logic loose on these clauses,
    it will deduce t2.y = 1, which is good, and then apply t2.y = 1 at
    the scan of t2, which is even better (since we might be able to turn
    that into an indexscan qual).  However, it will also try to apply
    t1.x = 1 at the scan of t1, and that's just wrong, because that
    will eliminate t1 rows that should come through with null extension.
    
    My current plan for making this work is to define
    EquivalenceClass-generated clauses as applying within "join domains",
    which are sets of inner-joined relations, and in the case of a one-sided
    outer join then the join itself belongs to the same join domain as its
    right-hand side --- but not to the join domain of its left-hand side.
    This would allow us to push EC clauses from an outer join's qual down
    into the RHS, but not into the LHS, and then anything leftover would
    still have to be applied at the join.  In this example we'd have to
    apply t1.x = t2.y or t1.x = 1, but not both, at the join.
    
    I got as far as inventing join domains, in the 0012 patch of this
    series, but I haven't quite finished puzzling out the clause application
    rules that would be needed for this scenario.  Ordinarily an EC
    containing a constant would be fully enforced at the scan level
    (i.e., apply t1.x = 1 and t2.y = 1 at scan level) and generate no
    additional clauses at join level; but that clearly doesn't work
    anymore when some of the scans are outside the join domain.
    I think that the no-constant case might need to be different too.
    I have some WIP code but nothing I can show.
    
    Also, this doesn't seem to help for full joins.  We can treat the
    two sides as each being their own join domains, but then the join's
    own ON clause doesn't belong to either one, since we can't throw
    away rows from either side on the basis of a restriction from ON.
    So it seems like we'll still need ad-hoc logic comparable to
    reconsider_full_join_clause, if we want to preserve that optimization.
    I'm only mildly discontented with that, but still discontented.
    
    			regards, tom lane
    
    
    
    
  47. Re: Making Vars outer-join aware

    David G. Johnston <david.g.johnston@gmail.com> — 2023-01-24T19:47:53Z

    On Tue, Jan 24, 2023 at 12:31 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    
    > I wrote:
    > > Hans Buschmann <buschmann@nidsa.net> writes:
    > >> I just noticed your new efforts in this area.
    > >> I wanted to recurr to my old thread [1] considering constant
    > propagation of quals.
    > >> [1]
    > https://www.postgresql.org/message-id/1571413123735.26467@nidsa.net
    >
    > > Yeah, this patch series is not yet quite up to the point of improving
    > > that.  That area is indeed the very next thing I want to work on, and
    > > I did spend some effort on it last month, but I ran out of time to get
    > > it working.  Maybe we'll have something there for v17.
    >
    > BTW, to clarify what's going on there: what I want to do is allow
    > the regular equivalence-class machinery to handle deductions from
    > equality operators appearing in LEFT JOIN ON clauses (maybe full
    > joins too, but I'd be satisfied if it works for one-sided outer
    > joins).  I'd originally hoped that distinguishing pre-nulled from
    > post-nulled variables would be enough to make that safe, but it's
    > not.  Here's an example:
    >
    >         select ... from t1 left join t2 on (t1.x = t2.y and t1.x = 1);
    >
    > If we turn the generic equivclass.c logic loose on these clauses,
    > it will deduce t2.y = 1, which is good, and then apply t2.y = 1 at
    > the scan of t2, which is even better (since we might be able to turn
    > that into an indexscan qual).  However, it will also try to apply
    > t1.x = 1 at the scan of t1, and that's just wrong, because that
    > will eliminate t1 rows that should come through with null extension.
    >
    >
    Is there a particular comment or README where that last conclusion is
    explained so that it makes sense.  Intuitively, I would expect t1.x = 1 to
    be applied during the scan of t1 - it isn't like the output of the join is
    allowed to include t1 rows not matching that condition anyway.
    
    IOW, I thought the more verbose but equivalent syntax for that was:
    
    select ... from (select * from t1 as insub where insub.x = 1) as t1 left
    join t2 on (t1.x  = t2.y)
    
    Thanks!
    
    David J.
    
  48. Re: Making Vars outer-join aware

    Tom Lane <tgl@sss.pgh.pa.us> — 2023-01-24T20:25:42Z

    "David G. Johnston" <david.g.johnston@gmail.com> writes:
    > On Tue, Jan 24, 2023 at 12:31 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    >> select ... from t1 left join t2 on (t1.x = t2.y and t1.x = 1);
    >> 
    >> If we turn the generic equivclass.c logic loose on these clauses,
    >> it will deduce t2.y = 1, which is good, and then apply t2.y = 1 at
    >> the scan of t2, which is even better (since we might be able to turn
    >> that into an indexscan qual).  However, it will also try to apply
    >> t1.x = 1 at the scan of t1, and that's just wrong, because that
    >> will eliminate t1 rows that should come through with null extension.
    
    > Is there a particular comment or README where that last conclusion is
    > explained so that it makes sense.
    
    Hm?  It's a LEFT JOIN, so it must not eliminate any rows from t1.
    A row that doesn't have t1.x = 1 will appear in the output with
    null columns for t2 ... but it must still appear, so we cannot
    filter on t1.x = 1 in the scan of t1.
    
    			regards, tom lane
    
    
    
    
  49. Re: Making Vars outer-join aware

    David G. Johnston <david.g.johnston@gmail.com> — 2023-01-24T20:39:35Z

    On Tue, Jan 24, 2023 at 1:25 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    
    > "David G. Johnston" <david.g.johnston@gmail.com> writes:
    > > On Tue, Jan 24, 2023 at 12:31 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    > >> select ... from t1 left join t2 on (t1.x = t2.y and t1.x = 1);
    > >>
    > >> If we turn the generic equivclass.c logic loose on these clauses,
    > >> it will deduce t2.y = 1, which is good, and then apply t2.y = 1 at
    > >> the scan of t2, which is even better (since we might be able to turn
    > >> that into an indexscan qual).  However, it will also try to apply
    > >> t1.x = 1 at the scan of t1, and that's just wrong, because that
    > >> will eliminate t1 rows that should come through with null extension.
    >
    > > Is there a particular comment or README where that last conclusion is
    > > explained so that it makes sense.
    >
    > Hm?  It's a LEFT JOIN, so it must not eliminate any rows from t1.
    > A row that doesn't have t1.x = 1 will appear in the output with
    > null columns for t2 ... but it must still appear, so we cannot
    > filter on t1.x = 1 in the scan of t1.
    >
    >
    Ran some queries, figured it out.  Sorry for the noise.  I had turned the
    behavior of the RHS side appearing in the ON clause into a personal general
    rule then tried to apply it to the LHS (left join mental model) without
    working through the rules from first principles.
    
    David J.
    
  50. Re: Making Vars outer-join aware

    Richard Guo <guofenglinux@gmail.com> — 2023-01-30T09:56:38Z

    On Tue, Jan 24, 2023 at 4:38 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    
    > Richard, are you planning to review this any more?  I'm getting
    > a little antsy to get it committed.  For such a large patch,
    > it's surprising it's had so few conflicts to date.
    
    
    Sorry for the delayed reply.  I don't have any more review comments at
    the moment, except a nitpicking one.
    
    In optimizer/README at line 729 there is a query as
    
        SELECT * FROM a
          LEFT JOIN (SELECT * FROM b WHERE b.z = 1) ss ON (a.x = b.y)
        WHERE a.x = 1;
    
    I think it should be
    
        SELECT * FROM a
          LEFT JOIN (SELECT * FROM b WHERE b.z = 1) ss ON (a.x = ss.y)
        WHERE a.x = 1;
    
    I have no objection to get it committed.
    
    Thanks
    Richard
    
  51. Re: Making Vars outer-join aware

    Tom Lane <tgl@sss.pgh.pa.us> — 2023-01-30T16:45:27Z

    Richard Guo <guofenglinux@gmail.com> writes:
    > Sorry for the delayed reply.  I don't have any more review comments at
    > the moment, except a nitpicking one.
    
    > In optimizer/README at line 729 there is a query as
    
    >     SELECT * FROM a
    >       LEFT JOIN (SELECT * FROM b WHERE b.z = 1) ss ON (a.x = b.y)
    >     WHERE a.x = 1;
    
    > I think it should be
    
    >     SELECT * FROM a
    >       LEFT JOIN (SELECT * FROM b WHERE b.z = 1) ss ON (a.x = ss.y)
    >     WHERE a.x = 1;
    
    Oh, good catch, thanks.
    
    > I have no objection to get it committed.
    
    I'll push forward then.  Thanks for reviewing!
    
    			regards, tom lane
    
    
    
    
  52. Re: Making Vars outer-join aware

    Justin Pryzby <pryzby@telsasoft.com> — 2023-02-12T23:58:23Z

    On Mon, Jan 23, 2023 at 03:38:06PM -0500, Tom Lane wrote:
    > Richard, are you planning to review this any more?  I'm getting
    > a little antsy to get it committed.  For such a large patch,
    > it's surprising it's had so few conflicts to date.
    
    The patch broke this query:
    
    select from pg_inherits inner join information_schema.element_types
    right join (select from pg_constraint as sample_2) on true
    on false, lateral (select scope_catalog, inhdetachpending from pg_publication_namespace limit 3);
    ERROR:  could not devise a query plan for the given query
    
    
    
    
    
  53. Re: Making Vars outer-join aware

    Richard Guo <guofenglinux@gmail.com> — 2023-02-13T07:33:15Z

    On Mon, Feb 13, 2023 at 7:58 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
    
    > The patch broke this query:
    >
    > select from pg_inherits inner join information_schema.element_types
    > right join (select from pg_constraint as sample_2) on true
    > on false, lateral (select scope_catalog, inhdetachpending from
    > pg_publication_namespace limit 3);
    > ERROR:  could not devise a query plan for the given query
    
    
    Thanks for the report!  I've looked at it a little bit and traced down
    to function have_unsafe_outer_join_ref().  The comment there says
    
     * In practice, this test never finds a problem ...
     * ...
     * It still seems worth checking
     * as a backstop, but we don't go to a lot of trouble: just reject if the
     * unsatisfied part includes any outer-join relids at all.
    
    This seems not correct as showed by the counterexample.  ISTM that we
    need to do the check honestly as what the other comment says
    
     * If the parameterization is only partly satisfied by the outer rel,
     * the unsatisfied part can't include any outer-join relids that could
     * null rels of the satisfied part.
    
    The NOT_USED part of code is doing this check.  But I think we need a
    little tweak.  We should check the nullable side of related outer joins
    against the satisfied part, rather than inner_paramrels.  Maybe
    something like attached.
    
    However, this test seems to cost some cycles after the change.  So I
    wonder if it's worthwhile to perform it, considering that join order
    restrictions should be able to guarantee there is no problem here.
    
    BTW, here is a simplified query that can trigger this issue on HEAD.
    
    select * from t1 inner join t2 left join (select null as c from t3 left
    join t4 on true) as sub on true on true, lateral (select c, t1.a from t5
    offset 0 ) ss;
    
    Thanks
    Richard
    
  54. Re: Making Vars outer-join aware

    Tom Lane <tgl@sss.pgh.pa.us> — 2023-02-13T16:50:17Z

    Richard Guo <guofenglinux@gmail.com> writes:
    > Thanks for the report!  I've looked at it a little bit and traced down
    > to function have_unsafe_outer_join_ref().  The comment there says
    >  * In practice, this test never finds a problem ...
    > This seems not correct as showed by the counterexample.
    
    Right.  I'd noticed that the inner loop of that function was not
    reached in our regression tests, and incorrectly concluded that it
    was not reachable --- but I failed to consider cases where the
    inner rel's parameterization depends on Vars from multiple places.
    
    > The NOT_USED part of code is doing this check.  But I think we need a
    > little tweak.  We should check the nullable side of related outer joins
    > against the satisfied part, rather than inner_paramrels.  Maybe
    > something like attached.
    
    Agreed.
    
    > However, this test seems to cost some cycles after the change.  So I
    > wonder if it's worthwhile to perform it, considering that join order
    > restrictions should be able to guarantee there is no problem here.
    
    Yeah, I think we should reduce it to an Assert check.  It shouldn't be
    worth the cycles to run in production, and that will also make it easier
    to notice in sqlsmith testing if anyone happens across another
    counterexample.
    
    Pushed that way.  Thanks for the report and the patch!
    
    			regards, tom lane
    
    
    
    
  55. Re: Making Vars outer-join aware

    Justin Pryzby <pryzby@telsasoft.com> — 2023-02-13T18:48:07Z

    On Mon, Feb 13, 2023 at 03:33:15PM +0800, Richard Guo wrote:
    > On Mon, Feb 13, 2023 at 7:58 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
    > > The patch broke this query:
    > >
    > > select from pg_inherits inner join information_schema.element_types
    > > right join (select from pg_constraint as sample_2) on true
    > > on false, lateral (select scope_catalog, inhdetachpending from
    > > pg_publication_namespace limit 3);
    > > ERROR:  could not devise a query plan for the given query
    
    > BTW, here is a simplified query that can trigger this issue on HEAD.
    > 
    > select * from t1 inner join t2 left join (select null as c from t3 left
    > join t4 on true) as sub on true on true, lateral (select c, t1.a from t5
    > offset 0 ) ss;
    
    It probably doesn't need to be said that the original query was reduced
    from sqlsmith...  But I mention that now to make it searchable.
    
    Thanks,
    -- 
    Justin
    
    
    
    
  56. Re: Making Vars outer-join aware

    Anton A. Melnikov <aamelnikov@inbox.ru> — 2023-05-04T07:22:36Z

    Hello!
    
    I'm having doubts about this fix but most likely i don't understand something.
    Could you help me to figure it out, please.
    
    The thing is that for custom scan nodes as readme says:
    "INDEX_VAR is abused to signify references to columns of a custom scan tuple type"
    But INDEX_VAR has a negative value, so it can not be used in varnullingrels bitmapset.
    And therefore this improvement seems will not work with custom scan nodes and some
    extensions that use such nodes.
    
    If i'm wrong in my doubts and bitmapset for varnullingrels is ok, may be add a check before
    adjust_relid_set() call like this:
    
    @@ -569,9 +569,10 @@ ChangeVarNodes_walker(Node *node, ChangeVarNodes_context *context)
                     {
                             if (var->varno == context->rt_index)
                                     var->varno = context->new_index;
    -                       var->varnullingrels = adjust_relid_set(var->varnullingrels,
    -                                                              context->rt_index,
    -                                                              context->new_index);
    +                       if (context->rt_index >= 0 && context->new_index >= 0)
    +                               var->varnullingrels = adjust_relid_set(var->varnullingrels,
    +                                                                      context->rt_index,
    +
    
    With the best wishes,
    
    -- 
    Anton A. Melnikov
    Postgres Professional: http://www.postgrespro.com
    The Russian Postgres Company
    
    
    
    
  57. Re: Making Vars outer-join aware

    Tom Lane <tgl@sss.pgh.pa.us> — 2023-05-04T12:22:04Z

    "Anton A. Melnikov" <aamelnikov@inbox.ru> writes:
    > The thing is that for custom scan nodes as readme says:
    > "INDEX_VAR is abused to signify references to columns of a custom scan tuple type"
    > But INDEX_VAR has a negative value, so it can not be used in varnullingrels bitmapset.
    > And therefore this improvement seems will not work with custom scan nodes and some
    > extensions that use such nodes.
    
    Under what circumstances would you be trying to inject INDEX_VAR
    into a nullingrel set?  Only outer-join relids should ever appear there.
    AFAICS the change you propose would serve only to mask bugs.
    
    			regards, tom lane
    
    
    
    
  58. Re: Making Vars outer-join aware

    Anton A. Melnikov <aamelnikov@inbox.ru> — 2023-05-29T12:01:03Z

    Hello and sorry for the big delay in reply!
    
    On 04.05.2023 15:22, Tom Lane wrote:
    > AFAICS the change you propose would serve only to mask bugs.
    
    Yes, it's a bad decision.
    
    > Under what circumstances would you be trying to inject INDEX_VAR
    > into a nullingrel set?  Only outer-join relids should ever appear there.
    
    The thing is that i don't try to push INDEX_VAR into a nullingrel set at all,
    i just try to replace the existing rt_index that equals to INDEX_VAR in Var nodes with
    the defined positive indexes by using ChangeVarNodes_walker() function call. It checks
    if the nullingrel contains the existing rt_index and does it need to be updated too.
    It calls bms_is_member() for that, but the last immediately throws an error
    if the value to be checked is negative like INDEX_VAR.
    
    But we are not trying to corrupt the existing nullingrel with this bad index,
    so it doesn't seem like an serious error.
    And this index certainly cannot be a member of the Bitmapset.
    
    Therefore it also seems better and more logical to me in the case of an index that
    cannot possibly be a member of the Bitmapset, immediately return false.
    
    Here is a patch like that.
    
    With the best regards,
    
    -- 
    Anton A. Melnikov
    Postgres Professional: http://www.postgrespro.com
    The Russian Postgres Company
  59. Re: Making Vars outer-join aware

    Tom Lane <tgl@sss.pgh.pa.us> — 2023-06-08T16:58:08Z

    [ back from PGCon ... ]
    
    "Anton A. Melnikov" <aamelnikov@inbox.ru> writes:
    > On 04.05.2023 15:22, Tom Lane wrote:
    >> Under what circumstances would you be trying to inject INDEX_VAR
    >> into a nullingrel set?  Only outer-join relids should ever appear there.
    
    > The thing is that i don't try to push INDEX_VAR into a nullingrel set at all,
    > i just try to replace the existing rt_index that equals to INDEX_VAR in Var nodes with
    > the defined positive indexes by using ChangeVarNodes_walker() function call.
    
    Hmm.  That implies that you're changing plan data structures around after
    setrefs.c, which doesn't seem like a great design to me --- IMO that ought
    to happen in PlanCustomPath, which will still see the original varnos.
    However, it's probably not worth breaking existing code for this, so
    now I agree that ChangeVarNodes ought to (continue to) allow negative
    rt_index.
    
    > Therefore it also seems better and more logical to me in the case of an index that
    > cannot possibly be a member of the Bitmapset, immediately return false.
    > Here is a patch like that.
    
    I do not like the blast radius of this patch.  Yes, I know about that
    comment in bms_is_member --- I wrote it, if memory serves.  But it's
    stood like that for more than two decades, and I believe it's caught
    its share of mistakes.  This issue doesn't seem like a sufficient
    reason to change a globally-visible behavior.
    
    I think the right thing here is not either of your patches, but
    to tweak adjust_relid_set() to not fail on negative oldrelid.
    I'll go make it so.
    
    			regards, tom lane
    
    
    
    
  60. Re: Making Vars outer-join aware

    Anton A. Melnikov <aamelnikov@inbox.ru> — 2023-07-23T12:20:21Z

    On 08.06.2023 19:58, Tom Lane wrote:
    > I think the right thing here is not either of your patches, but
    > to tweak adjust_relid_set() to not fail on negative oldrelid.
    > I'll go make it so.
    
    Thanks! This fully solves the problem with ChangeVarNodes() that i wrote above.
    
    
    > Hmm.  That implies that you're changing plan data structures around after
    > setrefs.c, which doesn't seem like a great design to me --- IMO that ought
    > to happen in PlanCustomPath, which will still see the original varnos.
    
    My further searchers led to the fact that it is possible to immediately set the
    necessary varnos during custom_scan->scan.plan.targetlist creation and leave the
    сustom_scan->custom_scan_tlist = NIL rather than changing them later using ChangeVarNodes().
    This resulted in a noticeable code simplification.
    Thanks a lot for pointing on it!
    
    Sincerely yours,
    
    -- 
    Anton A. Melnikov
    Postgres Professional: http://www.postgrespro.com
    The Russian Postgres Company