Re: Making Vars outer-join aware

David G. Johnston <david.g.johnston@gmail.com>

From: "David G. Johnston" <david.g.johnston@gmail.com>
To: Tom Lane <tgl@sss.pgh.pa.us>
Cc: Hans Buschmann <buschmann@nidsa.net>, Richard Guo <guofenglinux@gmail.com>, Pg Hackers <pgsql-hackers@lists.postgresql.org>, "Finnerty, Jim" <jfinnert@amazon.com>
Date: 2023-01-24T20:39:35Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Re-allow INDEX_VAR as rt_index in ChangeVarNodes().

  2. Fix thinkos in have_unsafe_outer_join_ref; reduce to Assert check.

  3. Invent "join domains" to replace the below_outer_join hack.

  4. Do assorted mop-up in the planner.

  5. Make Vars be outer-join-aware.

  6. Invent "multibitmapsets", and use them to speed up antijoin detection.

  7. Add basic regression tests for semi/antijoin recognition.

  8. Improve performance of adjust_appendrel_attrs_multilevel.

  9. Refactor addition of PlaceHolderVars to joinrel targetlists.

  10. Use an explicit state flag to control PlaceHolderInfo creation.

  11. Make PlaceHolderInfo lookup O(1).

On Tue, Jan 24, 2023 at 1:25 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

> "David G. Johnston" <david.g.johnston@gmail.com> writes:
> > On Tue, Jan 24, 2023 at 12:31 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> select ... from t1 left join t2 on (t1.x = t2.y and t1.x = 1);
> >>
> >> If we turn the generic equivclass.c logic loose on these clauses,
> >> it will deduce t2.y = 1, which is good, and then apply t2.y = 1 at
> >> the scan of t2, which is even better (since we might be able to turn
> >> that into an indexscan qual).  However, it will also try to apply
> >> t1.x = 1 at the scan of t1, and that's just wrong, because that
> >> will eliminate t1 rows that should come through with null extension.
>
> > Is there a particular comment or README where that last conclusion is
> > explained so that it makes sense.
>
> Hm?  It's a LEFT JOIN, so it must not eliminate any rows from t1.
> A row that doesn't have t1.x = 1 will appear in the output with
> null columns for t2 ... but it must still appear, so we cannot
> filter on t1.x = 1 in the scan of t1.
>
>
Ran some queries, figured it out.  Sorry for the noise.  I had turned the
behavior of the RHS side appearing in the ON clause into a personal general
rule then tried to apply it to the LHS (left join mental model) without
working through the rules from first principles.

David J.