Re: Making Vars outer-join aware

Richard Guo <guofenglinux@gmail.com>

From: Richard Guo <guofenglinux@gmail.com>
To: Tom Lane <tgl@sss.pgh.pa.us>
Cc: Pg Hackers <pgsql-hackers@lists.postgresql.org>, "Finnerty, Jim" <jfinnert@amazon.com>
Date: 2022-11-15T08:59:27Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Re-allow INDEX_VAR as rt_index in ChangeVarNodes().

  2. Fix thinkos in have_unsafe_outer_join_ref; reduce to Assert check.

  3. Invent "join domains" to replace the below_outer_join hack.

  4. Do assorted mop-up in the planner.

  5. Make Vars be outer-join-aware.

  6. Invent "multibitmapsets", and use them to speed up antijoin detection.

  7. Add basic regression tests for semi/antijoin recognition.

  8. Improve performance of adjust_appendrel_attrs_multilevel.

  9. Refactor addition of PlaceHolderVars to joinrel targetlists.

  10. Use an explicit state flag to control PlaceHolderInfo creation.

  11. Make PlaceHolderInfo lookup O(1).

On Sun, Nov 6, 2022 at 5:53 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

> I wrote:
> > I've been working away at this patch series, and here is an up-to-date
> > version.
>
> This needs a rebase after ff8fa0bf7 and b0b72c64a.  I also re-ordered
> the patches so that the commit messages' claims about when regression
> tests start to pass are true again.  No interesting changes, though.


I'm reviewing the part about multiple version clauses, and I find a case
that may not work as expected.  I tried with some query as below

 (A leftjoin (B leftjoin C on (Pbc)) on (Pab)) left join D on (Pcd)

Assume Pbc is strict for B and Pcd is strict for C.

According to identity 3, we know one of its equivalent form is

 ((A leftjoin B on (Pab)) leftjoin C on (Pbc)) left join D on (Pcd)

For outer join clause Pcd, we would generate two versions from the first
form

    Version 1: C Vars with nullingrels as {A/B}
    Version 2: C Vars with nullingrels as {B/C, A/B}

I understand version 2 is reasonable as the nullingrels from parser
would be set as that.  But it seems version 1 is not applicable in
either form.

Looking at the two forms again, it seems the expected two versions for
Pcd should be

    Version 1: C Vars with nullingrels as {B/C}
    Version 2: C Vars with nullingrels as {B/C, A/B}

With this we may have another problem that the two versions are both
applicable at the C/D join according to clause_is_computable_at(), in
both forms.

Another thing is I believe we have another equivalent form as

 (A left join B on (Pab)) left join (C left join D on (Pcd)) on (Pbc)

Currently this form cannot be generated because of the issue discussed
in [1].  But someday when we can do that, I think we should have a third
version for Pcd

    Version 3: C Vars with empty nullingrels

[1]
https://www.postgresql.org/message-id/flat/CAMbWs4_8n5ANh_aX2PinRZ9V9mtBguhnRd4DOVt9msPgHmEMOQ%40mail.gmail.com

Thanks
Richard