Re: Making Vars outer-join aware

Richard Guo <guofenglinux@gmail.com>

From: Richard Guo <guofenglinux@gmail.com>
To: Tom Lane <tgl@sss.pgh.pa.us>
Cc: Pg Hackers <pgsql-hackers@lists.postgresql.org>, "Finnerty, Jim" <jfinnert@amazon.com>
Date: 2022-08-15T08:48:23Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Re-allow INDEX_VAR as rt_index in ChangeVarNodes().

  2. Fix thinkos in have_unsafe_outer_join_ref; reduce to Assert check.

  3. Invent "join domains" to replace the below_outer_join hack.

  4. Do assorted mop-up in the planner.

  5. Make Vars be outer-join-aware.

  6. Invent "multibitmapsets", and use them to speed up antijoin detection.

  7. Add basic regression tests for semi/antijoin recognition.

  8. Improve performance of adjust_appendrel_attrs_multilevel.

  9. Refactor addition of PlaceHolderVars to joinrel targetlists.

  10. Use an explicit state flag to control PlaceHolderInfo creation.

  11. Make PlaceHolderInfo lookup O(1).

On Tue, Aug 2, 2022 at 3:51 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Here's a rebase up to HEAD, mostly to placate the cfbot.
> I accounted for d8e34fa7a (s/all_baserels/all_query_rels/
> in those places) and made one tiny bug-fix change.
> Nothing substantive as yet.


When we add required PlaceHolderVars to a join rel's targetlist, if the
PHV can be computed in the nullable-side of the join, we would add the
join's RT index to phnullingrels. This is correct as we know the PHV's
value can be nulled at this join. But I'm wondering if it is necessary
since we have already propagated any varnullingrels into the PHV when we
apply pullup variable replacement in perform_pullup_replace_vars().

On the other hand, the phnullingrels may contain RT indexes of outer
joins above this join level. It seems not good to add such a PHV to the
joinrel's targetlist. Below is an example:

# explain (verbose, costs off) select a.i, ss.jj from a left join (b left
join (select c.i, coalesce(c.j, 1) as jj from c) ss on b.i = ss.i) on true;
                       QUERY PLAN
---------------------------------------------------------
 Nested Loop Left Join
   Output: a.i, (COALESCE(c.j, 1))
   ->  Seq Scan on public.a
         Output: a.i, a.j
   ->  Materialize
         Output: (COALESCE(c.j, 1))
         ->  Hash Left Join
               Output: (COALESCE(c.j, 1))
               Hash Cond: (b.i = c.i)
               ->  Seq Scan on public.b
                     Output: b.i, b.j
               ->  Hash
                     Output: c.i, (COALESCE(c.j, 1))
                     ->  Seq Scan on public.c
                           Output: c.i, COALESCE(c.j, 1)
(15 rows)

In this query, for the joinrel {B, C}, the PHV in its targetlist has a
phnullingrels that contains the join of {A} and {BC}, which is confusing
because we have not reached that join level.

I tried the changes below to illustrate the two issues. The assertion
holds true during regression tests and the error pops up for the query
above.

--- a/src/backend/optimizer/util/placeholder.c
+++ b/src/backend/optimizer/util/placeholder.c
@@ -464,18 +464,20 @@ add_placeholders_to_joinrel(PlannerInfo *root,
RelOptInfo *joinrel,
                       {
                               if (sjinfo->jointype == JOIN_FULL &&
sjinfo->ojrelid != 0)
                               {
-                                      /* PHV's value can be nulled at this
join */
-                                      phv->phnullingrels =
bms_add_member(phv->phnullingrels,
-
                                   sjinfo->ojrelid);
+
 Assert(bms_is_member(sjinfo->ojrelid, phv->phnullingrels));
+
+                                      if
(!bms_is_subset(phv->phnullingrels, joinrel->relids))
+                                              elog(ERROR, "phnullingrels
is not subset of joinrel's relids");
                               }
                       }
                       else if (bms_is_subset(phinfo->ph_eval_at,
inner_rel->relids))
                       {
                               if (sjinfo->jointype != JOIN_INNER &&
sjinfo->ojrelid != 0)
                               {
-                                      /* PHV's value can be nulled at this
join */
-                                      phv->phnullingrels =
bms_add_member(phv->phnullingrels,
-
                                   sjinfo->ojrelid);
+
 Assert(bms_is_member(sjinfo->ojrelid, phv->phnullingrels));
+
+                                      if
(!bms_is_subset(phv->phnullingrels, joinrel->relids))
+                                              elog(ERROR, "phnullingrels
is not subset of joinrel's relids");
                               }
                       }


If the two issues are indeed something we need to fix, maybe we can
change add_placeholders_to_joinrel() to search the PHVs from
outer_rel/inner_rel's targetlist, and add the ojrelid to phnullingrels
if needed, just like what we do in build_joinrel_tlist(). The PHVs there
should have the correct phnullingrels (at least the PHVs in base rels'
targetlists have correctly set phnullingrels to NULL).

Thanks
Richard