Re: Making Vars outer-join aware

Tom Lane <tgl@sss.pgh.pa.us>

From: Tom Lane <tgl@sss.pgh.pa.us>
To: Richard Guo <guofenglinux@gmail.com>
Cc: Pg Hackers <pgsql-hackers@lists.postgresql.org>, "Finnerty, Jim" <jfinnert@amazon.com>
Date: 2022-08-16T19:24:04Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Re-allow INDEX_VAR as rt_index in ChangeVarNodes().

  2. Fix thinkos in have_unsafe_outer_join_ref; reduce to Assert check.

  3. Invent "join domains" to replace the below_outer_join hack.

  4. Do assorted mop-up in the planner.

  5. Make Vars be outer-join-aware.

  6. Invent "multibitmapsets", and use them to speed up antijoin detection.

  7. Add basic regression tests for semi/antijoin recognition.

  8. Improve performance of adjust_appendrel_attrs_multilevel.

  9. Refactor addition of PlaceHolderVars to joinrel targetlists.

  10. Use an explicit state flag to control PlaceHolderInfo creation.

  11. Make PlaceHolderInfo lookup O(1).

Attachments

I wrote:
> Richard Guo <guofenglinux@gmail.com> writes:
>> If the two issues are indeed something we need to fix, maybe we can
>> change add_placeholders_to_joinrel() to search the PHVs from
>> outer_rel/inner_rel's targetlist, and add the ojrelid to phnullingrels
>> if needed, just like what we do in build_joinrel_tlist(). The PHVs there
>> should have the correct phnullingrels (at least the PHVs in base rels'
>> targetlists have correctly set phnullingrels to NULL).

> Yeah, maybe we should do something more invasive and make use of the
> input targetlists rather than doing this from scratch.  Not sure.
> I'm worried that doing it that way would increase the risk of getting
> different join tlist contents depending on which pair of input rels
> we happen to consider first.

After chewing on that for awhile, I've concluded that that is the way
to go.  0001 attached is a standalone patch to rearrange the way that
PHVs are added to joinrel targetlists.  It results in one cosmetic
change in the regression tests, where the targetlist order for an
intermediate join node changes.  I think that's fine; if anything,
the new order is more sensible than the old because it matches the
inputs' targetlist orders better.

I believe the reason I didn't do it like this to start with is that
I was concerned about the cost of searching the placeholder_list
repeatedly.  With a lot of PHVs, build_joinrel_tlist becomes O(N^2)
just from the repeated find_placeholder_info lookups.  We can fix
that by adding an index array to go straight from phid to the
PlaceHolderInfo.  While thinking about where to construct such
an index array, I decided it'd be a good idea to have an explicit
step to "freeze" the set of PlaceHolderInfos, at the start of
deconstruct_jointree.  This allows getting rid of the create_new_ph
flags for find_placeholder_info and add_vars_to_targetlist, which
I've always feared were bugs waiting to happen: they require callers
to have a very clear understanding of when they're invoked.  There
might be some speed gain over existing code too, but I've not really
tried to measure it.  I did drop a couple of hacks that were only
meant to short-circuit find_placeholder_info calls; that function
should now be cheap enough to not matter.

Barring objections, I'd like to push these soon and then rebase
the main outer-join-vars patch set over them.

			regards, tom lane