Re: Memoize ANTI and SEMI JOIN inner

Alena Rybakina <a.rybakina@postgrespro.ru>

From: Alena Rybakina <a.rybakina@postgrespro.ru>
To: Andrei Lepikhov <lepihov@gmail.com>
Cc: David Rowley <dgrowleyml@gmail.com>, PostgreSQL-development <pgsql-hackers@postgresql.org>
Date: 2025-03-31T02:33:49Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Enable use of Memoize for ANTI joins

Attachments

Hi!

On 21.03.2025 18:56, Andrei Lepikhov wrote:
> On 20/3/2025 07:02, David Rowley wrote:
>> On Thu, 20 Mar 2025 at 06:16, Andrei Lepikhov <lepihov@gmail.com> wrote:
>>> How can we be sure that semi or anti-join needs only one tuple? I think
>>> it would be enough to control the absence of join clauses and 
>>> filters in
>>> the join. Unfortunately, we only have such a guarantee in the plan
>>> creation stage (maybe even setrefs.c). So, it seems we need to 
>>> invent an
>>> approach like AlternativeSubplan.
>>
>> I suggest looking at what 9e215378d did.  You might be able to also
>> allow semi and anti-joins providing the cache keys cover the entire
>> join condition. I think this might be safe as Nested Loop will only
>> ask its inner subnode for the first match before skipping to the next
>> outer row and with anti-join, there's no reason to look for additional
>> rows after the first. Semi-join and unique joins do the same thing in
>> nodeNestloop.c. To save doing additional checks at run-time, the code
>> does:
> Thank you for the clue! I almost took the wrong direction.
> I have attached the new patch, which includes corrected comments for 
> better clarification of the changes, as well as some additional tests.
> I now feel much more confident about this version since I have 
> resolved that concern.
>

I reviewed your patch and made a couple of suggestions.

The first change is related to your comment (and the one before it). I 
fixed some grammar issues and simplified the wording to make it clearer 
and easier to understand.

The second change involves adding an Assert when generating the Memoize 
path. Based on the existing comment and the surrounding logic (shown 
below),
I believe it's worth asserting that both inner_unique and single_mode 
are not true at the same time — just as a safety check.
/*
* We may do this for SEMI or ANTI joins when they need only one tuple from
* the inner side to produce the result. Following if condition checks that
* rule.
*
* XXX Currently we don't attempt to mark SEMI/ANTI joins as inner_unique
* = true. Should we? See add_paths_to_joinrel()
*/
if(!extra->inner_unique&& (jointype== JOIN_SEMI||
jointype== JOIN_ANTI))
single_mode= true;

-- 
Regards,
Alena Rybakina
Postgres Professional