Re: Memoize ANTI and SEMI JOIN inner

Alena Rybakina <a.rybakina@postgrespro.ru>

From: Alena Rybakina <a.rybakina@postgrespro.ru>
To: Andrei Lepikhov <lepihov@gmail.com>
Cc: David Rowley <dgrowleyml@gmail.com>, PostgreSQL-development <pgsql-hackers@postgresql.org>
Date: 2025-03-31T02:50:07Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Enable use of Memoize for ANTI joins

Attachments

I realized that I uploaded my diff file with a small mistake - sorry 
about that. I've corrected it with this message so your tests can pass 
in the CI.

On 31.03.2025 05:33, Alena Rybakina wrote:
>
> Hi!
>
> On 21.03.2025 18:56, Andrei Lepikhov wrote:
>> On 20/3/2025 07:02, David Rowley wrote:
>>> On Thu, 20 Mar 2025 at 06:16, Andrei Lepikhov <lepihov@gmail.com> 
>>> wrote:
>>>> How can we be sure that semi or anti-join needs only one tuple? I 
>>>> think
>>>> it would be enough to control the absence of join clauses and 
>>>> filters in
>>>> the join. Unfortunately, we only have such a guarantee in the plan
>>>> creation stage (maybe even setrefs.c). So, it seems we need to 
>>>> invent an
>>>> approach like AlternativeSubplan.
>>>
>>> I suggest looking at what 9e215378d did.  You might be able to also
>>> allow semi and anti-joins providing the cache keys cover the entire
>>> join condition. I think this might be safe as Nested Loop will only
>>> ask its inner subnode for the first match before skipping to the next
>>> outer row and with anti-join, there's no reason to look for additional
>>> rows after the first. Semi-join and unique joins do the same thing in
>>> nodeNestloop.c. To save doing additional checks at run-time, the code
>>> does:
>> Thank you for the clue! I almost took the wrong direction.
>> I have attached the new patch, which includes corrected comments for 
>> better clarification of the changes, as well as some additional tests.
>> I now feel much more confident about this version since I have 
>> resolved that concern.
>>
>
> I reviewed your patch and made a couple of suggestions.
>
> The first change is related to your comment (and the one before it). I 
> fixed some grammar issues and simplified the wording to make it 
> clearer and easier to understand.
>
> The second change involves adding an Assert when generating the 
> Memoize path. Based on the existing comment and the surrounding logic 
> (shown below),
> I believe it's worth asserting that both inner_unique and single_mode 
> are not true at the same time — just as a safety check.
> /*
> * We may do this for SEMI or ANTI joins when they need only one tuple from
> * the inner side to produce the result. Following if condition checks that
> * rule.
> *
> * XXX Currently we don't attempt to mark SEMI/ANTI joins as inner_unique
> * = true. Should we? See add_paths_to_joinrel()
> */
> if(!extra->inner_unique&& (jointype== JOIN_SEMI||
> jointype== JOIN_ANTI))
> single_mode= true;

-- 
Regards,
Alena Rybakina
Postgres Professional