Thread

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →

Enable use of Memoize for ANTI joins
- 0da29e4cb161 19 (unreleased) landed

Memoize ANTI and SEMI JOIN inner

Andrei Lepikhov <lepihov@gmail.com> — 2025-03-06T13:08:59Z

Hi,

In case of NestLoop with parameterised inner semi-join for each outer 
tuple requires only a single tuple from its inner relation to produce a 
result. It seems that the same principle applies to an anti-join. This 
approach could effectively allow the Memoize node to enhance the 
performance of pulled-up EXISTS and NOT EXISTS sublinks.

In attachment see a sketch of the feature. Since we are using single_row 
mode, adapting this method to cache semi-join inner results should not 
be extremely complex. However, I am unsure about potential corner cases 
and would appreciate any feedback or criticisms regarding this approach.

-- 
regards, Andrei Lepikhov

Re: Memoize ANTI and SEMI JOIN inner

Andrei Lepikhov <lepihov@gmail.com> — 2025-03-19T17:15:54Z

On 6/3/2025 14:08, Andrei Lepikhov wrote:
> Hi,
> 
> In case of NestLoop with parameterised inner semi-join for each outer 
> tuple requires only a single tuple from its inner relation to produce a 
> result. It seems that the same principle applies to an anti-join. This 
> approach could effectively allow the Memoize node to enhance the 
> performance of pulled-up EXISTS and NOT EXISTS sublinks.
> 
> In attachment see a sketch of the feature. Since we are using single_row 
> mode, adapting this method to cache semi-join inner results should not 
> be extremely complex. However, I am unsure about potential corner cases 
> and would appreciate any feedback or criticisms regarding this approach.
I found a corner case that breaks this approach: when NestLoop has a 
join clause or a filter, it may filter inner tuples, thus scanning more 
than only one time for each outer. You can see the reproduction script 
in the attachment. Each of the reproduction queries throws the error:
ERROR: cache entry already complete

How can we be sure that semi or anti-join needs only one tuple? I think 
it would be enough to control the absence of join clauses and filters in 
the join. Unfortunately, we only have such a guarantee in the plan 
creation stage (maybe even setrefs.c). So, it seems we need to invent an 
approach like AlternativeSubplan.

-- 
regards, Andrei Lepikhov

Re: Memoize ANTI and SEMI JOIN inner

David Rowley <dgrowleyml@gmail.com> — 2025-03-20T06:02:16Z

On Thu, 20 Mar 2025 at 06:16, Andrei Lepikhov <lepihov@gmail.com> wrote:
> How can we be sure that semi or anti-join needs only one tuple? I think
> it would be enough to control the absence of join clauses and filters in
> the join. Unfortunately, we only have such a guarantee in the plan
> creation stage (maybe even setrefs.c). So, it seems we need to invent an
> approach like AlternativeSubplan.

I suggest looking at what 9e215378d did.  You might be able to also
allow semi and anti-joins providing the cache keys cover the entire
join condition. I think this might be safe as Nested Loop will only
ask its inner subnode for the first match before skipping to the next
outer row and with anti-join, there's no reason to look for additional
rows after the first. Semi-join and unique joins do the same thing in
nodeNestloop.c. To save doing additional checks at run-time, the code
does:

nlstate->js.single_match = (node->join.inner_unique ||
node->join.jointype == JOIN_SEMI);

For making this work, I think the attached should be about the guts of
the code changes. I didn't look at the comments. Right now I can't
think of any reason why this can't be done, but some experimentation
might reveal some reason that it can't.

David

Re: Memoize ANTI and SEMI JOIN inner

Andrei Lepikhov <lepihov@gmail.com> — 2025-03-21T15:56:09Z

On 20/3/2025 07:02, David Rowley wrote:
> On Thu, 20 Mar 2025 at 06:16, Andrei Lepikhov <lepihov@gmail.com> wrote:
>> How can we be sure that semi or anti-join needs only one tuple? I think
>> it would be enough to control the absence of join clauses and filters in
>> the join. Unfortunately, we only have such a guarantee in the plan
>> creation stage (maybe even setrefs.c). So, it seems we need to invent an
>> approach like AlternativeSubplan.
> 
> I suggest looking at what 9e215378d did.  You might be able to also
> allow semi and anti-joins providing the cache keys cover the entire
> join condition. I think this might be safe as Nested Loop will only
> ask its inner subnode for the first match before skipping to the next
> outer row and with anti-join, there's no reason to look for additional
> rows after the first. Semi-join and unique joins do the same thing in
> nodeNestloop.c. To save doing additional checks at run-time, the code
> does:
Thank you for the clue! I almost took the wrong direction.
I have attached the new patch, which includes corrected comments for 
better clarification of the changes, as well as some additional tests.
I now feel much more confident about this version since I have resolved 
that concern.

-- 
regards, Andrei Lepikhov

Re: Memoize ANTI and SEMI JOIN inner

Alena Rybakina <a.rybakina@postgrespro.ru> — 2025-03-31T02:33:49Z

Hi!

On 21.03.2025 18:56, Andrei Lepikhov wrote:
> On 20/3/2025 07:02, David Rowley wrote:
>> On Thu, 20 Mar 2025 at 06:16, Andrei Lepikhov <lepihov@gmail.com> wrote:
>>> How can we be sure that semi or anti-join needs only one tuple? I think
>>> it would be enough to control the absence of join clauses and 
>>> filters in
>>> the join. Unfortunately, we only have such a guarantee in the plan
>>> creation stage (maybe even setrefs.c). So, it seems we need to 
>>> invent an
>>> approach like AlternativeSubplan.
>>
>> I suggest looking at what 9e215378d did.  You might be able to also
>> allow semi and anti-joins providing the cache keys cover the entire
>> join condition. I think this might be safe as Nested Loop will only
>> ask its inner subnode for the first match before skipping to the next
>> outer row and with anti-join, there's no reason to look for additional
>> rows after the first. Semi-join and unique joins do the same thing in
>> nodeNestloop.c. To save doing additional checks at run-time, the code
>> does:
> Thank you for the clue! I almost took the wrong direction.
> I have attached the new patch, which includes corrected comments for 
> better clarification of the changes, as well as some additional tests.
> I now feel much more confident about this version since I have 
> resolved that concern.
>

I reviewed your patch and made a couple of suggestions.

The first change is related to your comment (and the one before it). I 
fixed some grammar issues and simplified the wording to make it clearer 
and easier to understand.

The second change involves adding an Assert when generating the Memoize 
path. Based on the existing comment and the surrounding logic (shown 
below),
I believe it's worth asserting that both inner_unique and single_mode 
are not true at the same time — just as a safety check.
/*
* We may do this for SEMI or ANTI joins when they need only one tuple from
* the inner side to produce the result. Following if condition checks that
* rule.
*
* XXX Currently we don't attempt to mark SEMI/ANTI joins as inner_unique
* = true. Should we? See add_paths_to_joinrel()
*/
if(!extra->inner_unique&& (jointype== JOIN_SEMI||
jointype== JOIN_ANTI))
single_mode= true;

-- 
Regards,
Alena Rybakina
Postgres Professional

Re: Memoize ANTI and SEMI JOIN inner

Alena Rybakina <a.rybakina@postgrespro.ru> — 2025-03-31T02:50:07Z

I realized that I uploaded my diff file with a small mistake - sorry 
about that. I've corrected it with this message so your tests can pass 
in the CI.

On 31.03.2025 05:33, Alena Rybakina wrote:
>
> Hi!
>
> On 21.03.2025 18:56, Andrei Lepikhov wrote:
>> On 20/3/2025 07:02, David Rowley wrote:
>>> On Thu, 20 Mar 2025 at 06:16, Andrei Lepikhov <lepihov@gmail.com> 
>>> wrote:
>>>> How can we be sure that semi or anti-join needs only one tuple? I 
>>>> think
>>>> it would be enough to control the absence of join clauses and 
>>>> filters in
>>>> the join. Unfortunately, we only have such a guarantee in the plan
>>>> creation stage (maybe even setrefs.c). So, it seems we need to 
>>>> invent an
>>>> approach like AlternativeSubplan.
>>>
>>> I suggest looking at what 9e215378d did.  You might be able to also
>>> allow semi and anti-joins providing the cache keys cover the entire
>>> join condition. I think this might be safe as Nested Loop will only
>>> ask its inner subnode for the first match before skipping to the next
>>> outer row and with anti-join, there's no reason to look for additional
>>> rows after the first. Semi-join and unique joins do the same thing in
>>> nodeNestloop.c. To save doing additional checks at run-time, the code
>>> does:
>> Thank you for the clue! I almost took the wrong direction.
>> I have attached the new patch, which includes corrected comments for 
>> better clarification of the changes, as well as some additional tests.
>> I now feel much more confident about this version since I have 
>> resolved that concern.
>>
>
> I reviewed your patch and made a couple of suggestions.
>
> The first change is related to your comment (and the one before it). I 
> fixed some grammar issues and simplified the wording to make it 
> clearer and easier to understand.
>
> The second change involves adding an Assert when generating the 
> Memoize path. Based on the existing comment and the surrounding logic 
> (shown below),
> I believe it's worth asserting that both inner_unique and single_mode 
> are not true at the same time — just as a safety check.
> /*
> * We may do this for SEMI or ANTI joins when they need only one tuple from
> * the inner side to produce the result. Following if condition checks that
> * rule.
> *
> * XXX Currently we don't attempt to mark SEMI/ANTI joins as inner_unique
> * = true. Should we? See add_paths_to_joinrel()
> */
> if(!extra->inner_unique&& (jointype== JOIN_SEMI||
> jointype== JOIN_ANTI))
> single_mode= true;

-- 
Regards,
Alena Rybakina
Postgres Professional

Re: Memoize ANTI and SEMI JOIN inner

David Rowley <dgrowleyml@gmail.com> — 2025-03-31T03:04:18Z

On Mon, 31 Mar 2025 at 15:33, Alena Rybakina <a.rybakina@postgrespro.ru> wrote:
> I believe it's worth asserting that both inner_unique and single_mode are not true at the same time — just as a safety check.

add_paths_to_joinrel() just chooses not to populate inner_unique for
SEMI and ANTI joins because, as of today's master, it's pretty
pointless to determine that because the executor will short-circuit
and skip to the next outer tuple for those join types anyway. I don't
follow why having both these flags set would cause trouble. It seems
perfectly legitimate that add_paths_to_joinrel() could choose to set
the inner_unique flag for these join types, and if it did, the Assert
you're proposing would fail for no good reason.

David

Re: Memoize ANTI and SEMI JOIN inner

Alena Rybakina <a.rybakina@postgrespro.ru> — 2025-03-31T03:21:43Z

On 31.03.2025 06:04, David Rowley wrote:
> On Mon, 31 Mar 2025 at 15:33, Alena Rybakina<a.rybakina@postgrespro.ru>  wrote:
>> I believe it's worth asserting that both inner_unique and single_mode are not true at the same time — just as a safety check.
> add_paths_to_joinrel() just chooses not to populate inner_unique for
> SEMI and ANTI joins because, as of today's master, it's pretty
> pointless to determine that because the executor will short-circuit
> and skip to the next outer tuple for those join types anyway. I don't
> follow why having both these flags set would cause trouble. It seems
> perfectly legitimate that add_paths_to_joinrel() could choose to set
> the inner_unique flag for these join types, and if it did, the Assert
> you're proposing would fail for no good reason.
>
I tend to agree with you that someone might set this flag to true for 
these join types in the future.

However, is it necessary to check that extra->inner_unique must be false 
for SEMI/ANTI joins here, or am I missing something? It looks a little 
confusing at this point.

if(!extra->inner_unique&& (jointype== JOIN_SEMI|| jointype== JOIN_ANTI))

     single_mode= true;

--

Regards,
Alena Rybakina
Postgres Professional

Re: Memoize ANTI and SEMI JOIN inner

David Rowley <dgrowleyml@gmail.com> — 2025-03-31T03:33:03Z

On Mon, 31 Mar 2025 at 16:21, Alena Rybakina <a.rybakina@postgrespro.ru> wrote:
> However, is it necessary to check that extra->inner_unique must be false for SEMI/ANTI joins here, or am I missing something? It looks a little confusing at this point.

If it is necessary, I don't see the reason for it. It was me that
worked on unique joins and I see no reason why a SEMI or ANTI join
couldn't be marked as unique. The reason they're not today is that the
only point of the unique join optimisation is so that during
execution, the join nodes could skip to the next outer tuple after
matching the current outer to an inner.  If the join is unique, then
there are no more join partners to find for the current outer after
matching it up once. With SEMI and ANTI joins, we skip to the next
outer tuple after finding a match anyway, so there's no point in going
to the trouble of setting the inner_unique flag.

I can't say definitively that we won't find a reason in the future
that we should set inner_unique for SEMI/ANTI joins, so I don't follow
the need for the Assert.

Maybe you're seeing something that I'm not. What do you think will
break if both flags are true?

David

Re: Memoize ANTI and SEMI JOIN inner

Alena Rybakina <a.rybakina@postgrespro.ru> — 2025-03-31T06:34:54Z

On 31.03.2025 06:33, David Rowley wrote:
> On Mon, 31 Mar 2025 at 16:21, Alena Rybakina <a.rybakina@postgrespro.ru> wrote:
>> However, is it necessary to check that extra->inner_unique must be false for SEMI/ANTI joins here, or am I missing something? It looks a little confusing at this point.
> If it is necessary, I don't see the reason for it. It was me that
> worked on unique joins and I see no reason why a SEMI or ANTI join
> couldn't be marked as unique. The reason they're not today is that the
> only point of the unique join optimisation is so that during
> execution, the join nodes could skip to the next outer tuple after
> matching the current outer to an inner.  If the join is unique, then
> there are no more join partners to find for the current outer after
> matching it up once. With SEMI and ANTI joins, we skip to the next
> outer tuple after finding a match anyway, so there's no point in going
> to the trouble of setting the inner_unique flag.
>
> I can't say definitively that we won't find a reason in the future
> that we should set inner_unique for SEMI/ANTI joins, so I don't follow
> the need for the Assert.
>
> Maybe you're seeing something that I'm not. What do you think will
> break if both flags are true?
>
Actually, I was mainly confused by the code itself - the check seemed to 
contradict the explanation. It looked like we were enforcing that
inner_unique must be false for SEMI/ANTI joins, even though it's not 
actually important for those join types.
That’s why I originally proposed either adding an Assert or removing 
this flag from check altogether, just to make it more explicit.

So, I agree with your explanation — by the definition of SEMI and ANTI 
joins, there's no need to set inner_unique, and also no need to assert 
against it.
These joins skip to the next outer tuple once they find a match (or fail 
to find one, in the case of ANTI).

I updated the diff, where I left changes only in the code comment.

-- 
Regards,
Alena Rybakina
Postgres Professional

Re: Memoize ANTI and SEMI JOIN inner

Andrei Lepikhov <lepihov@gmail.com> — 2025-03-31T07:45:18Z

On 3/31/25 05:33, David Rowley wrote:
> I can't say definitively that we won't find a reason in the future
> that we should set inner_unique for SEMI/ANTI joins, so I don't follow
> the need for the Assert.
> 
> Maybe you're seeing something that I'm not. What do you think will
> break if both flags are true?
I considered targeting PG 19 and July Comitfest for this feature, but 
currently, I don't see any further development needed here. What are 
your thoughts, David? Does it make sense to commit this to PG 18?

I have no reason to rush the process, but this feature seems beneficial 
for practice. When the internal structure is a bushy join tree, 
producing even a single tuple can be costly. SQL Server's Spool node 
addresses this issue, and people sometimes experience confusion 
detecting degradation during migration with specific queries.

-- 
regards, Andrei Lepikhov

Re: Memoize ANTI and SEMI JOIN inner

Richard Guo <guofenglinux@gmail.com> — 2025-03-31T09:03:43Z

On Thu, Mar 20, 2025 at 3:02 PM David Rowley <dgrowleyml@gmail.com> wrote:
> For making this work, I think the attached should be about the guts of
> the code changes. I didn't look at the comments. Right now I can't
> think of any reason why this can't be done, but some experimentation
> might reveal some reason that it can't.

I reviewed this patch and I have some concerns about the following
code:

    if (extra->inner_unique &&
        (inner_path->param_info == NULL ||
         bms_num_members(inner_path->param_info->ppi_serials) <
         list_length(extra->restrictlist)))
        return NULL;

I understand that this check is used to ensure that the entire join
condition is parameterized in the case of unique joins, so that we can
safely mark the cache entry as complete after reading the first tuple.
However, ppi_clauses includes join clauses available from all outer
rels, not just the current outer rel, while extra->restrictlist only
includes the restriction clauses for the current join.  This means the
check could pass even if a restriction clause isn't parameterized, as
long as another join clause, which doesn't belong to the current join,
is included in ppi_clauses.

Thanks
Richard

Re: Memoize ANTI and SEMI JOIN inner

David Rowley <dgrowleyml@gmail.com> — 2025-03-31T09:39:14Z

On Mon, 31 Mar 2025 at 22:03, Richard Guo <guofenglinux@gmail.com> wrote:
> I reviewed this patch and I have some concerns about the following
> code:
>
>     if (extra->inner_unique &&
>         (inner_path->param_info == NULL ||
>          bms_num_members(inner_path->param_info->ppi_serials) <
>          list_length(extra->restrictlist)))
>         return NULL;
>
> I understand that this check is used to ensure that the entire join
> condition is parameterized in the case of unique joins, so that we can
> safely mark the cache entry as complete after reading the first tuple.
> However, ppi_clauses includes join clauses available from all outer
> rels, not just the current outer rel, while extra->restrictlist only
> includes the restriction clauses for the current join.  This means the
> check could pass even if a restriction clause isn't parameterized, as
> long as another join clause, which doesn't belong to the current join,
> is included in ppi_clauses.

Shouldn't you be more concerned about master here than the patch given
that the code you pasted is from master?

David

Re: Memoize ANTI and SEMI JOIN inner

Andrei Lepikhov <lepihov@gmail.com> — 2025-03-31T09:46:08Z

On 3/31/25 11:03, Richard Guo wrote:
> On Thu, Mar 20, 2025 at 3:02 PM David Rowley <dgrowleyml@gmail.com> wrote:
>> For making this work, I think the attached should be about the guts of
>> the code changes. I didn't look at the comments. Right now I can't
>> think of any reason why this can't be done, but some experimentation
>> might reveal some reason that it can't.
> 
> I reviewed this patch and I have some concerns about the following
> code:
> 
>      if (extra->inner_unique &&
>          (inner_path->param_info == NULL ||
>           bms_num_members(inner_path->param_info->ppi_serials) <
>           list_length(extra->restrictlist)))
>          return NULL;
> 
> I understand that this check is used to ensure that the entire join
> condition is parameterized in the case of unique joins, so that we can
> safely mark the cache entry as complete after reading the first tuple.
> However, ppi_clauses includes join clauses available from all outer
> rels, not just the current outer rel, while extra->restrictlist only
> includes the restriction clauses for the current join.  This means the
> check could pass even if a restriction clause isn't parameterized, as
> long as another join clause, which doesn't belong to the current join,
> is included in ppi_clauses.
Initially, I had the same concern. But if ppi_clauses contains a qual, 
it should refer to this join and, as a result, be in the 
extra->restrictlist, isn't it?
I thought about references to the other side of an OUTER JOIN, but if 
the subquery refers to LHS, it just not be transformed to the SEMI/ANTI 
join.
Anyway, if you provide an example or just a sketch, I will be happy to 
discover it.

-- 
regards, Andrei Lepikhov

Re: Memoize ANTI and SEMI JOIN inner

Richard Guo <guofenglinux@gmail.com> — 2025-03-31T09:50:04Z

On Mon, Mar 31, 2025 at 6:39 PM David Rowley <dgrowleyml@gmail.com> wrote:
> On Mon, 31 Mar 2025 at 22:03, Richard Guo <guofenglinux@gmail.com> wrote:
> > I reviewed this patch and I have some concerns about the following
> > code:
> >
> >     if (extra->inner_unique &&
> >         (inner_path->param_info == NULL ||
> >          bms_num_members(inner_path->param_info->ppi_serials) <
> >          list_length(extra->restrictlist)))
> >         return NULL;
> >
> > I understand that this check is used to ensure that the entire join
> > condition is parameterized in the case of unique joins, so that we can
> > safely mark the cache entry as complete after reading the first tuple.
> > However, ppi_clauses includes join clauses available from all outer
> > rels, not just the current outer rel, while extra->restrictlist only
> > includes the restriction clauses for the current join.  This means the
> > check could pass even if a restriction clause isn't parameterized, as
> > long as another join clause, which doesn't belong to the current join,
> > is included in ppi_clauses.
>
> Shouldn't you be more concerned about master here than the patch given
> that the code you pasted is from master?

Right.  This code is from the master branch, not the patch.  It caught
my attention while I was reviewing the patch and noticed it modifies
this code.

Thanks
Richard

Re: Memoize ANTI and SEMI JOIN inner

Richard Guo <guofenglinux@gmail.com> — 2025-03-31T10:18:37Z

On Mon, Mar 31, 2025 at 6:46 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
> On 3/31/25 11:03, Richard Guo wrote:
> > I reviewed this patch and I have some concerns about the following
> > code:
> >
> >      if (extra->inner_unique &&
> >          (inner_path->param_info == NULL ||
> >           bms_num_members(inner_path->param_info->ppi_serials) <
> >           list_length(extra->restrictlist)))
> >          return NULL;
> >
> > I understand that this check is used to ensure that the entire join
> > condition is parameterized in the case of unique joins, so that we can
> > safely mark the cache entry as complete after reading the first tuple.
> > However, ppi_clauses includes join clauses available from all outer
> > rels, not just the current outer rel, while extra->restrictlist only
> > includes the restriction clauses for the current join.  This means the
> > check could pass even if a restriction clause isn't parameterized, as
> > long as another join clause, which doesn't belong to the current join,
> > is included in ppi_clauses.

> Initially, I had the same concern. But if ppi_clauses contains a qual,
> it should refer to this join and, as a result, be in the
> extra->restrictlist, isn't it?

Hmm, I don't think so.  As I mentioned upthread, ppi_clauses includes
join clauses available from all outer rels, not just the current one.
So a clause included in ppi_clauses is not necessarily included in
extra->restrictlist.  As an example, consider

create table t (a int, b int);

explain (costs off)
select * from t t1 join t t2 join
    lateral (select *, t1.a+t2.a as x from t t3 offset 0) t3
    on t2.a = t3.a
  on t1.b = t3.b;
                       QUERY PLAN
---------------------------------------------------------
 Nested Loop
   ->  Seq Scan on t t2
   ->  Nested Loop
         ->  Seq Scan on t t1
         ->  Subquery Scan on t3
               Filter: ((t2.a = t3.a) AND (t1.b = t3.b))
               ->  Seq Scan on t t3_1
(7 rows)

t3's ppi_clauses includes "t2.a = t3.a" and "t1.b = t3.b", while t1/t3
join's restrictlist only includes "t1.b = t3.b".

Thanks
Richard

Re: Memoize ANTI and SEMI JOIN inner

Andrei Lepikhov <lepihov@gmail.com> — 2025-03-31T10:33:12Z

On 3/31/25 12:18, Richard Guo wrote:
> On Mon, Mar 31, 2025 at 6:46 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
>   Nested Loop
>     ->  Seq Scan on t t2
>     ->  Nested Loop
>           ->  Seq Scan on t t1
>           ->  Subquery Scan on t3
>                 Filter: ((t2.a = t3.a) AND (t1.b = t3.b))
>                 ->  Seq Scan on t t3_1
> (7 rows)
> 
> t3's ppi_clauses includes "t2.a = t3.a" and "t1.b = t3.b", while t1/t3
> join's restrictlist only includes "t1.b = t3.b".
I attempted to make your query ab it closer to our case:

SET enable_mergejoin = f;
SET enable_hashjoin = f;
SET enable_material = f;
CREATE INDEX ON t(a);

explain (costs off)
select * from t t1 join t t2
   on EXISTS (select *, t1.a+t2.a as x from t t3
     WHERE t2.a = t3.a AND t1.b = t3.b);

and I don't get the case. As I see, ANTI/SEMI join just transforms to 
the regular join and it is still not the case. May you be more specific?

-- 
regards, Andrei Lepikhov

Re: Memoize ANTI and SEMI JOIN inner

Richard Guo <guofenglinux@gmail.com> — 2025-04-01T07:18:58Z

On Mon, Mar 31, 2025 at 7:33 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
> and I don't get the case. As I see, ANTI/SEMI join just transforms to
> the regular join and it is still not the case. May you be more specific?

Upthread, you said that a qual contained in ppi_clauses will also be
included in extra->restrictlist.  I provided this example to show that
this is not true.  And it seems to me that this discrepancy makes the
check I mentioned earlier not reliable in all cases.  As I explained
earlier, this check could pass even if a restriction clause isn't
parameterized, as long as another join clause, which doesn't belong to
the current join, is included in ppi_clauses.  This is not correct.

This isn't about your patch, but about the master code.  However, if
this code is incorrect, your patch will also behave incorrectly, since
you patch relies on and extends this check.

Thanks
Richard

Re: Memoize ANTI and SEMI JOIN inner

Andrei Lepikhov <lepihov@gmail.com> — 2025-04-04T13:49:56Z

On 4/1/25 09:18, Richard Guo wrote:
> On Mon, Mar 31, 2025 at 7:33 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
>> and I don't get the case. As I see, ANTI/SEMI join just transforms to
>> the regular join and it is still not the case. May you be more specific?
> 
> Upthread, you said that a qual contained in ppi_clauses will also be
> included in extra->restrictlist.  I provided this example to show that
> this is not true.  And it seems to me that this discrepancy makes the
> check I mentioned earlier not reliable in all cases.  As I explained
> earlier, this check could pass even if a restriction clause isn't
> parameterized, as long as another join clause, which doesn't belong to
> the current join, is included in ppi_clauses.  This is not correct.
Ok, keep aside the question of why we discuss it here.
After second thought, I think the current check is correct at the 
moment. Looking into the inner_unique proving machinery, I see that the 
inner may be decided to be unique if:
1. It has a unique index on joining columns - in the case of NestLoop, 
these clauses inevitably go down as parameters.
2. degenerate case like x=1 where x is unique column - not our case, 
because x=1 goes down
3. DISTINCT clause - in this case, we can't transform the sublink at 
early stages, and using the already planned subquery --> upper rel 
clause will not be pushed to the unique inner.

So, I see nothing criminal in this clause - it may be a little more 
specific to be sure we will be safe in case of improvements in the 
inner_unique proof logic or subquery parameterisation, but that's it. If 
my analysis is correct, we only need to change the comment.
As usual, it would be nice to see an example of incorrect behaviour if 
I'm wrong.

-- 
regards, Andrei Lepikhov

Re: Memoize ANTI and SEMI JOIN inner

Richard Guo <guofenglinux@gmail.com> — 2025-04-09T06:48:33Z

On Thu, Mar 20, 2025 at 3:02 PM David Rowley <dgrowleyml@gmail.com> wrote:
> For making this work, I think the attached should be about the guts of
> the code changes. I didn't look at the comments. Right now I can't
> think of any reason why this can't be done, but some experimentation
> might reveal some reason that it can't.

I conducted some experiments, and I'm afraid it's not safe to consider
Memoize for semi or anti joins, unless the inner side is provably
unique.  As an example, please consider:

create table t (a int, b boolean);
insert into t select i%3, false from generate_series(1,100)i;
analyze t;

select * from t t1 where t1.a in
 (select a from t t2 where t2.b in
  (select t1.b from t t3 where t2.a > 1 offset 0));
ERROR:  cache entry already complete

With the proposed patch, this query results in an error.

The problem is that join clauses from the upper level may be moved to
the semi join.  For a given outer tuple, the first inner tuple that
satisfies the current parameters will mark the cache entry as complete
because singlerow is set to true.  However, if that inner tuple and
the current outer tuple don't satisfy the join clauses, the second
inner tuple that satisfies the parameters will complain that the cache
entry is already marked as complete.

If the inner side is provably unique, there will be no such problem,
as there won't be a second matching tuple.  OTOH, in this case, the
semi join will be reduced to an inner join by reduce_unique_semijoins.
Therefore, it doesn't make much sense to prove inner_unique for semi
joins in add_paths_to_joinrel.

Perhaps we could spend some planner cycles proving inner_unique for
anti joins, so that Memoize nodes can be considered for them?

Thanks
Richard

Re: Memoize ANTI and SEMI JOIN inner

Andrei Lepikhov <lepihov@gmail.com> — 2025-04-09T08:03:41Z

On 4/9/25 08:48, Richard Guo wrote:
> Perhaps we could spend some planner cycles proving inner_unique for
> anti joins, so that Memoize nodes can be considered for them?
Thanks for working on it!
It is an example I have eagerly wanted to find.
So, no rush this feature now, I add it to the July commitfest in hope we 
have enough time to resolve these concerns.

-- 
regards, Andrei Lepikhov

Re: Memoize ANTI and SEMI JOIN inner

David Rowley <dgrowleyml@gmail.com> — 2025-04-09T09:18:25Z

On Wed, 9 Apr 2025 at 18:48, Richard Guo <guofenglinux@gmail.com> wrote:
>
> On Thu, Mar 20, 2025 at 3:02 PM David Rowley <dgrowleyml@gmail.com> wrote:
> > For making this work, I think the attached should be about the guts of
> > the code changes. I didn't look at the comments. Right now I can't
> > think of any reason why this can't be done, but some experimentation
> > might reveal some reason that it can't.
>
> I conducted some experiments, and I'm afraid it's not safe to consider
> Memoize for semi or anti joins, unless the inner side is provably
> unique.  As an example, please consider:

Thanks for checking that. I was just looking at the spot you'd need to
adjust to prove the inner_unique for anti joins and found I'd written:

/*
* XXX it may be worth proving this to allow a Memoize to be
* considered for Nested Loop Semi/Anti Joins.
*/

Looks like I must have known that at one point in time...

> Perhaps we could spend some planner cycles proving inner_unique for
> anti joins, so that Memoize nodes can be considered for them?

Worth a try. It should be pretty easy to enable, as far as I can see.
It might just be a case of shuffling the cases around in the switch
statement in add_paths_to_joinrel().

David

Re: Memoize ANTI and SEMI JOIN inner

Richard Guo <guofenglinux@gmail.com> — 2025-04-10T10:36:14Z

On Wed, Apr 9, 2025 at 6:18 PM David Rowley <dgrowleyml@gmail.com> wrote:
> On Wed, 9 Apr 2025 at 18:48, Richard Guo <guofenglinux@gmail.com> wrote:
> > Perhaps we could spend some planner cycles proving inner_unique for
> > anti joins, so that Memoize nodes can be considered for them?

> Worth a try. It should be pretty easy to enable, as far as I can see.
> It might just be a case of shuffling the cases around in the switch
> statement in add_paths_to_joinrel().

Right.  Here is a patch for that.

Thanks
Richard

Re: Memoize ANTI and SEMI JOIN inner

Andres Freund <andres@anarazel.de> — 2025-05-13T13:51:42Z

Hi,

On 2025-04-10 19:36:14 +0900, Richard Guo wrote:
> On Wed, Apr 9, 2025 at 6:18 PM David Rowley <dgrowleyml@gmail.com> wrote:
> > On Wed, 9 Apr 2025 at 18:48, Richard Guo <guofenglinux@gmail.com> wrote:
> > > Perhaps we could spend some planner cycles proving inner_unique for
> > > anti joins, so that Memoize nodes can be considered for them?
>
> > Worth a try. It should be pretty easy to enable, as far as I can see.
> > It might just be a case of shuffling the cases around in the switch
> > statement in add_paths_to_joinrel().
>
> Right.  Here is a patch for that.

This is failing on CI:
https://cirrus-ci.com/github/postgresql-cfbot/postgresql/cf%2F5636
https://api.cirrus-ci.com/v1/artifact/task/5411026402803712/testrun/build-32/testrun/regress/regress/regression.diffs

diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/memoize.out /tmp/cirrus-ci-build/build-32/testrun/regress/regress/results/memoize.out
--- /tmp/cirrus-ci-build/src/test/regress/expected/memoize.out	2025-04-12 11:24:10.866868945 +0000
+++ /tmp/cirrus-ci-build/build-32/testrun/regress/regress/results/memoize.out	2025-04-12 11:32:44.454864476 +0000
@@ -525,7 +525,7 @@
                      ->  Unique (actual rows=2.67 loops=N)
                            ->  Sort (actual rows=67.33 loops=N)
                                  Sort Key: t2_1.a
-                                 Sort Method: quicksort  Memory: 27kB
+                                 Sort Method: quicksort  Memory: 18kB
                                  ->  Seq Scan on tab_anti t2_1 (actual rows=100.00 loops=N)
 (15 rows)


There shouldn't be Memory mentioned in tests added to the tree.

Greetings,

Andres

Re: Memoize ANTI and SEMI JOIN inner

Richard Guo <guofenglinux@gmail.com> — 2025-05-14T02:13:56Z

On Tue, May 13, 2025 at 10:51 PM Andres Freund <andres@anarazel.de> wrote:
> This is failing on CI:
> https://cirrus-ci.com/github/postgresql-cfbot/postgresql/cf%2F5636
> https://api.cirrus-ci.com/v1/artifact/task/5411026402803712/testrun/build-32/testrun/regress/regress/regression.diffs
>
> diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/memoize.out /tmp/cirrus-ci-build/build-32/testrun/regress/regress/results/memoize.out
> --- /tmp/cirrus-ci-build/src/test/regress/expected/memoize.out  2025-04-12 11:24:10.866868945 +0000
> +++ /tmp/cirrus-ci-build/build-32/testrun/regress/regress/results/memoize.out   2025-04-12 11:32:44.454864476 +0000
> @@ -525,7 +525,7 @@
>                       ->  Unique (actual rows=2.67 loops=N)
>                             ->  Sort (actual rows=67.33 loops=N)
>                                   Sort Key: t2_1.a
> -                                 Sort Method: quicksort  Memory: 27kB
> +                                 Sort Method: quicksort  Memory: 18kB
>                                   ->  Seq Scan on tab_anti t2_1 (actual rows=100.00 loops=N)
>  (15 rows)
>
>
> There shouldn't be Memory mentioned in tests added to the tree.

Thanks for pointing this out.  You're right — Memory usage should be
hidden from the output of EXPLAIN ANALYZE.  I'm a bit surprised the
existing explain_memoize function doesn't already handle that; I
guess it's because the plans in memoize.sql don't currently include
any Sort nodes.  In any case, I've added a regexp_replace to filter
out 'Memory: \d+kB' in explain_memoize.

Thanks
Richard

Re: Memoize ANTI and SEMI JOIN inner

Richard Guo <guofenglinux@gmail.com> — 2025-07-02T02:08:01Z

On Thu, Apr 10, 2025 at 7:36 PM Richard Guo <guofenglinux@gmail.com> wrote:
> On Wed, Apr 9, 2025 at 6:18 PM David Rowley <dgrowleyml@gmail.com> wrote:
> > On Wed, 9 Apr 2025 at 18:48, Richard Guo <guofenglinux@gmail.com> wrote:
> > > Perhaps we could spend some planner cycles proving inner_unique for
> > > anti joins, so that Memoize nodes can be considered for them?

> > Worth a try. It should be pretty easy to enable, as far as I can see.
> > It might just be a case of shuffling the cases around in the switch
> > statement in add_paths_to_joinrel().

> Right.  Here is a patch for that.

Here's a rebased version with no other changes.

David, Andrei, would you like to take another look?  I'm planning to
push it soon.

Thanks
Richard

Re: Memoize ANTI and SEMI JOIN inner

wenhui qiu <qiuwenhuifx@gmail.com> — 2025-07-02T03:48:11Z

HI

> - if (!extra->inner_unique && (jointype == JOIN_SEMI ||
> - jointype == JOIN_ANTI))
> + if ((jointype == JOIN_SEMI || jointype == JOIN_ANTI) &&
> + !extra->inner_unique)
To be nitpicky, this change is meant to align with the earlier comment
modifications to improve code readability, right? Everything else looks
good to me."

Thanks

On Wed, Jul 2, 2025 at 10:08 AM Richard Guo <guofenglinux@gmail.com> wrote:

> On Thu, Apr 10, 2025 at 7:36 PM Richard Guo <guofenglinux@gmail.com>
> wrote:
> > On Wed, Apr 9, 2025 at 6:18 PM David Rowley <dgrowleyml@gmail.com>
> wrote:
> > > On Wed, 9 Apr 2025 at 18:48, Richard Guo <guofenglinux@gmail.com>
> wrote:
> > > > Perhaps we could spend some planner cycles proving inner_unique for
> > > > anti joins, so that Memoize nodes can be considered for them?
>
> > > Worth a try. It should be pretty easy to enable, as far as I can see.
> > > It might just be a case of shuffling the cases around in the switch
> > > statement in add_paths_to_joinrel().
>
> > Right.  Here is a patch for that.
>
> Here's a rebased version with no other changes.
>
> David, Andrei, would you like to take another look?  I'm planning to
> push it soon.
>
> Thanks
> Richard
>

Re: Memoize ANTI and SEMI JOIN inner

Andrei Lepikhov <lepihov@gmail.com> — 2025-07-02T09:01:39Z

On 2/7/2025 05:48, wenhui qiu wrote:
> HI
> 
>  > - if (!extra->inner_unique && (jointype == JOIN_SEMI ||
>  > - jointype == JOIN_ANTI))
>  > + if ((jointype == JOIN_SEMI || jointype == JOIN_ANTI) &&
>  > + !extra->inner_unique)
> To be nitpicky, this change is meant to align with the earlier comment 
> modifications to improve code readability, right? Everything else looks 
> good to me."
Yep, I also found only this flaw.
Comments looks clear to me, test is quite stable.

May be correct the line:
INSERT INTO tab_anti SELECT i%3, false FROM generate_series(1,100)i;
with a backspace or an 'AS' keyword?

-- 
regards, Andrei Lepikhov