Thread

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Enable use of Memoize for ANTI joins

  1. Memoize ANTI and SEMI JOIN inner

    Andrei Lepikhov <lepihov@gmail.com> — 2025-03-06T13:08:59Z

    Hi,
    
    In case of NestLoop with parameterised inner semi-join for each outer 
    tuple requires only a single tuple from its inner relation to produce a 
    result. It seems that the same principle applies to an anti-join. This 
    approach could effectively allow the Memoize node to enhance the 
    performance of pulled-up EXISTS and NOT EXISTS sublinks.
    
    In attachment see a sketch of the feature. Since we are using single_row 
    mode, adapting this method to cache semi-join inner results should not 
    be extremely complex. However, I am unsure about potential corner cases 
    and would appreciate any feedback or criticisms regarding this approach.
    
    -- 
    regards, Andrei Lepikhov
    
  2. Re: Memoize ANTI and SEMI JOIN inner

    Andrei Lepikhov <lepihov@gmail.com> — 2025-03-19T17:15:54Z

    On 6/3/2025 14:08, Andrei Lepikhov wrote:
    > Hi,
    > 
    > In case of NestLoop with parameterised inner semi-join for each outer 
    > tuple requires only a single tuple from its inner relation to produce a 
    > result. It seems that the same principle applies to an anti-join. This 
    > approach could effectively allow the Memoize node to enhance the 
    > performance of pulled-up EXISTS and NOT EXISTS sublinks.
    > 
    > In attachment see a sketch of the feature. Since we are using single_row 
    > mode, adapting this method to cache semi-join inner results should not 
    > be extremely complex. However, I am unsure about potential corner cases 
    > and would appreciate any feedback or criticisms regarding this approach.
    I found a corner case that breaks this approach: when NestLoop has a 
    join clause or a filter, it may filter inner tuples, thus scanning more 
    than only one time for each outer. You can see the reproduction script 
    in the attachment. Each of the reproduction queries throws the error:
    ERROR: cache entry already complete
    
    How can we be sure that semi or anti-join needs only one tuple? I think 
    it would be enough to control the absence of join clauses and filters in 
    the join. Unfortunately, we only have such a guarantee in the plan 
    creation stage (maybe even setrefs.c). So, it seems we need to invent an 
    approach like AlternativeSubplan.
    
    -- 
    regards, Andrei Lepikhov
  3. Re: Memoize ANTI and SEMI JOIN inner

    David Rowley <dgrowleyml@gmail.com> — 2025-03-20T06:02:16Z

    On Thu, 20 Mar 2025 at 06:16, Andrei Lepikhov <lepihov@gmail.com> wrote:
    > How can we be sure that semi or anti-join needs only one tuple? I think
    > it would be enough to control the absence of join clauses and filters in
    > the join. Unfortunately, we only have such a guarantee in the plan
    > creation stage (maybe even setrefs.c). So, it seems we need to invent an
    > approach like AlternativeSubplan.
    
    I suggest looking at what 9e215378d did.  You might be able to also
    allow semi and anti-joins providing the cache keys cover the entire
    join condition. I think this might be safe as Nested Loop will only
    ask its inner subnode for the first match before skipping to the next
    outer row and with anti-join, there's no reason to look for additional
    rows after the first. Semi-join and unique joins do the same thing in
    nodeNestloop.c. To save doing additional checks at run-time, the code
    does:
    
    nlstate->js.single_match = (node->join.inner_unique ||
    node->join.jointype == JOIN_SEMI);
    
    For making this work, I think the attached should be about the guts of
    the code changes. I didn't look at the comments. Right now I can't
    think of any reason why this can't be done, but some experimentation
    might reveal some reason that it can't.
    
    David
    
  4. Re: Memoize ANTI and SEMI JOIN inner

    Andrei Lepikhov <lepihov@gmail.com> — 2025-03-21T15:56:09Z

    On 20/3/2025 07:02, David Rowley wrote:
    > On Thu, 20 Mar 2025 at 06:16, Andrei Lepikhov <lepihov@gmail.com> wrote:
    >> How can we be sure that semi or anti-join needs only one tuple? I think
    >> it would be enough to control the absence of join clauses and filters in
    >> the join. Unfortunately, we only have such a guarantee in the plan
    >> creation stage (maybe even setrefs.c). So, it seems we need to invent an
    >> approach like AlternativeSubplan.
    > 
    > I suggest looking at what 9e215378d did.  You might be able to also
    > allow semi and anti-joins providing the cache keys cover the entire
    > join condition. I think this might be safe as Nested Loop will only
    > ask its inner subnode for the first match before skipping to the next
    > outer row and with anti-join, there's no reason to look for additional
    > rows after the first. Semi-join and unique joins do the same thing in
    > nodeNestloop.c. To save doing additional checks at run-time, the code
    > does:
    Thank you for the clue! I almost took the wrong direction.
    I have attached the new patch, which includes corrected comments for 
    better clarification of the changes, as well as some additional tests.
    I now feel much more confident about this version since I have resolved 
    that concern.
    
    -- 
    regards, Andrei Lepikhov
  5. Re: Memoize ANTI and SEMI JOIN inner

    Alena Rybakina <a.rybakina@postgrespro.ru> — 2025-03-31T02:33:49Z

    Hi!
    
    On 21.03.2025 18:56, Andrei Lepikhov wrote:
    > On 20/3/2025 07:02, David Rowley wrote:
    >> On Thu, 20 Mar 2025 at 06:16, Andrei Lepikhov <lepihov@gmail.com> wrote:
    >>> How can we be sure that semi or anti-join needs only one tuple? I think
    >>> it would be enough to control the absence of join clauses and 
    >>> filters in
    >>> the join. Unfortunately, we only have such a guarantee in the plan
    >>> creation stage (maybe even setrefs.c). So, it seems we need to 
    >>> invent an
    >>> approach like AlternativeSubplan.
    >>
    >> I suggest looking at what 9e215378d did.  You might be able to also
    >> allow semi and anti-joins providing the cache keys cover the entire
    >> join condition. I think this might be safe as Nested Loop will only
    >> ask its inner subnode for the first match before skipping to the next
    >> outer row and with anti-join, there's no reason to look for additional
    >> rows after the first. Semi-join and unique joins do the same thing in
    >> nodeNestloop.c. To save doing additional checks at run-time, the code
    >> does:
    > Thank you for the clue! I almost took the wrong direction.
    > I have attached the new patch, which includes corrected comments for 
    > better clarification of the changes, as well as some additional tests.
    > I now feel much more confident about this version since I have 
    > resolved that concern.
    >
    
    I reviewed your patch and made a couple of suggestions.
    
    The first change is related to your comment (and the one before it). I 
    fixed some grammar issues and simplified the wording to make it clearer 
    and easier to understand.
    
    The second change involves adding an Assert when generating the Memoize 
    path. Based on the existing comment and the surrounding logic (shown 
    below),
    I believe it's worth asserting that both inner_unique and single_mode 
    are not true at the same time — just as a safety check.
    /*
    * We may do this for SEMI or ANTI joins when they need only one tuple from
    * the inner side to produce the result. Following if condition checks that
    * rule.
    *
    * XXX Currently we don't attempt to mark SEMI/ANTI joins as inner_unique
    * = true. Should we? See add_paths_to_joinrel()
    */
    if(!extra->inner_unique&& (jointype== JOIN_SEMI||
    jointype== JOIN_ANTI))
    single_mode= true;
    
    -- 
    Regards,
    Alena Rybakina
    Postgres Professional
    
  6. Re: Memoize ANTI and SEMI JOIN inner

    Alena Rybakina <a.rybakina@postgrespro.ru> — 2025-03-31T02:50:07Z

    I realized that I uploaded my diff file with a small mistake - sorry 
    about that. I've corrected it with this message so your tests can pass 
    in the CI.
    
    On 31.03.2025 05:33, Alena Rybakina wrote:
    >
    > Hi!
    >
    > On 21.03.2025 18:56, Andrei Lepikhov wrote:
    >> On 20/3/2025 07:02, David Rowley wrote:
    >>> On Thu, 20 Mar 2025 at 06:16, Andrei Lepikhov <lepihov@gmail.com> 
    >>> wrote:
    >>>> How can we be sure that semi or anti-join needs only one tuple? I 
    >>>> think
    >>>> it would be enough to control the absence of join clauses and 
    >>>> filters in
    >>>> the join. Unfortunately, we only have such a guarantee in the plan
    >>>> creation stage (maybe even setrefs.c). So, it seems we need to 
    >>>> invent an
    >>>> approach like AlternativeSubplan.
    >>>
    >>> I suggest looking at what 9e215378d did.  You might be able to also
    >>> allow semi and anti-joins providing the cache keys cover the entire
    >>> join condition. I think this might be safe as Nested Loop will only
    >>> ask its inner subnode for the first match before skipping to the next
    >>> outer row and with anti-join, there's no reason to look for additional
    >>> rows after the first. Semi-join and unique joins do the same thing in
    >>> nodeNestloop.c. To save doing additional checks at run-time, the code
    >>> does:
    >> Thank you for the clue! I almost took the wrong direction.
    >> I have attached the new patch, which includes corrected comments for 
    >> better clarification of the changes, as well as some additional tests.
    >> I now feel much more confident about this version since I have 
    >> resolved that concern.
    >>
    >
    > I reviewed your patch and made a couple of suggestions.
    >
    > The first change is related to your comment (and the one before it). I 
    > fixed some grammar issues and simplified the wording to make it 
    > clearer and easier to understand.
    >
    > The second change involves adding an Assert when generating the 
    > Memoize path. Based on the existing comment and the surrounding logic 
    > (shown below),
    > I believe it's worth asserting that both inner_unique and single_mode 
    > are not true at the same time — just as a safety check.
    > /*
    > * We may do this for SEMI or ANTI joins when they need only one tuple from
    > * the inner side to produce the result. Following if condition checks that
    > * rule.
    > *
    > * XXX Currently we don't attempt to mark SEMI/ANTI joins as inner_unique
    > * = true. Should we? See add_paths_to_joinrel()
    > */
    > if(!extra->inner_unique&& (jointype== JOIN_SEMI||
    > jointype== JOIN_ANTI))
    > single_mode= true;
    
    -- 
    Regards,
    Alena Rybakina
    Postgres Professional
    
  7. Re: Memoize ANTI and SEMI JOIN inner

    David Rowley <dgrowleyml@gmail.com> — 2025-03-31T03:04:18Z

    On Mon, 31 Mar 2025 at 15:33, Alena Rybakina <a.rybakina@postgrespro.ru> wrote:
    > I believe it's worth asserting that both inner_unique and single_mode are not true at the same time — just as a safety check.
    
    add_paths_to_joinrel() just chooses not to populate inner_unique for
    SEMI and ANTI joins because, as of today's master, it's pretty
    pointless to determine that because the executor will short-circuit
    and skip to the next outer tuple for those join types anyway. I don't
    follow why having both these flags set would cause trouble. It seems
    perfectly legitimate that add_paths_to_joinrel() could choose to set
    the inner_unique flag for these join types, and if it did, the Assert
    you're proposing would fail for no good reason.
    
    David
    
    
    
    
  8. Re: Memoize ANTI and SEMI JOIN inner

    Alena Rybakina <a.rybakina@postgrespro.ru> — 2025-03-31T03:21:43Z

    On 31.03.2025 06:04, David Rowley wrote:
    > On Mon, 31 Mar 2025 at 15:33, Alena Rybakina<a.rybakina@postgrespro.ru>  wrote:
    >> I believe it's worth asserting that both inner_unique and single_mode are not true at the same time — just as a safety check.
    > add_paths_to_joinrel() just chooses not to populate inner_unique for
    > SEMI and ANTI joins because, as of today's master, it's pretty
    > pointless to determine that because the executor will short-circuit
    > and skip to the next outer tuple for those join types anyway. I don't
    > follow why having both these flags set would cause trouble. It seems
    > perfectly legitimate that add_paths_to_joinrel() could choose to set
    > the inner_unique flag for these join types, and if it did, the Assert
    > you're proposing would fail for no good reason.
    >
    I tend to agree with you that someone might set this flag to true for 
    these join types in the future.
    
    However, is it necessary to check that extra->inner_unique must be false 
    for SEMI/ANTI joins here, or am I missing something? It looks a little 
    confusing at this point.
    
    if(!extra->inner_unique&& (jointype== JOIN_SEMI|| jointype== JOIN_ANTI))
    
         single_mode= true;
    
    --
    
    Regards,
    Alena Rybakina
    Postgres Professional
    
  9. Re: Memoize ANTI and SEMI JOIN inner

    David Rowley <dgrowleyml@gmail.com> — 2025-03-31T03:33:03Z

    On Mon, 31 Mar 2025 at 16:21, Alena Rybakina <a.rybakina@postgrespro.ru> wrote:
    > However, is it necessary to check that extra->inner_unique must be false for SEMI/ANTI joins here, or am I missing something? It looks a little confusing at this point.
    
    If it is necessary, I don't see the reason for it. It was me that
    worked on unique joins and I see no reason why a SEMI or ANTI join
    couldn't be marked as unique. The reason they're not today is that the
    only point of the unique join optimisation is so that during
    execution, the join nodes could skip to the next outer tuple after
    matching the current outer to an inner.  If the join is unique, then
    there are no more join partners to find for the current outer after
    matching it up once. With SEMI and ANTI joins, we skip to the next
    outer tuple after finding a match anyway, so there's no point in going
    to the trouble of setting the inner_unique flag.
    
    I can't say definitively that we won't find a reason in the future
    that we should set inner_unique for SEMI/ANTI joins, so I don't follow
    the need for the Assert.
    
    Maybe you're seeing something that I'm not. What do you think will
    break if both flags are true?
    
    David
    
    
    
    
  10. Re: Memoize ANTI and SEMI JOIN inner

    Alena Rybakina <a.rybakina@postgrespro.ru> — 2025-03-31T06:34:54Z

    On 31.03.2025 06:33, David Rowley wrote:
    > On Mon, 31 Mar 2025 at 16:21, Alena Rybakina <a.rybakina@postgrespro.ru> wrote:
    >> However, is it necessary to check that extra->inner_unique must be false for SEMI/ANTI joins here, or am I missing something? It looks a little confusing at this point.
    > If it is necessary, I don't see the reason for it. It was me that
    > worked on unique joins and I see no reason why a SEMI or ANTI join
    > couldn't be marked as unique. The reason they're not today is that the
    > only point of the unique join optimisation is so that during
    > execution, the join nodes could skip to the next outer tuple after
    > matching the current outer to an inner.  If the join is unique, then
    > there are no more join partners to find for the current outer after
    > matching it up once. With SEMI and ANTI joins, we skip to the next
    > outer tuple after finding a match anyway, so there's no point in going
    > to the trouble of setting the inner_unique flag.
    >
    > I can't say definitively that we won't find a reason in the future
    > that we should set inner_unique for SEMI/ANTI joins, so I don't follow
    > the need for the Assert.
    >
    > Maybe you're seeing something that I'm not. What do you think will
    > break if both flags are true?
    >
    Actually, I was mainly confused by the code itself - the check seemed to 
    contradict the explanation. It looked like we were enforcing that
    inner_unique must be false for SEMI/ANTI joins, even though it's not 
    actually important for those join types.
    That’s why I originally proposed either adding an Assert or removing 
    this flag from check altogether, just to make it more explicit.
    
    So, I agree with your explanation — by the definition of SEMI and ANTI 
    joins, there's no need to set inner_unique, and also no need to assert 
    against it.
    These joins skip to the next outer tuple once they find a match (or fail 
    to find one, in the case of ANTI).
    
    I updated the diff, where I left changes only in the code comment.
    
    -- 
    Regards,
    Alena Rybakina
    Postgres Professional
    
  11. Re: Memoize ANTI and SEMI JOIN inner

    Andrei Lepikhov <lepihov@gmail.com> — 2025-03-31T07:45:18Z

    On 3/31/25 05:33, David Rowley wrote:
    > I can't say definitively that we won't find a reason in the future
    > that we should set inner_unique for SEMI/ANTI joins, so I don't follow
    > the need for the Assert.
    > 
    > Maybe you're seeing something that I'm not. What do you think will
    > break if both flags are true?
    I considered targeting PG 19 and July Comitfest for this feature, but 
    currently, I don't see any further development needed here. What are 
    your thoughts, David? Does it make sense to commit this to PG 18?
    
    I have no reason to rush the process, but this feature seems beneficial 
    for practice. When the internal structure is a bushy join tree, 
    producing even a single tuple can be costly. SQL Server's Spool node 
    addresses this issue, and people sometimes experience confusion 
    detecting degradation during migration with specific queries.
    
    -- 
    regards, Andrei Lepikhov
    
    
    
    
  12. Re: Memoize ANTI and SEMI JOIN inner

    Richard Guo <guofenglinux@gmail.com> — 2025-03-31T09:03:43Z

    On Thu, Mar 20, 2025 at 3:02 PM David Rowley <dgrowleyml@gmail.com> wrote:
    > For making this work, I think the attached should be about the guts of
    > the code changes. I didn't look at the comments. Right now I can't
    > think of any reason why this can't be done, but some experimentation
    > might reveal some reason that it can't.
    
    I reviewed this patch and I have some concerns about the following
    code:
    
        if (extra->inner_unique &&
            (inner_path->param_info == NULL ||
             bms_num_members(inner_path->param_info->ppi_serials) <
             list_length(extra->restrictlist)))
            return NULL;
    
    I understand that this check is used to ensure that the entire join
    condition is parameterized in the case of unique joins, so that we can
    safely mark the cache entry as complete after reading the first tuple.
    However, ppi_clauses includes join clauses available from all outer
    rels, not just the current outer rel, while extra->restrictlist only
    includes the restriction clauses for the current join.  This means the
    check could pass even if a restriction clause isn't parameterized, as
    long as another join clause, which doesn't belong to the current join,
    is included in ppi_clauses.
    
    Thanks
    Richard
    
    
    
    
  13. Re: Memoize ANTI and SEMI JOIN inner

    David Rowley <dgrowleyml@gmail.com> — 2025-03-31T09:39:14Z

    On Mon, 31 Mar 2025 at 22:03, Richard Guo <guofenglinux@gmail.com> wrote:
    > I reviewed this patch and I have some concerns about the following
    > code:
    >
    >     if (extra->inner_unique &&
    >         (inner_path->param_info == NULL ||
    >          bms_num_members(inner_path->param_info->ppi_serials) <
    >          list_length(extra->restrictlist)))
    >         return NULL;
    >
    > I understand that this check is used to ensure that the entire join
    > condition is parameterized in the case of unique joins, so that we can
    > safely mark the cache entry as complete after reading the first tuple.
    > However, ppi_clauses includes join clauses available from all outer
    > rels, not just the current outer rel, while extra->restrictlist only
    > includes the restriction clauses for the current join.  This means the
    > check could pass even if a restriction clause isn't parameterized, as
    > long as another join clause, which doesn't belong to the current join,
    > is included in ppi_clauses.
    
    Shouldn't you be more concerned about master here than the patch given
    that the code you pasted is from master?
    
    David
    
    
    
    
  14. Re: Memoize ANTI and SEMI JOIN inner

    Andrei Lepikhov <lepihov@gmail.com> — 2025-03-31T09:46:08Z

    On 3/31/25 11:03, Richard Guo wrote:
    > On Thu, Mar 20, 2025 at 3:02 PM David Rowley <dgrowleyml@gmail.com> wrote:
    >> For making this work, I think the attached should be about the guts of
    >> the code changes. I didn't look at the comments. Right now I can't
    >> think of any reason why this can't be done, but some experimentation
    >> might reveal some reason that it can't.
    > 
    > I reviewed this patch and I have some concerns about the following
    > code:
    > 
    >      if (extra->inner_unique &&
    >          (inner_path->param_info == NULL ||
    >           bms_num_members(inner_path->param_info->ppi_serials) <
    >           list_length(extra->restrictlist)))
    >          return NULL;
    > 
    > I understand that this check is used to ensure that the entire join
    > condition is parameterized in the case of unique joins, so that we can
    > safely mark the cache entry as complete after reading the first tuple.
    > However, ppi_clauses includes join clauses available from all outer
    > rels, not just the current outer rel, while extra->restrictlist only
    > includes the restriction clauses for the current join.  This means the
    > check could pass even if a restriction clause isn't parameterized, as
    > long as another join clause, which doesn't belong to the current join,
    > is included in ppi_clauses.
    Initially, I had the same concern. But if ppi_clauses contains a qual, 
    it should refer to this join and, as a result, be in the 
    extra->restrictlist, isn't it?
    I thought about references to the other side of an OUTER JOIN, but if 
    the subquery refers to LHS, it just not be transformed to the SEMI/ANTI 
    join.
    Anyway, if you provide an example or just a sketch, I will be happy to 
    discover it.
    
    -- 
    regards, Andrei Lepikhov
    
    
    
    
  15. Re: Memoize ANTI and SEMI JOIN inner

    Richard Guo <guofenglinux@gmail.com> — 2025-03-31T09:50:04Z

    On Mon, Mar 31, 2025 at 6:39 PM David Rowley <dgrowleyml@gmail.com> wrote:
    > On Mon, 31 Mar 2025 at 22:03, Richard Guo <guofenglinux@gmail.com> wrote:
    > > I reviewed this patch and I have some concerns about the following
    > > code:
    > >
    > >     if (extra->inner_unique &&
    > >         (inner_path->param_info == NULL ||
    > >          bms_num_members(inner_path->param_info->ppi_serials) <
    > >          list_length(extra->restrictlist)))
    > >         return NULL;
    > >
    > > I understand that this check is used to ensure that the entire join
    > > condition is parameterized in the case of unique joins, so that we can
    > > safely mark the cache entry as complete after reading the first tuple.
    > > However, ppi_clauses includes join clauses available from all outer
    > > rels, not just the current outer rel, while extra->restrictlist only
    > > includes the restriction clauses for the current join.  This means the
    > > check could pass even if a restriction clause isn't parameterized, as
    > > long as another join clause, which doesn't belong to the current join,
    > > is included in ppi_clauses.
    >
    > Shouldn't you be more concerned about master here than the patch given
    > that the code you pasted is from master?
    
    Right.  This code is from the master branch, not the patch.  It caught
    my attention while I was reviewing the patch and noticed it modifies
    this code.
    
    Thanks
    Richard
    
    
    
    
  16. Re: Memoize ANTI and SEMI JOIN inner

    Richard Guo <guofenglinux@gmail.com> — 2025-03-31T10:18:37Z

    On Mon, Mar 31, 2025 at 6:46 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
    > On 3/31/25 11:03, Richard Guo wrote:
    > > I reviewed this patch and I have some concerns about the following
    > > code:
    > >
    > >      if (extra->inner_unique &&
    > >          (inner_path->param_info == NULL ||
    > >           bms_num_members(inner_path->param_info->ppi_serials) <
    > >           list_length(extra->restrictlist)))
    > >          return NULL;
    > >
    > > I understand that this check is used to ensure that the entire join
    > > condition is parameterized in the case of unique joins, so that we can
    > > safely mark the cache entry as complete after reading the first tuple.
    > > However, ppi_clauses includes join clauses available from all outer
    > > rels, not just the current outer rel, while extra->restrictlist only
    > > includes the restriction clauses for the current join.  This means the
    > > check could pass even if a restriction clause isn't parameterized, as
    > > long as another join clause, which doesn't belong to the current join,
    > > is included in ppi_clauses.
    
    > Initially, I had the same concern. But if ppi_clauses contains a qual,
    > it should refer to this join and, as a result, be in the
    > extra->restrictlist, isn't it?
    
    Hmm, I don't think so.  As I mentioned upthread, ppi_clauses includes
    join clauses available from all outer rels, not just the current one.
    So a clause included in ppi_clauses is not necessarily included in
    extra->restrictlist.  As an example, consider
    
    create table t (a int, b int);
    
    explain (costs off)
    select * from t t1 join t t2 join
        lateral (select *, t1.a+t2.a as x from t t3 offset 0) t3
        on t2.a = t3.a
      on t1.b = t3.b;
                           QUERY PLAN
    ---------------------------------------------------------
     Nested Loop
       ->  Seq Scan on t t2
       ->  Nested Loop
             ->  Seq Scan on t t1
             ->  Subquery Scan on t3
                   Filter: ((t2.a = t3.a) AND (t1.b = t3.b))
                   ->  Seq Scan on t t3_1
    (7 rows)
    
    t3's ppi_clauses includes "t2.a = t3.a" and "t1.b = t3.b", while t1/t3
    join's restrictlist only includes "t1.b = t3.b".
    
    Thanks
    Richard
    
    
    
    
  17. Re: Memoize ANTI and SEMI JOIN inner

    Andrei Lepikhov <lepihov@gmail.com> — 2025-03-31T10:33:12Z

    On 3/31/25 12:18, Richard Guo wrote:
    > On Mon, Mar 31, 2025 at 6:46 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
    >   Nested Loop
    >     ->  Seq Scan on t t2
    >     ->  Nested Loop
    >           ->  Seq Scan on t t1
    >           ->  Subquery Scan on t3
    >                 Filter: ((t2.a = t3.a) AND (t1.b = t3.b))
    >                 ->  Seq Scan on t t3_1
    > (7 rows)
    > 
    > t3's ppi_clauses includes "t2.a = t3.a" and "t1.b = t3.b", while t1/t3
    > join's restrictlist only includes "t1.b = t3.b".
    I attempted to make your query ab it closer to our case:
    
    SET enable_mergejoin = f;
    SET enable_hashjoin = f;
    SET enable_material = f;
    CREATE INDEX ON t(a);
    
    explain (costs off)
    select * from t t1 join t t2
       on EXISTS (select *, t1.a+t2.a as x from t t3
         WHERE t2.a = t3.a AND t1.b = t3.b);
    
    and I don't get the case. As I see, ANTI/SEMI join just transforms to 
    the regular join and it is still not the case. May you be more specific?
    
    -- 
    regards, Andrei Lepikhov
    
    
    
    
  18. Re: Memoize ANTI and SEMI JOIN inner

    Richard Guo <guofenglinux@gmail.com> — 2025-04-01T07:18:58Z

    On Mon, Mar 31, 2025 at 7:33 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
    > and I don't get the case. As I see, ANTI/SEMI join just transforms to
    > the regular join and it is still not the case. May you be more specific?
    
    Upthread, you said that a qual contained in ppi_clauses will also be
    included in extra->restrictlist.  I provided this example to show that
    this is not true.  And it seems to me that this discrepancy makes the
    check I mentioned earlier not reliable in all cases.  As I explained
    earlier, this check could pass even if a restriction clause isn't
    parameterized, as long as another join clause, which doesn't belong to
    the current join, is included in ppi_clauses.  This is not correct.
    
    This isn't about your patch, but about the master code.  However, if
    this code is incorrect, your patch will also behave incorrectly, since
    you patch relies on and extends this check.
    
    Thanks
    Richard
    
    
    
    
  19. Re: Memoize ANTI and SEMI JOIN inner

    Andrei Lepikhov <lepihov@gmail.com> — 2025-04-04T13:49:56Z

    On 4/1/25 09:18, Richard Guo wrote:
    > On Mon, Mar 31, 2025 at 7:33 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
    >> and I don't get the case. As I see, ANTI/SEMI join just transforms to
    >> the regular join and it is still not the case. May you be more specific?
    > 
    > Upthread, you said that a qual contained in ppi_clauses will also be
    > included in extra->restrictlist.  I provided this example to show that
    > this is not true.  And it seems to me that this discrepancy makes the
    > check I mentioned earlier not reliable in all cases.  As I explained
    > earlier, this check could pass even if a restriction clause isn't
    > parameterized, as long as another join clause, which doesn't belong to
    > the current join, is included in ppi_clauses.  This is not correct.
    Ok, keep aside the question of why we discuss it here.
    After second thought, I think the current check is correct at the 
    moment. Looking into the inner_unique proving machinery, I see that the 
    inner may be decided to be unique if:
    1. It has a unique index on joining columns - in the case of NestLoop, 
    these clauses inevitably go down as parameters.
    2. degenerate case like x=1 where x is unique column - not our case, 
    because x=1 goes down
    3. DISTINCT clause - in this case, we can't transform the sublink at 
    early stages, and using the already planned subquery --> upper rel 
    clause will not be pushed to the unique inner.
    
    So, I see nothing criminal in this clause - it may be a little more 
    specific to be sure we will be safe in case of improvements in the 
    inner_unique proof logic or subquery parameterisation, but that's it. If 
    my analysis is correct, we only need to change the comment.
    As usual, it would be nice to see an example of incorrect behaviour if 
    I'm wrong.
    
    -- 
    regards, Andrei Lepikhov
    
    
    
    
  20. Re: Memoize ANTI and SEMI JOIN inner

    Richard Guo <guofenglinux@gmail.com> — 2025-04-09T06:48:33Z

    On Thu, Mar 20, 2025 at 3:02 PM David Rowley <dgrowleyml@gmail.com> wrote:
    > For making this work, I think the attached should be about the guts of
    > the code changes. I didn't look at the comments. Right now I can't
    > think of any reason why this can't be done, but some experimentation
    > might reveal some reason that it can't.
    
    I conducted some experiments, and I'm afraid it's not safe to consider
    Memoize for semi or anti joins, unless the inner side is provably
    unique.  As an example, please consider:
    
    create table t (a int, b boolean);
    insert into t select i%3, false from generate_series(1,100)i;
    analyze t;
    
    select * from t t1 where t1.a in
     (select a from t t2 where t2.b in
      (select t1.b from t t3 where t2.a > 1 offset 0));
    ERROR:  cache entry already complete
    
    With the proposed patch, this query results in an error.
    
    The problem is that join clauses from the upper level may be moved to
    the semi join.  For a given outer tuple, the first inner tuple that
    satisfies the current parameters will mark the cache entry as complete
    because singlerow is set to true.  However, if that inner tuple and
    the current outer tuple don't satisfy the join clauses, the second
    inner tuple that satisfies the parameters will complain that the cache
    entry is already marked as complete.
    
    If the inner side is provably unique, there will be no such problem,
    as there won't be a second matching tuple.  OTOH, in this case, the
    semi join will be reduced to an inner join by reduce_unique_semijoins.
    Therefore, it doesn't make much sense to prove inner_unique for semi
    joins in add_paths_to_joinrel.
    
    Perhaps we could spend some planner cycles proving inner_unique for
    anti joins, so that Memoize nodes can be considered for them?
    
    Thanks
    Richard
    
    
    
    
  21. Re: Memoize ANTI and SEMI JOIN inner

    Andrei Lepikhov <lepihov@gmail.com> — 2025-04-09T08:03:41Z

    On 4/9/25 08:48, Richard Guo wrote:
    > Perhaps we could spend some planner cycles proving inner_unique for
    > anti joins, so that Memoize nodes can be considered for them?
    Thanks for working on it!
    It is an example I have eagerly wanted to find.
    So, no rush this feature now, I add it to the July commitfest in hope we 
    have enough time to resolve these concerns.
    
    -- 
    regards, Andrei Lepikhov
    
    
    
    
  22. Re: Memoize ANTI and SEMI JOIN inner

    David Rowley <dgrowleyml@gmail.com> — 2025-04-09T09:18:25Z

    On Wed, 9 Apr 2025 at 18:48, Richard Guo <guofenglinux@gmail.com> wrote:
    >
    > On Thu, Mar 20, 2025 at 3:02 PM David Rowley <dgrowleyml@gmail.com> wrote:
    > > For making this work, I think the attached should be about the guts of
    > > the code changes. I didn't look at the comments. Right now I can't
    > > think of any reason why this can't be done, but some experimentation
    > > might reveal some reason that it can't.
    >
    > I conducted some experiments, and I'm afraid it's not safe to consider
    > Memoize for semi or anti joins, unless the inner side is provably
    > unique.  As an example, please consider:
    
    Thanks for checking that. I was just looking at the spot you'd need to
    adjust to prove the inner_unique for anti joins and found I'd written:
    
    /*
    * XXX it may be worth proving this to allow a Memoize to be
    * considered for Nested Loop Semi/Anti Joins.
    */
    
    Looks like I must have known that at one point in time...
    
    > Perhaps we could spend some planner cycles proving inner_unique for
    > anti joins, so that Memoize nodes can be considered for them?
    
    Worth a try. It should be pretty easy to enable, as far as I can see.
    It might just be a case of shuffling the cases around in the switch
    statement in add_paths_to_joinrel().
    
    David
    
    
    
    
  23. Re: Memoize ANTI and SEMI JOIN inner

    Richard Guo <guofenglinux@gmail.com> — 2025-04-10T10:36:14Z

    On Wed, Apr 9, 2025 at 6:18 PM David Rowley <dgrowleyml@gmail.com> wrote:
    > On Wed, 9 Apr 2025 at 18:48, Richard Guo <guofenglinux@gmail.com> wrote:
    > > Perhaps we could spend some planner cycles proving inner_unique for
    > > anti joins, so that Memoize nodes can be considered for them?
    
    > Worth a try. It should be pretty easy to enable, as far as I can see.
    > It might just be a case of shuffling the cases around in the switch
    > statement in add_paths_to_joinrel().
    
    Right.  Here is a patch for that.
    
    Thanks
    Richard
    
  24. Re: Memoize ANTI and SEMI JOIN inner

    Andres Freund <andres@anarazel.de> — 2025-05-13T13:51:42Z

    Hi,
    
    On 2025-04-10 19:36:14 +0900, Richard Guo wrote:
    > On Wed, Apr 9, 2025 at 6:18 PM David Rowley <dgrowleyml@gmail.com> wrote:
    > > On Wed, 9 Apr 2025 at 18:48, Richard Guo <guofenglinux@gmail.com> wrote:
    > > > Perhaps we could spend some planner cycles proving inner_unique for
    > > > anti joins, so that Memoize nodes can be considered for them?
    >
    > > Worth a try. It should be pretty easy to enable, as far as I can see.
    > > It might just be a case of shuffling the cases around in the switch
    > > statement in add_paths_to_joinrel().
    >
    > Right.  Here is a patch for that.
    
    This is failing on CI:
    https://cirrus-ci.com/github/postgresql-cfbot/postgresql/cf%2F5636
    https://api.cirrus-ci.com/v1/artifact/task/5411026402803712/testrun/build-32/testrun/regress/regress/regression.diffs
    
    diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/memoize.out /tmp/cirrus-ci-build/build-32/testrun/regress/regress/results/memoize.out
    --- /tmp/cirrus-ci-build/src/test/regress/expected/memoize.out	2025-04-12 11:24:10.866868945 +0000
    +++ /tmp/cirrus-ci-build/build-32/testrun/regress/regress/results/memoize.out	2025-04-12 11:32:44.454864476 +0000
    @@ -525,7 +525,7 @@
                          ->  Unique (actual rows=2.67 loops=N)
                                ->  Sort (actual rows=67.33 loops=N)
                                      Sort Key: t2_1.a
    -                                 Sort Method: quicksort  Memory: 27kB
    +                                 Sort Method: quicksort  Memory: 18kB
                                      ->  Seq Scan on tab_anti t2_1 (actual rows=100.00 loops=N)
     (15 rows)
    
    
    There shouldn't be Memory mentioned in tests added to the tree.
    
    Greetings,
    
    Andres
    
    
    
    
  25. Re: Memoize ANTI and SEMI JOIN inner

    Richard Guo <guofenglinux@gmail.com> — 2025-05-14T02:13:56Z

    On Tue, May 13, 2025 at 10:51 PM Andres Freund <andres@anarazel.de> wrote:
    > This is failing on CI:
    > https://cirrus-ci.com/github/postgresql-cfbot/postgresql/cf%2F5636
    > https://api.cirrus-ci.com/v1/artifact/task/5411026402803712/testrun/build-32/testrun/regress/regress/regression.diffs
    >
    > diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/memoize.out /tmp/cirrus-ci-build/build-32/testrun/regress/regress/results/memoize.out
    > --- /tmp/cirrus-ci-build/src/test/regress/expected/memoize.out  2025-04-12 11:24:10.866868945 +0000
    > +++ /tmp/cirrus-ci-build/build-32/testrun/regress/regress/results/memoize.out   2025-04-12 11:32:44.454864476 +0000
    > @@ -525,7 +525,7 @@
    >                       ->  Unique (actual rows=2.67 loops=N)
    >                             ->  Sort (actual rows=67.33 loops=N)
    >                                   Sort Key: t2_1.a
    > -                                 Sort Method: quicksort  Memory: 27kB
    > +                                 Sort Method: quicksort  Memory: 18kB
    >                                   ->  Seq Scan on tab_anti t2_1 (actual rows=100.00 loops=N)
    >  (15 rows)
    >
    >
    > There shouldn't be Memory mentioned in tests added to the tree.
    
    Thanks for pointing this out.  You're right — Memory usage should be
    hidden from the output of EXPLAIN ANALYZE.  I'm a bit surprised the
    existing explain_memoize function doesn't already handle that; I
    guess it's because the plans in memoize.sql don't currently include
    any Sort nodes.  In any case, I've added a regexp_replace to filter
    out 'Memory: \d+kB' in explain_memoize.
    
    Thanks
    Richard
    
  26. Re: Memoize ANTI and SEMI JOIN inner

    Richard Guo <guofenglinux@gmail.com> — 2025-07-02T02:08:01Z

    On Thu, Apr 10, 2025 at 7:36 PM Richard Guo <guofenglinux@gmail.com> wrote:
    > On Wed, Apr 9, 2025 at 6:18 PM David Rowley <dgrowleyml@gmail.com> wrote:
    > > On Wed, 9 Apr 2025 at 18:48, Richard Guo <guofenglinux@gmail.com> wrote:
    > > > Perhaps we could spend some planner cycles proving inner_unique for
    > > > anti joins, so that Memoize nodes can be considered for them?
    
    > > Worth a try. It should be pretty easy to enable, as far as I can see.
    > > It might just be a case of shuffling the cases around in the switch
    > > statement in add_paths_to_joinrel().
    
    > Right.  Here is a patch for that.
    
    Here's a rebased version with no other changes.
    
    David, Andrei, would you like to take another look?  I'm planning to
    push it soon.
    
    Thanks
    Richard
    
  27. Re: Memoize ANTI and SEMI JOIN inner

    wenhui qiu <qiuwenhuifx@gmail.com> — 2025-07-02T03:48:11Z

    HI
    
    > - if (!extra->inner_unique && (jointype == JOIN_SEMI ||
    > - jointype == JOIN_ANTI))
    > + if ((jointype == JOIN_SEMI || jointype == JOIN_ANTI) &&
    > + !extra->inner_unique)
    To be nitpicky, this change is meant to align with the earlier comment
    modifications to improve code readability, right? Everything else looks
    good to me."
    
    Thanks
    
    On Wed, Jul 2, 2025 at 10:08 AM Richard Guo <guofenglinux@gmail.com> wrote:
    
    > On Thu, Apr 10, 2025 at 7:36 PM Richard Guo <guofenglinux@gmail.com>
    > wrote:
    > > On Wed, Apr 9, 2025 at 6:18 PM David Rowley <dgrowleyml@gmail.com>
    > wrote:
    > > > On Wed, 9 Apr 2025 at 18:48, Richard Guo <guofenglinux@gmail.com>
    > wrote:
    > > > > Perhaps we could spend some planner cycles proving inner_unique for
    > > > > anti joins, so that Memoize nodes can be considered for them?
    >
    > > > Worth a try. It should be pretty easy to enable, as far as I can see.
    > > > It might just be a case of shuffling the cases around in the switch
    > > > statement in add_paths_to_joinrel().
    >
    > > Right.  Here is a patch for that.
    >
    > Here's a rebased version with no other changes.
    >
    > David, Andrei, would you like to take another look?  I'm planning to
    > push it soon.
    >
    > Thanks
    > Richard
    >
    
  28. Re: Memoize ANTI and SEMI JOIN inner

    Andrei Lepikhov <lepihov@gmail.com> — 2025-07-02T09:01:39Z

    On 2/7/2025 05:48, wenhui qiu wrote:
    > HI
    > 
    >  > - if (!extra->inner_unique && (jointype == JOIN_SEMI ||
    >  > - jointype == JOIN_ANTI))
    >  > + if ((jointype == JOIN_SEMI || jointype == JOIN_ANTI) &&
    >  > + !extra->inner_unique)
    > To be nitpicky, this change is meant to align with the earlier comment 
    > modifications to improve code readability, right? Everything else looks 
    > good to me."
    Yep, I also found only this flaw.
    Comments looks clear to me, test is quite stable.
    
    May be correct the line:
    INSERT INTO tab_anti SELECT i%3, false FROM generate_series(1,100)i;
    with a backspace or an 'AS' keyword?
    
    -- 
    regards, Andrei Lepikhov