Thread

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Fix misuse of Relids for storing attribute numbers

  2. Reduce "Var IS [NOT] NULL" quals during constant folding

  3. Centralize collection of catalog info needed early in the planner

  4. Expand virtual generated columns before sublink pull-up

  5. Expand virtual generated columns in the planner

  1. Reduce "Var IS [NOT] NULL" quals during constant folding

    Richard Guo <guofenglinux@gmail.com> — 2025-03-21T09:14:40Z

    Currently, we have an optimization that reduces an IS [NOT] NULL qual
    on a NOT NULL column to constant true or constant false, provided we
    can prove that the input expression of the NullTest is not nullable by
    any outer joins.  This deduction happens pretty late in planner,
    during the distribution of quals to relations in query_planner().  As
    mentioned in [1], doing it at such a late stage has some drawbacks.
    
    Ideally, this deduction should happen during constant folding.
    However, we don't have the per-relation information about which
    columns are defined as NOT NULL available at that point.  That
    information is collected from catalogs when building RelOptInfos for
    base or other relations.
    
    I'm wondering whether we can collect that information while building
    the RangeTblEntry for a base or other relation, so that it's available
    before constant folding.  This could also enable other optimizations,
    such as checking if a NOT IN subquery's output columns and its
    left-hand expressions are all certainly not NULL, in which case we can
    convert it to an anti-join.
    
    Attached is a draft patch to reduce NullTest on a NOT NULL column in
    eval_const_expressions.
    
    Initially, I planned to get rid of restriction_is_always_true and
    restriction_is_always_false altogether, since we now perform the
    reduction of "Var IS [NOT] NULL" quals in eval_const_expressions.
    However, removing them would prevent us from reducing some IS [NOT]
    NULL quals that we were previously able to reduce, because (a) the
    self-join elimination may introduce new IS NOT NULL quals after the
    constant folding, and (b) if some outer joins are converted to inner
    joins, previously irreducible NullTest quals may become reducible.
    
    So I think maybe we'd better keep restriction_is_always_true and
    restriction_is_always_false as-is.
    
    Any thoughts?
    
    [1] https://postgr.es/m/2323997.1740623184@sss.pgh.pa.us
    
    Thanks
    Richard
    
  2. Re: Reduce "Var IS [NOT] NULL" quals during constant folding

    Richard Guo <guofenglinux@gmail.com> — 2025-03-21T14:21:28Z

    On Fri, Mar 21, 2025 at 6:14 PM Richard Guo <guofenglinux@gmail.com> wrote:
    > I'm wondering whether we can collect that information while building
    > the RangeTblEntry for a base or other relation, so that it's available
    > before constant folding.  This could also enable other optimizations,
    > such as checking if a NOT IN subquery's output columns and its
    > left-hand expressions are all certainly not NULL, in which case we can
    > convert it to an anti-join.
    >
    > Attached is a draft patch to reduce NullTest on a NOT NULL column in
    > eval_const_expressions.
    
    FWIW, reducing "Var IS [NOT] NULL" quals during constant folding can
    somewhat influence the decision on join ordering later.  For instance,
    
    create table t (a int not null, b int);
    
    select * from t t1 left join
      (t t2 left join t t3 on t2.a is not null)
      on t1.b = t2.b;
    
    For this query, "t2.a is not null" is reduced to true during constant
    folding and then ignored, which leads to us being unable to commute
    t1/t2 join with t2/t3 join.
    
    OTOH, constant-folding NullTest for Vars may enable join orders that
    were previously impossible.  For instance,
    
    select * from t t1 left join
      (t t2 left join t t3 on t2.a is null or t2.b = t3.b)
      on t1.b = t2.b;
    
    Previously the t2/t3 join's clause is not strict for t2 due to the IS
    NULL qual, which prevents t2/t3 join from commuting with t1/t2 join.
    Now, the IS NULL qual is removed during constant folding, allowing us
    to generate a plan with the join order (t1/t2)/t3.
    
    Not quite sure if this is something we need to worry about.
    
    Thanks
    Richard
    
    
    
    
  3. Re: Reduce "Var IS [NOT] NULL" quals during constant folding

    Robert Haas <robertmhaas@gmail.com> — 2025-03-21T16:12:46Z

    On Fri, Mar 21, 2025 at 10:21 AM Richard Guo <guofenglinux@gmail.com> wrote:
    > Not quite sure if this is something we need to worry about.
    
    I haven't really dug into this but I bet it's not that serious, in the
    sense that we could probably work around it with more logic if we
    really need to.
    
    However, I'm a bit concerned about the overall premise of the patch
    set. It feels like it is moving something that really ought to happen
    at optimization time back to parse time. I have a feeling that's going
    to break something, although I am not sure right now exactly what.
    Wouldn't it be better to have this still happen in the planner, but
    sooner than it does now?
    
    -- 
    Robert Haas
    EDB: http://www.enterprisedb.com
    
    
    
    
  4. Re: Reduce "Var IS [NOT] NULL" quals during constant folding

    Tom Lane <tgl@sss.pgh.pa.us> — 2025-03-21T17:21:43Z

    Robert Haas <robertmhaas@gmail.com> writes:
    > However, I'm a bit concerned about the overall premise of the patch
    > set. It feels like it is moving something that really ought to happen
    > at optimization time back to parse time. I have a feeling that's going
    > to break something, although I am not sure right now exactly what.
    
    Ugh, no, that is *completely* unworkable.  Suppose that the user
    does CREATE VIEW, and the parse tree recorded for that claims that
    column X is not-nullable.  Then the user drops the not-null
    constraint, and then asks to execute the view.  We'll optimize on
    the basis of stale information.
    
    The way to make this work is what I said before: move the planner's
    collection of relation information to somewhere a bit earlier in
    the planner.  But not to outside the planner.
    
    			regards, tom lane
    
    
    
    
  5. Re: Reduce "Var IS [NOT] NULL" quals during constant folding

    David G. Johnston <david.g.johnston@gmail.com> — 2025-03-21T17:49:30Z

    On Fri, Mar 21, 2025 at 10:21 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    
    > Robert Haas <robertmhaas@gmail.com> writes:
    > > However, I'm a bit concerned about the overall premise of the patch
    > > set. It feels like it is moving something that really ought to happen
    > > at optimization time back to parse time. I have a feeling that's going
    > > to break something, although I am not sure right now exactly what.
    >
    > Ugh, no, that is *completely* unworkable.  Suppose that the user
    > does CREATE VIEW, and the parse tree recorded for that claims that
    > column X is not-nullable.  Then the user drops the not-null
    > constraint, and then asks to execute the view.  We'll optimize on
    > the basis of stale information.
    >
    > The way to make this work is what I said before: move the planner's
    > collection of relation information to somewhere a bit earlier in
    > the planner.  But not to outside the planner.
    >
    
    Reading this reminded me of the existing issue in [1] where we've broken
    session isolation of temporary relation data.  There it feels like we are
    making decisions in the parser that really belong in the planner since
    catalog data is needed to determine relpersistence in many cases.  If we
    are looking for a spot "earlier in the planner" to attach relation
    information, figuring out how to use that to improve matters related to
    relpersistence seems warranted.
    
    David J.
    
    [1]
    https://www.postgresql.org/message-id/4xxau766dofbwugeyvjftra3g5f7ifaal2clgrbpr7jqotr4av%40d3ige2krpoza
    
  6. Re: Reduce "Var IS [NOT] NULL" quals during constant folding

    Richard Guo <guofenglinux@gmail.com> — 2025-03-23T08:50:25Z

    On Sat, Mar 22, 2025 at 1:12 AM Robert Haas <robertmhaas@gmail.com> wrote:
    > However, I'm a bit concerned about the overall premise of the patch
    > set. It feels like it is moving something that really ought to happen
    > at optimization time back to parse time. I have a feeling that's going
    > to break something, although I am not sure right now exactly what.
    > Wouldn't it be better to have this still happen in the planner, but
    > sooner than it does now?
    
    You're right.  It's just flat wrong to collect catalog information in
    the parser and use it in the planner.  As Tom pointed out, the catalog
    information could change in between, which would cause us to use stale
    data.
    
    Yeah, this should still happen in the planner, perhaps before
    pull_up_sublinks, if we plan to leverage that info to convert NOT IN
    to anti-join.
    
    Thanks
    Richard
    
    
    
    
  7. Re: Reduce "Var IS [NOT] NULL" quals during constant folding

    Richard Guo <guofenglinux@gmail.com> — 2025-03-23T09:25:15Z

    On Sat, Mar 22, 2025 at 2:21 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    > Ugh, no, that is *completely* unworkable.  Suppose that the user
    > does CREATE VIEW, and the parse tree recorded for that claims that
    > column X is not-nullable.  Then the user drops the not-null
    > constraint, and then asks to execute the view.  We'll optimize on
    > the basis of stale information.
    
    Thanks for pointing this out.
    
    > The way to make this work is what I said before: move the planner's
    > collection of relation information to somewhere a bit earlier in
    > the planner.  But not to outside the planner.
    
    I'm considering moving the collection of attnotnull information before
    pull_up_sublinks, in hopes of leveraging this info to pull up NOT IN
    in the future, something like attached.
    
    Maybe we could also collect the attgenerated information in the same
    routine, making life easier for expand_virtual_generated_columns.
    
    Another issue I found is that in convert_EXISTS_to_ANY, we pass the
    parent's root to eval_const_expressions, which can cause problems when
    reducing "Var IS [NOT] NULL" quals.  To fix, v2 patch constructs up a
    dummy PlannerInfo with "glob" link set to the parent's and "parser"
    link set to the subquery.  I believe these are the only fields used.
    
    Thanks
    Richard
    
  8. Re: Reduce "Var IS [NOT] NULL" quals during constant folding

    Richard Guo <guofenglinux@gmail.com> — 2025-03-26T02:16:26Z

    On Sun, Mar 23, 2025 at 6:25 PM Richard Guo <guofenglinux@gmail.com> wrote:
    > On Sat, Mar 22, 2025 at 2:21 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    > > The way to make this work is what I said before: move the planner's
    > > collection of relation information to somewhere a bit earlier in
    > > the planner.  But not to outside the planner.
    
    > I'm considering moving the collection of attnotnull information before
    > pull_up_sublinks, in hopes of leveraging this info to pull up NOT IN
    > in the future, something like attached.
    
    Here is an updated version of the patch with some cosmetic changes and
    a more readable commit message.  I'm wondering if it's good enough to
    be pushed.  Any comments?
    
    Thanks
    Richard
    
  9. Re: Reduce "Var IS [NOT] NULL" quals during constant folding

    Tender Wang <tndrwang@gmail.com> — 2025-03-26T06:06:11Z

    Richard Guo <guofenglinux@gmail.com> 于2025年3月26日周三 10:16写道:
    
    > On Sun, Mar 23, 2025 at 6:25 PM Richard Guo <guofenglinux@gmail.com>
    > wrote:
    > > On Sat, Mar 22, 2025 at 2:21 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    > > > The way to make this work is what I said before: move the planner's
    > > > collection of relation information to somewhere a bit earlier in
    > > > the planner.  But not to outside the planner.
    >
    > > I'm considering moving the collection of attnotnull information before
    > > pull_up_sublinks, in hopes of leveraging this info to pull up NOT IN
    > > in the future, something like attached.
    >
    > Here is an updated version of the patch with some cosmetic changes and
    > a more readable commit message.  I'm wondering if it's good enough to
    > be pushed.  Any comments?
    >
    
    The comment about  notnullattnums in struct RangeTblEntry says that:
    * notnullattnums is zero-based set containing attnums of NOT NULL
    * columns.
    
    But in get_relation_notnullatts():
    rte->notnullattnums = bms_add_member(rte->notnullattnums,
                                                                        i + 1);
    
    The notnullattnums seem to be 1-based.
    
    
    -- 
    Thanks,
    Tender Wang
    
  10. Re: Reduce "Var IS [NOT] NULL" quals during constant folding

    Richard Guo <guofenglinux@gmail.com> — 2025-03-26T06:31:06Z

    On Wed, Mar 26, 2025 at 3:06 PM Tender Wang <tndrwang@gmail.com> wrote:
    > The comment about  notnullattnums in struct RangeTblEntry says that:
    > * notnullattnums is zero-based set containing attnums of NOT NULL
    > * columns.
    >
    > But in get_relation_notnullatts():
    > rte->notnullattnums = bms_add_member(rte->notnullattnums,
    >                                                                     i + 1);
    >
    > The notnullattnums seem to be 1-based.
    
    This corresponds to the attribute numbers in Var nodes; you can
    consider zero as representing a whole-row Var.
    
    Thanks
    Richard
    
    
    
    
  11. Re: Reduce "Var IS [NOT] NULL" quals during constant folding

    David Rowley <dgrowleyml@gmail.com> — 2025-03-26T09:44:46Z

    On Wed, 26 Mar 2025 at 19:31, Richard Guo <guofenglinux@gmail.com> wrote:
    >
    > On Wed, Mar 26, 2025 at 3:06 PM Tender Wang <tndrwang@gmail.com> wrote:
    > > The comment about  notnullattnums in struct RangeTblEntry says that:
    > > * notnullattnums is zero-based set containing attnums of NOT NULL
    > > * columns.
    > >
    > > But in get_relation_notnullatts():
    > > rte->notnullattnums = bms_add_member(rte->notnullattnums,
    > >                                                                     i + 1);
    > >
    > > The notnullattnums seem to be 1-based.
    >
    > This corresponds to the attribute numbers in Var nodes; you can
    > consider zero as representing a whole-row Var.
    
    Yeah, and a negative number is a system attribute, which the Bitmapset
    can't represent... The zero-based comment is meant to inform the
    reader that they don't need to offset by
    FirstLowInvalidHeapAttributeNumber when indexing the Bitmapset. If
    there's some confusion about that then maybe the wording could be
    improved. I used "zero-based" because I wanted to state what it was
    and that was the most brief terminology that I could think of to do
    that. The only other way I thought about was to say that "it's not
    offset by FirstLowInvalidHeapAttributeNumber", but I thought it was
    better to say what it is rather than what it isn't.
    
    I'm open to suggestions if people are confused about this.
    
    David
    
    
    
    
  12. Re: Reduce "Var IS [NOT] NULL" quals during constant folding

    Richard Guo <guofenglinux@gmail.com> — 2025-03-27T07:10:35Z

    On Wed, Mar 26, 2025 at 6:45 PM David Rowley <dgrowleyml@gmail.com> wrote:
    > On Wed, 26 Mar 2025 at 19:31, Richard Guo <guofenglinux@gmail.com> wrote:
    > > On Wed, Mar 26, 2025 at 3:06 PM Tender Wang <tndrwang@gmail.com> wrote:
    > > > The comment about  notnullattnums in struct RangeTblEntry says that:
    > > > * notnullattnums is zero-based set containing attnums of NOT NULL
    > > > * columns.
    > > >
    > > > But in get_relation_notnullatts():
    > > > rte->notnullattnums = bms_add_member(rte->notnullattnums,
    > > >                                                                     i + 1);
    > > >
    > > > The notnullattnums seem to be 1-based.
    
    > > This corresponds to the attribute numbers in Var nodes; you can
    > > consider zero as representing a whole-row Var.
    
    > Yeah, and a negative number is a system attribute, which the Bitmapset
    > can't represent... The zero-based comment is meant to inform the
    > reader that they don't need to offset by
    > FirstLowInvalidHeapAttributeNumber when indexing the Bitmapset. If
    > there's some confusion about that then maybe the wording could be
    > improved. I used "zero-based" because I wanted to state what it was
    > and that was the most brief terminology that I could think of to do
    > that. The only other way I thought about was to say that "it's not
    > offset by FirstLowInvalidHeapAttributeNumber", but I thought it was
    > better to say what it is rather than what it isn't.
    >
    > I'm open to suggestions if people are confused about this.
    
    I searched the current terminology used in code and can find "offset
    by FirstLowInvalidHeapAttributeNumber", but not "not offset by
    FirstLowInvalidHeapAttributeNumber".  I think "zero-based" should be
    sufficient to indicate that this bitmapset is offset by zero, not by
    FirstLowInvalidHeapAttributeNumber.  So I'm fine to go with
    "zero-based".
    
    I'm planning to push this patch soon, barring any objections.
    
    Thanks
    Richard
    
    
    
    
  13. Re: Reduce "Var IS [NOT] NULL" quals during constant folding

    Tom Lane <tgl@sss.pgh.pa.us> — 2025-03-27T13:53:22Z

    Richard Guo <guofenglinux@gmail.com> writes:
    > I'm planning to push this patch soon, barring any objections.
    
    FWIW, I have not reviewed it at all.
    
    			regards, tom lane
    
    
    
    
  14. Re: Reduce "Var IS [NOT] NULL" quals during constant folding

    Richard Guo <guofenglinux@gmail.com> — 2025-03-27T14:08:13Z

    On Thu, Mar 27, 2025 at 10:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    > Richard Guo <guofenglinux@gmail.com> writes:
    > > I'm planning to push this patch soon, barring any objections.
    
    > FWIW, I have not reviewed it at all.
    
    Oh, sorry.  I'll hold off on pushing it.
    
    Thanks
    Richard
    
    
    
    
  15. Re: Reduce "Var IS [NOT] NULL" quals during constant folding

    Robert Haas <robertmhaas@gmail.com> — 2025-03-31T16:55:45Z

    On Thu, Mar 27, 2025 at 10:08 AM Richard Guo <guofenglinux@gmail.com> wrote:
    > On Thu, Mar 27, 2025 at 10:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    > > Richard Guo <guofenglinux@gmail.com> writes:
    > > > I'm planning to push this patch soon, barring any objections.
    >
    > > FWIW, I have not reviewed it at all.
    >
    > Oh, sorry.  I'll hold off on pushing it.
    
    As a general point, non-trivial patches should really get some
    substantive review before they are pushed. Please don't be in a rush
    to commit. It is very common for the time from when a patch is first
    posted to commit to be 3-6 months even for committers. Posting a
    brand-new feature patch on March 21st and then pressing to commit on
    March 27th is really not something you should be doing. I think it's
    particularly inappropriate here where you actually got a review that
    pointed out a serious design problem and then had to change the
    design. If you didn't get it right on the first try, you shouldn't be
    too confident that you did it perfectly the second time, either.
    
    I took a look at this today and I'm not entirely comfortable with this:
    
    + rel = table_open(rte->relid, NoLock);
    +
    + /* Record NOT NULL columns for this relation. */
    + get_relation_notnullatts(rel, rte);
    +
    + table_close(rel, NoLock);
    
    As a general principle, I have found that it's usually a sign that
    something has been designed poorly when you find yourself wanting to
    open a relation, get exactly one piece of information, and close the
    relation again. That is why, today, all the information that the
    planner needs about a particular relation is retrieved by
    get_relation_info(). We do not just wander around doing random catalog
    lookups wherever we need some critical detail. This patch increases
    the number of places where we fetch relation data from 1 to 2, but
    it's still the case that almost everything happens in
    get_relation_info(), and there's now just exactly this 1 thing that is
    done in a different place. That doesn't seem especially nice. I
    thought the idea was going to be to move get_relation_info() to an
    earlier stage, not split one thing out of it while leaving everything
    else the same.
    
    -- 
    Robert Haas
    EDB: http://www.enterprisedb.com
    
    
    
    
  16. Re: Reduce "Var IS [NOT] NULL" quals during constant folding

    Richard Guo <guofenglinux@gmail.com> — 2025-04-01T06:34:22Z

    On Tue, Apr 1, 2025 at 1:55 AM Robert Haas <robertmhaas@gmail.com> wrote:
    > As a general principle, I have found that it's usually a sign that
    > something has been designed poorly when you find yourself wanting to
    > open a relation, get exactly one piece of information, and close the
    > relation again. That is why, today, all the information that the
    > planner needs about a particular relation is retrieved by
    > get_relation_info(). We do not just wander around doing random catalog
    > lookups wherever we need some critical detail. This patch increases
    > the number of places where we fetch relation data from 1 to 2, but
    > it's still the case that almost everything happens in
    > get_relation_info(), and there's now just exactly this 1 thing that is
    > done in a different place. That doesn't seem especially nice. I
    > thought the idea was going to be to move get_relation_info() to an
    > earlier stage, not split one thing out of it while leaving everything
    > else the same.
    
    I initially considered moving get_relation_info() to an earlier stage,
    where we would collect all the per-relation data by relation OID.
    Later, when building the RelOptInfos, we could link them to the
    per-relation-OID data.
    
    However, I gave up this idea because I realized it would require
    retrieving a whole bundle of catalog information that isn't needed
    until after the RelOptInfos are built, such as max_attr, pages,
    tuples, reltablespace, parallel_workers, extended statistics, etc.
    And we may also need to create the IndexOptInfos for the relation's
    indexes.  It seems to me that it's not a trivial task to move
    get_relation_info() before building the RelOptInfos, and more
    importantly, it's unnecessary most of the time.
    
    In other words, of the many pieces of catalog information we need to
    retrieve for a relation, only a small portion is needed at an early
    stage.  As far as I can see, this small portion includes:
    
    * relhassubclass, which we retrieve immediately after we have finished
    adding rangetable entries.
    
    * attgenerated, which we retrieve in expand_virtual_generated_columns.
    
    * attnotnull, as discussed here, which should ideally be retrieved
    before pull_up_sublinks.
    
    My idea is to retrieve only this small portion at an early stage, and
    still defer collecting the majority of the catalog information until
    building the RelOptInfos.  It might be possible to optimize by
    retrieving this small portion in one place instead of three, but
    moving the entire get_relation_info() to an earlier stage doesn't seem
    like a good idea to me.
    
    It could be argued that the separation of catalog information
    collection isn't very great, but it seems this isn't something new in
    this patch.  So I respectfully disagree with your statement that "all
    the information that the planner needs about a particular relation is
    retrieved by get_relation_info()", and that "there's now just exactly
    this 1 thing that is done in a different place".  For instance,
    relhassubclass and attgenerated are retrieved before expression
    preprocessing, a relation's constraint expressions are retrieved when
    setting the relation's size estimates, and more.
    
    Thanks
    Richard
    
    
    
    
  17. Re: Reduce "Var IS [NOT] NULL" quals during constant folding

    Robert Haas <robertmhaas@gmail.com> — 2025-04-01T19:34:45Z

    On Tue, Apr 1, 2025 at 2:34 AM Richard Guo <guofenglinux@gmail.com> wrote:
    > However, I gave up this idea because I realized it would require
    > retrieving a whole bundle of catalog information that isn't needed
    > until after the RelOptInfos are built, such as max_attr, pages,
    > tuples, reltablespace, parallel_workers, extended statistics, etc.
    
    Why is that bad? I mean, if we're going to need that information
    anyway, then gathering it at earlier stage doesn't hurt. Of course, if
    we move it too early, say before partition pruning, then we might
    gather information we don't really need and hurt performance. But
    otherwise it doesn't seem to hurt anything.
    
    > And we may also need to create the IndexOptInfos for the relation's
    > indexes.  It seems to me that it's not a trivial task to move
    > get_relation_info() before building the RelOptInfos, and more
    > importantly, it's unnecessary most of the time.
    
    But again, if the work is going to have to be done anyway, who cares?
    
    > It could be argued that the separation of catalog information
    > collection isn't very great, but it seems this isn't something new in
    > this patch.  So I respectfully disagree with your statement that "all
    > the information that the planner needs about a particular relation is
    > retrieved by get_relation_info()", and that "there's now just exactly
    > this 1 thing that is done in a different place".  For instance,
    > relhassubclass and attgenerated are retrieved before expression
    > preprocessing, a relation's constraint expressions are retrieved when
    > setting the relation's size estimates, and more.
    
    Nonetheless I think we ought to be trying to consolidate more things
    into get_relation_info(), not disperse some of the things that are
    there to other places.
    
    -- 
    Robert Haas
    EDB: http://www.enterprisedb.com
    
    
    
    
  18. Re: Reduce "Var IS [NOT] NULL" quals during constant folding

    Richard Guo <guofenglinux@gmail.com> — 2025-04-02T02:14:10Z

    On Wed, Apr 2, 2025 at 4:34 AM Robert Haas <robertmhaas@gmail.com> wrote:
    > On Tue, Apr 1, 2025 at 2:34 AM Richard Guo <guofenglinux@gmail.com> wrote:
    > > However, I gave up this idea because I realized it would require
    > > retrieving a whole bundle of catalog information that isn't needed
    > > until after the RelOptInfos are built, such as max_attr, pages,
    > > tuples, reltablespace, parallel_workers, extended statistics, etc.
    
    > Why is that bad? I mean, if we're going to need that information
    > anyway, then gathering it at earlier stage doesn't hurt. Of course, if
    > we move it too early, say before partition pruning, then we might
    > gather information we don't really need and hurt performance. But
    > otherwise it doesn't seem to hurt anything.
    
    The attnotnull catalog information being discussed here is intended
    for use during constant folding (and possibly sublink pull-up), which
    occurs long before partition pruning.  Am I missing something?
    
    Additionally, I'm doubtful that the collection of relhassubclass can
    be moved after partition pruning.  How can we determine whether a
    relation is inheritable without retrieving its relhassubclass
    information?
    
    As for attgenerated, I also doubt that it can be retrieved after
    partition pruning.  It is used to expand virtual generated columns,
    which can result in new PlaceHolderVars.  This means it must be done
    before deconstruct_jointree, as make_outerjoininfo requires all active
    PlaceHolderVars to be present in root->placeholder_list.
    
    If these pieces of information cannot be retrieved after partition
    pruning, and for performance reasons we don't want to move the
    gathering of the majority of the catalog information before partition
    pruning, then it seems to me that moving get_relation_info() to an
    earlier stage might not be very meaningful.  What do you think?
    
    > > It could be argued that the separation of catalog information
    > > collection isn't very great, but it seems this isn't something new in
    > > this patch.  So I respectfully disagree with your statement that "all
    > > the information that the planner needs about a particular relation is
    > > retrieved by get_relation_info()", and that "there's now just exactly
    > > this 1 thing that is done in a different place".  For instance,
    > > relhassubclass and attgenerated are retrieved before expression
    > > preprocessing, a relation's constraint expressions are retrieved when
    > > setting the relation's size estimates, and more.
    
    > Nonetheless I think we ought to be trying to consolidate more things
    > into get_relation_info(), not disperse some of the things that are
    > there to other places.
    
    I agree with the general idea of more consolidation and less
    dispersion, but squashing all information collection into
    get_relation_info() seems quite challenging.  However, I think we can
    make at least one optimization, as I mentioned upthread — retrieving
    the small portion of catalog information needed at an early stage in
    one place instead of three.  Perhaps we could start with moving the
    retrieval of relhassubclass into collect_relation_attrs() and rename
    this function to better reflect this change.
    
    Thanks
    Richard
    
    
    
    
  19. Re: Reduce "Var IS [NOT] NULL" quals during constant folding

    Robert Haas <robertmhaas@gmail.com> — 2025-04-04T19:14:20Z

    On Tue, Apr 1, 2025 at 10:14 PM Richard Guo <guofenglinux@gmail.com> wrote:
    > The attnotnull catalog information being discussed here is intended
    > for use during constant folding (and possibly sublink pull-up), which
    > occurs long before partition pruning.  Am I missing something?
    
    Hmm, OK, so you think that we need to gather this information early,
    so that we can do constant folding correctly, but you don't want to
    gather everything that get_relation_info() does at this stage, because
    then we're doing extra work on partitions that might later be pruned.
    Is that correct?
    
    > Additionally, I'm doubtful that the collection of relhassubclass can
    > be moved after partition pruning.  How can we determine whether a
    > relation is inheritable without retrieving its relhassubclass
    > information?
    
    We can't -- but notice that we open the relation before fetching
    relhassubclass, and then pass down the Relation object to where
    get_relation_info() is ultimately called, so that we do not repeatedly
    open and close the Relation. I don't know if I can say exactly what's
    going to go wrong if we add an extra table_open()/table_close() as you
    do in the patch, but I've seen enough performance and correctness
    problems with such code over the years to make me skeptical.
    
    -- 
    Robert Haas
    EDB: http://www.enterprisedb.com
    
    
    
    
  20. Re: Reduce "Var IS [NOT] NULL" quals during constant folding

    Richard Guo <guofenglinux@gmail.com> — 2025-04-07T02:59:12Z

    On Sat, Apr 5, 2025 at 4:14 AM Robert Haas <robertmhaas@gmail.com> wrote:
    > On Tue, Apr 1, 2025 at 10:14 PM Richard Guo <guofenglinux@gmail.com> wrote:
    > > The attnotnull catalog information being discussed here is intended
    > > for use during constant folding (and possibly sublink pull-up), which
    > > occurs long before partition pruning.  Am I missing something?
    
    > Hmm, OK, so you think that we need to gather this information early,
    > so that we can do constant folding correctly, but you don't want to
    > gather everything that get_relation_info() does at this stage, because
    > then we're doing extra work on partitions that might later be pruned.
    > Is that correct?
    
    That's correct.  As I mentioned earlier, I believe attnotnull isn't
    the only piece of information we need to gather early on.  My general
    idea is to separate the collection of catalog information into two
    phases:
    
    * Phase 1 occurs at an early stage and collects the small portion of
    catalog information that is needed for constant folding, setting the
    inh flag for a relation, or expanding virtual generated columns.  All
    these happen very early in the planner, before partition pruning.
    
    * Phase 2 collects the majority of the catalog information and occurs
    when building the RelOptInfos, like what get_relation_info does.
    
    FWIW, aside from partition pruning, I suspect there may be other cases
    where a relation doesn't end up having a RelOptInfo created for it.
    And the comment for add_base_rels_to_query further strengthens my
    suspicion.
    
     * Note: the reason we find the baserels by searching the jointree, rather
     * than scanning the rangetable, is that the rangetable may contain RTEs
     * for rels not actively part of the query, for example views.  We don't
     * want to make RelOptInfos for them.
    
    If my suspicion is correct, then partition pruning isn't the only
    reason we might not want to move get_relation_info to an earlier
    stage.
    
    > > Additionally, I'm doubtful that the collection of relhassubclass can
    > > be moved after partition pruning.  How can we determine whether a
    > > relation is inheritable without retrieving its relhassubclass
    > > information?
    >
    > We can't -- but notice that we open the relation before fetching
    > relhassubclass, and then pass down the Relation object to where
    > get_relation_info() is ultimately called, so that we do not repeatedly
    > open and close the Relation. I don't know if I can say exactly what's
    > going to go wrong if we add an extra table_open()/table_close() as you
    > do in the patch, but I've seen enough performance and correctness
    > problems with such code over the years to make me skeptical.
    
    I'm confused here.  AFAICS, we don't open the relation before fetching
    relhassubclass, according to the code that sets the inh flag in
    subquery_planner.  Additionally, I do not see we pass down the
    Relation object to get_relation_info.  In get_relation_info, we call
    table_open to obtain the Relation object, use it to retrieve the
    catalog information, and then call table_close to close the Relation.
    
    Am I missing something, or do you mean that the relcache entry is
    actually built earlier, and that table_open/table_close call in
    get_relation_info merely increments/decrements the reference count?
    
    IIUC, you're concerned about calling table_open/table_close to
    retrieve catalog information.  Do you know of a better way to retrieve
    catalog information without calling table_open/table_close?  I find
    the table_open/table_close pattern is quite common in the current
    code.  In addition to get_relation_info(), I've also seen it in
    get_relation_constraints(), get_relation_data_width(), and others.
    
    Thanks
    Richard
    
    
    
    
  21. Re: Reduce "Var IS [NOT] NULL" quals during constant folding

    Robert Haas <robertmhaas@gmail.com> — 2025-04-10T19:45:28Z

    On Sun, Apr 6, 2025 at 10:59 PM Richard Guo <guofenglinux@gmail.com> wrote:
    > That's correct.  As I mentioned earlier, I believe attnotnull isn't
    > the only piece of information we need to gather early on.  My general
    > idea is to separate the collection of catalog information into two
    > phases:
    >
    > * Phase 1 occurs at an early stage and collects the small portion of
    > catalog information that is needed for constant folding, setting the
    > inh flag for a relation, or expanding virtual generated columns.  All
    > these happen very early in the planner, before partition pruning.
    >
    > * Phase 2 collects the majority of the catalog information and occurs
    > when building the RelOptInfos, like what get_relation_info does.
    
    OK. Maybe I shouldn't be worrying about the table_open() /
    table_close() here, because I see that you are right that
    has_subclass() is nearby, which admittedly does not involve opening
    the relation, but it does involve fetching from the syscache a tuple
    that we wouldn't need to fetch if we had a Relation available at that
    point. And I see now that expand_virtual_generated_columns() is also
    in that area and works similar to your proposed function in that it
    just opens and closes all the relations. Perhaps that's just the way
    we do things at this very early stage of the planner? But I feel like
    it might make sense to do some reorganization of this code, though, so
    that it more resembles the phase 1 and phase 2 as you describe them.
    Both expand_virtual_generated_columns() and collect_relation_attrs()
    care about exactly those relations with rte->rtekind == RTE_RELATION,
    and even if we have to open and close all of those relations once to
    do this processing, perhaps we can avoid doing it twice, and maybe
    avoid the has_subclass() call along the way? Maybe we can hoist a loop
    over parse->rtable up into subquery_planner and then have it call a
    virtual-column expansion function and a non-null collection function
    once per RTE_RELATION entry?
    
    A related point that I'm noticing is that you record the not-NULL
    information in the RangeTblEntry. I wonder whether that's going to be
    a problem, because I think of the RangeTblEntry as a parse-time
    structure and the RelOptInfo as a plan-time structure, meaning that we
    shouldn't scribble on the former and that we should record any
    plan-time details we need in the latter. I understand that the problem
    is precisely that the RelOptInfo isn't yet available, but I'm not sure
    that makes it OK to use the RangeTblEntry instead.
    
    > I'm confused here.  AFAICS, we don't open the relation before fetching
    > relhassubclass, according to the code that sets the inh flag in
    > subquery_planner.  Additionally, I do not see we pass down the
    > Relation object to get_relation_info.  In get_relation_info, we call
    > table_open to obtain the Relation object, use it to retrieve the
    > catalog information, and then call table_close to close the Relation.
    
    You're right. I don't know what I was thinking.
    
    > IIUC, you're concerned about calling table_open/table_close to
    > retrieve catalog information.  Do you know of a better way to retrieve
    > catalog information without calling table_open/table_close?  I find
    > the table_open/table_close pattern is quite common in the current
    > code.  In addition to get_relation_info(), I've also seen it in
    > get_relation_constraints(), get_relation_data_width(), and others.
    
    You're also right about this.
    
    -- 
    Robert Haas
    EDB: http://www.enterprisedb.com
    
    
    
    
  22. Re: Reduce "Var IS [NOT] NULL" quals during constant folding

    Tom Lane <tgl@sss.pgh.pa.us> — 2025-04-10T19:54:39Z

    Robert Haas <robertmhaas@gmail.com> writes:
    > OK. Maybe I shouldn't be worrying about the table_open() /
    > table_close() here, because I see that you are right that
    > has_subclass() is nearby, which admittedly does not involve opening
    > the relation, but it does involve fetching from the syscache a tuple
    > that we wouldn't need to fetch if we had a Relation available at that
    > point. And I see now that expand_virtual_generated_columns() is also
    > in that area and works similar to your proposed function in that it
    > just opens and closes all the relations. Perhaps that's just the way
    > we do things at this very early stage of the planner? But I feel like
    > it might make sense to do some reorganization of this code, though, so
    > that it more resembles the phase 1 and phase 2 as you describe them.
    
    Indeed, I think those are hacks that we should get rid of, not
    emulate.  Note in particular that expand_virtual_generated_columns
    is new in v18 and has exactly zero credibility as precedent.  In fact,
    I'm probably going to harass Peter about fixing it before v18 ships.
    Randomly adding table_opens in the planner is not a route to high
    planning performance.
    
    > A related point that I'm noticing is that you record the not-NULL
    > information in the RangeTblEntry.
    
    Did we not just complain about that w.r.t. the v1 version of this
    patch?  RangeTblEntry is not where to store this info.  We need
    a new data structure, and IMO the data structure should be a hashtable
    based on table OID, not relid.  That way we can amortize across
    multiple references to the same table within a query.
    
    			regards, tom lane
    
    
    
    
  23. Re: Reduce "Var IS [NOT] NULL" quals during constant folding

    Robert Haas <robertmhaas@gmail.com> — 2025-04-10T20:07:25Z

    On Thu, Apr 10, 2025 at 3:54 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    > > A related point that I'm noticing is that you record the not-NULL
    > > information in the RangeTblEntry.
    >
    > Did we not just complain about that w.r.t. the v1 version of this
    > patch?  RangeTblEntry is not where to store this info.  We need
    > a new data structure, and IMO the data structure should be a hashtable
    > based on table OID, not relid.  That way we can amortize across
    > multiple references to the same table within a query.
    
    It's not quite the same complaint, because the earlier complaint was
    that it was actually being done at parse time, and this complaint is
    that it is scribbling on a parse-time data structure.
    
    -- 
    Robert Haas
    EDB: http://www.enterprisedb.com
    
    
    
    
  24. Re: Reduce "Var IS [NOT] NULL" quals during constant folding

    Tom Lane <tgl@sss.pgh.pa.us> — 2025-04-10T20:33:57Z

    Robert Haas <robertmhaas@gmail.com> writes:
    > It's not quite the same complaint, because the earlier complaint was
    > that it was actually being done at parse time, and this complaint is
    > that it is scribbling on a parse-time data structure.
    
    Ah, right.  But that's still not the direction we want to be
    going in [1].
    
    			regards, tom lane
    
    [1] https://www.postgresql.org/message-id/2531459.1743871597%40sss.pgh.pa.us
    
    
    
    
  25. Re: Reduce "Var IS [NOT] NULL" quals during constant folding

    Richard Guo <guofenglinux@gmail.com> — 2025-04-11T06:51:30Z

    On Fri, Apr 11, 2025 at 4:45 AM Robert Haas <robertmhaas@gmail.com> wrote:
    > OK. Maybe I shouldn't be worrying about the table_open() /
    > table_close() here, because I see that you are right that
    > has_subclass() is nearby, which admittedly does not involve opening
    > the relation, but it does involve fetching from the syscache a tuple
    > that we wouldn't need to fetch if we had a Relation available at that
    > point. And I see now that expand_virtual_generated_columns() is also
    > in that area and works similar to your proposed function in that it
    > just opens and closes all the relations. Perhaps that's just the way
    > we do things at this very early stage of the planner? But I feel like
    > it might make sense to do some reorganization of this code, though, so
    > that it more resembles the phase 1 and phase 2 as you describe them.
    > Both expand_virtual_generated_columns() and collect_relation_attrs()
    > care about exactly those relations with rte->rtekind == RTE_RELATION,
    > and even if we have to open and close all of those relations once to
    > do this processing, perhaps we can avoid doing it twice, and maybe
    > avoid the has_subclass() call along the way?
    
    Yeah, I agree on this.  This is the optimization I mentioned upthread
    in the last paragraph of [1] - retrieving the small portion of catalog
    information needed at an early stage in one place instead of three.
    Hopefully, this way we only need one table_open/table_close at the
    early stage of the planner.
    
    > Maybe we can hoist a loop
    > over parse->rtable up into subquery_planner and then have it call a
    > virtual-column expansion function and a non-null collection function
    > once per RTE_RELATION entry?
    
    Hmm, I'm afraid there might be some difficulty with this approach.
    The virtual-column expansion needs to be done after sublink pull-up to
    ensure that the virtual-column references within the SubLinks that
    should be transformed into joins can get expanded, while non-null
    collection needs to be done before sublink pull-up since we might want
    to leverage the non-null information to convert NOT IN sublinks to
    anti joins.
    
    What I had in mind is that we hoist a loop over parse->rtable before
    pull_up_sublinks to gather information about which columns of each
    relation are defined as NOT NULL and which are virtually generated.
    These information will be used in sublink pull-up and virtual-column
    expansion.  We also call has_subclass for each relation that is maked
    'inh' within that loop and clear the inh flag if needed.
    
    This seems to require a fair amount of code changes, so I'd like to
    get some feedback on this approach before proceeding with the
    implementation.
    
    > A related point that I'm noticing is that you record the not-NULL
    > information in the RangeTblEntry. I wonder whether that's going to be
    > a problem, because I think of the RangeTblEntry as a parse-time
    > structure and the RelOptInfo as a plan-time structure, meaning that we
    > shouldn't scribble on the former and that we should record any
    > plan-time details we need in the latter. I understand that the problem
    > is precisely that the RelOptInfo isn't yet available, but I'm not sure
    > that makes it OK to use the RangeTblEntry instead.
    
    Fair point.  We should do our best to refrain from scribbling on the
    parsetree in the planner.  I initially went with the hashtable
    approach as Tom suggested, but later found it quite handy to store the
    not-null information in the RangeTblEntry, especially since we do
    something similar with rte->inh.  However, I've come to realize that
    inh may not be a good example to follow after all, so I'll go back to
    the hashtable method.
    
    Thank you for pointing that out before I went too far down the wrong
    path.
    
    [1] https://postgr.es/m/CAMbWs4-DryEm_U-juPn3HwUiwZRXW3jhfX18b_AFgrgihgq4kg@mail.gmail.com
    
    Thanks
    Richard
    
    
    
    
  26. Re: Reduce "Var IS [NOT] NULL" quals during constant folding

    Richard Guo <guofenglinux@gmail.com> — 2025-05-01T08:33:13Z

    On Fri, Apr 11, 2025 at 3:51 PM Richard Guo <guofenglinux@gmail.com> wrote:
    > On Fri, Apr 11, 2025 at 4:45 AM Robert Haas <robertmhaas@gmail.com> wrote:
    > > OK. Maybe I shouldn't be worrying about the table_open() /
    > > table_close() here, because I see that you are right that
    > > has_subclass() is nearby, which admittedly does not involve opening
    > > the relation, but it does involve fetching from the syscache a tuple
    > > that we wouldn't need to fetch if we had a Relation available at that
    > > point. And I see now that expand_virtual_generated_columns() is also
    > > in that area and works similar to your proposed function in that it
    > > just opens and closes all the relations. Perhaps that's just the way
    > > we do things at this very early stage of the planner? But I feel like
    > > it might make sense to do some reorganization of this code, though, so
    > > that it more resembles the phase 1 and phase 2 as you describe them.
    > > Both expand_virtual_generated_columns() and collect_relation_attrs()
    > > care about exactly those relations with rte->rtekind == RTE_RELATION,
    > > and even if we have to open and close all of those relations once to
    > > do this processing, perhaps we can avoid doing it twice, and maybe
    > > avoid the has_subclass() call along the way?
    >
    > Yeah, I agree on this.  This is the optimization I mentioned upthread
    > in the last paragraph of [1] - retrieving the small portion of catalog
    > information needed at an early stage in one place instead of three.
    > Hopefully, this way we only need one table_open/table_close at the
    > early stage of the planner.
    
    Here is the patchset that implements this optimization.  0001 moves
    the expansion of virtual generated columns to occur before sublink
    pull-up.  0002 introduces a new function, preprocess_relation_rtes,
    which scans the rangetable for relation RTEs and performs inh flag
    updates and virtual generated column expansion in a single loop, so
    that only one table_open/table_close call is required for each
    relation.  0003 collects NOT NULL attribute information for each
    relation within the same loop, stores it in a relation OID based hash
    table, and uses this information to reduce NullTest quals during
    constant folding.
    
    I think the code now more closely resembles the phase 1 and phase 2
    described earlier: it collects all required early-stage catalog
    information within a single loop over the rangetable, allowing each
    relation to be opened and closed only once.  It also avoids the
    has_subclass() call along the way.
    
    Thanks
    Richard
    
  27. Re: Reduce "Var IS [NOT] NULL" quals during constant folding

    Chengpeng Yan <chengpeng_yan@outlook.com> — 2025-05-03T10:48:44Z

    
    > On May 1, 2025, at 16:33, Richard Guo <guofenglinux@gmail.com> wrote:
    > 
    > Here is the patchset that implements this optimization.  0001 moves
    > the expansion of virtual generated columns to occur before sublink
    > pull-up.  0002 introduces a new function, preprocess_relation_rtes,
    > which scans the rangetable for relation RTEs and performs inh flag
    > updates and virtual generated column expansion in a single loop, so
    > that only one table_open/table_close call is required for each
    > relation.  0003 collects NOT NULL attribute information for each
    > relation within the same loop, stores it in a relation OID based hash
    > table, and uses this information to reduce NullTest quals during
    > constant folding.
    > 
    > I think the code now more closely resembles the phase 1 and phase 2
    > described earlier: it collects all required early-stage catalog
    > information within a single loop over the rangetable, allowing each
    > relation to be opened and closed only once.  It also avoids the
    > has_subclass() call along the way.
    > 
    > Thanks
    > Richard
    > <v4-0001-Expand-virtual-generated-columns-before-sublink-p.patch><v4-0002-Centralize-collection-of-catalog-info-needed-earl.patch><v4-0003-Reduce-Var-IS-NOT-NULL-quals-during-constant-fold.patch>
    
    Hi,
    
    I've been following the V4 patches (focusing on 1 and 2 for now): Patch 2's preprocess_relation_rtes is a nice improvement for efficiently gathering early catalog info like inh and attgenerated definitions in one pass.
    
    However, Patch 1 needs to add expansion calls inside specific pull-up functions (like convert_EXISTS_sublink_to_join) because the main expansion work was moved before pull_up_sublinks.
    
    Could we perhaps simplify this? What if preprocess_relation_rtes only collected the attgenerated definitions (storing them, maybe in a hashtable like planned for attnotnull in Patch 3), but didn't perform the actual expansion (Var replacement)?
    
    Then, we could perform the actual expansion (Var replacement) in a separate, single, global step later on. Perhaps after pull_up_sublinks (closer to the original timing), or maybe even later still, for instance after flatten_simple_union_all, once the main query structure including pulled-up subqueries/links has stabilized? A unified expansion after the major structural changes seems cleaner. I'm not sure where is the better position now.
    
    This might avoid the need for the extra expansion calls within convert_EXISTS_sublink_to_join, etc., keeping the information gathering separate from the expression transformation and potentially making the overall flow a bit cleaner.
    
    Any thoughts?
    
    Thanks,
    
    Chengpeng Yan
    
    
    
  28. Re: Reduce "Var IS [NOT] NULL" quals during constant folding

    Richard Guo <guofenglinux@gmail.com> — 2025-05-07T06:55:51Z

    On Sat, May 3, 2025 at 7:48 PM Chengpeng Yan <chengpeng_yan@outlook.com> wrote:
    > I've been following the V4 patches (focusing on 1 and 2 for now): Patch 2's preprocess_relation_rtes is a nice improvement for efficiently gathering early catalog info like inh and attgenerated definitions in one pass.
    >
    > However, Patch 1 needs to add expansion calls inside specific pull-up functions (like convert_EXISTS_sublink_to_join) because the main expansion work was moved before pull_up_sublinks.
    >
    > Could we perhaps simplify this? What if preprocess_relation_rtes only collected the attgenerated definitions (storing them, maybe in a hashtable like planned for attnotnull in Patch 3), but didn't perform the actual expansion (Var replacement)?
    >
    > Then, we could perform the actual expansion (Var replacement) in a separate, single, global step later on. Perhaps after pull_up_sublinks (closer to the original timing), or maybe even later still, for instance after flatten_simple_union_all, once the main query structure including pulled-up subqueries/links has stabilized? A unified expansion after the major structural changes seems cleaner. I'm not sure where is the better position now.
    >
    > This might avoid the need for the extra expansion calls within convert_EXISTS_sublink_to_join, etc., keeping the information gathering separate from the expression transformation and potentially making the overall flow a bit cleaner.
    >
    > Any thoughts?
    
    This approach is possible, but I chose not to go that route because 1)
    it would require an additional loop over the rangetable; 2) it would
    involve collecting and storing in hash table a lot more information
    that is only used during the expansion of virtual generated columns.
    This includes not only the attgenerated attributes of columns you
    mentioned, but also the default values of columns and the total number
    of attributes in the tuple.
    
    Therefore, it seems to me that expanding the virtual generated columns
    within the same loop is cleaner and more efficient.
    
    Please note that even if we move the expansion of virtual generated
    columns into a separate loop, it still needs to occur before subquery
    pull-up.  This is because we must ensure that RTE_RELATION RTEs do not
    have lateral markers.  In other words, the expansion still needs to
    take place within the subquery pull-up function.
    
    Thanks
    Richard
    
    
    
    
  29. Re: Reduce "Var IS [NOT] NULL" quals during constant folding

    Robert Haas <robertmhaas@gmail.com> — 2025-05-22T14:14:34Z

    On Thu, May 1, 2025 at 4:33 AM Richard Guo <guofenglinux@gmail.com> wrote:
    > Here is the patchset that implements this optimization.  0001 moves
    > the expansion of virtual generated columns to occur before sublink
    > pull-up.  0002 introduces a new function, preprocess_relation_rtes,
    > which scans the rangetable for relation RTEs and performs inh flag
    > updates and virtual generated column expansion in a single loop, so
    > that only one table_open/table_close call is required for each
    > relation.  0003 collects NOT NULL attribute information for each
    > relation within the same loop, stores it in a relation OID based hash
    > table, and uses this information to reduce NullTest quals during
    > constant folding.
    >
    > I think the code now more closely resembles the phase 1 and phase 2
    > described earlier: it collects all required early-stage catalog
    > information within a single loop over the rangetable, allowing each
    > relation to be opened and closed only once.  It also avoids the
    > has_subclass() call along the way.
    
    Before we commit to something along these lines, I think we need to
    understand whether Tom intends to press Peter for some bigger change
    around expand_virtual_generated_columns.
    
    If Tom doesn't respond right away, I suggest that we need to add an
    open item for http://postgr.es/m/602561.1744314879@sss.pgh.pa.us
    
    -- 
    Robert Haas
    EDB: http://www.enterprisedb.com
    
    
    
    
  30. Re: Reduce "Var IS [NOT] NULL" quals during constant folding

    Tom Lane <tgl@sss.pgh.pa.us> — 2025-05-22T14:51:30Z

    Robert Haas <robertmhaas@gmail.com> writes:
    > Before we commit to something along these lines, I think we need to
    > understand whether Tom intends to press Peter for some bigger change
    > around expand_virtual_generated_columns.
    > If Tom doesn't respond right away, I suggest that we need to add an
    > open item for http://postgr.es/m/602561.1744314879@sss.pgh.pa.us
    
    I think that we do need to do something about that, but it may be
    in the too-late-for-v18 category by now.  Not sure.  I definitely
    don't love the idea of table_open'ing every table in every query
    an extra time just to find out (about 99.44% of the time) that it
    does not have any virtual generated columns.
    
    I wonder if a better answer would be to make the rewriter responsible
    for this.  If you hold your head at the correct angle, a table with
    virtual generated columns looks a good deal like a view, and we don't
    ask the planner to handle those.
    
    BTW, in my mind the current thread is certainly v19 material,
    so I have not looked at Richard's patch yet.
    
    			regards, tom lane
    
    
    
    
  31. Re: Reduce "Var IS [NOT] NULL" quals during constant folding

    Richard Guo <guofenglinux@gmail.com> — 2025-05-23T03:03:38Z

    On Thu, May 22, 2025 at 11:51 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    > I wonder if a better answer would be to make the rewriter responsible
    > for this.  If you hold your head at the correct angle, a table with
    > virtual generated columns looks a good deal like a view, and we don't
    > ask the planner to handle those.
    
    In Peter's initial commit (83ea6c540), it was the rewriter that was
    responsible for expanding virtual generated columns.  However, this
    approach introduced several problems (see the reports starting from
    [1]).  In some cases, we can't simply replace Var nodes that reference
    virtual columns with their corresponding generation expressions.  To
    preserve correctness, we may need to wrap those expressions in
    PlaceHolderVars — for example, when the Vars come from the nullable
    side of an outer join or are used in grouping sets.
    
    So in commit 1e4351af3, Dean and I proposed moving the expansion of
    virtual generated columns into the planner, so that we can insert
    PlaceHolderVars when needed.
    
    Yeah, the extra table_open call is annoying.  In this patchset, we're
    performing some additional tasks while the relation is open — such as
    retrieving relhassubclass and attnotnull information.  We also get rid
    of the has_subclass() call along the way.  Maybe this would help
    justify the added cost?
    
    [1] https://postgr.es/m/75eb1a6f-d59f-42e6-8a78-124ee808cda7@gmail.com
    
    Thanks
    Richard
    
    
    
    
  32. Re: Reduce "Var IS [NOT] NULL" quals during constant folding

    Richard Guo <guofenglinux@gmail.com> — 2025-05-28T09:28:42Z

    On Thu, May 22, 2025 at 11:51 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    > BTW, in my mind the current thread is certainly v19 material,
    > so I have not looked at Richard's patch yet.
    
    Yeah, this patchset is targeted for v19.  Maybe we could be more
    aggressive and have 0001 and 0002 in v18?  (no chance for 0003 though)
    
    This patchset does not apply anymore due to 2c0ed86d3.  Here is a new
    rebase.
    
    Thanks
    Richard
    
  33. Re: Reduce "Var IS [NOT] NULL" quals during constant folding

    Richard Guo <guofenglinux@gmail.com> — 2025-06-30T07:26:44Z

    On Wed, May 28, 2025 at 6:28 PM Richard Guo <guofenglinux@gmail.com> wrote:
    > Yeah, this patchset is targeted for v19.  Maybe we could be more
    > aggressive and have 0001 and 0002 in v18?  (no chance for 0003 though)
    >
    > This patchset does not apply anymore due to 2c0ed86d3.  Here is a new
    > rebase.
    
    This patchset does not apply anymore, due to 5069fef1c this time.
    Here is a new rebase.
    
    Thanks
    Richard
    
  34. Re: Reduce "Var IS [NOT] NULL" quals during constant folding

    Andrei Lepikhov <lepihov@gmail.com> — 2025-07-01T13:57:18Z

    On 30/6/2025 09:26, Richard Guo wrote:
    > On Wed, May 28, 2025 at 6:28 PM Richard Guo <guofenglinux@gmail.com> wrote:
    >> Yeah, this patchset is targeted for v19.  Maybe we could be more
    >> aggressive and have 0001 and 0002 in v18?  (no chance for 0003 though)
    >>
    >> This patchset does not apply anymore due to 2c0ed86d3.  Here is a new
    >> rebase.
    > 
    > This patchset does not apply anymore, due to 5069fef1c this time.
    > Here is a new rebase.
    I like the general idea of this work. But I wonder, why is a new hash 
    table designed to store only the notnullattnums field? From the 
    discussion, it is not apparent why not to cache all (or most of) the 
    data needed for get_relation_info. In cases where multiple subqueries 
    reference the same table, it could save some cycles and memory.
    
    -- 
    regards, Andrei Lepikhov
    
    
    
    
  35. Re: Reduce "Var IS [NOT] NULL" quals during constant folding

    Richard Guo <guofenglinux@gmail.com> — 2025-07-02T01:24:25Z

    On Tue, Jul 1, 2025 at 10:57 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
    > I like the general idea of this work. But I wonder, why is a new hash
    > table designed to store only the notnullattnums field? From the
    > discussion, it is not apparent why not to cache all (or most of) the
    > data needed for get_relation_info. In cases where multiple subqueries
    > reference the same table, it could save some cycles and memory.
    
    I think this idea was already thoroughly discussed earlier in this
    thread when Robert proposed moving get_relation_info() to an earlier
    stage.  One reason against it is that not every RTE_RELATION relation
    will be actively part of the query.  Collecting the whole bundle of
    catalog information for such relations is wasteful and can negatively
    impact performance.
    
    Thanks
    Richard
    
    
    
    
  36. Re: Reduce "Var IS [NOT] NULL" quals during constant folding

    Andrei Lepikhov <lepihov@gmail.com> — 2025-07-02T07:32:43Z

    On 2/7/2025 03:24, Richard Guo wrote:
    > On Tue, Jul 1, 2025 at 10:57 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
    >> I like the general idea of this work. But I wonder, why is a new hash
    >> table designed to store only the notnullattnums field? From the
    >> discussion, it is not apparent why not to cache all (or most of) the
    >> data needed for get_relation_info. In cases where multiple subqueries
    >> reference the same table, it could save some cycles and memory.
    > 
    > I think this idea was already thoroughly discussed earlier in this
    > thread when Robert proposed moving get_relation_info() to an earlier
    > stage.  One reason against it is that not every RTE_RELATION relation
    > will be actively part of the query.  Collecting the whole bundle of
    > catalog information for such relations is wasteful and can negatively
    > impact performance.
    I'm trying to understand the phrase "not every relation ...". Could you 
    clarify that? I know that Postgres can eliminate some self-joins and 
    outer joins, and might determine that a WHERE clause is always false, 
    etc. However, these cases seem to be rare, especially when users refine 
    their queries. Additionally, AFAICS, this is not an issue for partition 
    pruning.
    
    Generally, I believe these optimisations should have a positive impact. 
    So, I think "not actively participate" might mean something different.
    
    I must say that I appreciate Tom's idea and see significant benefits in 
    making the parse tree a read-only structure. In complex queries, it can 
    be frustrating to make copies of the parse tree, leading to complaints 
    from users about insufficient memory allocation. This is why, in our 
    enterprise fork, we support a specific option to avoid copying the parse 
    tree multiple times.
    
    Therefore, it would be better to find a way to refactor the 
    `preprocess_relation_rtes` function to gather table statistics lazily 
    into the hash table when they are needed. For example, we could do this 
    at the moment of creating the `RelOptInfo` or before a subquery pull-up, 
    without modifying the RTE at all.
    
    -- 
    regards, Andrei Lepikhov
    
    
    
    
  37. Re: Reduce "Var IS [NOT] NULL" quals during constant folding

    Richard Guo <guofenglinux@gmail.com> — 2025-07-02T09:14:48Z

    On Wed, Jul 2, 2025 at 4:32 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
    > I must say that I appreciate Tom's idea and see significant benefits in
    > making the parse tree a read-only structure. In complex queries, it can
    > be frustrating to make copies of the parse tree, leading to complaints
    > from users about insufficient memory allocation. This is why, in our
    > enterprise fork, we support a specific option to avoid copying the parse
    > tree multiple times.
    
    I don't see how the changes in this patchset violate Tom's proposal
    regarding keeping the parse tree read-only.  The only potential issue
    I can see is that we may clear the rte->inh flag in some cases -- but
    that behavior has existed for a long time, not starting from this
    patchset.
    
    > Therefore, it would be better to find a way to refactor the
    > `preprocess_relation_rtes` function to gather table statistics lazily
    > into the hash table when they are needed. For example, we could do this
    > at the moment of creating the `RelOptInfo` or before a subquery pull-up,
    > without modifying the RTE at all.
    
    All the catalog information collected in preprocess_relation_rtes() is
    needed very early in the planner.  I don't see how we could move that
    logic to a later stage, such as at the moment of creating RelOptInfos
    as you mentioned.
    
    Thanks
    Richard
    
    
    
    
  38. Re: Reduce "Var IS [NOT] NULL" quals during constant folding

    Andrei Lepikhov <lepihov@gmail.com> — 2025-07-02T09:44:18Z

    On 2/7/2025 11:14, Richard Guo wrote:
    > On Wed, Jul 2, 2025 at 4:32 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
    >> I must say that I appreciate Tom's idea and see significant benefits in
    >> making the parse tree a read-only structure. In complex queries, it can
    >> be frustrating to make copies of the parse tree, leading to complaints
    >> from users about insufficient memory allocation. This is why, in our
    >> enterprise fork, we support a specific option to avoid copying the parse
    >> tree multiple times.
    > 
    > I don't see how the changes in this patchset violate Tom's proposal
    > regarding keeping the parse tree read-only.  The only potential issue
    > I can see is that we may clear the rte->inh flag in some cases -- but
    > that behavior has existed for a long time, not starting from this
    > patchset.
    I think the 1e4351a solution was a little too fast and it changes the 
    parse tree inside the planner. To achieve a read-only parse tree, we 
    will need to redesign it.
    >> Therefore, it would be better to find a way to refactor the
    >> `preprocess_relation_rtes` function to gather table statistics lazily
    >> into the hash table when they are needed. For example, we could do this
    >> at the moment of creating the `RelOptInfo` or before a subquery pull-up,
    >> without modifying the RTE at all.
    > All the catalog information collected in preprocess_relation_rtes() is
    > needed very early in the planner.  I don't see how we could move that
    > logic to a later stage, such as at the moment of creating RelOptInfos
    > as you mentioned.
    I apologise for the confusion in my previous message. I am not 
    suggesting that we postpone this. Instead, I would like an explanation 
    of why you believe that accessing the table statistics earlier could 
    negatively impact planner performance. As I mentioned before, I have 
    only envisioned rare instances where join eliminations may reduce the 
    number of relations and clause evaluations resulting in a constant.
    
    -- 
    regards, Andrei Lepikhov
    
    
    
    
  39. Re: Reduce "Var IS [NOT] NULL" quals during constant folding

    Richard Guo <guofenglinux@gmail.com> — 2025-07-03T00:30:07Z

    On Wed, Jul 2, 2025 at 6:44 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
    > On 2/7/2025 11:14, Richard Guo wrote:
    > > On Wed, Jul 2, 2025 at 4:32 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
    > >> Therefore, it would be better to find a way to refactor the
    > >> `preprocess_relation_rtes` function to gather table statistics lazily
    > >> into the hash table when they are needed. For example, we could do this
    > >> at the moment of creating the `RelOptInfo` or before a subquery pull-up,
    > >> without modifying the RTE at all.
    
    > > All the catalog information collected in preprocess_relation_rtes() is
    > > needed very early in the planner.  I don't see how we could move that
    > > logic to a later stage, such as at the moment of creating RelOptInfos
    > > as you mentioned.
    
    > I apologise for the confusion in my previous message. I am not
    > suggesting that we postpone this. Instead, I would like an explanation
    > of why you believe that accessing the table statistics earlier could
    > negatively impact planner performance. As I mentioned before, I have
    > only envisioned rare instances where join eliminations may reduce the
    > number of relations and clause evaluations resulting in a constant.
    
    I wonder how you arrived at the conclusion that these cases are rare.
    If they truly are, then why have we invested so much effort in
    optimizing for them?
    
    I also wonder why you think we should collect all catalog information
    at the very early stage of the planner, given that most of it is only
    used much later -- after RelOptInfos have been created.  If the goal
    is to avoid redundant catalog retrieval for the same relation in
    get_relation_info(), perhaps adding a caching mechanism within that
    function would be a more targeted solution.  I don't see a strong
    reason for moving get_relation_info() to the very beginning of the
    planner.
    
    Thanks
    Richard
    
    
    
    
  40. Re: Reduce "Var IS [NOT] NULL" quals during constant folding

    Andrei Lepikhov <lepihov@gmail.com> — 2025-07-03T09:08:54Z

    On 3/7/2025 02:30, Richard Guo wrote:
    > On Wed, Jul 2, 2025 at 6:44 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
    >> I apologise for the confusion in my previous message. I am not
    >> suggesting that we postpone this. Instead, I would like an explanation
    >> of why you believe that accessing the table statistics earlier could
    >> negatively impact planner performance. As I mentioned before, I have
    >> only envisioned rare instances where join eliminations may reduce the
    >> number of relations and clause evaluations resulting in a constant.
    > 
    > I wonder how you arrived at the conclusion that these cases are rare.
    > If they truly are, then why have we invested so much effort in
    > optimizing for them?
    There is no direct connection between effort and frequency; it primarily 
    depends on personal desire. As you might find, much of the effort goes 
    into convincing the community.
    These specific cases should be rare from the Postgres perspective, the 
    planner's code remains simple based on the assumption that crafting the 
    appropriate query is the user's responsibility.
    
    > 
    > I also wonder why you think we should collect all catalog information
    > at the very early stage of the planner, given that most of it is only
    > used much later -- after RelOptInfos have been created.  If the goal
    > is to avoid redundant catalog retrieval for the same relation in
    > get_relation_info(), perhaps adding a caching mechanism within that
    > function would be a more targeted solution.  I don't see a strong
    > reason for moving get_relation_info() to the very beginning of the
    > planner.
    This indicates that there is still room for further exploration and 
    discussion. For starters, the 'Redundant NullTest' issue is not the only 
    concern. Additionally, Postgres processes pull-up transformation blindly 
    without considering the cost model. However, each pull-up has its corner 
    case, and in practice, we often see new complaints arise after a new 
    pull-up technique is committed. One possible solution I envision could 
    be to examine indexes and/or make raw initial estimations to avoid 
    problematic pull-up cases.
    
    -- 
    regards, Andrei Lepikhov
    
    
    
    
  41. Re: Reduce "Var IS [NOT] NULL" quals during constant folding

    Richard Guo <guofenglinux@gmail.com> — 2025-07-09T06:32:47Z

    On Mon, Jun 30, 2025 at 4:26 PM Richard Guo <guofenglinux@gmail.com> wrote:
    > On Wed, May 28, 2025 at 6:28 PM Richard Guo <guofenglinux@gmail.com> wrote:
    > > This patchset does not apply anymore due to 2c0ed86d3.  Here is a new
    > > rebase.
    
    > This patchset does not apply anymore, due to 5069fef1c this time.
    > Here is a new rebase.
    
    Here is a new rebase.  I moved the call to preprocess_relation_rtes to
    a later point within convert_EXISTS_sublink_to_join, so we can avoid
    the work if it turns out that the EXISTS SubLink cannot be flattened.
    Nothing essential has changed.
    
    The NOT-IN pullup work depends on the changes in this patchset (it
    also relies on the not-null information), so I'd like to move it
    forward.
    
    Hi Tom, Robert -- just to be sure, are you planning to take another
    look at it?
    
    Thanks
    Richard
    
  42. Re: Reduce "Var IS [NOT] NULL" quals during constant folding

    Richard Guo <guofenglinux@gmail.com> — 2025-07-16T01:57:51Z

    On Wed, Jul 9, 2025 at 3:32 PM Richard Guo <guofenglinux@gmail.com> wrote:
    > On Mon, Jun 30, 2025 at 4:26 PM Richard Guo <guofenglinux@gmail.com> wrote:
    > > On Wed, May 28, 2025 at 6:28 PM Richard Guo <guofenglinux@gmail.com> wrote:
    > > > This patchset does not apply anymore due to 2c0ed86d3.  Here is a new
    > > > rebase.
    >
    > > This patchset does not apply anymore, due to 5069fef1c this time.
    > > Here is a new rebase.
    >
    > Here is a new rebase.  I moved the call to preprocess_relation_rtes to
    > a later point within convert_EXISTS_sublink_to_join, so we can avoid
    > the work if it turns out that the EXISTS SubLink cannot be flattened.
    > Nothing essential has changed.
    >
    > The NOT-IN pullup work depends on the changes in this patchset (it
    > also relies on the not-null information), so I'd like to move it
    > forward.
    >
    > Hi Tom, Robert -- just to be sure, are you planning to take another
    > look at it?
    
    I'm aiming to push this patchset next week, barring any objections.
    
    Thanks
    Richard
    
    
    
    
  43. Re: Reduce "Var IS [NOT] NULL" quals during constant folding

    Richard Guo <guofenglinux@gmail.com> — 2025-07-22T02:55:14Z

    On Wed, Jul 16, 2025 at 10:57 AM Richard Guo <guofenglinux@gmail.com> wrote:
    > On Wed, Jul 9, 2025 at 3:32 PM Richard Guo <guofenglinux@gmail.com> wrote:
    > > Here is a new rebase.  I moved the call to preprocess_relation_rtes to
    > > a later point within convert_EXISTS_sublink_to_join, so we can avoid
    > > the work if it turns out that the EXISTS SubLink cannot be flattened.
    > > Nothing essential has changed.
    > >
    > > The NOT-IN pullup work depends on the changes in this patchset (it
    > > also relies on the not-null information), so I'd like to move it
    > > forward.
    > >
    > > Hi Tom, Robert -- just to be sure, are you planning to take another
    > > look at it?
    
    > I'm aiming to push this patchset next week, barring any objections.
    
    Hearing nothing, I've gone ahead and pushed the patchset.  Thanks for
    all the reviews and discussion.
    
    Thanks
    Richard
    
    
    
    
  44. Re: Reduce "Var IS [NOT] NULL" quals during constant folding

    Tomas Vondra <tomas@vondra.me> — 2025-07-29T18:45:38Z

    On 7/22/25 04:55, Richard Guo wrote:
    > On Wed, Jul 16, 2025 at 10:57 AM Richard Guo <guofenglinux@gmail.com> wrote:
    >> On Wed, Jul 9, 2025 at 3:32 PM Richard Guo <guofenglinux@gmail.com> wrote:
    >>> Here is a new rebase.  I moved the call to preprocess_relation_rtes to
    >>> a later point within convert_EXISTS_sublink_to_join, so we can avoid
    >>> the work if it turns out that the EXISTS SubLink cannot be flattened.
    >>> Nothing essential has changed.
    >>>
    >>> The NOT-IN pullup work depends on the changes in this patchset (it
    >>> also relies on the not-null information), so I'd like to move it
    >>> forward.
    >>>
    >>> Hi Tom, Robert -- just to be sure, are you planning to take another
    >>> look at it?
    > 
    >> I'm aiming to push this patchset next week, barring any objections.
    > 
    > Hearing nothing, I've gone ahead and pushed the patchset.  Thanks for
    > all the reviews and discussion.
    > 
    
    Hi Richard,
    
    Does this mean we can close the PG18 open item tracking this?
    
      * virtual generated columns and planning speed
        Owner: Peter Eisentraut
    
    
    If I understand correctly, the commits went only to master, which means
    PG18 still does the table_open/table_close calls Tom complained about in
    [1] and [2].
    
    I think it'd be perfectly fine if it only affected cases with virtual
    generated columns, but AFAICS we do the open/close call for every
    relation. Has anyone tried to measure if the impact is measurable? I
    suspect it's negligible, we already hold a lock on the rel anyway
    (right?). But has anyone tried to measure if that's true?
    
    If it turns out to be expensive, that might be an argument to backpatch
    the changes after all - the commits seem fairly small/non-invasive.
    
    
    regards
    
    
    [1] https://www.postgresql.org/message-id/602561.1744314879%40sss.pgh.pa.us
    
    [2] https://www.postgresql.org/message-id/1514756.1747925490%40sss.pgh.pa.us
    
    
    -- 
    Tomas Vondra
    
    
    
    
    
  45. Re: Reduce "Var IS [NOT] NULL" quals during constant folding

    Richard Guo <guofenglinux@gmail.com> — 2025-07-30T06:17:38Z

    On Wed, Jul 30, 2025 at 3:45 AM Tomas Vondra <tomas@vondra.me> wrote:
    > Does this mean we can close the PG18 open item tracking this?
    >
    >   * virtual generated columns and planning speed
    >     Owner: Peter Eisentraut
    >
    >
    > If I understand correctly, the commits went only to master, which means
    > PG18 still does the table_open/table_close calls Tom complained about in
    > [1] and [2].
    
    You're right.  This patchset is intended only for master, so it
    doesn't address that open item for v18.
    
    > I think it'd be perfectly fine if it only affected cases with virtual
    > generated columns, but AFAICS we do the open/close call for every
    > relation. Has anyone tried to measure if the impact is measurable? I
    > suspect it's negligible, we already hold a lock on the rel anyway
    > (right?). But has anyone tried to measure if that's true?
    
    I ran a naive test on v18: selecting from 10 tables, comparing the
    unmodified v18 with a hacked version where the call to
    expand_virtual_generated_columns() was removed, 3 times each.  Here
    are the planning times I observed.
    
    create table t (a int, b int, c int);
    
    explain (costs off)
    select * from t t1
     natural join t t2
     natural join t t3
     natural join t t4
     natural join t t5
     natural join t t6
     natural join t t7
     natural join t t8
     natural join t t9
     natural join t t10
    ;
    
    -- unmodified v18
    Time: 133.244 ms
    Time: 132.831 ms
    Time: 132.345 ms
    
    -- the hacked version
    Time: 132.756 ms
    Time: 132.745 ms
    Time: 133.728 ms
    
    I didn't observe measurable impact, but perhaps others can run more
    representative tests and demonstrate otherwise.
    
    (I recall Peter E. mentioned he might run some tests to measure the
    impact.  Not sure if he's had the time to do that yet.)
    
    > If it turns out to be expensive, that might be an argument to backpatch
    > the changes after all - the commits seem fairly small/non-invasive.
    
    The main goal of this patchset is to collect NOT NULL information
    early in the planning phase and use it to reduce NullTest quals during
    constant folding.
    
    It doesn't eliminate the added table_open call, but it does centralize
    the collection of all required early-stage catalog information into a
    single table_open/table_close call, which may help justify the added
    overhead.  However, I think Tom's proposal is to move the expansion
    of virtual generated columns to the rewriter, so I'm not sure whether
    backpatching this patchset to v18 would fully address his concerns.
    
    (I had previously proposed including 0001 and 0002 in v18, but I
    dropped the idea due to a lack of support.)
    
    Thanks
    Richard
    
    
    
    
  46. Re: Reduce "Var IS [NOT] NULL" quals during constant folding

    Richard Guo <guofenglinux@gmail.com> — 2025-07-30T07:55:43Z

    On Wed, Jul 30, 2025 at 3:17 PM Richard Guo <guofenglinux@gmail.com> wrote:
    > create table t (a int, b int, c int);
    >
    > explain (costs off)
    > select * from t t1
    >  natural join t t2
    >  natural join t t3
    >  natural join t t4
    >  natural join t t5
    >  natural join t t6
    >  natural join t t7
    >  natural join t t8
    >  natural join t t9
    >  natural join t t10
    > ;
    
    FWIW, for this query, I've observed that table_open/table_close are
    also called for each RTE_RELATION in build_physical_tlist().  Not sure
    if we should also be concerned about those calls.
    
    It's not clear to me how much performance impact an extra table_open
    might have, especially when the lock is already held, and the relation
    is likely present in the relcache.
    
    Thanks
    Richard
    
    
    
    
  47. Re: Reduce "Var IS [NOT] NULL" quals during constant folding

    Nathan Bossart <nathandbossart@gmail.com> — 2025-08-19T17:38:49Z

    On Wed, Jul 30, 2025 at 03:17:38PM +0900, Richard Guo wrote:
    > On Wed, Jul 30, 2025 at 3:45 AM Tomas Vondra <tomas@vondra.me> wrote:
    >> Does this mean we can close the PG18 open item tracking this?
    >>
    >>   * virtual generated columns and planning speed
    >>     Owner: Peter Eisentraut
    >>
    >>
    >> If I understand correctly, the commits went only to master, which means
    >> PG18 still does the table_open/table_close calls Tom complained about in
    >> [1] and [2].
    > 
    > You're right.  This patchset is intended only for master, so it
    > doesn't address that open item for v18.
    > 
    >> I think it'd be perfectly fine if it only affected cases with virtual
    >> generated columns, but AFAICS we do the open/close call for every
    >> relation. Has anyone tried to measure if the impact is measurable? I
    >> suspect it's negligible, we already hold a lock on the rel anyway
    >> (right?). But has anyone tried to measure if that's true?
    > 
    > I ran a naive test on v18: selecting from 10 tables, comparing the
    > unmodified v18 with a hacked version where the call to
    > expand_virtual_generated_columns() was removed, 3 times each.  Here
    > are the planning times I observed.
    > 
    > [...]
    > 
    > I didn't observe measurable impact, but perhaps others can run more
    > representative tests and demonstrate otherwise.
    > 
    > (I recall Peter E. mentioned he might run some tests to measure the
    > impact.  Not sure if he's had the time to do that yet.)
    
    There is still an open item for this one, but it's not clear whether we are
    planning to do anything about this for v18, especially since nobody has
    shown measurable performance impact.  Does anyone want to argue for
    addressing this for v18, or shall we close the open item as "Won't Fix"?
    
    -- 
    nathan
    
    
    
    
  48. Re: Reduce "Var IS [NOT] NULL" quals during constant folding

    Richard Guo <guofenglinux@gmail.com> — 2025-08-20T01:29:03Z

    On Wed, Aug 20, 2025 at 2:38 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
    > On Wed, Jul 30, 2025 at 03:17:38PM +0900, Richard Guo wrote:
    > > I didn't observe measurable impact, but perhaps others can run more
    > > representative tests and demonstrate otherwise.
    > >
    > > (I recall Peter E. mentioned he might run some tests to measure the
    > > impact.  Not sure if he's had the time to do that yet.)
    
    > There is still an open item for this one, but it's not clear whether we are
    > planning to do anything about this for v18, especially since nobody has
    > shown measurable performance impact.  Does anyone want to argue for
    > addressing this for v18, or shall we close the open item as "Won't Fix"?
    
    I don't think we're likely to do anything about this for v18.
    Actually, I still doubt that the extra table_open call brings any
    measurable performance impact, especially since the lock is already
    held and the relation is likely already present in the relcache.
    
    Also, I still don't think moving the expansion of virtual generated
    columns to the rewriter (as Tom proposed) is a better idea.  It turned
    out to have several problems that need to be fixed with the help of
    PHVs, which is why we moved the expansion into the planner.
    
    Thanks
    Richard
    
    
    
    
  49. Re: Reduce "Var IS [NOT] NULL" quals during constant folding

    Nathan Bossart <nathandbossart@gmail.com> — 2025-08-20T14:10:58Z

    On Wed, Aug 20, 2025 at 10:29:03AM +0900, Richard Guo wrote:
    > On Wed, Aug 20, 2025 at 2:38 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
    >> There is still an open item for this one, but it's not clear whether we are
    >> planning to do anything about this for v18, especially since nobody has
    >> shown measurable performance impact.  Does anyone want to argue for
    >> addressing this for v18, or shall we close the open item as "Won't Fix"?
    > 
    > I don't think we're likely to do anything about this for v18.
    > Actually, I still doubt that the extra table_open call brings any
    > measurable performance impact, especially since the lock is already
    > held and the relation is likely already present in the relcache.
    > 
    > Also, I still don't think moving the expansion of virtual generated
    > columns to the rewriter (as Tom proposed) is a better idea.  It turned
    > out to have several problems that need to be fixed with the help of
    > PHVs, which is why we moved the expansion into the planner.
    
    Okay.  I have marked the v18 open item as "Won't Fix".
    
    -- 
    nathan
    
    
    
    
  50. Re: Reduce "Var IS [NOT] NULL" quals during constant folding

    Richard Guo <guofenglinux@gmail.com> — 2025-08-21T01:07:33Z

    On Wed, Aug 20, 2025 at 11:11 PM Nathan Bossart
    <nathandbossart@gmail.com> wrote:
    > On Wed, Aug 20, 2025 at 10:29:03AM +0900, Richard Guo wrote:
    > > On Wed, Aug 20, 2025 at 2:38 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
    > >> There is still an open item for this one, but it's not clear whether we are
    > >> planning to do anything about this for v18, especially since nobody has
    > >> shown measurable performance impact.  Does anyone want to argue for
    > >> addressing this for v18, or shall we close the open item as "Won't Fix"?
    
    > > I don't think we're likely to do anything about this for v18.
    > > Actually, I still doubt that the extra table_open call brings any
    > > measurable performance impact, especially since the lock is already
    > > held and the relation is likely already present in the relcache.
    > >
    > > Also, I still don't think moving the expansion of virtual generated
    > > columns to the rewriter (as Tom proposed) is a better idea.  It turned
    > > out to have several problems that need to be fixed with the help of
    > > PHVs, which is why we moved the expansion into the planner.
    
    > Okay.  I have marked the v18 open item as "Won't Fix".
    
    Thank you for helping with this.
    
    Thanks
    Richard
    
    
    
    
  51. Re: Reduce "Var IS [NOT] NULL" quals during constant folding

    Junwang Zhao <zhjwpku@gmail.com> — 2025-09-07T11:11:52Z

    Hi,
    
    On Thu, Aug 21, 2025 at 9:07 AM Richard Guo <guofenglinux@gmail.com> wrote:
    >
    > On Wed, Aug 20, 2025 at 11:11 PM Nathan Bossart
    > <nathandbossart@gmail.com> wrote:
    > > On Wed, Aug 20, 2025 at 10:29:03AM +0900, Richard Guo wrote:
    > > > On Wed, Aug 20, 2025 at 2:38 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
    > > >> There is still an open item for this one, but it's not clear whether we are
    > > >> planning to do anything about this for v18, especially since nobody has
    > > >> shown measurable performance impact.  Does anyone want to argue for
    > > >> addressing this for v18, or shall we close the open item as "Won't Fix"?
    >
    > > > I don't think we're likely to do anything about this for v18.
    > > > Actually, I still doubt that the extra table_open call brings any
    > > > measurable performance impact, especially since the lock is already
    > > > held and the relation is likely already present in the relcache.
    > > >
    > > > Also, I still don't think moving the expansion of virtual generated
    > > > columns to the rewriter (as Tom proposed) is a better idea.  It turned
    > > > out to have several problems that need to be fixed with the help of
    > > > PHVs, which is why we moved the expansion into the planner.
    >
    > > Okay.  I have marked the v18 open item as "Won't Fix".
    >
    > Thank you for helping with this.
    >
    > Thanks
    > Richard
    >
    >
    
    While reading this thread, I found that it uses *Relids* to collect NOT NULL
    attribute numbers, I think this might be an oversight, since ISTM that
    Relids is used to represent the index of the relation in the range table.
    
    I searched the code base and it seems nowhere to use Relids to represent
    attribute numbers, and there is a *notnullattnums* field in RelOptInfo:
    
    /* zero-based set containing attnums of NOT NULL columns */
    Bitmapset *notnullattnums;
    
    So I think it would be better to be consistent, anyway I post a trivial patch
    if the community agrees with me.
    
    -- 
    Regards
    Junwang Zhao
    
  52. Re: Reduce "Var IS [NOT] NULL" quals during constant folding

    Tender Wang <tndrwang@gmail.com> — 2025-09-08T00:51:39Z

    Junwang Zhao <zhjwpku@gmail.com> 于2025年9月7日周日 19:12写道:
    
    > Hi,
    >
    > On Thu, Aug 21, 2025 at 9:07 AM Richard Guo <guofenglinux@gmail.com>
    > wrote:
    > >
    > > On Wed, Aug 20, 2025 at 11:11 PM Nathan Bossart
    > > <nathandbossart@gmail.com> wrote:
    > > > On Wed, Aug 20, 2025 at 10:29:03AM +0900, Richard Guo wrote:
    > > > > On Wed, Aug 20, 2025 at 2:38 AM Nathan Bossart <
    > nathandbossart@gmail.com> wrote:
    > > > >> There is still an open item for this one, but it's not clear
    > whether we are
    > > > >> planning to do anything about this for v18, especially since nobody
    > has
    > > > >> shown measurable performance impact.  Does anyone want to argue for
    > > > >> addressing this for v18, or shall we close the open item as "Won't
    > Fix"?
    > >
    > > > > I don't think we're likely to do anything about this for v18.
    > > > > Actually, I still doubt that the extra table_open call brings any
    > > > > measurable performance impact, especially since the lock is already
    > > > > held and the relation is likely already present in the relcache.
    > > > >
    > > > > Also, I still don't think moving the expansion of virtual generated
    > > > > columns to the rewriter (as Tom proposed) is a better idea.  It
    > turned
    > > > > out to have several problems that need to be fixed with the help of
    > > > > PHVs, which is why we moved the expansion into the planner.
    > >
    > > > Okay.  I have marked the v18 open item as "Won't Fix".
    > >
    > > Thank you for helping with this.
    > >
    > > Thanks
    > > Richard
    > >
    > >
    >
    > While reading this thread, I found that it uses *Relids* to collect NOT
    > NULL
    > attribute numbers, I think this might be an oversight, since ISTM that
    > Relids is used to represent the index of the relation in the range table.
    >
    > I searched the code base and it seems nowhere to use Relids to represent
    > attribute numbers, and there is a *notnullattnums* field in RelOptInfo:
    >
    > /* zero-based set containing attnums of NOT NULL columns */
    > Bitmapset *notnullattnums;
    >
    > So I think it would be better to be consistent, anyway I post a trivial
    > patch
    > if the community agrees with me.
    >
    > --
    > Regards
    > Junwang Zhao
    >
    
    +1
    >From the code readability perspective, Bitmapset* seems better.
    -- 
    Thanks,
    Tender Wang
    
  53. Re: Reduce "Var IS [NOT] NULL" quals during constant folding

    Richard Guo <guofenglinux@gmail.com> — 2025-09-08T08:21:30Z

    On Sun, Sep 7, 2025 at 8:12 PM Junwang Zhao <zhjwpku@gmail.com> wrote:
    > While reading this thread, I found that it uses *Relids* to collect NOT NULL
    > attribute numbers, I think this might be an oversight, since ISTM that
    > Relids is used to represent the index of the relation in the range table.
    
    Nice catch; it's better to use Bitmapset * rather than Relids in this
    scenario.  That was my oversight; will fix it.
    
    > So I think it would be better to be consistent, anyway I post a trivial patch
    > if the community agrees with me.
    
    Your patch misses one spot: the notnullattnums in
    get_relation_notnullatts() should also be fixed.  Otherwise it LGTM.
    
    - Richard
    
    
    
    
  54. Re: Reduce "Var IS [NOT] NULL" quals during constant folding

    Junwang Zhao <zhjwpku@gmail.com> — 2025-09-08T13:07:55Z

    On Mon, Sep 8, 2025 at 4:21 PM Richard Guo <guofenglinux@gmail.com> wrote:
    >
    > On Sun, Sep 7, 2025 at 8:12 PM Junwang Zhao <zhjwpku@gmail.com> wrote:
    > > While reading this thread, I found that it uses *Relids* to collect NOT NULL
    > > attribute numbers, I think this might be an oversight, since ISTM that
    > > Relids is used to represent the index of the relation in the range table.
    >
    > Nice catch; it's better to use Bitmapset * rather than Relids in this
    > scenario.  That was my oversight; will fix it.
    >
    > > So I think it would be better to be consistent, anyway I post a trivial patch
    > > if the community agrees with me.
    >
    > Your patch misses one spot: the notnullattnums in
    > get_relation_notnullatts() should also be fixed.  Otherwise it LGTM.
    
    True, attached v2 adds that missing spot, thanks for the review.
    
    >
    > - Richard
    
    
    
    -- 
    Regards
    Junwang Zhao
    
  55. Re: Reduce "Var IS [NOT] NULL" quals during constant folding

    Richard Guo <guofenglinux@gmail.com> — 2025-09-12T02:21:15Z

    On Mon, Sep 8, 2025 at 10:08 PM Junwang Zhao <zhjwpku@gmail.com> wrote:
    > On Mon, Sep 8, 2025 at 4:21 PM Richard Guo <guofenglinux@gmail.com> wrote:
    > > Your patch misses one spot: the notnullattnums in
    > > get_relation_notnullatts() should also be fixed.  Otherwise it LGTM.
    
    > True, attached v2 adds that missing spot, thanks for the review.
    
    Pushed.  Thanks for the report and fix.
    
    - Richard
    
    
    
    
  56. Re: Reduce "Var IS [NOT] NULL" quals during constant folding

    BharatDB <bharatdbpg@gmail.com> — 2025-09-18T12:46:02Z

    Dear Team,
    
    In continuation with the previous mail
    (CAAh00ETEMEXntw1gxp=xP+4sqrz80tK1R4VEhTpqH9CJpxs-wA) regarding the
    optimizations in PostgreSQL 18 to simplify query plans by folding away Var
    IS [NOT] NULL checks on columns declared NOT NULL. I experimented with two
    approaches, but both hit significant errors:
    
    *1. PlannerInfo-level hash table (HTAB *rel_notnull_info)*
    
       - The idea was to collect NOT NULL constraint info early and use it for
       constant folding.
       - gen_node_support.pl cannot handle non-serializable HTAB* fields when
       generating node serialization code, leading to compilation errors (“could
       not handle type HTAB*”).
       - Workarounds (e.g., /* nonserialized */ comments) fail due to comment
       stripping, and marking the whole PlannerInfo with
    pg_node_attr(no_copy_equal,
       no_read_write) risks breaking features like parallel query execution or
       plan caching.
       - Other limitations include potential ABI stability issues from
       modifying node structs, increased memory usage from hash tables in nodes,
       and the preference for per-relation data structures (e.g., in RelOptInfo)
       over global ones.
       - A global hash table is a viable alternative but complicates subquery
       handling.
    
    *2. Planner-level relattrinfo_htab for column nullability*
    
       - This avoids touching node serialization, but still suffers from
       practical issues.
       - It crashes during initdb because catalog state is unavailable in
       bootstrap mode, requires fragile lifecycle management to avoid memory leaks
       or stale entries which leads to risking leaks or stale state, and largely
       duplicates the existing var_is_nonnullable() logic.
       - In practice, it yields minimal performance benefit since constant
       folding and nullability inference are largely handled in core
    
    I’d appreciate feedback on whether pursuing either direction makes sense,
    or whether improvements should instead focus on extending the existing
    var_is_nonnullable() framework.
    
    Sincerely,
    Soumya
    
    On Fri, Sep 12, 2025 at 7:51 AM Richard Guo <guofenglinux@gmail.com> wrote:
    
    > On Mon, Sep 8, 2025 at 10:08 PM Junwang Zhao <zhjwpku@gmail.com> wrote:
    > > On Mon, Sep 8, 2025 at 4:21 PM Richard Guo <guofenglinux@gmail.com>
    > wrote:
    > > > Your patch misses one spot: the notnullattnums in
    > > > get_relation_notnullatts() should also be fixed.  Otherwise it LGTM.
    >
    > > True, attached v2 adds that missing spot, thanks for the review.
    >
    > Pushed.  Thanks for the report and fix.
    >
    > - Richard
    >
    >
    >