Thread

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Assign each subquery a unique name prior to planning it.

  2. Keep track of what RTIs a Result node is scanning.

  1. plan shape work

    Robert Haas <robertmhaas@gmail.com> — 2025-05-19T18:01:55Z

    Hi,
    
    A couple of people at pgconf.dev seemed to want to know more about my
    ongoing plan shape work, so here are the patches I have currently.
    This is a long way from something that actually looks like a usable
    feature, but these are bits of infrastructure that I think will be
    necessary to get to a usable feature. As a recap, my overall goal here
    is to make it so that you can examine a finished plan, figure out what
    decisions the planner made, and then somehow get the planner to make
    those same decisions over again in a future planning cycle. Since
    doing this for all types of planner decisions seems too difficult for
    an initial goal, I'm focusing on scans and joins for now. A further
    goal is that I want it to be possible for extensions to use this
    infrastructure to implement a variety of different policies that they
    might feel to be beneficial, so I'm looking to minimize the amount of
    stuff that has to be done in core PostgreSQL or can only be used by
    core PostgreSQL.
    
    So far I've identified two main problems. First, you need to be able
    to reconstruct the planner decisions from the final plan, which you
    almost can do already but we're missing a few key pieces of
    information in the final plan tree. Second, you need to be able to
    write those decisions down in a way that can be correctly and
    unambiguously reinterpreted during a future planning cycle for the
    same query. For example, if we say that the planner chose a sequential
    scan of table foo, what exactly does that mean? Table foo could appear
    multiple times in the query, either in different subqueries or the
    same one, and it could be a partitioned table with a different scan
    method for each partition.
    
    Let's start by talking about problem #1. I've found two subproblems in
    this area so far. The first is that a Result node does not store the
    relids of the scan or join that it replaces. Note that a Result note
    whose job is to apply a one-time filter or a projection to some
    subordinate node is not an issue here -- we can just look through the
    Result node to whatever scan or join is beneath it. The concern here
    is about the case where a scan or join is proven empty and entirely
    replaced by a Result node that has "One-Time Filter: false". Patch
    0001 adds that field, and patch 0002 teaches ExplainPreScanNodes about
    it, which results in a number of regression test output changes that I
    personally consider to be improvements -- with these patches, we
    properly qualify some column references with a table alias as EXPLAIN
    does in all other cases, as opposed to printing them as bare column
    names with no alias. Patch 0003 checks that this is the only problem
    of this type that is visible at the stage where we are constructing
    join paths.
    
    Still talking about problem #1, the second subproblem I've identified
    is that during setrefs processing, we elide trivial SubqueryScan,
    Append, and MergeAppend nodes from the final plan. So during planning
    we might see, for example, that a certain join is between RTI 4 and
    RTI 5 and it's, say, a hash join. But after setrefs processing, it may
    appear that RTI was joined to, say, RTI 13, which might not even have
    been part of the same subquery level. If we record that we want to see
    RTI 4 joined to RTI 13 via a hash join, that will be completely
    useless -- the join planning code will never see those two RTIs as
    options to be joined to each other. What I've done in 0006 is made it
    so that we keep a record of each node elided during setrefs
    processing. This list of elided nodes is stored in the PlannedStmt
    outside of the portion of the tree that actually gets executed, so
    that code that is doing plan tree inspection can look at it but
    execution doesn't get any slower (except possibly by having to copy a
    slightly larger amount of data around when reading and writing
    PlannedStmt objects, which seems like it should be negligible).
    
    Now let's talk about problem #2. I believe that we do not actually
    want to refer to what happened to RTI 4 and RTI 5 as I mooted in the
    previous paragraph, but rather to refer to relations by some kind of
    name. However, we can't use the names shown in the EXPLAIN output,
    because those are not really present in the plan and are only assigned
    on-the-fly by EXPLAIN; hence, they can't be known at plan time. Since
    planning proceeds one subquery at a time, I think the right way to
    approach this problem is to first name the subquery and then to name
    the table within that subquery. Subqueries sort of have names right
    now, at least some of them, but it's an odd system: a CTE subquery,
    for example, has the name mentioned by the user, but other kinds of
    subplans just get names like "InitPlan 3" or "SubPlan 2". The real
    problem, though, is that those names are only assigned after we've
    FINISHED planned the subquery. If we begin planning our very first
    subquery, it might turn out to be InitPlan 1 or SubPlan 1, or if while
    planning it we recurse into some further subquery then *that* subquery
    might become InitPlan 1 or SubPlan 1 and OUR subquery might become
    InitPlan 2 or SubPlan 2 (or higher, if we find more subqueries and
    recurse into them too). Thus, being given some information about how
    the user wants, say, SubPlan 2 to be planned is completely useless
    because we won't know whether that is us until after we've done the
    planning that the user is trying to influence.
    
    To solve that problem, I decided to arrange for every subquery to have
    a unique name that is assigned before we begin planning it. Patch 0004
    does this. Then, 0005 records those names in the final plan. That's
    enough that you can look at the scanrelid (or apprelids, etc.) of a
    Plan node and relate that back to a named subquery and a particular
    RTI within that subquery. There is still the problem of how to name
    relations within a single subquery, since it's possible to reuse
    aliases within the same subquery level (simple cases are rejected, but
    there are at least two ways to bypass the error check, and they look
    intentional, so we can't just block it). Then, references to a certain
    alias name can be further multiplied by inheritance expansion. This is
    all a bit hairy but I haven't really found any fundamental problems
    here that keep you from deciding on a workable naming convention.
    
    Hope you find this interesting. If you do, let me know what you think.
    
    Thanks,
    
    --
    Robert Haas
    EDB: http://www.enterprisedb.com
    
  2. Re: plan shape work

    Tomas Vondra <tomas@vondra.me> — 2025-05-20T10:05:05Z

    On 5/19/25 20:01, Robert Haas wrote:
    > Hi,
    > 
    > A couple of people at pgconf.dev seemed to want to know more about my
    > ongoing plan shape work, so here are the patches I have currently.
    > This is a long way from something that actually looks like a usable
    > feature, but these are bits of infrastructure that I think will be
    > necessary to get to a usable feature. As a recap, my overall goal here
    > is to make it so that you can examine a finished plan, figure out what
    > decisions the planner made, and then somehow get the planner to make
    > those same decisions over again in a future planning cycle. Since
    > doing this for all types of planner decisions seems too difficult for
    > an initial goal, I'm focusing on scans and joins for now. A further
    > goal is that I want it to be possible for extensions to use this
    > infrastructure to implement a variety of different policies that they
    > might feel to be beneficial, so I'm looking to minimize the amount of
    > stuff that has to be done in core PostgreSQL or can only be used by
    > core PostgreSQL.
    > 
    > ...
    
    Thanks for the overview. I don't have any immediate feedback, but it
    sounds like it might be related to the "making planner decisions clear"
    session from the unconference ...
    
    The basic premise of that session was about how to give users better
    info about the planner decisions - why paths were selected/rejected,
    etc. A simple example would be "why was the index not used", and the
    possible answers include "dominated by cost by another path" or "does
    not match the index keys" etc.
    
    I wonder if this work might be useful for something like that.
    
    
    regards
    
    -- 
    Tomas Vondra
    
    
    
    
    
  3. Re: plan shape work

    Maciek Sakrejda <maciek@pganalyze.com> — 2025-05-20T19:09:04Z

    +1, this seems like it could be very useful. A somewhat related issue
    is being able to tie plan nodes back to the query text: it can be hard
    to understand the planner's decisions if it's not even clear what part
    of the query it's making decisions about. I'm sure this is not an easy
    problem in general, but I wonder if you think that could be improved
    in the course of this work, or if you have other thoughts about it.
    
    Thanks,
    Maciek
    
    
    
    
  4. Re: plan shape work

    Robert Haas <robertmhaas@gmail.com> — 2025-05-21T13:58:51Z

    On Tue, May 20, 2025 at 2:45 PM Tomas Vondra <tomas@vondra.me> wrote:
    > Thanks for the overview. I don't have any immediate feedback, but it
    > sounds like it might be related to the "making planner decisions clear"
    > session from the unconference ...
    >
    > The basic premise of that session was about how to give users better
    > info about the planner decisions - why paths were selected/rejected,
    > etc. A simple example would be "why was the index not used", and the
    > possible answers include "dominated by cost by another path" or "does
    > not match the index keys" etc.
    >
    > I wonder if this work might be useful for something like that.
    
    I've been wondering that, too. There's definitely some indirect ways
    in which that might be the case. For example, I think this work would
    lend itself to saying "hey, try planning this query, but for that
    table over there, use an index scan on this table." Then, it either
    still doesn't -- meaning the index isn't usable for some reason -- or
    it does and you can see the resulting plan with presumably higher cost
    and maybe infer why it didn't happen. That's better than today, where
    we have only very crude tools that let us do things like disable an
    entire scan type for the entire query, and I think it would make it a
    lot easier and less frustrating for a knowledgeable user to figure out
    why things are happening.
    
    But even though I think that would be better than today, I'm not sure
    it rises to the level of actually being good, because I think it still
    requires a fairly knowledgeable operator to figure things out, and you
    probably have to experiment a bunch to understand the situation
    instead of, say, being able to just look at the EXPLAIN plan and see
    the answer. I think being able to look at the EXPLAIN plan and see the
    answer, without needing a bunch of poking around, would be the ideal
    scenario here.
    
    But in some sense this is the same problem as understanding how an AI
    neural network is reasoning. The answer to "why did the planner pick
    plan X" is always "X was the cheapest possible plan". Ideas like "we
    chose a merge join because both tables are large enough that neither
    would fit into a hash table conveniently" are human explanations of
    why the math had the effect that it did; they are not how the planner
    actually reasons. So it's not just a matter of exposing the actual
    reasoning process to the user, because the computer is not reasoning
    in a way that a human would. It would have to be a matter of exposing
    some kind of other information that would allow the human being to
    comprehend easily what led the machine's algorithm to a certain
    conclusion; and it is not obvious how to get there.
    
    I have a sense - possibly an incorrect one - that the core of the
    problem here is that the planner considers lots of very similar
    alternatives. A hypothetical feature that showed the second-cheapest
    plan would be all but useless, because the second-cheapest plan would
    just be a very minor variation of the cheapest plan in almost all
    cases. One idea that crossed my mind was to display information in
    EXPLAIN about what would have happened if we'd done something really
    different. For instance, suppose that at a certain level of the plan
    tree we actually chose a merge join, but we also show the estimated
    cost of the cheapest hash join (if any) and the cheapest nested loop
    (if any) that we considered at that level. The user might be able to
    draw useful conclusions based on whether those numbers were altogether
    absent (i.e. that join type was not viable at all) or whether the cost
    was a little higher or a lot higher than that of the path actually
    chosen. For scans, you could list which indexes were believed to be
    usable and perhaps what the cost would have been for the cheapest one
    not actually selected; and what the cost of a sequential scan would
    have been if you hadn't picked one.
    
    I'm not sure how useful this would be, so the whole idea might
    actually suck, or maybe it's sort of the right idea but needs a bunch
    of refinement to really be useful. I don't have a better idea right
    now, though.
    
    If there are any notes that were taken during that unconference
    session, please point me in the right direction; I was in another
    session at that time but would read any available notes with interest.
    
    -- 
    Robert Haas
    EDB: http://www.enterprisedb.com
    
    
    
    
  5. Re: plan shape work

    Robert Haas <robertmhaas@gmail.com> — 2025-05-21T14:28:47Z

    On Tue, May 20, 2025 at 3:09 PM Maciek Sakrejda <maciek@pganalyze.com> wrote:
    > +1, this seems like it could be very useful. A somewhat related issue
    > is being able to tie plan nodes back to the query text: it can be hard
    > to understand the planner's decisions if it's not even clear what part
    > of the query it's making decisions about. I'm sure this is not an easy
    > problem in general, but I wonder if you think that could be improved
    > in the course of this work, or if you have other thoughts about it.
    
    Thanks. I don't really have any ideas about the problem you mention,
    perhaps partly because I haven't experienced it too much. I mean, I
    have sometimes been confused about which parts of the query go with
    which parts of the EXPLAIN, but I think in my experience so far that
    is mostly because either (1) both the query and the EXPLAIN output are
    super long and maybe also super-wide and therefore it's hard to
    correlate things by eye or (2) somebody wrote a query where they use
    the same table and/or table alias over and over again in different
    parts of the query and so it's hard to tell which reference goes with
    which. Neither of those problems seems all that exciting to me from a
    dev perspective: if you're calling everything a or x or orders or
    something, maybe don't do that, and if your query is 1500 characters
    long, I guess you need to budget some time to align that with the
    query plan. I don't really know how much we can do here. But maybe
    there are cases that I haven't seen where something better is
    possible, or perhaps you have some good idea that I haven't
    considered.
    
    (If I'm honest, I do have an idea that I think might very
    significantly improve the readability of EXPLAIN output. I think it
    would make it much less wide in normal cases without making it much
    longer. This has been percolating in my brain for a few years now and
    I have the vague intention of proposing it at some point, but not
    until I'm good and ready to be flamed to a well-done crisp, because
    I'm quite sure there will be more than one opinion on the merits.)
    
    -- 
    Robert Haas
    EDB: http://www.enterprisedb.com
    
    
    
    
  6. Re: plan shape work

    Maciek Sakrejda <m.sakrejda@gmail.com> — 2025-05-21T16:02:13Z

    On Wed, May 21, 2025 at 7:29 AM Robert Haas <robertmhaas@gmail.com> wrote:
    >
    > On Tue, May 20, 2025 at 3:09 PM Maciek Sakrejda <maciek@pganalyze.com> wrote:
    > > +1, this seems like it could be very useful. A somewhat related issue
    > > is being able to tie plan nodes back to the query text: it can be hard
    > > to understand the planner's decisions if it's not even clear what part
    > > of the query it's making decisions about. I'm sure this is not an easy
    > > problem in general, but I wonder if you think that could be improved
    > > in the course of this work, or if you have other thoughts about it.
    >
    > Thanks. I don't really have any ideas about the problem you mention,
    > perhaps partly because I haven't experienced it too much.
    
    That may be due to your extensive experience with Postgres and EXPLAIN plans.
    
    > I mean, I
    > have sometimes been confused about which parts of the query go with
    > which parts of the EXPLAIN, but I think in my experience so far that
    > is mostly because either (1) both the query and the EXPLAIN output are
    > super long and maybe also super-wide and therefore it's hard to
    > correlate things by eye or (2) somebody wrote a query where they use
    > the same table and/or table alias over and over again in different
    > parts of the query and so it's hard to tell which reference goes with
    > which. Neither of those problems seems all that exciting to me from a
    > dev perspective: if you're calling everything a or x or orders or
    > something, maybe don't do that, and if your query is 1500 characters
    > long, I guess you need to budget some time to align that with the
    > query plan.
    
    Fair enough, although the people trying to make sense of EXPLAIN plans
    are sometimes not the same ones who are writing the queries. And
    sometimes the queries are not written by people at all but by ORMs
    (or—heaven help us—vibe coded). "Don't do X" is a reasonable response
    to "It hurts when I do X," but it doesn't really solve the user's
    problem. That said, it's hard to argue with "We don't have any good
    ideas on how to improve this right now, and it's not a total dumpster
    fire, so we'll focus on other work."
    
    > I don't really know how much we can do here. But maybe
    > there are cases that I haven't seen where something better is
    > possible, or perhaps you have some good idea that I haven't
    > considered.
    
    No great ideas here. I thought initially that a good solution would be
    to have structured EXPLAIN output include something like "Query Text
    Start Index" and "Query Text End Index" fields for each node, but I
    realized that this doesn't really work for multiple joins (and
    probably other cases). Maybe "Query Text Indices", as a list of pairs?
    But from the little I know about the planner, that seems like any sort
    of tracking back to the source would be hard to implement. And it only
    really solves the problem for external EXPLAIN viewers, and only ones
    that put in the work to support this. I'm not sure if the problem can
    be meaningfully addressed for text format, but maybe that's another
    reason not to spend time on it in core.
    
    > (If I'm honest, I do have an idea that I think might very
    > significantly improve the readability of EXPLAIN output. I think it
    > would make it much less wide in normal cases without making it much
    > longer. This has been percolating in my brain for a few years now and
    > I have the vague intention of proposing it at some point, but not
    > until I'm good and ready to be flamed to a well-done crisp, because
    > I'm quite sure there will be more than one opinion on the merits.)
    
    I'm intrigued, and happy to stand by with an extinguisher. The road to
    great ideas is paved with bad ideas.
    
    Thanks,
    Maciek
    
    
    
    
  7. Re: plan shape work

    Robert Haas <robertmhaas@gmail.com> — 2025-05-21T16:27:32Z

    On Wed, May 21, 2025 at 12:03 PM Maciek Sakrejda <m.sakrejda@gmail.com> wrote:
    > That may be due to your extensive experience with Postgres and EXPLAIN plans.
    
    Yes, that is very possible. All things being equal, it helps to have
    done something a lot of times.
    
    > Fair enough, although the people trying to make sense of EXPLAIN plans
    > are sometimes not the same ones who are writing the queries. And
    > sometimes the queries are not written by people at all but by ORMs
    > (or—heaven help us—vibe coded). "Don't do X" is a reasonable response
    > to "It hurts when I do X," but it doesn't really solve the user's
    > problem. That said, it's hard to argue with "We don't have any good
    > ideas on how to improve this right now, and it's not a total dumpster
    > fire, so we'll focus on other work."
    
    +1 to all of that.
    
    > No great ideas here. I thought initially that a good solution would be
    > to have structured EXPLAIN output include something like "Query Text
    > Start Index" and "Query Text End Index" fields for each node, but I
    > realized that this doesn't really work for multiple joins (and
    > probably other cases). Maybe "Query Text Indices", as a list of pairs?
    > But from the little I know about the planner, that seems like any sort
    > of tracking back to the source would be hard to implement. And it only
    > really solves the problem for external EXPLAIN viewers, and only ones
    > that put in the work to support this. I'm not sure if the problem can
    > be meaningfully addressed for text format, but maybe that's another
    > reason not to spend time on it in core.
    
    I'm not gonna say you couldn't make something like that work, but it
    sounds like a lot of effort for a hypothetical piece of external
    visualization software that might or might not produce satisfying
    results. My advice to anyone wanting to pursue this idea would be:
    make a totally fake POC first. Get a sample query with at least a
    moderately complex plan, get the EXPLAIN output, manually generate
    whatever data you think PostgreSQL ought to be able to spit out, and
    do a mock-up of an external viewer. When you're happy with the
    results, show it to some other people and see if they also like it. We
    can have the discussion about whether to include anything in core and
    what it should be after that. I definitely would not rule out the
    possibility that something like this could turn out to be really cool
    -- maybe hovering over stuff and having the corresponding part of the
    plan get highlighted will turn out to be awesome. But I think it might
    also turn out that there are things where it's not quite clear what
    you can or should usefully highlight, like target-list items, or for
    example a case where the query says that a.x = b.x and b.x = c.x but
    in the actual plan we use evaluate a.x = c.x, an expression not
    appearing anywhere in the query text. The legwork of sorting some of
    that kind of stuff out should really happen before making a feature
    proposal.
    
    > I'm intrigued, and happy to stand by with an extinguisher. The road to
    > great ideas is paved with bad ideas.
    
    Thanks. That proposal is a task for another day, but I appreciate the sentiment.
    
    -- 
    Robert Haas
    EDB: http://www.enterprisedb.com
    
    
    
    
  8. Re: plan shape work

    Andy Fan <zhihuifan1213@163.com> — 2025-05-22T00:54:49Z

    Robert Haas <robertmhaas@gmail.com> writes:
    
    > Hi,
    >
    > ... As a recap, my overall goal here
    > is to make it so that you can examine a finished plan, figure out what
    > decisions the planner made, and then somehow get the planner to make
    > those same decisions over again in a future planning cycle.
    
    I am feeling that this is similar with Oracle's outline feature, where
    the final plan is examined and then a series of hints are stored and
    then during the replanning of the the same query, these hints will be
    applied to planner.  If one of the hints is not appliable any more, like
    the index is unusable, it is just ignored.
    
    > This list of elided nodes is stored in the PlannedStmt
    
    I did a quick check on the attached patches and I can see some more
    information is added into PlannedStmt. then my question are the
    PlannedStmt is not avaiable during the future planning cycle, then how
    does these information would be helpful on the feture planning? and I'm
    strange that there are little changes on the optimizer part. Does this
    patchset just a preparation work for reconstructing a same plan in
    future? 
    
    > outside of the portion of the tree that actually gets executed, so
    > that code that is doing plan tree inspection can look at it but
    > execution doesn't get any slower.
    
    Thank you for sharing this!
    
    -- 
    Best Regards
    Andy Fan
    
    
    
    
    
  9. Re: plan shape work

    Andy Fan <zhihuifan1213@163.com> — 2025-05-22T08:14:23Z

    Andy Fan <zhihuifan1213@163.com> writes:
    
    >
    >> This list of elided nodes is stored in the PlannedStmt
    >
    > I did a quick check on the attached patches and I can see some more
    > information is added into PlannedStmt. then my question are the
    > PlannedStmt is not avaiable during the future planning cycle, then how
    > does these information would be helpful on the feture planning? and I'm
    > strange that there are little changes on the optimizer part. Does this
    > patchset just a preparation work for reconstructing a same plan in
    > future? 
    
    I'm sure it is a preliminary work for reconstructing a same plan in
    future, sorry for this noise. 
    
    -- 
    Best Regards
    Andy Fan
    
    
    
    
    
  10. Re: plan shape work

    Robert Haas <robertmhaas@gmail.com> — 2025-08-26T14:58:33Z

    On Mon, May 19, 2025 at 2:01 PM Robert Haas <robertmhaas@gmail.com> wrote:
    > A couple of people at pgconf.dev seemed to want to know more about my
    > ongoing plan shape work, so here are the patches I have currently.
    
    Here's an updated patch set. My goal for the September CommitFest is
    to get patches 0001-0004 committed. Of course, if there are too many
    objections or too little review, that might not happen, but that's my
    goal.
    
    This patch set is basically unchanged from the previous patch set,
    except that I've added one new patch. 0007 records information about
    Append node consolidation into the final plan tree. Without this, when
    we build an AppendPath or MergeAppendPath and pull up the subpaths
    from a similar underlying node, we can lose the RTIs from the
    subordinate node, making it very difficult to analyze the plan after
    the fact.
    
    Just to remark a bit further on the structure of the patch set,
    0001-0003 are closely related. The only one I really need committed in
    order to move forward is 0001, but I think the others are a good idea.
    There is probably room for some bikeshedding on the output produced by
    0002. Then after that, 0004 stands alone as an incredibly important
    and foundational patch: without it, there's no way to know what the
    name of a subplan will be until after it's already been planned. I am
    fairly confident in the approach that I've taken here, but it does
    cause user-visible changes in EXPLAIN output about which people might
    conceivably have strong opinions. Getting agreement either on what
    I've done here or some variant of the approach is essential for me to
    be able to move forward. Then, 0005-0007 all have to do with
    preserving in the final plan various details that today would be
    discarded at the end of planning. While I'm happy to have comments on
    these now, I'm still not completely confident that I've found all
    issues in this area or handled them perfectly; hence, I'm not in a
    hurry to move forward with those just yet.
    
    -- 
    Robert Haas
    EDB: http://www.enterprisedb.com
    
  11. Re: plan shape work

    Bruce Momjian <bruce@momjian.us> — 2025-08-26T20:16:34Z

    On Tue, Aug 26, 2025 at 10:58:33AM -0400, Robert Haas wrote:
    > During planning, there is one range table per subquery; at the end if
    > planning, those separate range tables are flattened into a single
    > range table. Prior to this change, it was impractical for code
    > examining the final plan to understand which parts of the flattened
    > range table came from which subquery's range table.
    > 
    > If the only consumer of the final plan is the executor, that is
    > completely fine. However, if some code wants to examine the final
    > plan, or what happens when we execute it, and extract information from
    > it that be used in future planning cycles, it's inconvenient.
    
    I am very interested in how plans can be used for future planning.
    
    -- 
      Bruce Momjian  <bruce@momjian.us>        https://momjian.us
      EDB                                      https://enterprisedb.com
    
      Do not let urgent matters crowd out time for investment in the future.
    
    
    
    
  12. Re: plan shape work

    Andrei Lepikhov <lepihov@gmail.com> — 2025-08-27T09:41:00Z

    On 19/5/2025 20:01, Robert Haas wrote:
    > Hope you find this interesting. If you do, let me know what you think.Thanks for looking in this direction!
    
    Since 2017, we designed features that should 'memorise' the experience 
    of previous executions, checking the PlanState tree and instrumentation 
    after execution.
    The first dumb prototype you should know - AQO. The following, more 
    robust code is 'replan' [1]. These two features utilise the core patch, 
    and I hope this patch can be removed after a few adjustments to the core.
    In fact, 'replan' is used to stop execution in the middle (if the 
    time/memory usage limit is achieved), pass through the state, collect 
    valuable data and execute again. That's why using data for the 
    subsequent execution needs a matching query tree, and that data may be 
    irrelevant for a different set of constants.
    
    The lessons learned during design and after some usage in real life:
    1. Subplans should be uniquely identified during the planning. Like you 
    mentioned, two similar subplans on the same query level may be executed 
    differently and provide different row numbers to the 'knowledge base'. I 
    used the subquery_planner hack that generated a unique ID for each subplan.
    2. Each node should be identified - I used a kind of signature at each 
    RelOptInfo node - just a hash generated by restrictinfos and underlying 
    signatures. This approach needs a create_plan hook to copy the signature 
    to the Plan nodes.
    3. A great source of fluctuations is 'never executed' nodes - because 
    depending on the constant, the subtree may never be executed or produce 
    tons of tuples - I resolved it by just ignoring 'never executed' 
    results, no in-core changes needed.
    4. A tree of paths may implement a single RelOptInfo. I have saved the 
    signature at the top of the Plan node for the corresponding RelOptInfo 
    and have never encountered any problems yet. It limits the use of 
    gathered data on cardinality/group number/peak memory consumption 
    somewhat, but not significantly.
    
    I think the node extension fields and hooks, debated in the thread [2], 
    may be enough to enable extensions to implement such a feature.
    
    What's more, I personally prefer to have a hook that allows extension 
    checking of a condition during execution - it may be done inside the 
    ExecProcNode() by calling the node-specific hook, which an extension may 
    initialise in the PlanState structure before the start of execution.
    Additionally, it would be beneficial to have a hook for error processing 
    at the top of the portal execution code - this is a key point for 
    managing query resources. If we need to stop and re-execute the query, 
    it is the only place where we can release all the resources assigned.
    One more helpful thing - an option to postpone receiver sending data out 
    of the instance-side, even result headings. We may want to decide on the 
    query plan after some tuples are produced, and if something has already 
    been sent to the user, we can't just stop and rebuild the plan.
    
    [1] https://postgrespro.com/docs/enterprise/16/realtime-query-replanning
    [2] 
    https://www.postgresql.org/message-id/flat/CA%2BTgmoYxfg90rw13%2BJcYwn4dwSC%2Bagw7o8-A%2BfA3M0fh96pg8w%40mail.gmail.com
    
    -- 
    regards, Andrei Lepikhov
    
    
    
    
  13. Re: plan shape work

    Alexandra Wang <alexandra.wang.oss@gmail.com> — 2025-08-28T17:22:06Z

    Hi Robert,
    
    On Tue, Aug 26, 2025 at 7:59 AM Robert Haas <robertmhaas@gmail.com> wrote:
    
    > On Mon, May 19, 2025 at 2:01 PM Robert Haas <robertmhaas@gmail.com> wrote:
    > > A couple of people at pgconf.dev seemed to want to know more about my
    > > ongoing plan shape work, so here are the patches I have currently.
    >
    > Here's an updated patch set. My goal for the September CommitFest is
    > to get patches 0001-0004 committed. Of course, if there are too many
    > objections or too little review, that might not happen, but that's my
    > goal.
    >
    
    Thanks for the patches!
    
    I don’t know enough about the history in this area to object to your
    approach or suggest an alternative design. That said, I’ve reviewed
    patches 0001-0004, and as individual patches they make sense to me.
    
    Below are some more detailed comments, which would only be relevant if
    you decide to proceed in this direction.
    
    0002:
    
                             QUERY PLAN
     ----------------------------------------------------------
      Nested Loop Left Join
    -   Output: t1.i, (1), t2.i2, i3, t4.i4
    +   Output: t1.i, (1), t2.i2, t3.i3, t4.i4
        ->  Nested Loop Left Join
    -         Output: t1.i, t2.i2, (1), i3
    +         Output: t1.i, t2.i2, (1), t3.i3
              Join Filter: false
              ->  Hash Left Join
                    Output: t1.i, t2.i2, (1)
    
    These plan changes introduced by 0002, which adds schema qualifiers,
    make me very happy. I think it’s a nice improvement on its own.
    
    In reply to 0002's commit message:
    > XXX. I have broken this out as a separate commit for now; however,
    > it could be merged with the commit to add 'relids' to 'Result'; or
    > the patch series could even be rejiggered to present this as the
    > primary benefit of that change, leaving the EXPLAIN changes as a
    > secondary benefit, instead of the current organization, which does
    > the reverse.
    
    I’m fine with the current organization. I agree that if we just
    compare the EXPLAIN changes in 0001, which add additional “Replaces:”
    information for the simple Result node, with the EXPLAIN changes in
    0002, the changes in 0002 are arguably more attractive. However, I
    think the EXPLAIN changes in 0001 are a more direct reflection of what
    the rest of 0001 is trying to achieve: keeping track of the RTIs a
    Result node is scanning. The changes in 0002 feel more like a side
    benefit.
    
    With that said, this is just my personal preference. If other
    reviewers feel differently, I won’t object.
    
    0003:
    
    In get_scanned_rtindexes():
    +       case T_NestLoop:
    +           {
    +               Bitmapset  *outer_scanrelids;
    +               Bitmapset  *inner_scanrelids;
    +               Bitmapset  *combined_scanrelids;
    +
    +               outer_scanrelids =
    +                   get_scanned_rtindexes(root, plan->lefttree);
    +               inner_scanrelids =
    +                   get_scanned_rtindexes(root, plan->righttree);
    +               combined_scanrelids =
    +                   bms_union(outer_scanrelids, inner_scanrelids);
    +               inner_scanrelids = remove_join_rtis(root, inner_scanrelids);
    +
    +               return remove_join_rtis(root, combined_scanrelids);
    +               break;
    +           }
    
    It looks like there is an redundant assignment to inner_scanrelids:
    +               inner_scanrelids = remove_join_rtis(root, inner_scanrelids);
    
    0004:
    
    There is a compiler warning reported in the CommitFest build:
    https://cirrus-ci.com/task/6248981396717568
    
    [23:03:57.811] subselect.c: In function ‘sublinktype_to_string’:
    [23:03:57.811] subselect.c:3232:1: error: control reaches end of non-void
    function [-Werror=return-type]
    [23:03:57.811] 3232 | }
    [23:03:57.811] | ^
    [23:03:57.811] cc1: all warnings being treated as errors
    [23:03:57.812] make[4]: *** [<builtin>: subselect.o] Error 1
    [23:03:57.812] make[3]: *** [../../../src/backend/common.mk:37:
    plan-recursive] Error 2
    [23:03:57.812] make[2]: *** [common.mk:37: optimizer-recursive] Error 2
    [23:03:57.812] make[2]: *** Waiting for unfinished jobs....
    [23:04:05.513] make[1]: *** [Makefile:42: all-backend-recurse] Error 2
    [23:04:05.514] make: *** [GNUmakefile:21: world-bin-src-recurse] Error 2
    
    You might want to add a return to get rid of the warning.
    
    Still in 0004:
    --- a/src/backend/optimizer/plan/planagg.c
    +++ b/src/backend/optimizer/plan/planagg.c
    @@ -339,6 +340,8 @@ build_minmax_path(PlannerInfo *root, MinMaxAggInfo
    *mminfo,
        memcpy(subroot, root, sizeof(PlannerInfo));
        subroot->query_level++;
        subroot->parent_root = root;
    +   subroot->plan_name = choose_plan_name(root->glob, "minmax", true);
    +
        /* reset subplan-related stuff */
        subroot->plan_params = NIL;
        subroot->outer_params = NULL;
    @@ -359,6 +362,9 @@ build_minmax_path(PlannerInfo *root, MinMaxAggInfo
    *mminfo,
        /* and we haven't created PlaceHolderInfos, either */
        Assert(subroot->placeholder_list == NIL);
    
    +   /* Add this to list of all PlannerInfo objects. */
    +   root->glob->allroots = lappend(root->glob->allroots, root);
    +
    
    In the last diff, it should add "subroot" instead of "root" to the
    list of all PlannerInfos. Currently, if there are multiple minmax
    expressions, we end up with the following plan showing duplicate
    names:
    
    test=# explain (costs off) SELECT MIN(value), MAX(value) FROM test_minmax;
                                               QUERY PLAN
    -------------------------------------------------------------------------------------------------
     Result
       Replaces: Aggregate
       InitPlan minmax_1
         ->  Limit
               ->  Index Only Scan using test_minmax_value_idx on test_minmax
                     Index Cond: (value IS NOT NULL)
       InitPlan minmax_1
         ->  Limit
               ->  Index Only Scan Backward using test_minmax_value_idx on
    test_minmax test_minmax_1
                     Index Cond: (value IS NOT NULL)
    (10 rows)
    
    Best,
    Alex
    
  14. Re: plan shape work

    Robert Haas <robertmhaas@gmail.com> — 2025-09-02T20:07:17Z

    On Thu, Aug 28, 2025 at 1:22 PM Alexandra Wang
    <alexandra.wang.oss@gmail.com> wrote:
    > Thanks for the patches!
    
    Thanks for the review. Responding just briefly to avoid quoting too much text:
    
    - I'm glad to hear that you like 0002 and consider it an improvement
    independent of what follows.
    - I'm glad to hear that you're OK with the current split between 0001 and 0002.
    - I would like opinions on those topics from more people.
    - I have attempted to fix all of the other mistakes that you pointed
    out in the attached v3, which is also rebased.
    
    -- 
    Robert Haas
    EDB: http://www.enterprisedb.com
    
  15. Re: plan shape work

    Richard Guo <guofenglinux@gmail.com> — 2025-09-04T08:21:10Z

    On Wed, Sep 3, 2025 at 5:07 AM Robert Haas <robertmhaas@gmail.com> wrote:
    > Thanks for the review. Responding just briefly to avoid quoting too much text:
    >
    > - I'm glad to hear that you like 0002 and consider it an improvement
    > independent of what follows.
    > - I'm glad to hear that you're OK with the current split between 0001 and 0002.
    > - I would like opinions on those topics from more people.
    > - I have attempted to fix all of the other mistakes that you pointed
    > out in the attached v3, which is also rebased.
    
    I've reviewed 0001 to 0003, and they all make sense to me.  The
    changes to the EXPLAIN output in 0001 and 0002 are very nice
    improvements.
    
    I found some issues with 0003 though.  It seems get_scanned_rtindexes
    is intended to return RTI sets with outer join relids excluded.  For
    some node types, such as Append and MergeAppend, it fails to do so,
    which can cause the assertion in assert_join_preserves_scan_rtis to
    fail.  For example:
    
    create table p (a int, b int) partition by range(a);
    create table p1 partition of p for values from (0) to (10);
    create table p2 partition of p for values from (10) to (20);
    
    set enable_partitionwise_join to on;
    
    explain (costs off)
    select * from p t1
        left join p t2 on t1.a = t2.a
        left join p t3 on t2.b = t3.b;
    server closed the connection unexpectedly
    
    Besides, to exclude outer join relids, it iterates over the RTI sets,
    checks each RTE for type RTE_JOIN, and bms_del_member it if found (cf.
    remove_join_rtis).  I think a simpler approach would be to leverage
    PlannerInfo.outer_join_rels:
    
        scanrelids = bms_difference(scanrelids, root->outer_join_rels);
    
    Therefore, I suggest that we don't try to remove the join RTIs in
    get_scanned_rtindexes.  Instead, we do that in
    assert_join_preserves_scan_rtis -- before comparing the RTIs from the
    outer and inner subplans with the join's RTIs -- by leveraging
    PlannerInfo.outer_join_rels.  And remove_join_rtis can be retired.
    
    - Richard
    
    
    
    
  16. Re: plan shape work

    Robert Haas <robertmhaas@gmail.com> — 2025-09-05T15:51:07Z

    On Thu, Sep 4, 2025 at 4:21 AM Richard Guo <guofenglinux@gmail.com> wrote:
    > I found some issues with 0003 though.  It seems get_scanned_rtindexes
    > is intended to return RTI sets with outer join relids excluded.  For
    > some node types, such as Append and MergeAppend, it fails to do so,
    > which can cause the assertion in assert_join_preserves_scan_rtis to
    > fail.  For example:
    >
    > create table p (a int, b int) partition by range(a);
    > create table p1 partition of p for values from (0) to (10);
    > create table p2 partition of p for values from (10) to (20);
    >
    > set enable_partitionwise_join to on;
    >
    > explain (costs off)
    > select * from p t1
    >     left join p t2 on t1.a = t2.a
    >     left join p t3 on t2.b = t3.b;
    > server closed the connection unexpectedly
    
    Ouch. Good catch.
    
    > Besides, to exclude outer join relids, it iterates over the RTI sets,
    > checks each RTE for type RTE_JOIN, and bms_del_member it if found (cf.
    > remove_join_rtis).  I think a simpler approach would be to leverage
    > PlannerInfo.outer_join_rels:
    >
    >     scanrelids = bms_difference(scanrelids, root->outer_join_rels);
    
    I was not aware of outer_join_rels, so thank you for pointing it out.
    However, consider this query:
    
    select 1 from pg_class a inner join pg_class b on a.relfilenode = b.relfilenode;
    
    Here, we end up with a three-item range table: one for a, one for b,
    and one for the join. But the join is not an outer join, and does not
    appear in root->outer_join_rels. Therefore, I'm not sure we can rely
    on outer_join_rels in this scenario.
    
    -- 
    Robert Haas
    EDB: http://www.enterprisedb.com
    
    
    
    
  17. Re: plan shape work

    Tom Lane <tgl@sss.pgh.pa.us> — 2025-09-05T16:00:55Z

    Robert Haas <robertmhaas@gmail.com> writes:
    > I was not aware of outer_join_rels, so thank you for pointing it out.
    > However, consider this query:
    
    > select 1 from pg_class a inner join pg_class b on a.relfilenode = b.relfilenode;
    
    > Here, we end up with a three-item range table: one for a, one for b,
    > and one for the join. But the join is not an outer join, and does not
    > appear in root->outer_join_rels. Therefore, I'm not sure we can rely
    > on outer_join_rels in this scenario.
    
    Plain (not-outer) joins will never be included in a relid set in the
    first place.
    
    			regards, tom lane
    
    
    
    
  18. Re: plan shape work

    Robert Haas <robertmhaas@gmail.com> — 2025-09-05T16:40:22Z

    On Fri, Sep 5, 2025 at 12:00 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    > Plain (not-outer) joins will never be included in a relid set in the
    > first place.
    
    Ah, well then Richard's idea might work! Let me try it and see what happens...
    
    Thanks,
    
    -- 
    Robert Haas
    EDB: http://www.enterprisedb.com
    
    
    
    
  19. Re: plan shape work

    Robert Haas <robertmhaas@gmail.com> — 2025-09-05T17:38:43Z

    On Fri, Sep 5, 2025 at 12:40 PM Robert Haas <robertmhaas@gmail.com> wrote:
    > Ah, well then Richard's idea might work! Let me try it and see what happens...
    
    Here's the result. Seem to work OK.
    
    -- 
    Robert Haas
    EDB: http://www.enterprisedb.com
    
  20. Re: plan shape work

    Dilip Kumar <dilipbalaut@gmail.com> — 2025-09-08T05:05:32Z

    On Wed, May 21, 2025 at 7:29 PM Robert Haas <robertmhaas@gmail.com> wrote:
    >
    > On Tue, May 20, 2025 at 2:45 PM Tomas Vondra <tomas@vondra.me> wrote:
    
    > I have a sense - possibly an incorrect one - that the core of the
    > problem here is that the planner considers lots of very similar
    > alternatives. A hypothetical feature that showed the second-cheapest
    > plan would be all but useless, because the second-cheapest plan would
    > just be a very minor variation of the cheapest plan in almost all
    > cases. One idea that crossed my mind was to display information in
    > EXPLAIN about what would have happened if we'd done something really
    > different. For instance, suppose that at a certain level of the plan
    > tree we actually chose a merge join, but we also show the estimated
    > cost of the cheapest hash join (if any) and the cheapest nested loop
    > (if any) that we considered at that level. The user might be able to
    > draw useful conclusions based on whether those numbers were altogether
    > absent (i.e. that join type was not viable at all) or whether the cost
    > was a little higher or a lot higher than that of the path actually
    > chosen. For scans, you could list which indexes were believed to be
    > usable and perhaps what the cost would have been for the cheapest one
    > not actually selected; and what the cost of a sequential scan would
    > have been if you hadn't picked one.
    >
    > I'm not sure how useful this would be, so the whole idea might
    > actually suck, or maybe it's sort of the right idea but needs a bunch
    > of refinement to really be useful. I don't have a better idea right
    > now, though.
    
    Having detailed information on the costs of alternative join
    methods/scan method, even when a different method is chosen, would be
    valuable information. For example, if a merge join is selected for
    tables t1 and t2 in a subquery, showing the estimated costs for both a
    hash join and a nested loop join would provide a more complete picture
    of the planner's decision-making process.
    
    And I believe, this information would be particularly useful if the
    cost of a non-selected plan, such as a nested loop join, is very close
    to the cost of the chosen merge join. In such cases, a database
    administrator or query optimizer could use this insight to manually
    override the planner's choice and opt for the nested loop join for
    specific tables in a subquery. This level of detail would empower
    users to fine-tune query performance and explore alternative execution
    strategies.
    
    IIUC, one of the goal of this work is where operator can say I want to
    use this scan method while scanning a particular table in a particular
    subquery, that means if the planner can give the information about non
    selected paths as well then it would be really helpful in making this
    process more smooth otherwise without much information on what path
    got rejected its very hard to provide hints.
    
    -- 
    Regards,
    Dilip Kumar
    Google
    
    
    
    
  21. Re: plan shape work

    Richard Guo <guofenglinux@gmail.com> — 2025-09-08T09:51:33Z

    On Sat, Sep 6, 2025 at 1:00 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    > Robert Haas <robertmhaas@gmail.com> writes:
    > > I was not aware of outer_join_rels, so thank you for pointing it out.
    > > However, consider this query:
    >
    > > select 1 from pg_class a inner join pg_class b on a.relfilenode = b.relfilenode;
    >
    > > Here, we end up with a three-item range table: one for a, one for b,
    > > and one for the join. But the join is not an outer join, and does not
    > > appear in root->outer_join_rels. Therefore, I'm not sure we can rely
    > > on outer_join_rels in this scenario.
    
    > Plain (not-outer) joins will never be included in a relid set in the
    > first place.
    
    Exactly.  Non-outer joins wouldn't cause Vars to become null, so we
    never include them in the joinrel's relids.
    
    BTW, I'm wondering if we can take outer join relids into account in
    assert_join_preserves_scan_rtis(), which could make the check more
    useful.  A joinrel's relids consists of three parts: the outer plan's
    relids, the inner plan's relids, and the relids of outer joins that
    are calculated at this join.  We already have the first two.  If we
    can find a way to determine the third, we'd be able to assert that:
    
     outer_relids U inner_relids U outerjoin_relids == joinrel->relids
    
    Determining the third part can be tricky though, especially due to
    outer-join identity 3: the "outerjoin_relids" of one outer join might
    include more than one outer join relids.  But I think this is till
    doable.
    
    (This may not be useful for your overall goal in this patchset, so
    feel free to ignore it if it's not of interest.)
    
    - Richard
    
    
    
    
  22. Re: plan shape work

    Robert Haas <robertmhaas@gmail.com> — 2025-09-08T13:56:18Z

    On Mon, Sep 8, 2025 at 5:51 AM Richard Guo <guofenglinux@gmail.com> wrote:
    > > Plain (not-outer) joins will never be included in a relid set in the
    > > first place.
    >
    > Exactly.  Non-outer joins wouldn't cause Vars to become null, so we
    > never include them in the joinrel's relids.
    
    OK. I didn't understand what the rules were there.
    
    > BTW, I'm wondering if we can take outer join relids into account in
    > assert_join_preserves_scan_rtis(), which could make the check more
    > useful.  A joinrel's relids consists of three parts: the outer plan's
    > relids, the inner plan's relids, and the relids of outer joins that
    > are calculated at this join.  We already have the first two.  If we
    > can find a way to determine the third, we'd be able to assert that:
    >
    >  outer_relids U inner_relids U outerjoin_relids == joinrel->relids
    >
    > Determining the third part can be tricky though, especially due to
    > outer-join identity 3: the "outerjoin_relids" of one outer join might
    > include more than one outer join relids.  But I think this is till
    > doable.
    >
    > (This may not be useful for your overall goal in this patchset, so
    > feel free to ignore it if it's not of interest.)
    
    I don't mind doing the work if there's a reasonable and useful way of
    accomplishing the goal. However, one concern I have is that it seems
    pointless if we're computing outerjoin_relids by essentially redoing
    the same computation that set the join's relid set in the first place.
    In that case, the cross-check has no real probative value. All it
    would be demonstrating is that if you calculate outerjoin_relids twice
    using essentially the same methodology, you get the same answer. That
    seems like a waste of code to me. If there's a way to calculate
    outerjoin_relids using a different methodology than what we used when
    populating the joinrelids, that would be interesting. It would be
    similar to how the existing code recomputes the outer and inner relids
    in a way that can potentially find issues that otherwise would not
    have been spotted (such as the Result node case).
    
    Do you have a proposal?
    
    -- 
    Robert Haas
    EDB: http://www.enterprisedb.com
    
    
    
    
  23. Re: plan shape work

    Richard Guo <guofenglinux@gmail.com> — 2025-09-09T02:22:40Z

    On Mon, Sep 8, 2025 at 10:56 PM Robert Haas <robertmhaas@gmail.com> wrote:
    > On Mon, Sep 8, 2025 at 5:51 AM Richard Guo <guofenglinux@gmail.com> wrote:
    > > BTW, I'm wondering if we can take outer join relids into account in
    > > assert_join_preserves_scan_rtis(), which could make the check more
    > > useful.  A joinrel's relids consists of three parts: the outer plan's
    > > relids, the inner plan's relids, and the relids of outer joins that
    > > are calculated at this join.  We already have the first two.  If we
    > > can find a way to determine the third, we'd be able to assert that:
    > >
    > >  outer_relids U inner_relids U outerjoin_relids == joinrel->relids
    > >
    > > Determining the third part can be tricky though, especially due to
    > > outer-join identity 3: the "outerjoin_relids" of one outer join might
    > > include more than one outer join relids.  But I think this is till
    > > doable.
    > >
    > > (This may not be useful for your overall goal in this patchset, so
    > > feel free to ignore it if it's not of interest.)
    
    > I don't mind doing the work if there's a reasonable and useful way of
    > accomplishing the goal. However, one concern I have is that it seems
    > pointless if we're computing outerjoin_relids by essentially redoing
    > the same computation that set the join's relid set in the first place.
    > In that case, the cross-check has no real probative value. All it
    > would be demonstrating is that if you calculate outerjoin_relids twice
    > using essentially the same methodology, you get the same answer. That
    > seems like a waste of code to me. If there's a way to calculate
    > outerjoin_relids using a different methodology than what we used when
    > populating the joinrelids, that would be interesting. It would be
    > similar to how the existing code recomputes the outer and inner relids
    > in a way that can potentially find issues that otherwise would not
    > have been spotted (such as the Result node case).
    >
    > Do you have a proposal?
    
    One idea (not fully thought through) is that we record the calculated
    outerjoin_relids for each outer join in its JoinPaths.  (We cannot
    store this in the joinrel's RelOptInfo because it varies depending on
    the join sequence we use.)  And then we could use the recorded
    outerjoin_relids for the assertion here:
    
     outer_relids U inner_relids U joinpath->ojrelids == joinrel->relids
    
    The value of this approach, IMO, is that it could help verify the
    correctness of how we compute outer joins' outerjoin_relids, ie. the
    logic in add_outer_joins_to_relids(), which is quite complex due to
    outer-join identity 3.  If we miscalculate the outerjoin_relids for
    one certain outer join, this assertion could catch it effectively.
    
    However, this shouldn't be a requirement for committing your patches.
    Maybe we should discuss it in a separate thread.
    
    - Richard
    
    
    
    
  24. Re: plan shape work

    Robert Haas <robertmhaas@gmail.com> — 2025-09-09T13:17:51Z

    First of all, as an administrative note, since both you and Alex seem
    to like 0001 and 0002 and no suggestions for improvement have been
    offered, I plan to commit those soon unless there are objections or
    additional review comments. I will likely do the same for 0003 as
    well, pending the results of the current conversation, but maybe not
    quite as quickly. I believe that 0004 still needs more review, and its
    effects will be more user-visible than 0001-0003, so I don't plan to
    move forward with that immediately, but I invite review comments.
    
    On Mon, Sep 8, 2025 at 10:22 PM Richard Guo <guofenglinux@gmail.com> wrote:
    > One idea (not fully thought through) is that we record the calculated
    > outerjoin_relids for each outer join in its JoinPaths.  (We cannot
    > store this in the joinrel's RelOptInfo because it varies depending on
    > the join sequence we use.)  And then we could use the recorded
    > outerjoin_relids for the assertion here:
    >
    >  outer_relids U inner_relids U joinpath->ojrelids == joinrel->relids
    >
    > The value of this approach, IMO, is that it could help verify the
    > correctness of how we compute outer joins' outerjoin_relids, ie. the
    > logic in add_outer_joins_to_relids(), which is quite complex due to
    > outer-join identity 3.  If we miscalculate the outerjoin_relids for
    > one certain outer join, this assertion could catch it effectively.
    >
    > However, this shouldn't be a requirement for committing your patches.
    > Maybe we should discuss it in a separate thread.
    
    I'm OK with moving the conversation to a separate thread, but can you
    clarify from where you believe that joinpath->ojrelids would be
    populated? It seems to me that the assertion couldn't pass unless
    every join path ended up with the same value of joinpath->ojrelids.
    That's because, for a given joinrel, there is only one value of
    joinrel->relids; and all of those RTIs must be either RTE_JOIN or
    non-RTE_JOIN. The non-RTE_JOIN RTIs will be found only in outer_relids
    U inner_relids, and the RTE_JOIN RTIs will be found only in
    joinpath->ojrelids. Therefore, it seems impossible for the assertion
    to pass unless the value is the same for all join paths. If that is
    correct, then I don't think we should store the value in the join
    path. Instead, if we want to cross-check it, we could calculate the
    value that would have been stored into joinpath->ojrelids at whatever
    earlier stage we had the information available to do so, and it should
    be equal to bms_intersect(joinrel->relids, root->outer_join_rels),
    which I think would have to be already initialized before we can think
    of building a join path.
    
    Please feel free to correct me if I am misunderstanding.
    
    Thanks,
    
    -- 
    Robert Haas
    EDB: http://www.enterprisedb.com
    
    
    
    
  25. Re: plan shape work

    Tom Lane <tgl@sss.pgh.pa.us> — 2025-09-09T15:12:51Z

    Robert Haas <robertmhaas@gmail.com> writes:
    > First of all, as an administrative note, since both you and Alex seem
    > to like 0001 and 0002 and no suggestions for improvement have been
    > offered, I plan to commit those soon unless there are objections or
    > additional review comments.
    
    FWIW, I don't love the details of 0001: I think it's going in the
    right direction, but could use more polish.  In particular, you've
    defined Result.relids in a way that seems ambiguous.  There are two
    different meanings for NULL, and one of them is being interpreted as
    an "Aggregate" without a lot of principle behind that.  I think you
    need to store more data in order to make that less of a hack.
    
    So far as I can see from the regression-test changes, the "Aggregate"
    case only occurs when we've replaced a aggregate calculation with
    a MinMaxAggPath representing an index endpoint probe.  What I would
    like to see come out when that's the case is something like
    
    	Replaces: MIN or MAX aggregate over scan on tab1
    
    This means first that the Result.relids needs to include the relid of
    the table being scanned by the indexscan, and second that EXPLAIN will
    then need some other cue to help it distinguish this case from a
    case where it should just say "Replaces: Scan on tab1".  It's possible
    that you could intuit that by examining the initplans attached to the
    Result node, but I think what would make a ton more sense is to add
    an enum field to Result that explicitly identifies why it's there.
    We've got at least "apply one-time filter to subplan", "apply per-row
    gating filter to subplan", "represent a relation proven empty", and
    "place-hold for a MinMaxAgg InitPlan".  Tracing back from all the
    calls to make_result() might identify some more cases.  I'm not
    arguing that the user-visible EXPLAIN output should distinguish
    all of these (but probably overexplain should).  I think though
    that it'd be useful to have this recorded in the plan tree.
    
    > On Mon, Sep 8, 2025 at 10:22 PM Richard Guo <guofenglinux@gmail.com> wrote:
    >> One idea (not fully thought through) is that we record the calculated
    >> outerjoin_relids for each outer join in its JoinPaths.
    
    > I'm OK with moving the conversation to a separate thread, but can you
    > clarify from where you believe that joinpath->ojrelids would be
    > populated? It seems to me that the assertion couldn't pass unless
    > every join path ended up with the same value of joinpath->ojrelids.
    
    What I have been intending to suggest is that you should add a field
    to join plan nodes that is zero if an inner join, but the relid of
    the outer join RTE if it's an outer join.  This is uniquely defined
    because any given join node implements a specific outer join, even
    though the planner's relids for the completed join are (intentionally)
    ambiguous about the order in which multiple joins were done.
    
    The reason I wanted to do this is that I think it'd become possible to
    tighten the assertions in setrefs.c about whether Vars' varnullingrels
    are correct, so that we can always assert that those relid sets are
    exactly thus-and-so and not have to settle for superset/subset tests.
    I've not worked through the details to be entirely sure that this is
    possible, so I didn't bring it up before.  But maybe labeling join
    nodes this way would also address Richard's concern.  In any case
    it fits into your overall goal of decorating plan trees with more
    information.
    
    			regards, tom lane
    
    
    
    
  26. Re: plan shape work

    Robert Haas <robertmhaas@gmail.com> — 2025-09-09T16:00:26Z

    On Tue, Sep 9, 2025 at 11:12 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    > I think what would make a ton more sense is to add
    > an enum field to Result that explicitly identifies why it's there.
    > We've got at least "apply one-time filter to subplan", "apply per-row
    > gating filter to subplan", "represent a relation proven empty", and
    > "place-hold for a MinMaxAgg InitPlan".
    
    Thanks, I'll look into this.
    
    > What I have been intending to suggest is that you should add a field
    > to join plan nodes that is zero if an inner join, but the relid of
    > the outer join RTE if it's an outer join.  This is uniquely defined
    > because any given join node implements a specific outer join, even
    > though the planner's relids for the completed join are (intentionally)
    > ambiguous about the order in which multiple joins were done.
    >
    > The reason I wanted to do this is that I think it'd become possible to
    > tighten the assertions in setrefs.c about whether Vars' varnullingrels
    > are correct, so that we can always assert that those relid sets are
    > exactly thus-and-so and not have to settle for superset/subset tests.
    > I've not worked through the details to be entirely sure that this is
    > possible, so I didn't bring it up before.  But maybe labeling join
    > nodes this way would also address Richard's concern.  In any case
    > it fits into your overall goal of decorating plan trees with more
    > information.
    
    Oh, that seems quite elegant! Then we could reasonably expect to
    re-find all the relevant RTIs and no others with an appropriate tree
    traversal.
    
    -- 
    Robert Haas
    EDB: http://www.enterprisedb.com
    
    
    
    
  27. Re: plan shape work

    Robert Haas <robertmhaas@gmail.com> — 2025-09-09T19:43:12Z

    On Tue, Sep 9, 2025 at 12:00 PM Robert Haas <robertmhaas@gmail.com> wrote:
    > On Tue, Sep 9, 2025 at 11:12 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    > > I think what would make a ton more sense is to add
    > > an enum field to Result that explicitly identifies why it's there.
    > > We've got at least "apply one-time filter to subplan", "apply per-row
    > > gating filter to subplan", "represent a relation proven empty", and
    > > "place-hold for a MinMaxAgg InitPlan".
    >
    > Thanks, I'll look into this.
    
    Just a random thought, but another idea that crossed my mind here at
    one point was to actually split the Result node up into Result nodes
    with subplans and Result nodes without subplans. We could call the
    version with a subplan "Project" and the version without a subplan
    "Result", for example. This seems a little silly because both variants
    would need to be able to handle resconstantqual, or alternatively we'd
    have to be OK with getting "Project" on top of "Result" in some cases
    where a single "Result" node currently does both jobs. On the other
    hand, only Project needs a subplan, and only Result needs relids.
    
    -- 
    Robert Haas
    EDB: http://www.enterprisedb.com
    
    
    
    
  28. Re: plan shape work

    Tom Lane <tgl@sss.pgh.pa.us> — 2025-09-09T20:37:38Z

    Robert Haas <robertmhaas@gmail.com> writes:
    > Just a random thought, but another idea that crossed my mind here at
    > one point was to actually split the Result node up into Result nodes
    > with subplans and Result nodes without subplans. We could call the
    > version with a subplan "Project" and the version without a subplan
    > "Result", for example. This seems a little silly because both variants
    > would need to be able to handle resconstantqual, or alternatively we'd
    > have to be OK with getting "Project" on top of "Result" in some cases
    > where a single "Result" node currently does both jobs. On the other
    > hand, only Project needs a subplan, and only Result needs relids.
    
    Maybe.  I kinda feel that actually redesigning plan trees is outside
    the scope of this project, but maybe we should consider it.
    
    I think though that you might be underestimating the amount of
    commonality.  For instance, we might need a projecting node on top of
    a subquery-in-FROM subplan, but that node would still have to bear
    a relid --- the relid of the RTE_SUBQUERY RTE in the upper query,
    not that of any RTE in the subquery, but nonetheless it's a relid.
    
    Another variant of this is that that RTE_SUBQUERY relid would normally
    be borne by a SubqueryScan plan node, but if we elide the SubqueryScan
    because it isn't doing anything useful, where shall we put that relid?
    If we don't store it anywhere then we will not be able to reconstruct
    correct join relids for the upper plan level.
    
    			regards, tom lane
    
    
    
    
  29. Re: plan shape work

    Richard Guo <guofenglinux@gmail.com> — 2025-09-10T07:16:29Z

    On Tue, Sep 9, 2025 at 10:18 PM Robert Haas <robertmhaas@gmail.com> wrote:
    > On Mon, Sep 8, 2025 at 10:22 PM Richard Guo <guofenglinux@gmail.com> wrote:
    > > One idea (not fully thought through) is that we record the calculated
    > > outerjoin_relids for each outer join in its JoinPaths.  (We cannot
    > > store this in the joinrel's RelOptInfo because it varies depending on
    > > the join sequence we use.)  And then we could use the recorded
    > > outerjoin_relids for the assertion here:
    > >
    > >  outer_relids U inner_relids U joinpath->ojrelids == joinrel->relids
    
    > I'm OK with moving the conversation to a separate thread, but can you
    > clarify from where you believe that joinpath->ojrelids would be
    > populated? It seems to me that the assertion couldn't pass unless
    > every join path ended up with the same value of joinpath->ojrelids.
    > That's because, for a given joinrel, there is only one value of
    > joinrel->relids; and all of those RTIs must be either RTE_JOIN or
    > non-RTE_JOIN. The non-RTE_JOIN RTIs will be found only in outer_relids
    > U inner_relids, and the RTE_JOIN RTIs will be found only in
    > joinpath->ojrelids. Therefore, it seems impossible for the assertion
    > to pass unless the value is the same for all join paths.
    
    Hmm, this isn't quite what I had in mind.  What I was thinking is that
    the outer join relids included in joinrel->relids can also be found
    from its outer or inner.  For example, consider a query like:
    
      (A leftjoin B on (Pab)) leftjoin C on (Pbc)
    
    For the join with joinrel->relids being {1, 2, 3, 4, 5}, {1, 2, 3}
    comes from the outer side, {4} comes from the inner side, and {5} is
    the outer join being calculated at this join.  So the Assert I
    proposed earlier becomes:
    
      {1, 2, 3} U {4} U {5} == {1, 2, 3, 4, 5}
    
    However, if we have transformed it to:
    
      A leftjoin (B leftjoin C on (Pbc)) on (Pab)
    
    For this same join, {1} comes from the outer side, {2, 4} comes from
    the inner side, and {3, 5} are the outer joins being calculated at
    this join.  So the Assert becomes:
    
      {1} U {2, 4} U {3, 5} == {1, 2, 3, 4, 5}
    
    Either way, the assertion should always hold -- if it doesn't, there's
    likely a bug in how we're calculating the relids.
    
    As you can see, the set of outer joins calculated at the same join can
    vary depending on the join order.  What I suggested is to record this
    information in JoinPaths (or maybe also in Join plan nodes so that
    get_scanned_rtindexes can collect it) for the assertion.
    
    - Richard
    
    
    
    
  30. Re: plan shape work

    Robert Haas <robertmhaas@gmail.com> — 2025-09-10T14:11:54Z

    On Wed, Sep 10, 2025 at 3:16 AM Richard Guo <guofenglinux@gmail.com> wrote:
    > Hmm, this isn't quite what I had in mind.  What I was thinking is that
    > the outer join relids included in joinrel->relids can also be found
    > from its outer or inner.  For example, consider a query like:
    >
    >   (A leftjoin B on (Pab)) leftjoin C on (Pbc)
    >
    > For the join with joinrel->relids being {1, 2, 3, 4, 5}, {1, 2, 3}
    > comes from the outer side, {4} comes from the inner side, and {5} is
    > the outer join being calculated at this join.  So the Assert I
    > proposed earlier becomes:
    >
    >   {1, 2, 3} U {4} U {5} == {1, 2, 3, 4, 5}
    
    Makes sense.
    
    > However, if we have transformed it to:
    >
    >   A leftjoin (B leftjoin C on (Pbc)) on (Pab)
    >
    > For this same join, {1} comes from the outer side, {2, 4} comes from
    > the inner side, and {3, 5} are the outer joins being calculated at
    > this join.  So the Assert becomes:
    >
    >   {1} U {2, 4} U {3, 5} == {1, 2, 3, 4, 5}
    
    Hmm. As I understood it, Tom was proposing a single, optional ojrelid
    for each join, with all outer joins having a value and no inner join
    having one. What you are proposing seems to be a very similar concept,
    but as I understand it, you're saying that each join would carry a
    *set* of ojrelids, which might be empty and might contain more than
    one element.
    
    Experimenting with this example, it looks like you're correct and Tom,
    or my understanding of Tom, is incorrect. What I see is that I get a
    structure like this:
    
       {MERGEPATH
       :parent_relids (b 1 2 3 4 5)
       :jpath.outerjoinpath
          {PATH
          :parent_relids (b 1)
       :jpath.innerjoinpath
          {MERGEPATH
          :parent_relids (b 2 4)
          :jpath.outerjoinpath
             {PATH
             :parent_relids (b 2)
             }
          :jpath.innerjoinpath
             {PATH
             :parent_relids (b 4)
             }
          }
       }
    
    So, for the assertion to pass, the more deeply nested merge join would
    need to have ojrelids = {} and the less-deeply nested one would need
    ojrelids={3,5}.
    
    -- 
    Robert Haas
    EDB: http://www.enterprisedb.com
    
    
    
    
  31. Re: plan shape work

    Robert Haas <robertmhaas@gmail.com> — 2025-09-10T17:28:49Z

    On Tue, Sep 9, 2025 at 4:37 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    > Maybe.  I kinda feel that actually redesigning plan trees is outside
    > the scope of this project, but maybe we should consider it.
    
    Yeah, I would rather not go there, all things being equal. We could
    also consider displaying things differently even if the underlying
    node type is the same. ExplainNode already has examples where
    pname/sname are set to something other than the node tag, e.g. we set
    EXPLAIN designation based on (((ModifyTable *) plan)->operation) or
    agg->aggstrategy. I'm not sure if this is the right way to go, but
    it's worth a thought.
    
    > I think though that you might be underestimating the amount of
    > commonality.  For instance, we might need a projecting node on top of
    > a subquery-in-FROM subplan, but that node would still have to bear
    > a relid --- the relid of the RTE_SUBQUERY RTE in the upper query,
    > not that of any RTE in the subquery, but nonetheless it's a relid.
    >
    > Another variant of this is that that RTE_SUBQUERY relid would normally
    > be borne by a SubqueryScan plan node, but if we elide the SubqueryScan
    > because it isn't doing anything useful, where shall we put that relid?
    > If we don't store it anywhere then we will not be able to reconstruct
    > correct join relids for the upper plan level.
    
    I don't quite understand how these two scenarios are different, but I
    have found it critical to distinguish problems that happen at setrefs
    time from problems that happen during main planning. If we're talking
    about feeding information from one planning cycle forward to the next,
    we need be able to look at the plan tree and understand what the
    pre-setrefs state of affairs was, because when the next planning cycle
    happens, the decisions we want to influence are happening pre-setrefs.
    The later patches in this patch series deal with exactly this problem:
    0004 and 0005 make it possible to match up an RTI from the flattened
    range table that pops out of one planning cycle with a specific
    subroot and RTI relative to that subroot during the following cycle;
    and 0006 and 0007 arrange to leave a breadcrumb trail in cases where
    setrefs-time processing deletes plan nodes. The setrefs-time elision
    of SubqueryScan nodes is recorded by the mechanism in 0006.
    
    I went back and studied 0001 some more today in reference to your
    comments about classifying Result nodes. 0001 already loosely
    classifies Result nodes as either "gating" result nodes (that have a
    subplan) or "simple" result notes (that don't). "Gating" result notes
    happen for target-list projection and/or to apply a one-time filter. I
    don't think the reasons for gating nodes need to be recorded anywhere;
    either the tlist matches the underlying node or it doesn't, and either
    resconstantqual contains something or not. "Simple" result have more
    interesting reasons for existing:
    
    1. MinMaxAgg placeholders
    2. Degenerate grouping
    3. No-FROM-clause cases (these go through create_group_result_plan
    like the previous case, but are arguably distinct)
    4. Relations proven empty
    
    There's a sort of hybrid case when we want a gating result node but
    the underlying node is a simple result. In that case, the patch builds
    a new simple result that is similar to the existing one but with the
    gating result's target list and one-time filter. It seems OK to me to
    forget all about the gating result node and its reason for existence
    in this case and just consider ourselves to have updated the
    underlying Result node. Otherwise, you'd have to consider that a
    Result node might have multiple reasons for existence: whatever caused
    the "simple" Result note to get created, plus possibly projection or
    one-time filtering.
    
    Now, looking at (1)-(4) above, (3) is actually a special case of (4):
    
    robert.haas=# explain (range_table) select 1;
                    QUERY PLAN
    ------------------------------------------
     Result  (cost=0.00..0.01 rows=1 width=4)
       RTIs: 1
     RTI 1 (result):
       Eref: "*RESULT*" ()
    (4 rows)
    
    So the reason why the patch set feels justified in printing "Replaces:
    Aggregate" when there are no relids is because we must have case (1)
    or (2) from the above list. But it does seem fragile. Not only can we
    confuse (1) and (2), but also, only the top-level grouping rel
    necessarily has empty relids. We already have child grouping rels that
    have relid sets, and I suspect Richard's pending work on eager
    aggregation will introduce more of them. This patch won't be able to
    distinguish those from case (4).
    
    So maybe what we want for Result reasons is something like
    RESULT_REPLACES_BASEREL, RESULT_REPLACES_JOINREL,
    RESULT_REPLACES_GROUPING_REL, RESULT_IMPLEMENTS_MINMAX_AGGREGATE?
    That's a bit verbose; shorter alternatives welcome. The first two
    could be merged, since the cardinality of the relid set should
    distinguish them. Or it could be more like RESULT_REPLACES_SCAN,
    RESULT_REPLACES_JOIN, RESULT_REPLACES_AGGREGATE,
    RESULT_IMPLEMENTS_MINMAX_AGGREGATE, to more closely match what we
    would presumably show in the EXPLAIN output.
    
    Thoughts?
    
    -- 
    Robert Haas
    EDB: http://www.enterprisedb.com
    
    
    
    
  32. Re: plan shape work

    Tom Lane <tgl@sss.pgh.pa.us> — 2025-09-11T18:19:40Z

    Richard Guo <guofenglinux@gmail.com> writes:
    > As you can see, the set of outer joins calculated at the same join can
    > vary depending on the join order.  What I suggested is to record this
    > information in JoinPaths (or maybe also in Join plan nodes so that
    > get_scanned_rtindexes can collect it) for the assertion.
    
    I do not think this is correct, or at least it's not the most useful
    way to think about it.  As I stated earlier, each join plan node that
    is doing a non-inner join has a unique corresponding outer-join RTE
    that describes what it's doing.  The point that you are making is
    that if we've applied identity 3 to swap the order of two outer
    joins, then we can't claim that the output of the lower plan node is a
    fully-correct representation of the semantics of the join that it's
    doing: there may be values that should have gone to NULL but won't
    until after the upper plan node processes the rows.
    
    If we're going to attach more labeling to the plan nodes, I'd
    prefer to do what I suggested and label the nodes with the specific
    outer join that they think they are implementing.  With Richard's
    proposal it will remain impossible to tell which node is doing what.
    
    While I've still not worked through the details, it might be that
    we can't implement my desire to make setrefs.c's nullingrels
    checks exact unless we *also* store the bitmap sets that Richard
    is proposing.  I don't know if it's worth carrying two new fields
    in order to make that work.
    
    I don't entirely buy that Richard's proposed assertion is worth
    doing: I think I agree with Robert's original opinion that it's
    redundant.  I do think that tightening the nullingrels checks
    would be useful.
    
    			regards, tom lane
    
    
    
    
  33. Re: plan shape work

    Robert Haas <robertmhaas@gmail.com> — 2025-09-11T18:41:54Z

    On Thu, Sep 11, 2025 at 2:19 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    > I do not think this is correct, or at least it's not the most useful
    > way to think about it.  As I stated earlier, each join plan node that
    > is doing a non-inner join has a unique corresponding outer-join RTE
    > that describes what it's doing.  The point that you are making is
    > that if we've applied identity 3 to swap the order of two outer
    > joins, then we can't claim that the output of the lower plan node is a
    > fully-correct representation of the semantics of the join that it's
    > doing: there may be values that should have gone to NULL but won't
    > until after the upper plan node processes the rows.
    >
    > If we're going to attach more labeling to the plan nodes, I'd
    > prefer to do what I suggested and label the nodes with the specific
    > outer join that they think they are implementing.  With Richard's
    > proposal it will remain impossible to tell which node is doing what.
    
    Conceptually, I prefer your idea of one RTI per join node, but I don't
    understand how to make it work. Let's say that, as in Richard's
    example, the query is written as (A leftjoin B on (Pab)) leftjoin C on
    (Pbc) but we end up with a plan tree that looks like this:
    
    Something Join (RTIs: 1 2 3 4 5)
    -> Scan on A (RTI: 1)
    -> Whatever Join (RTIs: 2 4)
        -> Scan on B (RTI: 2)
        -> Scan on C (RTI: 4)
    
    RTE 3 is the join between A and B, and RTI 5 is the join between A-B
    and C. It makes plenty of sense to associate RTI 5 with the Something
    Join, so your model seems to require us to associate RTI 3 with the
    Whatever Join, because there's no place else for it to go. That seems
    to create two problems.
    
    First, RTI 3 is for an A-B join, and the Whatever Join is for a B-C
    join, and it sounds wrong to associate an RTI with a join when not all
    of the rels being joined are present at that level. Can we really say
    that the Whatever Join is implementing RTI 3 given that RTI 3 includes
    A?
    
    Second, even ignoring that problem, if we now try to assert that the
    RTIs of a joinrel are the union of the RTIs we see in the plan tree,
    the assertion is going to fail, because now the Whatever Join sees
    RTIs 2 and 4 through its children and RTI 3 through its own ojrelid,
    but the joinrel's RTIs are {2,4}, not {2,3,4}.
    
    -- 
    Robert Haas
    EDB: http://www.enterprisedb.com
    
    
    
    
  34. Re: plan shape work

    Robert Haas <robertmhaas@gmail.com> — 2025-09-11T20:07:03Z

    Here's a likely-doomed new version of just the first three patches.
    0002 is unchanged. 0001 has been reworked so that a Result node
    contains a result_type. This is as per Tom's suggestion, but it's
    unclear to me that he will like the details. 0003 has been reworked so
    that when we build a Join plan, we annotate it with the ojrelids
    completed at that level, which Tom said earlier that he thought was
    the wrong idea (but after I'd already written the code, and I've
    already replied to say I don't understand what the alternative is).
    
    Hence, I expect this version to crash and burn, but maybe it will do
    so in such a way that I have some idea what to propose instead.
    
    --
    Robert Haas
    EDB: http://www.enterprisedb.com
    
  35. Re: plan shape work

    Tom Lane <tgl@sss.pgh.pa.us> — 2025-09-12T15:08:51Z

    Robert Haas <robertmhaas@gmail.com> writes:
    > On Thu, Sep 11, 2025 at 2:19 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    >> If we're going to attach more labeling to the plan nodes, I'd
    >> prefer to do what I suggested and label the nodes with the specific
    >> outer join that they think they are implementing.  With Richard's
    >> proposal it will remain impossible to tell which node is doing what.
    
    > Conceptually, I prefer your idea of one RTI per join node, but I don't
    > understand how to make it work. Let's say that, as in Richard's
    > example, the query is written as (A leftjoin B on (Pab)) leftjoin C on
    > (Pbc) but we end up with a plan tree that looks like this:
    
    > Something Join (RTIs: 1 2 3 4 5)
    > -> Scan on A (RTI: 1)
    > -> Whatever Join (RTIs: 2 4)
    >     -> Scan on B (RTI: 2)
    >     -> Scan on C (RTI: 4)
    
    After thinking about this for awhile, I believe that Richard and I
    each had half of the right solution ;-).  Let me propose some new
    terminology in hopes of clarifying matters:
    
    * A join plan node "starts" an outer join if it performs the
    null-extension step corresponding to that OJ (specifically,
    if it is the first join doing null-extension over the minimum
    RHS of that OJ).
    
    * A join plan node "completes" an outer join if its output
    nulls all the values that that OJ should null when done
    according to syntactic order.
    
    In simple cases where we have not applied OJ identity 3, every
    outer-join plan node starts and completes a single OJ relid.
    But if we have applied identity 3 in the forward direction,
    as per your example above, it's different.  The physically
    lower join node starts OJ 5, but doesn't complete it.  The
    upper node starts OJ 3, and completes both 3 and 5.  I think
    that it's possible for the topmost join to complete more than
    two OJs, if we have a nest of multiple OJs that can all be
    re-ordered via identity 3.
    
    I was arguing for labeling plan nodes according to which OJ they
    start (always a unique relid).  Richard was arguing for labeling
    according to which OJ(s) they complete (zero, one, or more relids).
    But I now think it's probably worth doing both.  We need the
    completion bitmapsets if we want to cross-check Var nullingrels,
    because those correspond to the nullingrels that should get added
    at each join's output.  I think that we also want the start labels
    though.  For one thing, if the start nodes are not identified,
    it's impossible to understand how much of the tree is the "no
    man's land" where a C variable may or may not have gone to null
    on its way to becoming a C* variable.  But in general I think
    that we'll want to be able to identify an outer-join plan node
    even if it does not complete its OJ.
    
    			regards, tom lane
    
    
    
    
  36. Re: plan shape work

    Robert Haas <robertmhaas@gmail.com> — 2025-09-12T15:54:28Z

    On Fri, Sep 12, 2025 at 11:08 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    > I was arguing for labeling plan nodes according to which OJ they
    > start (always a unique relid).  Richard was arguing for labeling
    > according to which OJ(s) they complete (zero, one, or more relids).
    
    I agree with everything in your reply up to and including this part.
    
    > But I now think it's probably worth doing both.  We need the
    > completion bitmapsets if we want to cross-check Var nullingrels,
    > because those correspond to the nullingrels that should get added
    > at each join's output.  I think that we also want the start labels
    > though.  For one thing, if the start nodes are not identified,
    > it's impossible to understand how much of the tree is the "no
    > man's land" where a C variable may or may not have gone to null
    > on its way to becoming a C* variable.  But in general I think
    > that we'll want to be able to identify an outer-join plan node
    > even if it does not complete its OJ.
    
    So, it looks to me like the way this works today is that
    join_is_legal() figures out the relevant SpecialJoinInfo and then
    add_outer_joins_to_relids() decides what to add to the joinrel's relid
    set. So I think, though I am not quite sure, that if somewhere around
    that point in the code we copied sjinfo->ojrelid into the RelOptInfo,
    and then propagated that through to the final plan, that might be what
    you're looking for here. However, that assumes that the choice of
    SpecialJoinInfo is fixed for all possible ways of constructing a given
    joinrel, which I think might not be true. In Richard's test case, one
    simply can't go wrong, because the lower join only draws from two
    baserels. But in a case like A LJ B ON A.x = B.x LJ C ON B.y = C.y LJ
    D ON B.z = D.z, the B-C-D joinrel could presumably be constructed by
    joining either B-D to C or B-C to D, and I'm guessing that will result
    in a different choice of SpecialJoinInfo in each case. That would mean
    that the ojrelid has to be per-Path rather than per-RelOptInfo.
    
    While in theory that's fine, it sounds expensive. I was hoping that we
    could piggyback mostly on existing calculations here, exposing data we
    already have instead of calculating new things. If we want to expose
    the starting-the-outer-join-RTI value that you want here, it seems
    like we're going to have to redo some of the join_is_legal() work and
    some of the add_outer_joins_to_relids() work for every path. I'm
    skeptical about expending those cycles. It's not clear to me that I
    need to care about outer join RTIs at all for what I'm trying to do --
    focusing where RTIs originating from baserels end up in the final plan
    tree is, as far as I can see, completely adequate. There might be
    other people who want to do other things that would benefit more from
    seeing that stuff, though. I'm not against exposing information that
    is easily calculated and might be useful to somebody, even if it's
    just for planner debugging. But it seems to me that what you're asking
    here might be going quite a bit further than that.
    
    So my counter-proposal is that this patch set should either (1) expose
    nothing at all about join RTIs because I don't have a need for them or
    (2) expose the join RTIs completed at a certain level because that's
    easily calculated from the data we already have; and if you want to
    later also expose the single join RTI started at a certain level for a
    varnullingrels cross-check or any other purpose, then you can propose
    a patch for that. Alternatively, if you want to edit my 0003 patch to
    work the way you think it should, cool.  Or if you can describe what
    you think it should do, I'm somewhat willing to try implementing that,
    but that's definitely not such a great choice from my perspective. I'm
    afraid that I'm getting pulled a bit further down the garden path than
    I really want to go trying to satisfy your desire to perform a
    cross-check that I don't really understand and/or expose information
    for which I don't see a clear need in my own work. What I need is for
    all the baserels that appear in the final Plan tree to be properly
    labelled.
    
    -- 
    Robert Haas
    EDB: http://www.enterprisedb.com
    
    
    
    
  37. Re: plan shape work

    Tom Lane <tgl@sss.pgh.pa.us> — 2025-09-12T16:18:16Z

    Robert Haas <robertmhaas@gmail.com> writes:
    > So, it looks to me like the way this works today is that
    > join_is_legal() figures out the relevant SpecialJoinInfo and then
    > add_outer_joins_to_relids() decides what to add to the joinrel's relid
    > set. So I think, though I am not quite sure, that if somewhere around
    > that point in the code we copied sjinfo->ojrelid into the RelOptInfo,
    > and then propagated that through to the final plan, that might be what
    > you're looking for here.
    
    Not the RelOptInfo, but the Path.  I have not looked at details, but
    it might be necessary to identify these labels at Path construction
    time rather than reconstructing them during createplan.c.  I'd rather
    not, because bloating Path nodes with hopefully-redundant information
    isn't attractive.  But if it turns out that createplan.c doesn't have
    enough information then we might have to.
    
    I'm also thinking that this notion of starting/completing OJs might be
    useful in its own right to clarify and even simplify some of the Path
    manipulations we do.  But that will require reviewing the code.
    
    > So my counter-proposal is that this patch set should either (1) expose
    > nothing at all about join RTIs because I don't have a need for them or
    > (2) expose the join RTIs completed at a certain level because that's
    > easily calculated from the data we already have; and if you want to
    > later also expose the single join RTI started at a certain level for a
    > varnullingrels cross-check or any other purpose, then you can propose
    > a patch for that.
    
    If what you want to do is only interested in baserel RTIs, then I
    think we should leave outer join RTIs out of the discussion for the
    present.  I was not looking to make you do work you aren't interested
    in.  I reserve the right to do said work later ...
    
    			regards, tom lane
    
    
    
    
  38. Re: plan shape work

    Robert Haas <robertmhaas@gmail.com> — 2025-09-12T17:32:10Z

    On Fri, Sep 12, 2025 at 12:18 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    > Not the RelOptInfo, but the Path.  I have not looked at details, but
    > it might be necessary to identify these labels at Path construction
    > time rather than reconstructing them during createplan.c.  I'd rather
    > not, because bloating Path nodes with hopefully-redundant information
    > isn't attractive.  But if it turns out that createplan.c doesn't have
    > enough information then we might have to.
    
    My intuition (which might be wrong) is that there's enough information
    available to do it during createplan.c, but I'm not sure that there's
    enough information available to do it efficiently at that stage. You
    could grovel through the whole Plan tree and find all of the scan
    RTIs, and then from there you should be able to work out what joins
    were commuted, and then from there you should be able to work out
    which SpecialJoinInfo goes with each Join that appears in the final
    plan tree. That doesn't sound like a lot of fun, though. On the other
    hand, I'm not sure doing this at Path creation time is going to be a
    picnic either. Right now, we're sort of cheating by only caring about
    what OJs are finished at a certain level: that is consistent across
    all possible ways of forming a joinrel, but which OJs are started at a
    certain level is not, and I'm not currently seeing how to fix that
    without adding cycles.
    
    > I'm also thinking that this notion of starting/completing OJs might be
    > useful in its own right to clarify and even simplify some of the Path
    > manipulations we do.  But that will require reviewing the code.
    
    Makes sense.
    
    > If what you want to do is only interested in baserel RTIs, then I
    > think we should leave outer join RTIs out of the discussion for the
    > present.  I was not looking to make you do work you aren't interested
    > in.  I reserve the right to do said work later ...
    
    Absolutely. I'm more than happy to have you do that.
    
    We sort of got started down this path because, reviewing v4-0003,
    Richard commented that I might be able to sanity-check
    something-or-other about RTE_JOIN RTIs instead of just focusing on
    baserels. From there, this sub-thread has turned into a discussion of
    exactly what that sanity check should be. v5 exposes the
    completed-at-this-level OJs in the final plan tree, which is easy to
    compute and could be useful for somebody's plan introspection, but (1)
    I don't need it and (2) it just derives them from the joinrel's RTI
    set rather than in any independent per-path way that might lead to a
    more meaningful cross-check. Having done the work to create v5-0003, I
    find myself thinking it feels a little tidier than v4 and am somewhat
    inclined to prefer it; I think that it's very possible that you or
    Richard might find it a useful basis for future work to further
    strengthen the way things work in this area. However, on the other
    hand, maybe not, and going back to v4-0003 is also completely
    reasonable. I don't care much one way or the other as long as nobody's
    too mad when the dust settles.
    
    But, I also can't commit either v4-0003 or v5-0003 or any variant
    thereof until we agree on what to do about 0001, and you're the
    holdout there. v5-0001 adds a result_type field to the Result node in
    response to your previous review comments, so knowing whether that
    looks like what you want or whether you would prefer something else is
    the blocker for me as of this moment.
    
    Thanks,
    
    -- 
    Robert Haas
    EDB: http://www.enterprisedb.com
    
    
    
    
  39. Re: plan shape work

    Tom Lane <tgl@sss.pgh.pa.us> — 2025-09-12T17:44:32Z

    Robert Haas <robertmhaas@gmail.com> writes:
    > But, I also can't commit either v4-0003 or v5-0003 or any variant
    > thereof until we agree on what to do about 0001, and you're the
    > holdout there.
    
    Yeah, I owe you a review, hope to get to it over the weekend.
    
    			regards, tom lane
    
    
    
    
  40. Re: plan shape work

    Robert Haas <robertmhaas@gmail.com> — 2025-09-12T17:52:01Z

    On Fri, Sep 12, 2025 at 1:44 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    > Robert Haas <robertmhaas@gmail.com> writes:
    > > But, I also can't commit either v4-0003 or v5-0003 or any variant
    > > thereof until we agree on what to do about 0001, and you're the
    > > holdout there.
    >
    > Yeah, I owe you a review, hope to get to it over the weekend.
    
    Thanks. This is a good time to mention that I appreciate your
    engagement in this thread. Let me know if there's something that you'd
    like me to pay attention to in turn.
    
    -- 
    Robert Haas
    EDB: http://www.enterprisedb.com
    
    
    
    
  41. Re: plan shape work

    Richard Guo <guofenglinux@gmail.com> — 2025-09-13T08:46:09Z

    On Sat, Sep 13, 2025 at 12:08 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    > After thinking about this for awhile, I believe that Richard and I
    > each had half of the right solution ;-).  Let me propose some new
    > terminology in hopes of clarifying matters:
    >
    > * A join plan node "starts" an outer join if it performs the
    > null-extension step corresponding to that OJ (specifically,
    > if it is the first join doing null-extension over the minimum
    > RHS of that OJ).
    >
    > * A join plan node "completes" an outer join if its output
    > nulls all the values that that OJ should null when done
    > according to syntactic order.
    
    This new notion makes a lot of sense to me.  I feel that it could help
    us optimize some existing logic, or at least make certain parts of it
    easier to understand.  It might be worth adding to the README, maybe
    under the section "Relation Identification and Qual Clause Placement",
    where we explain the idea behind pushed-down joins.
    
    - Richard
    
    
    
    
  42. Re: plan shape work

    Richard Guo <guofenglinux@gmail.com> — 2025-09-13T09:02:04Z

    On Sat, Sep 13, 2025 at 2:32 AM Robert Haas <robertmhaas@gmail.com> wrote:
    > We sort of got started down this path because, reviewing v4-0003,
    > Richard commented that I might be able to sanity-check
    > something-or-other about RTE_JOIN RTIs instead of just focusing on
    > baserels. From there, this sub-thread has turned into a discussion of
    > exactly what that sanity check should be.
    
    Yeah, when commenting on v4-0003 about including outer join relids in
    the assertion, I had a feeling that it might take the discussion off
    topic -- sorry that it did.  That's why I suggested moving this part
    of discussion to a separate thread.
    
    I still think that cross-checking outer join relids isn't a
    requirement for committing your patches, so I'm totally fine if you
    end up with a version that doesn't assert outer join relids.
    
    - Richard
    
    
    
    
  43. Re: plan shape work

    Tom Lane <tgl@sss.pgh.pa.us> — 2025-09-14T23:42:43Z

    I wrote:
    > Robert Haas <robertmhaas@gmail.com> writes:
    >> But, I also can't commit either v4-0003 or v5-0003 or any variant
    >> thereof until we agree on what to do about 0001, and you're the
    >> holdout there.
    
    > Yeah, I owe you a review, hope to get to it over the weekend.
    
    I'm running out of weekend, but I found time to look at v5-0001,
    so here are some comments.  In general, it's in pretty good shape
    and these are nitpicks.
    
    * I think your placement of the show_result_replacement_info call
    site suffers from add-at-the-end syndrome.  It should certainly go
    before the show_instrumentation_count call: IMO we expect stuff
    added by EXPLAIN ANALYZE to appear after stuff that's there in plain
    EXPLAIN.  But I'd really argue that from a user's standpoint this
    information is part of the fundamental plan structure and so it
    deserves more prominence.  I'd lean to putting it first in the
    T_Result case, before the "One-Time Filter".  (Thought experiment:
    if we'd had this EXPLAIN field from day one, where do you think
    it would have been placed?)
    
    * Even more nitpicky:
    
    +			if (plan->lefttree == NULL)
    +				show_result_replacement_info(castNode(Result, plan), es);
    
    I think show_result_replacement_info should have responsibility for
    deciding whether to print anything in the lefttree == NULL case.
    (This will affect the Asserts in show_result_replacement_info,
    but those seem a little odd anyway.)
    
    +		case RESULT_TYPE_UPPER:
    +			/* a small white lie */
    +			replacement_type = "Aggregate";
    +			break;
    
    I find this unconvincing: is it really an aggregate?  It doesn't
    help that this case doesn't seem to be reached anywhere in the
    regression tests.
    
    In general I suspect that we'll have to refine RESULT_TYPE_UPPER
    in the future.  I don't think this fear needs to block committing
    of what you have though.
    
    +        /* Work out what reference name to use and added it the string. */
    
    The grammar police will be after you.
    
    +     * (Arguably, we should instead display the RTE name in some other way in
    +     * such cases, but in typical cases the RTE name is *RESULT* and printing
    +     * "Result on *RESULT*" or similar doesn't seem especially useful, so for
    +     * now we don't print anything at all.)
    
    Right offhand, I think that RTE_RESULT *always* has the name *RESULT*,
    so the "typical" bit seems misleading.  Personally I'd drop this
    para altogether.
    
    +    /*
    +     * We're replacing either a scan or a join, according to the number of
    +     * rels in the relids set.
    +     */
    +    if (nrels == 0)
    +        ExplainPropertyText("Replaces", replacement_type, es);
    +    else
    +    {
    +        char *s = psprintf("%s on %s", replacement_type, buf.data);
    +
    +        ExplainPropertyText("Replaces", s, es);
    +    }
    
    This comment seems to neither have anything to do with the logic,
    or to be adding anything.  But do you need nrels at all?  I'd
    be inclined to check for "buf.len > 0" to see if you want to
    insert the "on ..." bit.
    
    + * make_simple_result
    + *      Build a Result plan node that returns a single row (or possibly no rows,
    + *      if the one-time filtered defined by resconstantqual returns false)
    
    I don't love the name "make_simple_result", as the cases this handles
    are frequently far from simple.  I don't have a clearly-better idea
    offhand though.  Maybe "make_one_row_result"?  In any case, "one-time
    filtered" needs help.
    
    In general, I wonder if it'd be better for the callers of
    make_xxx_result to pass in the result_type to use.  Those
    functions have such a narrow view of the available info
    that I'm dubious that they can get it right.  Especially
    if/when we decide that RESULT_TYPE_UPPER needs subdivision.
    
    + * relids identifies the relation for which this Result node is generating the
    + * tuples. When subplan is not NULL, it should be empty: this node is not
    + * generating anything in that case, just acting on tuples generated by the
    + * subplan. Otherwise, it may contain a single RTI (as when this Result node
    + * is substituted for a scan); multiple RTIs (as when this Result node is
    + * substituted for a join); or no RTIs at all (as when this Result node is
    + * substituted for an upper rel).
    
    I doubt this claim that the relid set will be empty for an upper rel.
    I think it's more likely that it will include all the rels for the
    query.
    
    			regards, tom lane
    
    
    
    
  44. Re: plan shape work

    Robert Haas <robertmhaas@gmail.com> — 2025-09-15T16:32:15Z

    Thanks for the review. Comments to which I don't respond below are duly noted.
    
    On Sun, Sep 14, 2025 at 7:42 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    > * I think your placement of the show_result_replacement_info call
    > site suffers from add-at-the-end syndrome.  It should certainly go
    > before the show_instrumentation_count call: IMO we expect stuff
    > added by EXPLAIN ANALYZE to appear after stuff that's there in plain
    > EXPLAIN.  But I'd really argue that from a user's standpoint this
    > information is part of the fundamental plan structure and so it
    > deserves more prominence.  I'd lean to putting it first in the
    > T_Result case, before the "One-Time Filter".  (Thought experiment:
    > if we'd had this EXPLAIN field from day one, where do you think
    > it would have been placed?)
    
    Yes, I wondered if it should actually look more like this:
    
    Degenerate Scan on blah
    Degenerate Join on blah, blah, blah
    Degenerate Aggregate
    MinMaxAggregate Result
    
    So getting rid of Replaces: altogether. In fact, "Replaces" is really
    a complete misnomer in the case of a MinMaxAggregate; I'm just not
    sure exactly what to do instead.
    
    
    > +               case RESULT_TYPE_UPPER:
    > +                       /* a small white lie */
    > +                       replacement_type = "Aggregate";
    > +                       break;
    >
    > I find this unconvincing: is it really an aggregate?  It doesn't
    > help that this case doesn't seem to be reached anywhere in the
    > regression tests.
    
    This is the case I know about where that can be reached.
    
    robert.haas=# explain select 1 as a, 2 as b having false;
                    QUERY PLAN
    ------------------------------------------
     Result  (cost=0.00..0.01 rows=1 width=8)
       One-Time Filter: false
       Replaces: Aggregate
    (3 rows)
    
    > In general I suspect that we'll have to refine RESULT_TYPE_UPPER
    > in the future.  I don't think this fear needs to block committing
    > of what you have though.
    
    +1.
    
    > +     * (Arguably, we should instead display the RTE name in some other way in
    > +     * such cases, but in typical cases the RTE name is *RESULT* and printing
    > +     * "Result on *RESULT*" or similar doesn't seem especially useful, so for
    > +     * now we don't print anything at all.)
    >
    > Right offhand, I think that RTE_RESULT *always* has the name *RESULT*,
    > so the "typical" bit seems misleading.  Personally I'd drop this
    > para altogether.
    
    Counterexample:
    
    robert.haas=# explain verbose select * from pgbench_accounts where 0 = 1;
                    QUERY PLAN
    ------------------------------------------
     Result  (cost=0.00..0.00 rows=0 width=0)
       Output: aid, bid, abalance, filler
       One-Time Filter: false
       Replaces: Scan on pgbench_accounts
    (4 rows)
    
    debug_print_plan says:
    
          :alias <>
          :eref
             {ALIAS
             :aliasname pgbench_accounts
             :colnames ("aid" "bid" "abalance" "filler")
             }
    
    > In general, I wonder if it'd be better for the callers of
    > make_xxx_result to pass in the result_type to use.  Those
    > functions have such a narrow view of the available info
    > that I'm dubious that they can get it right.  Especially
    > if/when we decide that RESULT_TYPE_UPPER needs subdivision.
    
    That was my first thought, but after experimentation I think it sucks,
    especially because of this:
    
                    /*
                     * The only path for it is a trivial Result path.  We cheat a
                     * bit here by using a GroupResultPath, because that way we
                     * can just jam the quals into it without preprocessing them.
                     * (But, if you hold your head at the right angle, a FROM-less
                     * SELECT is a kind of degenerate-grouping case, so it's not
                     * that much of a cheat.)
                     */
    
    I would argue that if you hold your head at that angle, you need your
    head examined. Perhaps that wasn't the case when this comment was
    written, but from the viewpoint of this project, I think it's pretty
    clear. What this does is use create_group_result_path() not only for
    actual degenerate-grouping cases (where we're replacing an aggregate)
    but also for no-FROM-clause cases (where we're replacing a scan). We
    could adjust things so that the no-FROM-clause case doesn't take this
    code path, or passes down a flag, but it's clear from examination of
    the RelOptInfo. Likewise, for scans vs. joins, peaking at the
    reloptkind is a very easy way to tell whether we've got a baserel or a
    joinrel and having the caller go to extra trouble to pass it down just
    seems silly.
    
    > + * relids identifies the relation for which this Result node is generating the
    > + * tuples. When subplan is not NULL, it should be empty: this node is not
    > + * generating anything in that case, just acting on tuples generated by the
    > + * subplan. Otherwise, it may contain a single RTI (as when this Result node
    > + * is substituted for a scan); multiple RTIs (as when this Result node is
    > + * substituted for a join); or no RTIs at all (as when this Result node is
    > + * substituted for an upper rel).
    >
    > I doubt this claim that the relid set will be empty for an upper rel.
    > I think it's more likely that it will include all the rels for the
    > query.
    
    Upper rels are created by fetch_upper_rel(). The third argument
    becomes the relids set. Most call sites pass that argument as NULL:
    
    [robert.haas pgsql]$ git grep fetch_upper_rel src/backend/optimizer/
    src/backend/optimizer/path/allpaths.c:   sub_final_rel =
    fetch_upper_rel(rel->subroot, UPPERREL_FINAL, NULL);
    src/backend/optimizer/path/costsize.c:   sub_final_rel =
    fetch_upper_rel(subroot, UPPERREL_FINAL, NULL);
    src/backend/optimizer/plan/planagg.c:    grouped_rel =
    fetch_upper_rel(root, UPPERREL_GROUP_AGG, NULL);
    src/backend/optimizer/plan/planner.c:    final_rel =
    fetch_upper_rel(root, UPPERREL_FINAL, NULL);
    src/backend/optimizer/plan/planner.c:    final_rel =
    fetch_upper_rel(root, UPPERREL_FINAL, NULL);
    src/backend/optimizer/plan/planner.c:    final_rel =
    fetch_upper_rel(root, UPPERREL_FINAL, NULL);
    src/backend/optimizer/plan/planner.c:        grouped_rel =
    fetch_upper_rel(root, UPPERREL_GROUP_AGG,
    src/backend/optimizer/plan/planner.c:        grouped_rel =
    fetch_upper_rel(root, UPPERREL_GROUP_AGG, NULL);
    src/backend/optimizer/plan/planner.c:    window_rel =
    fetch_upper_rel(root, UPPERREL_WINDOW, NULL);
    src/backend/optimizer/plan/planner.c:    distinct_rel =
    fetch_upper_rel(root, UPPERREL_DISTINCT, NULL);
    src/backend/optimizer/plan/planner.c:    partial_distinct_rel =
    fetch_upper_rel(root, UPPERREL_PARTIAL_DISTINCT,
    src/backend/optimizer/plan/planner.c:    ordered_rel =
    fetch_upper_rel(root, UPPERREL_ORDERED, NULL);
    src/backend/optimizer/plan/planner.c:    partially_grouped_rel =
    fetch_upper_rel(root,
    src/backend/optimizer/plan/setrefs.c:
    IS_DUMMY_REL(fetch_upper_rel(rel->subroot,
    src/backend/optimizer/plan/subselect.c:  final_rel =
    fetch_upper_rel(subroot, UPPERREL_FINAL, NULL);
    src/backend/optimizer/plan/subselect.c:          final_rel =
    fetch_upper_rel(subroot, UPPERREL_FINAL, NULL);
    src/backend/optimizer/plan/subselect.c:      final_rel =
    fetch_upper_rel(subroot, UPPERREL_FINAL, NULL);
    src/backend/optimizer/prep/prepunion.c:  result_rel =
    fetch_upper_rel(root, UPPERREL_SETOP,
    src/backend/optimizer/prep/prepunion.c:  final_rel =
    fetch_upper_rel(rel->subroot, UPPERREL_FINAL, NULL);
    src/backend/optimizer/prep/prepunion.c:  result_rel =
    fetch_upper_rel(root, UPPERREL_SETOP, relids);
    src/backend/optimizer/prep/prepunion.c:  result_rel =
    fetch_upper_rel(root, UPPERREL_SETOP,
    src/backend/optimizer/util/relnode.c: * fetch_upper_rel
    src/backend/optimizer/util/relnode.c:fetch_upper_rel(PlannerInfo
    *root, UpperRelationKind kind, Relids relids)
    
    The exceptions are: make_grouping_rel() passes a relids set when
    IS_OTHER_REL(input_rel); create_partial_grouping_paths() passes a
    relids set when creating UPPERREL_PARTIAL_GROUP_AGG; and
    generate_union_paths() and generate_nonunion_paths() in prepunion.c
    bubble up the underlying relids. AFAICS, a non-parallel,
    non-partitionwise aggregate ends up with an empty relid set, and even
    those cases end up with an empty relid set for the topmost grouping
    rel, even if they create some other upper rels that do have non-empty
    relid sets.
    
    -- 
    Robert Haas
    EDB: http://www.enterprisedb.com
    
    
    
    
  45. Re: plan shape work

    Tom Lane <tgl@sss.pgh.pa.us> — 2025-09-15T19:43:46Z

    Robert Haas <robertmhaas@gmail.com> writes:
    > On Sun, Sep 14, 2025 at 7:42 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    >> ... But I'd really argue that from a user's standpoint this
    >> information is part of the fundamental plan structure and so it
    >> deserves more prominence.  I'd lean to putting it first in the
    >> T_Result case, before the "One-Time Filter".  (Thought experiment:
    >> if we'd had this EXPLAIN field from day one, where do you think
    >> it would have been placed?)
    
    > Yes, I wondered if it should actually look more like this:
    > Degenerate Scan on blah
    > Degenerate Join on blah, blah, blah
    > Degenerate Aggregate
    > MinMaxAggregate Result
    > So getting rid of Replaces: altogether.
    
    Yeah, that would be pretty tempting if we were working in a green
    field.  I think it might be too much change though.  Also, from a
    developer's standpoint it's better if what EXPLAIN prints agrees
    with what the node types are internally.
    
    >> +               case RESULT_TYPE_UPPER:
    >> +                       /* a small white lie */
    >> +                       replacement_type = "Aggregate";
    >> +                       break;
    >> 
    >> I find this unconvincing: is it really an aggregate?  It doesn't
    >> help that this case doesn't seem to be reached anywhere in the
    >> regression tests.
    
    > This is the case I know about where that can be reached.
    
    > robert.haas=# explain select 1 as a, 2 as b having false;
    >                 QUERY PLAN
    > ------------------------------------------
    >  Result  (cost=0.00..0.01 rows=1 width=8)
    >    One-Time Filter: false
    >    Replaces: Aggregate
    > (3 rows)
    
    Hmm, okay.  "Replaces: Aggregate" doesn't seem like a great
    explanation, but I don't have a better idea offhand.
    In any case I'm not sure this result_type value is going to
    have a long shelf-life, so arguing about how to spell it
    may not be a productive use of time.
    
    I do suggest adding the above as a regression test.
    
    >> Right offhand, I think that RTE_RESULT *always* has the name *RESULT*,
    >> so the "typical" bit seems misleading.  Personally I'd drop this
    >> para altogether.
    
    > Counterexample:
    > robert.haas=# explain verbose select * from pgbench_accounts where 0 = 1;
    >                 QUERY PLAN
    > ------------------------------------------
    >  Result  (cost=0.00..0.00 rows=0 width=0)
    >    Output: aid, bid, abalance, filler
    >    One-Time Filter: false
    >    Replaces: Scan on pgbench_accounts
    > (4 rows)
    
    Uh ... this example does not involve an RTE_RESULT does it?
    I see a regular RTE_RELATION RTE for pgbench_accounts, which
    your code duly prints.
    
    Looking at the code, there are just three places (all in
    prepjointree.c) that create RTE_RESULT RTEs.  Two of them
    are building the RTE from scratch, and they both do
    	rte->eref = makeAlias("*RESULT*", NIL);
    However the third one (pull_up_constant_function) is changing
    a pulled-up RTE_FUNCTION into RTE_RESULT, and it doesn't do
    anything to the RTE's eref.  While that makes your argument
    nominally true, I'd be inclined to argue that this was an oversight
    and it should have changed the alias/eref fields to look like other
    RTE_RESULTs.  (I've not investigated, but I wonder what your
    patch prints for such cases.)
    
    Bottom line remains that I don't think the comment para I quoted
    is adding any useful info.  The first half of that comment block
    was sufficient.
    
    >> In general, I wonder if it'd be better for the callers of
    >> make_xxx_result to pass in the result_type to use.
    
    > That was my first thought, but after experimentation I think it sucks,
    
    Hmph.  Okay, if you tried it and it's bad, I'll accept your opinion.
    
    >> I doubt this claim that the relid set will be empty for an upper rel.
    >> I think it's more likely that it will include all the rels for the
    >> query.
    
    > Upper rels are created by fetch_upper_rel(). The third argument
    > becomes the relids set. Most call sites pass that argument as NULL:
    
    Okay, but "most" is not "all".  I suggest that that comment is just
    too specific in the first place.  Even if it's 100% correct today,
    it's likely to be falsified in future and no one will remember to
    update it.  Something like this might have a longer shelf life:
    
    + * relids identifies the relation for which this Result node is generating the
    + * tuples. When subplan is not NULL, it should be empty: this node is not
    + * generating anything in that case, just acting on tuples generated by the
    + * subplan. Otherwise, it contains the relids of the planner relation that
    + * the Result represents.
    
    			regards, tom lane
    
    
    
    
  46. Re: plan shape work

    Robert Haas <robertmhaas@gmail.com> — 2025-09-15T21:07:47Z

    On Mon, Sep 15, 2025 at 3:43 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    > > Yes, I wondered if it should actually look more like this:
    > > Degenerate Scan on blah
    > > Degenerate Join on blah, blah, blah
    > > Degenerate Aggregate
    > > MinMaxAggregate Result
    > > So getting rid of Replaces: altogether.
    >
    > Yeah, that would be pretty tempting if we were working in a green
    > field.  I think it might be too much change though.  Also, from a
    > developer's standpoint it's better if what EXPLAIN prints agrees
    > with what the node types are internally.
    
    Well, I was just bringing it up because you seemed to think that the
    Replaces: line was not necessarily where we would have put it in a
    green field. I am not sure about where we would have put it, but I
    think in a green field we wouldn't have it at all, and I actually
    think the above is worth considering. I think people would get used to
    it pretty fast and that it might be more clear than "Result", which
    doesn't really mean a whole lot. However, if you or others don't like
    that and you want to just move the Replaces line slightly higher in
    the output, I mean, I don't mind that, I just don't know that it's
    really a material difference.
    
    > I do suggest adding the above as a regression test.
    
    Makes sense.
    
    > Uh ... this example does not involve an RTE_RESULT does it?
    > I see a regular RTE_RELATION RTE for pgbench_accounts, which
    > your code duly prints.
    >
    > Looking at the code, there are just three places (all in
    > prepjointree.c) that create RTE_RESULT RTEs.  Two of them
    > are building the RTE from scratch, and they both do
    >         rte->eref = makeAlias("*RESULT*", NIL);
    > However the third one (pull_up_constant_function) is changing
    > a pulled-up RTE_FUNCTION into RTE_RESULT, and it doesn't do
    > anything to the RTE's eref.  While that makes your argument
    > nominally true, I'd be inclined to argue that this was an oversight
    > and it should have changed the alias/eref fields to look like other
    > RTE_RESULTs.  (I've not investigated, but I wonder what your
    > patch prints for such cases.)
    
    Will investigate.
    
    > + * relids identifies the relation for which this Result node is generating the
    > + * tuples. When subplan is not NULL, it should be empty: this node is not
    > + * generating anything in that case, just acting on tuples generated by the
    > + * subplan. Otherwise, it contains the relids of the planner relation that
    > + * the Result represents.
    
    OK.
    
    -- 
    Robert Haas
    EDB: http://www.enterprisedb.com
    
    
    
    
  47. Re: plan shape work

    Tom Lane <tgl@sss.pgh.pa.us> — 2025-09-15T21:15:44Z

    Robert Haas <robertmhaas@gmail.com> writes:
    > Well, I was just bringing it up because you seemed to think that the
    > Replaces: line was not necessarily where we would have put it in a
    > green field. I am not sure about where we would have put it, but I
    > think in a green field we wouldn't have it at all, and I actually
    > think the above is worth considering. I think people would get used to
    > it pretty fast and that it might be more clear than "Result", which
    > doesn't really mean a whole lot. However, if you or others don't like
    > that and you want to just move the Replaces line slightly higher in
    > the output, I mean, I don't mind that, I just don't know that it's
    > really a material difference.
    
    Yeah, I'm content with making it a "Replaces" attribute, I'm just
    complaining about the ordering.  I'd be the first to agree that
    this is purely cosmetic, but in my mind there is a pretty clear
    precedence ordering among the attributes that EXPLAIN prints.
    
    			regards, tom lane
    
    
    
    
  48. Re: plan shape work

    Robert Haas <robertmhaas@gmail.com> — 2025-09-16T14:17:42Z

    On Mon, Sep 15, 2025 at 3:43 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    > anything to the RTE's eref.  While that makes your argument
    > nominally true, I'd be inclined to argue that this was an oversight
    > and it should have changed the alias/eref fields to look like other
    > RTE_RESULTs.  (I've not investigated, but I wonder what your
    > patch prints for such cases.)
    
    It just prints "-> Result" and that's it, as in this example:
    
    robert.haas=# create or replace function absolutely_not() returns bool
    return false;
    CREATE FUNCTION
    robert.haas=# explain (costs off) select * from generate_series(1,3) g
    full join absolutely_not() n on true;
                    QUERY PLAN
    ------------------------------------------
     Merge Full Join
       ->  Function Scan on generate_series g
       ->  Materialize
             ->  Result
    (4 rows)
    
    robert.haas=# explain (costs off, range_table) select * from
    generate_series(1,3) g full join absolutely_not() n on true;
                    QUERY PLAN
    ------------------------------------------
     Merge Full Join
       ->  Function Scan on generate_series g
             Scan RTI: 1
       ->  Materialize
             ->  Result
                   RTIs: 2
     RTI 1 (function, in-from-clause):
       Alias: g ()
       Eref: g (g)
       WITH ORDINALITY: false
     RTI 2 (result, in-from-clause):
       Alias: n ()
       Eref: n (n)
     RTI 3 (join, in-from-clause):
       Eref: unnamed_join (g, n)
       Join Type: Full
    (16 rows)
    
    Here's a new patch set. My main questions are:
    
    1. Did I miss anything you wanted fixed in 0001?
    
    2. Should 0001 be combined with 0002 or kept separate?
    
    3. Do you have a preference between this version of 0003 and the older
    revision that just ignored outer-join relids?
    
    --
    Robert Haas
    EDB: http://www.enterprisedb.com
    
  49. Re: plan shape work

    Robert Haas <robertmhaas@gmail.com> — 2025-09-16T15:27:04Z

    CI is unhappy with v6 because:
    
    [14:37:37.742] In function ‘show_result_replacement_info’,
    [14:37:37.742]     inlined from ‘ExplainNode’ at explain.c:2240:4:
    [14:37:37.742] explain.c:4849:33: error: ‘replacement_type’ may be
    used uninitialized [-Werror=maybe-uninitialized]
    [14:37:37.742]  4849 |                 char       *s = psprintf("%s on
    %s", replacement_type, buf.data);
    [14:37:37.742]       |
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    [14:37:37.742] explain.c: In function ‘ExplainNode’:
    [14:37:37.742] explain.c:4769:21: note: ‘replacement_type’ was declared here
    [14:37:37.742]  4769 |         char       *replacement_type;
    [14:37:37.742]       |                     ^~~~~~~~~~~~~~~~
    
    So apparently an "enum" over every value of the switch is not good
    enough for the value to be assigned.
    
    I'm inclined to change the code like this to fix it:
    
        char       *replacement_type = "???";
    
    ...in the hopes of still producing a warning here if somebody adds
    another label to the enum.
    
    -- 
    Robert Haas
    EDB: http://www.enterprisedb.com
    
    
    
    
  50. Re: plan shape work

    Robert Haas <robertmhaas@gmail.com> — 2025-09-18T17:23:11Z

    On Tue, Sep 16, 2025 at 11:27 AM Robert Haas <robertmhaas@gmail.com> wrote:
    > I'm inclined to change the code like this to fix it:
    >
    >     char       *replacement_type = "???";
    >
    > ...in the hopes of still producing a warning here if somebody adds
    > another label to the enum.
    
    Done in this version. I've also now gone back and rebased the rest of
    the patches as well, so this email includes all 7 patches instead of
    just the first 3. To recall, my goal for this CF was to get 1-4
    committed.
    
    -- 
    Robert Haas
    EDB: http://www.enterprisedb.com
    
  51. Re: plan shape work

    Junwang Zhao <zhjwpku@gmail.com> — 2025-09-21T09:52:21Z

    Hi Robert,
    
    On Fri, Sep 19, 2025 at 1:23 AM Robert Haas <robertmhaas@gmail.com> wrote:
    >
    > On Tue, Sep 16, 2025 at 11:27 AM Robert Haas <robertmhaas@gmail.com> wrote:
    > > I'm inclined to change the code like this to fix it:
    > >
    > >     char       *replacement_type = "???";
    > >
    > > ...in the hopes of still producing a warning here if somebody adds
    > > another label to the enum.
    >
    > Done in this version. I've also now gone back and rebased the rest of
    > the patches as well, so this email includes all 7 patches instead of
    > just the first 3. To recall, my goal for this CF was to get 1-4
    > committed.
    >
    > --
    > Robert Haas
    > EDB: http://www.enterprisedb.com
    
    I have a question about the following changes:
    
    - splan = plan;
      if (IsA(plan, Result))
      {
      Result    *rplan = (Result *) plan;
    
    - if (rplan->plan.lefttree == NULL &&
    - rplan->resconstantqual == NULL)
    - splan = NULL;
    + gplan->plan.lefttree = NULL;
    + gplan->relids = rplan->relids;
    + gplan->result_type = rplan->result_type;
      }
    
    You set gplan->relids and gplan->result_type, but at the end you
    returned &gplan->plan, what's the point of setting these two fields?
    
    
    -- 
    Regards
    Junwang Zhao
    
    
    
    
  52. Re: plan shape work

    Junwang Zhao <zhjwpku@gmail.com> — 2025-09-21T10:34:49Z

    On Sun, Sep 21, 2025 at 5:52 PM Junwang Zhao <zhjwpku@gmail.com> wrote:
    >
    > Hi Robert,
    >
    > On Fri, Sep 19, 2025 at 1:23 AM Robert Haas <robertmhaas@gmail.com> wrote:
    > >
    > > On Tue, Sep 16, 2025 at 11:27 AM Robert Haas <robertmhaas@gmail.com> wrote:
    > > > I'm inclined to change the code like this to fix it:
    > > >
    > > >     char       *replacement_type = "???";
    > > >
    > > > ...in the hopes of still producing a warning here if somebody adds
    > > > another label to the enum.
    > >
    > > Done in this version. I've also now gone back and rebased the rest of
    > > the patches as well, so this email includes all 7 patches instead of
    > > just the first 3. To recall, my goal for this CF was to get 1-4
    > > committed.
    > >
    > > --
    > > Robert Haas
    > > EDB: http://www.enterprisedb.com
    >
    > I have a question about the following changes:
    >
    > - splan = plan;
    >   if (IsA(plan, Result))
    >   {
    >   Result    *rplan = (Result *) plan;
    >
    > - if (rplan->plan.lefttree == NULL &&
    > - rplan->resconstantqual == NULL)
    > - splan = NULL;
    > + gplan->plan.lefttree = NULL;
    > + gplan->relids = rplan->relids;
    > + gplan->result_type = rplan->result_type;
    >   }
    >
    > You set gplan->relids and gplan->result_type, but at the end you
    > returned &gplan->plan, what's the point of setting these two fields?
    
    After further study, this should not be a problem, since the address of
    &gplan->plan is the same as gplan, but the code is a little bit confusing
    at first glance, I think *return (Plan *) gplan* is easier to understand
    but I don't insist ;)
    
    >
    >
    > --
    > Regards
    > Junwang Zhao
    
    
    
    -- 
    Regards
    Junwang Zhao
    
    
    
    
  53. Re: plan shape work

    Robert Haas <robertmhaas@gmail.com> — 2025-09-22T00:26:00Z

    On Sun, Sep 21, 2025 at 6:35 AM Junwang Zhao <zhjwpku@gmail.com> wrote:
    > After further study, this should not be a problem, since the address of
    > &gplan->plan is the same as gplan, but the code is a little bit confusing
    > at first glance, I think *return (Plan *) gplan* is easier to understand
    > but I don't insist ;)
    
    Thanks for looking into it. Stylistically, I prefer the style without
    the cast, but the effect is the same.
    
    -- 
    Robert Haas
    EDB: http://www.enterprisedb.com
    
    
    
    
  54. Re: plan shape work

    Tom Lane <tgl@sss.pgh.pa.us> — 2025-09-22T18:15:22Z

    Robert Haas <robertmhaas@gmail.com> writes:
    > Done in this version. I've also now gone back and rebased the rest of
    > the patches as well, so this email includes all 7 patches instead of
    > just the first 3. To recall, my goal for this CF was to get 1-4
    > committed.
    
    I found time finally to look through all of these.  Some notes:
    
    0001: committable, no further comments.
    
    0002: Yeah, this does seem like an improvement.  There is something
    faintly weird about output like
    
              ->  Result
    -               Output: i3
    +               Output: t3.i3
                    Replaces: Scan on t3
                    One-Time Filter: false
    
    because the whole point of this construct is that we're *not*
    scanning t3.  However, failing to supply the Var prefix is surely
    not better.  I like the fact that the output in upper plan
    levels is now just like what you'd see with a non-phony t3 scan.
    I think you could commit this too.  Merging it with 0001 is
    reasonable, but if you'd rather keep them separate that's okay too.
    
    0003: doesn't feel ready for commit.  In the first place, you've used
    the "completed an outer join" terminology in the commit message and
    a couple of comments, but defined it nowhere.  I think a para or
    two in optimizer/README to define "starting" and "completing"
    outer joins is essential.  (I'm also still quite bemused by marking
    nodes that complete outer joins but not those that start them.)
    In the second place, we should not need to add two hundred lines
    of new code to createplan.c to accomplish this.  Why not simply
    bms_difference the joinrel's relids from the union of the inputs'
    relids?  (And no, I do not believe that computing the value two
    different ways so you can assert they're the same is adding anything
    whatsoever except wasted cycles.)
    
    By and large, I don't believe that it's necessary for 0003 to depend
    on 0001 either.  We already have the information available from the
    paths' parent RelOptInfos.
    
    0004: commit msg is not very convincing about why this is a good idea.
    It really looks like change for the sake of change, so you need to
    make a better case for it.  Also, this output seems inconsistent:
    
      Function Scan on pg_catalog.generate_series x
    -   Output: ARRAY(SubPlan 1)
    +   Output: ARRAY(array_1)
        Function Call: generate_series(1, 3)
    -   SubPlan 1
    +   SubPlan array_1
    
    Why isn't it now "ARRAY(SubPlan array_1)"?  The previous notation was
    chosen to be not-confusable with a plain function call, but you've
    lost that.  Likewise for cases like
    -   Group Key: (InitPlan 1).col1
    +   Group Key: (minmax_1).col1
    which now looks exactly like an ordinary Var.
    
    nitpick: sublinktype_to_string() should return const char *.
    
    0005: I'm even less convinced about there being a need for this.
    It's not that hard to see which RTIs are in which subplan,
    especially after the other changes in this patchset.
    
    0006: I agree with the need for this, but the details seem messy,
    and it's not very clear that the proposed data structure would
    be convenient to use.  Do we really need to rely on plan_node_id?
    Why haven't you integrated record_elided_node into
    clean_up_removed_plan_level?
    
    An idea maybe worth thinking about is that instead of completely
    eliding the plan node, we could replace it with a "no-op" plan node
    that both EXPLAIN and the executor will look right through.
    That node could carry the relid(s) that we lost.  Not sure how
    messy this'd be to integrate, though.
    
    0007: not sure about this either.  Why not simply add those
    relid(s) to the surviving node's apprelids?  Again, I don't
    love the amount of code and data structure that's being added
    for a hypothetical use-case.  It's not even clear that this
    form of the data structure would be helpful to anyone.
    
    			regards, tom lane
    
    
    
    
  55. Re: plan shape work

    Robert Haas <robertmhaas@gmail.com> — 2025-09-23T15:34:45Z

    On Mon, Sep 22, 2025 at 2:15 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    > I found time finally to look through all of these.  Some notes:
    
    Thanks. Committed 0001 and 0002 together. I agree with you that 0002
    is a bit weird, but I think it is less weird than the status quo ante,
    and it seems you agree. I think it's best to regard a Result node that
    replaces a scan or join as being a form of scan or join that just so
    happens to be able to be optimized to a degree not normally possible.
    There's a good argument for calling something like this a dummy
    scan/join, as I mentioned before. There's no compelling reason we have
    to make that terminology change, as we noted in the earlier
    discussion, but it's a useful and not incorrect way of thinking about
    it.
    
    > 0003: doesn't feel ready for commit.  In the first place, you've used
    > the "completed an outer join" terminology in the commit message and
    > a couple of comments, but defined it nowhere.  I think a para or
    > two in optimizer/README to define "starting" and "completing"
    > outer joins is essential.  (I'm also still quite bemused by marking
    > nodes that complete outer joins but not those that start them.)
    > In the second place, we should not need to add two hundred lines
    > of new code to createplan.c to accomplish this.  Why not simply
    > bms_difference the joinrel's relids from the union of the inputs'
    > relids?  (And no, I do not believe that computing the value two
    > different ways so you can assert they're the same is adding anything
    > whatsoever except wasted cycles.)
    
    So, let me just back up a minute here and talk about overall goals.
    What I originally wanted to do with this patch is ensure that the
    non-RTE_JOIN RTIs from the joinrel are all mentioned in the final
    plan. Without the changes to the Result node, that's not the case;
    with the changes to the Result node, that is, AFAICT, now the case. It
    can be argued that what has been numbered 0003 up to now is not really
    necessary at all, since all it does is make it less likely that we
    will introduce cases with similar problems to the Result-node case in
    the future, and that could be judged unlikely enough to make 0003 not
    worth committing. However, it seems like you might be proposing
    keeping the patch in some form but throwing out the part of the code
    that walks the plan tree to collect RTIs, and that doesn't make a
    whole lot of sense to me. I have some confidence that the joinrel's
    RTI set will include all of the RTIs from the input rels; what I'm
    worried about is whether those RTIs survive into the final Plan.
    
    On the other hand, I'm not sure that I'm interpreting your remarks
    correctly. When you say "bms_difference the joinrel's relids from the
    union of the inputs' relids" maybe you're specifically talking about
    the handling of the RTE_JOIN relids, and I don't care very much how we
    account for those. So I guess I need some clarification here as to
    what your thinking is.
    
    With regard to your other comments, I'm not opposed to updating the
    README, or alternatively, I'm also not opposed to adjusting the
    wording of the comments and commit message to avoid that particular
    terminology. As far as your parenthetical comment about not marking
    outer joins started, I don't actually care what we do, but as I said
    before, I don't see an efficient way to identify outer joins started
    at a particular level, and it isn't worth adding planner cycles for
    information for which I have no concrete need.
    
    > 0004: commit msg is not very convincing about why this is a good idea.
    > It really looks like change for the sake of change, so you need to
    > make a better case for it.
    
    You're right. That commit message presupposes that the goal is already
    understood, instead of explaining it. See the first message on this
    thread, in the paragraph that begins "Now let's talk about problem
    #2," for the justification. Quoting the most relevant part:
    
    Subqueries sort of have names right
    now, at least some of them, but it's an odd system: a CTE subquery,
    for example, has the name mentioned by the user, but other kinds of
    subplans just get names like "InitPlan 3" or "SubPlan 2". The real
    problem, though, is that those names are only assigned after we've
    FINISHED planned the subquery. If we begin planning our very first
    subquery, it might turn out to be InitPlan 1 or SubPlan 1, or if while
    planning it we recurse into some further subquery then *that* subquery
    might become InitPlan 1 or SubPlan 1 and OUR subquery might become
    InitPlan 2 or SubPlan 2 (or higher, if we find more subqueries and
    recurse into them too). Thus, being given some information about how
    the user wants, say, SubPlan 2 to be planned is completely useless
    because we won't know whether that is us until after we've done the
    planning that the user is trying to influence.
    
    I need to figure out some good way of explaining this (hopefully not
    too verbosely) in the commit message.
    
    > Also, this output seems inconsistent:
    >
    >   Function Scan on pg_catalog.generate_series x
    > -   Output: ARRAY(SubPlan 1)
    > +   Output: ARRAY(array_1)
    >     Function Call: generate_series(1, 3)
    > -   SubPlan 1
    > +   SubPlan array_1
    >
    > Why isn't it now "ARRAY(SubPlan array_1)"?  The previous notation was
    > chosen to be not-confusable with a plain function call, but you've
    > lost that.  Likewise for cases like
    > -   Group Key: (InitPlan 1).col1
    > +   Group Key: (minmax_1).col1
    > which now looks exactly like an ordinary Var.
    
    I don't think there's anything keeping me from making that print
    InitPlan/SubPlan there. I just thought it looked a little verbose that
    way. I thought that the "InitPlan 1" notation was probably due to the
    fact that "1" is not a proper name, rather than (as you suggest here)
    to avoid confusion with other syntax. Compare CTEs, which are also
    subplans, but for which we simply print the CTE's alias name, rather
    than "CTE alias_name".
    
    > nitpick: sublinktype_to_string() should return const char *.
    
    This bleeds into a bunch of other things. I'll poke at it and try to
    figure something out. I may need help from someone with superior const
    skills, but I'll give it a go.
    
    > 0005: I'm even less convinced about there being a need for this.
    > It's not that hard to see which RTIs are in which subplan,
    > especially after the other changes in this patchset.
    
    Let me just say that everything in this patch set is the result of
    experimentation -- trying to write code that attempts to interpret a
    Plan tree and discovering problems doing so along the way. I'm not
    altogether convinced that patches 0005-0008 attack the problems in the
    best possible way and I'm happy to hear suggestions for how to do it
    better, but I'm pretty confident that they're all trying to tackle
    real problems. The way to think about 0004 and 0005, IMHO, is this:
    suppose we look at the final Plan, and we see that RTI 17 was planned
    in such-and-such a way. If we want to reproduce that planning decision
    in the next cycle -- or change that planning decision in the next
    planning cycle -- we need to be able to figure out which subroot RTI
    17 came from and what RTI it had within that subroot. With 0004 and
    0005 applied, you can figure out that RTI 17 was originally RTI 5
    within a subroot that was called expr_1, or whatever, and when we
    replan the same query we will assign the name expr_1 to the same
    subroot before we begin planning it and RTI 5 within that subroot will
    refer to the same thing it did before. This assumes, of course, that
    the query is the same and that inlining decisions and so forth haven't
    changed and that no objects have been swapped out for objects with the
    same name; but the point is that even if none of that has happened we
    can't match things up in an automated way without this infrastructure,
    or at least I don't know how to do it. If you do, I'm all ears. I
    would much rather minimize the amount of information that we have to
    store in the Plan tree or PlannedStmt and just have code to do the
    necessary interpretation based on the data that's already available,
    but I could not see how to make it work.
    
    > 0006: I agree with the need for this, but the details seem messy,
    > and it's not very clear that the proposed data structure would
    > be convenient to use.  Do we really need to rely on plan_node_id?
    > Why haven't you integrated record_elided_node into
    > clean_up_removed_plan_level?
    >
    > An idea maybe worth thinking about is that instead of completely
    > eliding the plan node, we could replace it with a "no-op" plan node
    > that both EXPLAIN and the executor will look right through.
    > That node could carry the relid(s) that we lost.  Not sure how
    > messy this'd be to integrate, though.
    
    There's certainly nothing that requires us to use this particular
    scheme for identifying the locations where certain plan nodes were
    elided. For example, we could put an additional List * in each plan
    node and stash a list of elided nodes there rather than indexing by
    plan_node_id. The disadvantage of that is simply that it increases the
    size of the Plan struct for every node, and most nodes will end up
    with an empty list. Indexing by plan_node_id was just my way of
    getting the information that I wanted to store out of the tree that
    the executor examines. We could potentially also do as you say here
    and keep the Plan nodes around and then somehow arrange to ignore them
    at explain and execution time, but I don't have a good idea of how we
    would do that without adding overhead. In terms of the usability of
    the data structure, experimentation has shown it to be serviceable.
    There is obviously a risk that with many elided nodes, having to
    grovel through the list of nodes looking for a certain plan ID over
    and over again could be inefficient -- but large numbers of elided
    nodes don't seem all that likely, so maybe it's OK; and I suppose any
    code that uses this could always build a hash table if warranted. But
    I'm not saying it's a perfect solution.
    
    In terms of why record_elided_node is not integrated into
    clean_up_removed_plan_level, that was just a stylistic choice.
    clean_up_removed_plan_level() could instead call record_elided_node(),
    or record_elided_node() could cease to exist and the logic be inlined
    into clean_up_removed_plan_level().
    
    > 0007: not sure about this either.  Why not simply add those
    > relid(s) to the surviving node's apprelids?  Again, I don't
    > love the amount of code and data structure that's being added
    > for a hypothetical use-case.  It's not even clear that this
    > form of the data structure would be helpful to anyone.
    
    I'd say that is the patch I'm least sure about. Note that my goal for
    this commitfest was to get 0001-0004 committed, partly because I
    wasn't too sure whether the later patches might need some adjustment.
    My intuition is that flattening the relid sets together loses
    important information, but I don't have a test case proving that near
    at hand, so maybe it doesn't. However, I'm fairly certain that it is
    at least a lot more convenient to have the information in this form.
    In general, if we see an Append or MergeAppend node generated from a
    RelOptInfo with a single RTI, that's a partition-wise scan of a
    partitioned relation, and if we see an Append or MergeAppend node
    generated from a RelOptInfo with more than one RTI, that's a
    partition-wise join of all those relations. So, very naively, if we
    just combine the relid sets for a bunch of partitionwise scans into a
    single RTI set, it looks like we've got a partitionwise join, but we
    don't. Now, if it's only set operations that result in multiple levels
    of Append/MergeAppend nodes getting collapsed, it might be possible to
    disentangle what actually happened by partitioning the final set of
    RTIs by the subroot from which they originate. I'm not sure that's how
    it works, though. At any rate, I agree with you that this is not
    adequately motivated at present. Having said that, at ten thousand
    feet, the motivation here is to be able to figure out from the final
    plan tree where exactly we switched to partitionwise operation --
    whether that was done for some joinrel or only when we got down to the
    underlying baserel.
    
    -- 
    Robert Haas
    EDB: http://www.enterprisedb.com
    
    
    
    
  56. Re: plan shape work

    Tom Lane <tgl@sss.pgh.pa.us> — 2025-09-23T21:27:37Z

    Robert Haas <robertmhaas@gmail.com> writes:
    > On Mon, Sep 22, 2025 at 2:15 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    >> In the second place, we should not need to add two hundred lines
    >> of new code to createplan.c to accomplish this.  Why not simply
    >> bms_difference the joinrel's relids from the union of the inputs'
    >> relids?
    
    > ... On the other hand, I'm not sure that I'm interpreting your remarks
    > correctly. When you say "bms_difference the joinrel's relids from the
    > union of the inputs' relids" maybe you're specifically talking about
    > the handling of the RTE_JOIN relids, and I don't care very much how we
    > account for those. So I guess I need some clarification here as to
    > what your thinking is.
    
    What I'm saying is that I'd be much happier with 0003 if it looked
    about like the attached.  We do not need a heap of mechanism
    redundantly proving that the planner is getting these things right
    (and potentially containing its own bugs).
    
    > Note that my goal for
    > this commitfest was to get 0001-0004 committed, partly because I
    > wasn't too sure whether the later patches might need some adjustment.
    
    Fair enough.  I think we can reach agreement on that much pretty quickly.
    
    			regards, tom lane
    
    
  57. Re: plan shape work

    Robert Haas <robertmhaas@gmail.com> — 2025-09-24T12:27:47Z

    On Tue, Sep 23, 2025 at 5:27 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    > What I'm saying is that I'd be much happier with 0003 if it looked
    > about like the attached.  We do not need a heap of mechanism
    > redundantly proving that the planner is getting these things right
    > (and potentially containing its own bugs).
    
    Thanks for the demo. That doesn't actually Assert anything, so the
    commit message is a lie, but I get the point, and based on that, I
    think what we should do is just drop 0003 altogether for now. I don't
    need the ojrelids field for anything currently, and if I or someone
    else does later, we can always revisit this idea. Or, if it turns out
    that we later introduce more bugs that my version of 0003 would have
    caught, we can re-ask the question of whether we want to Assert
    something. I don't agree with your judgement that this is an
    unreasonable amount of mechanism for what it checks, but I'm entirely
    prepared to concede that 0003 is kind of clunky, and I also think that
    the rate at which we do new things in the planner is low enough that
    it could easily be decades or forever before we have another problem
    that this would have caught. Hence, I'm fine with dropping this patch.
    Let's call it "some code that Robert found useful for personal
    testing" and move on.
    
    > > Note that my goal for
    > > this commitfest was to get 0001-0004 committed, partly because I
    > > wasn't too sure whether the later patches might need some adjustment.
    >
    > Fair enough.  I think we can reach agreement on that much pretty quickly.
    
    Cool. Let's focus on 0004 then, and possibly 0005 since it's somewhat
    related and you seem to have an idea that there could be a better way
    of solving that problem. That's not necessarily to say that 0005 would
    get committed this CF, unless we happen to agree vigorously on
    something, but if there's a way to work around needing 0005 or if it
    needs to be redone in some other form, it would be good to have some
    idea around that sooner rather than later.
    
    -- 
    Robert Haas
    EDB: http://www.enterprisedb.com
    
    
    
    
  58. Re: plan shape work

    Robert Haas <robertmhaas@gmail.com> — 2025-09-24T14:34:00Z

    On Wed, Sep 24, 2025 at 8:27 AM Robert Haas <robertmhaas@gmail.com> wrote:
    > Cool. Let's focus on 0004 then, and possibly 0005 since it's somewhat
    > related and you seem to have an idea that there could be a better way
    > of solving that problem. That's not necessarily to say that 0005 would
    > get committed this CF, unless we happen to agree vigorously on
    > something, but if there's a way to work around needing 0005 or if it
    > needs to be redone in some other form, it would be good to have some
    > idea around that sooner rather than later.
    
    Here's a new patch set. 0004 is now 0001 and similarly all other patch
    numbers are -3, since the old 0001 and 0002 were committed together
    and the 0003 is abandoned. I made the following changes to
    old-0004/new-0001:
    
    - I rewrote the commit message. I'm not really sure this is any
    clearer about the motivation for this patch, but I tried. Suggestions
    appreciated.
    
    - CI was complaining about a warning from sublinktype_to_string, which
    I've tried to suppress by adding a dummy return to the end of the
    function.
    
    - You (Tom) complained about the lack of const on
    sublinktype_to_string, so this version has been const-ified. The const
    bled into the arguments to choose_plan_name() and subquery_planner(),
    and into the plan_name structure members within PlannerInfo and
    SubPlan. I don't know if this is the right thing to do, so feel free
    to set me straight.
    
    - You (Tom) also asked why not print InitPlan/SubPlan wherever we
    refer to subplans, so this version restores that behavior. I did that
    by putting logic to print the InitPlan or SubPlan prefix in
    ruleutils.c, while not including InitPlan or SubPlan in the SubPlan's
    plan_name field any more. The reason for this is that the purpose of
    the patch set is to assign names before planning, and the decision as
    to whether something is an InitPlan or a SubPlan is made after
    planning, so keeping "InitPlan" or "SubPlan" in the actual plan name
    undermines the whole point of the patch. I would argue that this is
    actually better on philosophical grounds, because I think that the
    fact that something is an expression rather than an EXISTS clause or
    whatever is part of the identify of the resulting object, whereas I
    think whether something is an InitPlan or a SubPlan is mostly
    interesting as a performance characteristic rather than as a definer
    of identity. I don't necessarily expect this position to be accepted
    without debate, but I prefer the EXPLAIN output with the patch to the
    pre-patch output.
    
    The remaining patches are simply rebased and are only included here in
    case we want to discuss how they could be
    better/different/unnecessary, especially what's now 0002, rather than
    because I'm looking to get something committed right away.
    
    Thanks,
    
    -- 
    Robert Haas
    EDB: http://www.enterprisedb.com
    
  59. Re: plan shape work

    Tom Lane <tgl@sss.pgh.pa.us> — 2025-09-24T22:03:19Z

    Robert Haas <robertmhaas@gmail.com> writes:
    > Here's a new patch set. 0004 is now 0001 and similarly all other patch
    > numbers are -3, since the old 0001 and 0002 were committed together
    > and the 0003 is abandoned. I made the following changes to
    > old-0004/new-0001:
    
    I've not looked at all of these patches, but here's a review of v9-0001.
    
    > - I rewrote the commit message. I'm not really sure this is any
    > clearer about the motivation for this patch, but I tried. Suggestions
    > appreciated.
    
    It's much better, thanks.
    
    > - You (Tom) complained about the lack of const on
    > sublinktype_to_string, so this version has been const-ified. The const
    > bled into the arguments to choose_plan_name() and subquery_planner(),
    > and into the plan_name structure members within PlannerInfo and
    > SubPlan. I don't know if this is the right thing to do, so feel free
    > to set me straight.
    
    I don't think so.  We do not have a nice story on marking Node fields
    const: it's very unclear for example what consequences that ought to
    have for copyObject().  Maybe somebody will tackle that issue someday,
    but it's not something to touch casually in a patch with other
    objectives.  So I don't think we can make the plan_name fields const.
    The best solution I think is to make choose_plan_name() take a const
    string and return a non-const one.  The attached v10-0001 is just like
    your v9-0001 except for doing the const stuff this way.  I chose to
    fix the impedance mismatch within choose_plan_name() by having it
    pstrdup when it wants to just return the "name" string, but you could
    make a case for holding your nose and just casting away const there.
    
    > - You (Tom) also asked why not print InitPlan/SubPlan wherever we
    > refer to subplans, so this version restores that behavior.
    
    Thanks.  I'm good with the output now (modulo the bug described
    below).  Someone could potentially argue that this exposes more
    of the internals than we really ought to, such as the difference
    between expr and multiexpr SubLinks, but I'm okay with that.
    
    Aside from the const issue, something I don't really like at the
    coding level is the use of an "allroots" list.  One reason is that
    it's partially redundant with the adjacent "subroots" list, but
    a bigger one is that we have transient roots that shouldn't be
    in there.  An example here is pull_up_simple_subquery: it builds
    a clone of the query's PlannerInfo to help it use various
    infrastructure along the way to flattening the subquery, but
    that clone is not referenced anymore after the function exits.
    You were putting that into allroots, which seems to me to be
    a fundamental error, even more so because it went in with the
    same plan_name as the root it was cloned from.
    
    I think a better idea is to keep a list of just the subplan
    names that we've assigned so far.  That has a far clearer
    charter, plus it can be updated immediately by choose_plan_name()
    instead of relying on the caller to do the right thing later.
    I coded this up, and was rather surprised to find that it changed
    some regression outputs.  On investigation, that's because
    build_minmax_path() was actually doing the wrong thing later:
    it was putting the wrong root into allroots, so that "minmax_1"
    never became assigned and could be re-used later.
    
    I also observed that SS_process_ctes() was not using
    choose_plan_name() but simply assigning the user-written CTE
    name.  I believe it's possible to use the same CTE name in
    different parts of a query tree, so this fails to achieve
    the stated purpose of making the names unique.
    
    I'm still a little bit uncomfortable about whether
    it's okay for pull_up_simple_subquery() to just do
    
    +	subroot->plan_name = root->plan_name;
    
    rather than giving some other name to the transient subroot.
    I think it's okay because we are not making any meaningful planning
    decisions during the life of the subroot, just seeing if we can
    transform the subquery into a form that allows it to be pulled up.
    But you might think differently.  Perhaps a potential compromise
    is to set the transient subroot's plan_name to NULL instead?
    
    Anyway, v10-0002 is a delta patch to use a list of subplan
    names instead of "allroots", and there are a couple of trivial
    cosmetic changes too.
    
    			regards, tom lane
    
    
  60. Re: plan shape work

    Robert Haas <robertmhaas@gmail.com> — 2025-09-25T13:10:54Z

    On Wed, Sep 24, 2025 at 6:03 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    > I don't think so.  We do not have a nice story on marking Node fields
    > const: it's very unclear for example what consequences that ought to
    > have for copyObject().  Maybe somebody will tackle that issue someday,
    > but it's not something to touch casually in a patch with other
    > objectives.  So I don't think we can make the plan_name fields const.
    > The best solution I think is to make choose_plan_name() take a const
    > string and return a non-const one.  The attached v10-0001 is just like
    > your v9-0001 except for doing the const stuff this way.  I chose to
    > fix the impedance mismatch within choose_plan_name() by having it
    > pstrdup when it wants to just return the "name" string, but you could
    > make a case for holding your nose and just casting away const there.
    
    Yeah, these are the kinds of trade-offs I never know how to make. I
    have a little difficulty believing that it's worth a pstrdup() to have
    the benefit of marking things const, but maybe it is, and the
    difference is probably microscopic either way. It's not like we're
    going to be faced with many planning problems involving gigantic
    numbers of subqueries. If this were a hot code path we'd want to think
    harder.
    
    > > - You (Tom) also asked why not print InitPlan/SubPlan wherever we
    > > refer to subplans, so this version restores that behavior.
    >
    > Thanks.  I'm good with the output now (modulo the bug described
    > below).  Someone could potentially argue that this exposes more
    > of the internals than we really ought to, such as the difference
    > between expr and multiexpr SubLinks, but I'm okay with that.
    
    I actually thought it looked kind of nice exposing some of that
    detail, but it's certainly a judgement call.
    
    > Aside from the const issue, something I don't really like at the
    > coding level is the use of an "allroots" list.  One reason is that
    > it's partially redundant with the adjacent "subroots" list, but
    > a bigger one is that we have transient roots that shouldn't be
    > in there.  An example here is pull_up_simple_subquery: it builds
    > a clone of the query's PlannerInfo to help it use various
    > infrastructure along the way to flattening the subquery, but
    > that clone is not referenced anymore after the function exits.
    > You were putting that into allroots, which seems to me to be
    > a fundamental error, even more so because it went in with the
    > same plan_name as the root it was cloned from.
    >
    > I think a better idea is to keep a list of just the subplan
    > names that we've assigned so far.  That has a far clearer
    > charter, plus it can be updated immediately by choose_plan_name()
    > instead of relying on the caller to do the right thing later.
    > I coded this up, and was rather surprised to find that it changed
    > some regression outputs.  On investigation, that's because
    > build_minmax_path() was actually doing the wrong thing later:
    > it was putting the wrong root into allroots, so that "minmax_1"
    > never became assigned and could be re-used later.
    
    Ooph, that's embarrassing. I think the reason that I ended up making a
    list of the roots themselves rather than the strings is that I was
    thinking that everything in this data structure would need to be a
    node, and I didn't want to cons up String nodes for every list
    element. Then later I marked that structure member read_write_ignore
    and never stopped to think that maybe then we didn't need nodes at
    all. So, in short, I think this is a great solution and thanks a ton
    for putting in the legwork to figure it out.
    
    > I also observed that SS_process_ctes() was not using
    > choose_plan_name() but simply assigning the user-written CTE
    > name.  I believe it's possible to use the same CTE name in
    > different parts of a query tree, so this fails to achieve
    > the stated purpose of making the names unique.
    
    Ouch, that's another good catch. Also, even if the CTEs had to be
    unique among themselves, there could be a collision between a CTE name
    and a generated name.
    
    > I'm still a little bit uncomfortable about whether
    > it's okay for pull_up_simple_subquery() to just do
    >
    > +       subroot->plan_name = root->plan_name;
    >
    > rather than giving some other name to the transient subroot.
    > I think it's okay because we are not making any meaningful planning
    > decisions during the life of the subroot, just seeing if we can
    > transform the subquery into a form that allows it to be pulled up.
    > But you might think differently.  Perhaps a potential compromise
    > is to set the transient subroot's plan_name to NULL instead?
    
    I don't like NULL. Imagine that some meaningful planning decision does
    get made during the life of the subroot, and imagine further that some
    hook is called by means of which some extension can influence that
    decision. What plan_name do we want that extension to see? I think
    there's some argument for letting it see the plan name of the parent
    plan into which we're thinking about inlining the subquery, which I
    believe is the effect of the current coding, and there's perhaps also
    an argument for having a wholly new plan name there just to really
    identify that particular decision clearly, but if we put NULL, that's
    the name we use for the topmost query level. If we changed things so
    that the top query level gets called "main" or some other constant
    string, then it would make sense to use NULL as a sentinel value here
    to mean "undefined," but I kind of think we probably don't want to go
    there.
    
    > Anyway, v10-0002 is a delta patch to use a list of subplan
    > names instead of "allroots", and there are a couple of trivial
    > cosmetic changes too.
    
    OK, I'll try out these versions and let you know what I find out. Thanks much.
    
    -- 
    Robert Haas
    EDB: http://www.enterprisedb.com
    
    
    
    
  61. Re: plan shape work

    Tom Lane <tgl@sss.pgh.pa.us> — 2025-09-25T13:43:24Z

    Robert Haas <robertmhaas@gmail.com> writes:
    > On Wed, Sep 24, 2025 at 6:03 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    >> I think a better idea is to keep a list of just the subplan
    >> names that we've assigned so far.  That has a far clearer
    >> charter, plus it can be updated immediately by choose_plan_name()
    >> instead of relying on the caller to do the right thing later.
    >> I coded this up, and was rather surprised to find that it changed
    >> some regression outputs.  On investigation, that's because
    >> build_minmax_path() was actually doing the wrong thing later:
    >> it was putting the wrong root into allroots, so that "minmax_1"
    >> never became assigned and could be re-used later.
    
    > Ooph, that's embarrassing. I think the reason that I ended up making a
    > list of the roots themselves rather than the strings is that I was
    > thinking that everything in this data structure would need to be a
    > node, and I didn't want to cons up String nodes for every list
    > element. Then later I marked that structure member read_write_ignore
    > and never stopped to think that maybe then we didn't need nodes at
    > all. So, in short, I think this is a great solution and thanks a ton
    > for putting in the legwork to figure it out.
    
    I think if we do decide later that the list-of-names needs to be
    Nodes, then converting them to String nodes is a perfectly fine
    solution.  As you remark elsewhere, none of this is going to be
    more than microscopic compared to the overall cost of planning
    a subplan.  But for right now, yeah, we don't need it.
    
    Anyway, it seems the only remaining issue is what name
    pull_up_simple_subquery should give to its transient root clone.
    I don't actually believe that it matters, so if you're content
    with re-using the parent name, let's just roll with that until
    someone complains.
    
    			regards, tom lane
    
    
    
    
  62. Re: plan shape work

    Alexandra Wang <alexandra.wang.oss@gmail.com> — 2025-09-26T01:20:24Z

    Hi there,
    
    I've tried v10-000{1,2}+v9-0002 and v9-000{1,2}. I was curious whether
    the names choose_plan_name() chose for subqueries match the Subquery
    Scan names in the EXPLAIN plan. My guess is that since the former is
    chosen before planning and the latter after planning, they might
    differ. I think it's probably ok to have different naming mechanisms
    as long as the names are unique within themselves. But in case anyone
    else cares about the naming inconsistency, here's an example that
    shows it.
    
    -- applied patches
    I've applied v10-0001, v10-0002 and v9-0002. I needed v9-0002 because
    I want to see the plan_names in the debug plan and in the
    EXPLAIN(RANGE_TABLE) plan with pg_overexplain.
    (Applying v9-000{1,2} instead should give the same results)
    
    -- setup
    CREATE TABLE r (a int, b int);
    CREATE TABLE s (c int, d int);
    LOAD 'pg_overexplain';
    SET debug_print_plan to on;
    SET client_min_messages to 'log';
    
    -- query
    EXPLAIN (range_table, costs off)
    SELECT *
    FROM
      (SELECT a FROM
         (SELECT a, b FROM r WHERE b > 42 ORDER BY a)
       UNION ALL
         (SELECT c FROM
    (SELECT c, d FROM s WHERE d > 24 ORDER BY d)));
    
    -- plan
                      QUERY PLAN
    ----------------------------------------------
     Append
       Append RTIs: 1
       ->  Subquery Scan on unnamed_subquery_1
             Scan RTI: 4
             ->  Sort
                   Sort Key: r.a
                   ->  Seq Scan on r
                         Filter: (b > 42)
                         Scan RTI: 6
       ->  Subquery Scan on unnamed_subquery_2
             Scan RTI: 5
             ->  Sort
                   Sort Key: s.d
                   ->  Seq Scan on s
                         Filter: (d > 24)
                         Scan RTI: 7
     RTI 1 (subquery, inherited, in-from-clause):
       Eref: unnamed_subquery (a)
     RTI 2 (subquery):
       Eref: unnamed_subquery (a)
     RTI 3 (subquery):
       Eref: unnamed_subquery (c)
     RTI 4 (subquery, in-from-clause):
       Eref: unnamed_subquery (a, b)
     RTI 5 (subquery, in-from-clause):
       Eref: unnamed_subquery (c, d)
     RTI 6 (relation, in-from-clause):
       Subplan: unnamed_subquery
       Eref: r (a, b)
       Relation: r
       Relation Kind: relation
       Relation Lock Mode: AccessShareLock
       Permission Info Index: 1
     RTI 7 (relation, in-from-clause):
       Subplan: unnamed_subquery_1
       Eref: s (c, d)
       Relation: s
       Relation Kind: relation
       Relation Lock Mode: AccessShareLock
       Permission Info Index: 2
     Unprunable RTIs: 6 7
    (41 rows)
    
    -- interesting part of the debug plan:
       :subplans <>
       :subrtinfos (
          {SUBPLANRTINFO
          :plan_name unnamed_subquery
          :rtoffset 5
          :dummy false
          }
          {SUBPLANRTINFO
          :plan_name unnamed_subquery_1
          :rtoffset 6
          :dummy false
          }
    
    It appears that in the EXPLAIN plan the subqueries are named
    "unnamed_subquery" (does not show up in the EXPLAIN output),
    "unnamed_subquery_1", and "unnamed_subquery_2"; whereas in the RTIs
    section from pg_overexplain, as well as in the debug plan's
    :subrtinfos section, the subplans are named "unnamed_subquery" and
    "unnamed_subquery_1".
    
    IIUC, the Subquery Scan names in the query plan, for example:
       ->  Subquery Scan on unnamed_subquery_2
    is the name assigned to this Subquery Scan node of RTI: 5, after
    planning.
    
    And the Subplan name of an RTI with pg_overexplain, for example:
     RTI 7 (relation, in-from-clause):
       Subplan: unnamed_subquery_1
       Eref: s (c, d)
    is the Subplan name chosen before planning. RTI 7 here maps to the
    SUBPLANRTINFO with ":rtoffset 6" in the debug plan, which means it
    belongs to the Subplan named "unnamed_subquery_1". This is what I
    think causes confusion, because from the query plan we see that RTI 7
    is under the Subquery Scan on "unnamed_subquery_2", not
    "unnamed_subquery_1".
    
    I think technically this is not a problem, since we can uniquely
    identify the Subplans using the names assigned before planning, and we
    can also uniquely identify the Subquery Scans in the EXPLAIN plan
    using the names assigned after planning. Still, I found it a bit
    confusing when looking at the EXPLAIN(RANGE_TABLE) output, where the
    same name "unnamed_subquery_1" not only doesn't mean the same plan
    node, but also not in the same branch of the plan tree.
    
    Thoughts?
    
    Best,
    Alex
    
  63. Re: plan shape work

    Robert Haas <robertmhaas@gmail.com> — 2025-09-26T01:40:56Z

    On Thu, Sep 25, 2025 at 9:21 PM Alexandra Wang
    <alexandra.wang.oss@gmail.com> wrote:
    > I've tried v10-000{1,2}+v9-0002 and v9-000{1,2}. I was curious whether
    > the names choose_plan_name() chose for subqueries match the Subquery
    > Scan names in the EXPLAIN plan. My guess is that since the former is
    > chosen before planning and the latter after planning, they might
    > differ. I think it's probably ok to have different naming mechanisms
    > as long as the names are unique within themselves. But in case anyone
    > else cares about the naming inconsistency, here's an example that
    > shows it.
    
    Yeah. Technically, these are not the same names: one set of names is
    the names assigned to the subqueries, and the other is the set of
    names assigned to the relations that get scanned by subquery scans.
    This may seem like a technicality, but that's not entirely the case,
    because in each case the chosen names are unique. If, for example,
    there were a table named unnamed_subquery that we were
    subquery-scanning, then you couldn't also have a subquery scan of that
    table, but you could still have a subquery with that name.
    
    But maybe there's still some way to improve this. It would probably be
    hard to make it perfect because of the fact that EXPLAIN names are
    unique across all relations in the query, as noted above. However, we
    might be able to make it so the names match in the absence of name
    conflicts with user specified aliases or table names. Or maybe we
    should consider some larger change to the EXPLAIN format so that we
    display subquery names instead of relation names.
    
    -- 
    Robert Haas
    EDB: http://www.enterprisedb.com
    
    
    
    
  64. Re: plan shape work

    Tom Lane <tgl@sss.pgh.pa.us> — 2025-09-26T01:53:30Z

    Robert Haas <robertmhaas@gmail.com> writes:
    > But maybe there's still some way to improve this. It would probably be
    > hard to make it perfect because of the fact that EXPLAIN names are
    > unique across all relations in the query, as noted above. However, we
    > might be able to make it so the names match in the absence of name
    > conflicts with user specified aliases or table names. Or maybe we
    > should consider some larger change to the EXPLAIN format so that we
    > display subquery names instead of relation names.
    
    I'm kind of down on that last idea, because relation names/aliases
    are user-supplied or at least user-suppliable, whereas what
    choose_plan_name() generates is not under user control.  So if we
    try to make this match, it should be in the direction of using
    the parent query's subquery alias whenever possible.  But I fear
    that it'll be hard to make this match up exactly unless we choose
    to take the selection of unique relation aliases out of EXPLAIN
    altogether and make the planner do it.  Which seems like a bad
    idea, because then we're paying that cost all the time, and
    it's not small for big queries.
    
    			regards, tom lane
    
    
    
    
  65. Re: plan shape work

    Richard Guo <guofenglinux@gmail.com> — 2025-09-26T02:26:16Z

    FWIW, I'm a bit concerned about the double for loop inside
    choose_plan_name(), especially since the outer loop runs with a true
    condition.  Maybe I'm just worrying over nothing, as we probably don't
    expect a large number of subroots in practice, but the nested loops
    still make me a little uneasy.
    
    - Richard
    
    
    
    
  66. Re: plan shape work

    Tom Lane <tgl@sss.pgh.pa.us> — 2025-09-26T02:37:20Z

    Richard Guo <guofenglinux@gmail.com> writes:
    > FWIW, I'm a bit concerned about the double for loop inside
    > choose_plan_name(), especially since the outer loop runs with a true
    > condition.  Maybe I'm just worrying over nothing, as we probably don't
    > expect a large number of subroots in practice, but the nested loops
    > still make me a little uneasy.
    
    I really doubt that a query could have enough subplans to make
    that a problem.  But if I'm wrong, it's surely something we could
    improve in a localized way later.
    
    			regards, tom lane
    
    
    
    
  67. Re: plan shape work

    Richard Guo <guofenglinux@gmail.com> — 2025-09-26T06:29:38Z

    On Fri, Sep 26, 2025 at 11:37 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    > Richard Guo <guofenglinux@gmail.com> writes:
    > > FWIW, I'm a bit concerned about the double for loop inside
    > > choose_plan_name(), especially since the outer loop runs with a true
    > > condition.  Maybe I'm just worrying over nothing, as we probably don't
    > > expect a large number of subroots in practice, but the nested loops
    > > still make me a little uneasy.
    
    > I really doubt that a query could have enough subplans to make
    > that a problem.  But if I'm wrong, it's surely something we could
    > improve in a localized way later.
    
    I'm concerned not only about the potential for a large number of
    subplans but also because if there happens to be a bug within the
    nested loops, the always-true condition in the outer loop could cause
    an infinite loop.  However, if you're confident that the code inside
    the loop is completely bug-free and will remain so through future
    changes, then this shouldn't be an issue.
    
    Looking at choose_plan_name(), IIUC, the nested loop is used to find
    the next unused suffix number for a given name.  I'm wondering why not
    simply iterate through glob->subplanNames once, check the suffix
    number for each name matching the given base name, determine the
    current maximum suffix, and then use "max_suffix + 1" as the next
    unused suffix.  This approach requires only a single pass through the
    list, and if there's a bug, the worst-case scenario would be a
    duplicate name rather than an infinite loop.  It seems to me that this
    approach is both more efficient and less risky.
    
    - Richard
    
    
    
    
  68. Re: plan shape work

    Tom Lane <tgl@sss.pgh.pa.us> — 2025-09-26T14:23:21Z

    Richard Guo <guofenglinux@gmail.com> writes:
    > Looking at choose_plan_name(), IIUC, the nested loop is used to find
    > the next unused suffix number for a given name.  I'm wondering why not
    > simply iterate through glob->subplanNames once, check the suffix
    > number for each name matching the given base name, determine the
    > current maximum suffix, and then use "max_suffix + 1" as the next
    > unused suffix.  This approach requires only a single pass through the
    > list, and if there's a bug, the worst-case scenario would be a
    > duplicate name rather than an infinite loop.  It seems to me that this
    > approach is both more efficient and less risky.
    
    "simply" is perhaps not the right adjective there.  My guess is that
    this approach nets out to more code, more possibilities for bugs
    (especially in cases where one name is a prefix of another), and
    will be slower in typical cases with just a few subplan names.
    
    As an example of edge cases that your idea introduces, what happens
    if a user-written subquery name is "expr_999999999999999999999999"
    and then we need to generate a unique name based on "expr"?  Now
    we have an integer-overflow situation to worry about, with possibly
    platform-dependent results.
    
    But it's Robert's patch, so he gets to make the call.
    
    			regards, tom lane
    
    
    
    
  69. Re: plan shape work

    Robert Haas <robertmhaas@gmail.com> — 2025-09-26T14:23:29Z

    On Fri, Sep 26, 2025 at 2:29 AM Richard Guo <guofenglinux@gmail.com> wrote:
    > Looking at choose_plan_name(), IIUC, the nested loop is used to find
    > the next unused suffix number for a given name.  I'm wondering why not
    > simply iterate through glob->subplanNames once, check the suffix
    > number for each name matching the given base name, determine the
    > current maximum suffix, and then use "max_suffix + 1" as the next
    > unused suffix.  This approach requires only a single pass through the
    > list, and if there's a bug, the worst-case scenario would be a
    > duplicate name rather than an infinite loop.  It seems to me that this
    > approach is both more efficient and less risky.
    
    I feel like the current coding is more straightforward and this should
    be considered in terms of the chance of having bugs. Doing as you
    propose here would require starting with a prefix match of the string
    and then attempting a string-to-integer conversion on the remaining
    bytes. That's certainly doable but such things tend to be a bit
    fiddly: you have to make sure you do the right thing when you see a
    non-digit and when the value overflows or is zero. It wouldn't be that
    hard to get it right, but I think it would be a little trickier than
    we have now. If we find that the performance cost of this function is
    too high in some scenario, replacing it within an implementation along
    these lines would make sense to me, but I am not too worried about the
    current logic accidentally causing an infinite loop. I don't think
    that will happen but even if it does it should be a simple fix.
    
    -- 
    Robert Haas
    EDB: http://www.enterprisedb.com
    
    
    
    
  70. Re: plan shape work

    Richard Guo <guofenglinux@gmail.com> — 2025-09-29T01:47:50Z

    On Fri, Sep 26, 2025 at 11:23 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    > Richard Guo <guofenglinux@gmail.com> writes:
    > > Looking at choose_plan_name(), IIUC, the nested loop is used to find
    > > the next unused suffix number for a given name.  I'm wondering why not
    > > simply iterate through glob->subplanNames once, check the suffix
    > > number for each name matching the given base name, determine the
    > > current maximum suffix, and then use "max_suffix + 1" as the next
    > > unused suffix.  This approach requires only a single pass through the
    > > list, and if there's a bug, the worst-case scenario would be a
    > > duplicate name rather than an infinite loop.  It seems to me that this
    > > approach is both more efficient and less risky.
    
    > "simply" is perhaps not the right adjective there.  My guess is that
    > this approach nets out to more code, more possibilities for bugs
    > (especially in cases where one name is a prefix of another), and
    > will be slower in typical cases with just a few subplan names.
    >
    > As an example of edge cases that your idea introduces, what happens
    > if a user-written subquery name is "expr_999999999999999999999999"
    > and then we need to generate a unique name based on "expr"?  Now
    > we have an integer-overflow situation to worry about, with possibly
    > platform-dependent results.
    
    I'd argue that this hypothetical edge case can be resolved with a bit
    of canonicalization in how subplan names are represented internally.
    I think the issue you mentioned arises because there is no clearly
    distinction between the base name and the numeric suffix.  I haven't
    spent much time thinking about it, but an off-the-cuff idea is to
    require that all subplan names in glob->subplanNames end with a suffix
    of the form "_<number>".  (If no numeric suffix is required, we can
    use the suffix "_0".)  With this convention, we can simply split
    on the last underscore: everything before it is the base name, and
    everything after is the numeric suffix.
    
    The user-written subquery name "expr_999999999999999999999999" would
    be internally represented as "expr_999999999999999999999999_0".  Then,
    when we need to generate a unique name based on "expr", it won't match
    with the base name of that subquery name.
    
    With this canonicalization in place, my proposed approach is simply a
    matter of applying strrchr(name, '_') and tracking the maximum suffix
    number in a single pass over glob->subplanNames.  I think this can be
    handled with straightforward, basic C code.  It seems to me that
    this could also eliminate the need for the additional loop under the
    "if (!always_number)" branch in choose_plan_name().
    
    By replacing a pass over the subplanNames list plus a nested loop with
    a single pass, I doubt that this would be slower in typical cases.
    
    - Richard
    
    
    
    
  71. Re: plan shape work

    Tom Lane <tgl@sss.pgh.pa.us> — 2025-09-29T02:41:47Z

    Richard Guo <guofenglinux@gmail.com> writes:
    > On Fri, Sep 26, 2025 at 11:23 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    >> As an example of edge cases that your idea introduces, what happens
    >> if a user-written subquery name is "expr_999999999999999999999999"
    >> and then we need to generate a unique name based on "expr"?  Now
    >> we have an integer-overflow situation to worry about, with possibly
    >> platform-dependent results.
    
    > I'd argue that this hypothetical edge case can be resolved with a bit
    > of canonicalization in how subplan names are represented internally.
    
    [ raised eyebrow... ]  How did you get to that from the complaint
    that Robert's patch was not obviously bug-free?  (A complaint I
    thought was unmerited, but nevermind.)  This proposal is neither
    simple, nor obviously bug-free.  Moreover, in view of comments
    upthread, I think we should look with great suspicion on any
    proposal that involves changing user-supplied subquery aliases
    unnecessarily.
    
    			regards, tom lane
    
    
    
    
  72. Re: plan shape work

    Richard Guo <guofenglinux@gmail.com> — 2025-09-29T08:52:18Z

    On Mon, Sep 29, 2025 at 11:41 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    > Richard Guo <guofenglinux@gmail.com> writes:
    > > On Fri, Sep 26, 2025 at 11:23 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    > >> As an example of edge cases that your idea introduces, what happens
    > >> if a user-written subquery name is "expr_999999999999999999999999"
    > >> and then we need to generate a unique name based on "expr"?  Now
    > >> we have an integer-overflow situation to worry about, with possibly
    > >> platform-dependent results.
    
    > > I'd argue that this hypothetical edge case can be resolved with a bit
    > > of canonicalization in how subplan names are represented internally.
    
    > [ raised eyebrow... ]  How did you get to that from the complaint
    > that Robert's patch was not obviously bug-free?  (A complaint I
    > thought was unmerited, but nevermind.)
    
    I'm not sure I fully understand your point here.  Apologies if I got
    it wrong.
    
    Firstly, my intention in the previous email was merely to propose a
    solution for my approach to address the edge case you raised.  I don't
    see how this relates to my so-called "complaint" about Robert's patch
    not being obviously bug-free.  You raised a case where my approach
    won't work, and I provided a solution to address it.  That's all.
    
    Secondly, I don't think it's fair to characterize my concern as a
    complaint when I expressed that the nested loop with an always-true
    condition is vulnerable to bugs and could potentially cause an
    infinite loop if such a bug exists.
    
    In a nearby thread, I was asked whether I can guarantee my code is
    100% bug-free.  After some consideration, I think I cannot make such a
    guarantee, and I doubt that anyone realistically can.  Given this, I
    think it's important that we try our best to write code that minimizes
    the potential bad-effect should a bug occur.
    
    Therefore, upon observing a nested loop with an always-true condition
    in the patch, I expressed my concern and suggested a possible
    improvement.  However, I did not expect that concern to be treated as
    an unmerited complaint.
    
    > This proposal is neither
    > simple, nor obviously bug-free.  Moreover, in view of comments
    > upthread, I think we should look with great suspicion on any
    > proposal that involves changing user-supplied subquery aliases
    > unnecessarily.
    
    It seems no one has attempted to code up the approach I suggested, so
    I went ahead and did it; please see the attached PoC patch.  It's just
    a proof of concept to show what I have in mind, so please excuse the
    lack of comments and necessary assertions for now.
    
    I agree that this implementation cannot be guaranteed to be bug-free,
    but I'm not sure I agree that it's not simple.  I'm also not convinced
    that it would be slower in typical cases.
    
    BTW, a small nit I just noticed: I suggest explicitly initializing
    glob->subplanNames in standard_planner().  It may be argued that this
    is pointless, as makeNode() zeroes all fields by default.  But AFAICS
    subplanNames is the only field in PlannerGlobal that is not explicitly
    initialized.
    
    - Richard
    
  73. Re: plan shape work

    Robert Haas <robertmhaas@gmail.com> — 2025-09-29T14:12:31Z

    On Mon, Sep 29, 2025 at 4:52 AM Richard Guo <guofenglinux@gmail.com> wrote:
    > It seems no one has attempted to code up the approach I suggested, so
    > I went ahead and did it; please see the attached PoC patch.  It's just
    > a proof of concept to show what I have in mind, so please excuse the
    > lack of comments and necessary assertions for now.
    
    I think that if there are subqueries named expr_1 and expr_3, this
    will assign the name expr_4 next, whereas the previous patch assigns
    the name expr_2. That might not be a bug, but it's different and I
    don't like it as well.
    
    I also think that if there are subqueries named expr_1a and expr_2a,
    this will assign the name expr_3 next, whereas the previous patch
    assigns the name expr_1. I would consider that a clear bug.
    
    Your code will also see expr_01 and decide that the next name should
    be expr_2 rather than expr_1. I don't think that's right, either.
    
    I also think that it's a bug that your function sometimes returns a
    value different from the one it appends to canon_names.
    
    I don't really understand why you're so fixed on this point. I think
    that the code as I wrote it is quite a normal way to write code for
    that kind of thing. Sure, there are other things that we could do, but
    I wrote the code that way I did precisely in order to avoid behaviors
    like the ones I mention above. It would be possible to rearrange the
    code so that the termination condition for the loop was something like
    "!found", or to rewrite the loop as a do { ... } while (!found)
    construct as we do in set_rtable_names() for a very similar problem to
    the code that this is solving, but I think the generated machine code
    would be exactly the same and the code would not look as intuitive for
    a human to read. Somebody else might have had a different stylistic
    preference, but if you are going to object every time I write for (;;)
    or while (1), you're going to hate an awful lot of my code for, IMHO,
    very little reason. It's reasonable to be concerned about whether a
    loop will ever terminate, but the mere fact of putting the loop exit
    someplace other than the top of the loop isn't enough to say that
    there's a problem.
    
    -- 
    Robert Haas
    EDB: http://www.enterprisedb.com
    
    
    
    
  74. Re: plan shape work

    Tom Lane <tgl@sss.pgh.pa.us> — 2025-09-29T14:27:57Z

    Robert Haas <robertmhaas@gmail.com> writes:
    > I don't really understand why you're so fixed on this point. I think
    > that the code as I wrote it is quite a normal way to write code for
    > that kind of thing.
    
    Also, we have numerous other places that generate de-duplicated names
    in pretty much this way (ruleutils.c's set_rtable_names being a very
    closely related case).  I don't think we should go inventing some
    random new way to do that.
    
    If it turns out that Robert's code is too slow in practice, I would
    prefer to deal with that by using a hashtable to keep track of
    already-allocated names, not by changing the user-visible behavior.
    I'm content to wait for field complaints before building such logic
    though, because I really doubt that queries would ever have so
    many subplans as to be a problem.
    
    			regards, tom lane
    
    
    
    
  75. Re: plan shape work

    Richard Guo <guofenglinux@gmail.com> — 2025-09-30T01:59:10Z

    On Mon, Sep 29, 2025 at 11:12 PM Robert Haas <robertmhaas@gmail.com> wrote:
    > On Mon, Sep 29, 2025 at 4:52 AM Richard Guo <guofenglinux@gmail.com> wrote:
    > > It seems no one has attempted to code up the approach I suggested, so
    > > I went ahead and did it; please see the attached PoC patch.  It's just
    > > a proof of concept to show what I have in mind, so please excuse the
    > > lack of comments and necessary assertions for now.
    
    > I think that if there are subqueries named expr_1 and expr_3, this
    > will assign the name expr_4 next, whereas the previous patch assigns
    > the name expr_2. That might not be a bug, but it's different and I
    > don't like it as well.
    >
    > I also think that if there are subqueries named expr_1a and expr_2a,
    > this will assign the name expr_3 next, whereas the previous patch
    > assigns the name expr_1. I would consider that a clear bug.
    >
    > Your code will also see expr_01 and decide that the next name should
    > be expr_2 rather than expr_1. I don't think that's right, either.
    >
    > I also think that it's a bug that your function sometimes returns a
    > value different from the one it appends to canon_names.
    
    I don't think any of the bugs you described upthread exist in the PoC
    patch.  The patch ensures that all names stored in glob->subplanNames
    are canonicalized to the format "$basename_$suffixnum".  If no numeric
    suffix is required, the name is stored with a "_0" suffix.  This
    guarantees a clear distinction between the base name and the numeric
    suffix for all names stored in the list.  (Please note that this
    canonicalization applies only to how names are stored internally, not
    to user-visible names.  So no user-visible behavior change here.)
    
    I don't really see how uncanonicalized names like expr_1a, expr_2a, or
    expr_01 would appear in the subplanNames list to begin with.  Perhaps
    you're referring to user-supplied subquery aliases?  In that case, the
    patch deliberately avoids matching expr to those names, since it only
    compares the base name.  As a result, it would assign expr_1 as the
    next name, which is the expected behavior.
    
    This PoC patch passes all the regression tests, which at least, IMHO,
    suggests that it avoids such basic bugs.
    
    However, since both you and Tom feel this proposal doesn't make
    sense, I'll withdraw it.  Apologies for any trouble this has caused.
    
    - Richard
    
    
    
    
  76. Re: plan shape work

    Robert Haas <robertmhaas@gmail.com> — 2025-10-06T19:13:59Z

    On Mon, Sep 29, 2025 at 9:59 PM Richard Guo <guofenglinux@gmail.com> wrote:
    > I don't think any of the bugs you described upthread exist in the PoC
    > patch.  The patch ensures that all names stored in glob->subplanNames
    > are canonicalized to the format "$basename_$suffixnum".  If no numeric
    > suffix is required, the name is stored with a "_0" suffix.  This
    > guarantees a clear distinction between the base name and the numeric
    > suffix for all names stored in the list.  (Please note that this
    > canonicalization applies only to how names are stored internally, not
    > to user-visible names.  So no user-visible behavior change here.)
    
    I studied your patch in more detail today and I find that you are
    correct. It doesn't do what I thought it did. However, it also doesn't
    guarantee uniqueness. Here is an example:
    
    robert.haas=# explain select (select random() limit 1), (select
    random() limit 2) from (select * from pg_class limit 1) expr_1;
    WARNING:  choose_plan_name for name "expr" returns "expr_1"
    WARNING:  choose_plan_name for name "expr" returns "expr_2"
    WARNING:  choose_plan_name for name "expr_1" returns "expr_1"
                                   QUERY PLAN
    -------------------------------------------------------------------------
     Subquery Scan on expr_1  (cost=0.03..0.08 rows=1 width=16)
       InitPlan expr_1
         ->  Limit  (cost=0.00..0.01 rows=1 width=8)
               ->  Result  (cost=0.00..0.01 rows=1 width=8)
       InitPlan expr_2
         ->  Limit  (cost=0.00..0.01 rows=1 width=8)
               ->  Result  (cost=0.00..0.01 rows=1 width=8)
       ->  Limit  (cost=0.00..0.04 rows=1 width=240)
             ->  Seq Scan on pg_class  (cost=0.00..18.16 rows=416 width=240)
    (9 rows)
    
    When I originally looked at the patch, I believed that the use of
    atoi() without sanity checking was going to cause the sorts of
    problems I was mentioning. That was wrong, since you never let a
    user-specified name leak directly into the list, but only names to
    which your code has added a numeric suffix. But that also means that
    you don't get conflicts between names that, in reality, should
    conflict.
    
    -- 
    Robert Haas
    EDB: http://www.enterprisedb.com
    
    
    
    
  77. Re: plan shape work

    Robert Haas <robertmhaas@gmail.com> — 2025-10-06T19:36:49Z

    I see that Richard's PoC last patch confused cfbot. Here's a new
    version of just the patch proposed for commit for CfBot testing.
    
  78. Re: plan shape work

    Richard Guo <guofenglinux@gmail.com> — 2025-10-09T02:24:49Z

    On Tue, Oct 7, 2025 at 4:37 AM Robert Haas <robertmhaas@gmail.com> wrote:
    > I see that Richard's PoC last patch confused cfbot. Here's a new
    > version of just the patch proposed for commit for CfBot testing.
    
    Does it make sense to explicitly initialize glob->subplanNames in
    standard_planner()?  I understand this might seem pointless since
    makeNode() zeroes all fields by default, but subplanNames is currently
    the only field in PlannerGlobal that isn't explicitly initialized.  I
    previously committed a patch (2c0ed86d3) to ensure all PlannerGlobal
    fields are explicitly initialized, and I'd prefer to maintain that
    consistency.
    
    I actually suggested the same in [1] (the last paragraph), but it
    seems to have been overlooked.
    
    [1] https://postgr.es/m/CAMbWs4-ysLvZiWp=w5=+noCMdX9FHFrrc0Wuk-TcUz1RDmEbkQ@mail.gmail.com
    
    - Richard
    
    
    
    
  79. Re: plan shape work

    Tom Lane <tgl@sss.pgh.pa.us> — 2025-10-09T03:04:37Z

    Richard Guo <guofenglinux@gmail.com> writes:
    > Does it make sense to explicitly initialize glob->subplanNames in
    > standard_planner()?  I understand this might seem pointless since
    > makeNode() zeroes all fields by default, but subplanNames is currently
    > the only field in PlannerGlobal that isn't explicitly initialized.  I
    > previously committed a patch (2c0ed86d3) to ensure all PlannerGlobal
    > fields are explicitly initialized, and I'd prefer to maintain that
    > consistency.
    
    We don't really have consensus on that point, I fear.  I like the
    initialize-em-all-explicitly approach, but some other senior hackers
    think it's useless verbiage.
    
    My argument for doing it explicitly is that when adding a new field
    to a struct, one frequently searches for existing references to a
    nearby field.  Without initialize-em-all, this risks missing places
    where you need to initialize your new field.  If you'd only set it
    to zero, then fine ... but what if that particular place needs some
    other initial value?  So I think omitting initializations-to-zero
    risks future bugs of omission.  Some other folk don't find that
    argument very compelling, though.
    
    			regards, tom lane
    
    
    
    
  80. Re: plan shape work

    Robert Haas <robertmhaas@gmail.com> — 2025-10-09T12:34:56Z

    On Wed, Oct 8, 2025 at 11:04 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    > We don't really have consensus on that point, I fear.  I like the
    > initialize-em-all-explicitly approach, but some other senior hackers
    > think it's useless verbiage.
    
    I'm in the "useless verbiage" camp.
    
    > My argument for doing it explicitly is that when adding a new field
    > to a struct, one frequently searches for existing references to a
    > nearby field.  Without initialize-em-all, this risks missing places
    > where you need to initialize your new field.  If you'd only set it
    > to zero, then fine ... but what if that particular place needs some
    > other initial value?  So I think omitting initializations-to-zero
    > risks future bugs of omission.  Some other folk don't find that
    > argument very compelling, though.
    
    This problem does not typically happen for me because it is my habit
    to start by grepping for makeNode(Whatever) -- or for relevant palloc
    calls, for non-Node types -- at the very start of my research into any
    topic, and only later to look into specific fields.
    
    There might be a consistency argument for doing this in this case, if
    other nearby fields are doing the same thing, and if one of you wants
    to go and make it so I have much better things to do than spend time
    complaining about it. But any code I write is likely to rely on
    makeNode() or palloc0() to zero fields wherever relying on such a
    thing is convenient, unless somebody forces me to do otherwise in a
    particular case.
    
    -- 
    Robert Haas
    EDB: http://www.enterprisedb.com
    
    
    
    
  81. Re: plan shape work

    Robert Haas <robertmhaas@gmail.com> — 2025-12-01T20:31:06Z

    On Wed, Sep 24, 2025 at 6:03 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    > Aside from the const issue, something I don't really like at the
    > coding level is the use of an "allroots" list.  One reason is that
    > it's partially redundant with the adjacent "subroots" list, but
    > a bigger one is that we have transient roots that shouldn't be
    > in there.  An example here is pull_up_simple_subquery: it builds
    > a clone of the query's PlannerInfo to help it use various
    > infrastructure along the way to flattening the subquery, but
    > that clone is not referenced anymore after the function exits.
    > You were putting that into allroots, which seems to me to be
    > a fundamental error, even more so because it went in with the
    > same plan_name as the root it was cloned from.
    >
    > I think a better idea is to keep a list of just the subplan
    > names that we've assigned so far.  That has a far clearer
    > charter, plus it can be updated immediately by choose_plan_name()
    > instead of relying on the caller to do the right thing later.
    > I coded this up, and was rather surprised to find that it changed
    > some regression outputs.  On investigation, that's because
    > build_minmax_path() was actually doing the wrong thing later:
    > it was putting the wrong root into allroots, so that "minmax_1"
    > never became assigned and could be re-used later.
    
    Today, I discovered a disadvantage of the change from allroots to
    subplanNames. The concern that you raise about transient roots still
    seems entirely valid to me. However, the allroots list - if correctly
    constructed to exclude such transient roots - seems to me to have
    potential utility for extensions that subplanNames doesn't. What I
    wanted to do was use planner_shutdown_hook to copy some information
    from each PlannerInfo to the extension_state field of the PlannedStmt
    and, without allroots, there's no easy way to find all of them.
    
    I do think there are other ways to solve that problem. For instance,
    we could add a hook that's called when we invoke subquery_planner,
    similar to the way that planner_hook can be used to intercept calls to
    planner(). The new hook could then do whatever it likes at the end of
    each call to subquery_planner(). That has the disadvantage of making
    each call to subquery_planner() a little more expensive, but it might
    turn out that such a hook has other utility.
    
    For what I was trying to do today, I can probably even solve it using
    set_join_pathlist_hook. I wanted to capture the relid sets of all
    baserels and joinrels where rel->unique_rel is non-NULL, and I think
    that I could do that by having set_join_pathlist_hook add notice when
    save_jointype is JOIN_UNIQUE_INNER or JOIN_UNIQUE_OUTER, and then add
    innerrel or outerrel as appropriate to a list hanging off an object
    attached using Get/SetPlannerGlobalExtensionState, making sure not to
    add the same one more than once. But that only works in this specific
    case, and it's pretty indirect even for that.
    
    So I'm wondering if we ought to step back and rethink a bit.
    subplanNames ensures that we assign a different name to every
    PlannerInfo, but it doesn't give you any help finding the PlannerInfo
    given the name, or enumerating all of the PlannerInfo objects that
    exist. If we went back to allroots, and just fixed the problem of
    temporary entries creeping into the list, the core code wouldn't
    really notice the difference, but I think extensions would have an
    easier time.
    
    Thoughts?
    
    -- 
    Robert Haas
    EDB: http://www.enterprisedb.com
    
    
    
    
  82. Re: plan shape work

    Robert Haas <robertmhaas@gmail.com> — 2025-12-08T19:06:35Z

    On Mon, Dec 1, 2025 at 3:31 PM Robert Haas <robertmhaas@gmail.com> wrote:
    > So I'm wondering if we ought to step back and rethink a bit.
    > subplanNames ensures that we assign a different name to every
    > PlannerInfo, but it doesn't give you any help finding the PlannerInfo
    > given the name, or enumerating all of the PlannerInfo objects that
    > exist. If we went back to allroots, and just fixed the problem of
    > temporary entries creeping into the list, the core code wouldn't
    > really notice the difference, but I think extensions would have an
    > easier time.
    >
    > Thoughts?
    
    Here's a couple of patches to bring back allroots. The first one adds
    allroots, makes choose_plan_name() use it, and adds assertions that
    the contents of allroots and subplanNames correspond. The second one
    removes subplanNames. I imagine that these would be collapsed for
    commit, but testing 0001 without 0002 seems useful for peace of mind.
    
    -- 
    Robert Haas
    EDB: http://www.enterprisedb.com