Thread

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →

Assign each subquery a unique name prior to planning it.
- 8c49a484e8eb 19 (unreleased) landed
Keep track of what RTIs a Result node is scanning.
- f2bae51dfd5b 19 (unreleased) landed

plan shape work

Robert Haas <robertmhaas@gmail.com> — 2025-05-19T18:01:55Z

Hi,

A couple of people at pgconf.dev seemed to want to know more about my
ongoing plan shape work, so here are the patches I have currently.
This is a long way from something that actually looks like a usable
feature, but these are bits of infrastructure that I think will be
necessary to get to a usable feature. As a recap, my overall goal here
is to make it so that you can examine a finished plan, figure out what
decisions the planner made, and then somehow get the planner to make
those same decisions over again in a future planning cycle. Since
doing this for all types of planner decisions seems too difficult for
an initial goal, I'm focusing on scans and joins for now. A further
goal is that I want it to be possible for extensions to use this
infrastructure to implement a variety of different policies that they
might feel to be beneficial, so I'm looking to minimize the amount of
stuff that has to be done in core PostgreSQL or can only be used by
core PostgreSQL.

So far I've identified two main problems. First, you need to be able
to reconstruct the planner decisions from the final plan, which you
almost can do already but we're missing a few key pieces of
information in the final plan tree. Second, you need to be able to
write those decisions down in a way that can be correctly and
unambiguously reinterpreted during a future planning cycle for the
same query. For example, if we say that the planner chose a sequential
scan of table foo, what exactly does that mean? Table foo could appear
multiple times in the query, either in different subqueries or the
same one, and it could be a partitioned table with a different scan
method for each partition.

Let's start by talking about problem #1. I've found two subproblems in
this area so far. The first is that a Result node does not store the
relids of the scan or join that it replaces. Note that a Result note
whose job is to apply a one-time filter or a projection to some
subordinate node is not an issue here -- we can just look through the
Result node to whatever scan or join is beneath it. The concern here
is about the case where a scan or join is proven empty and entirely
replaced by a Result node that has "One-Time Filter: false". Patch
0001 adds that field, and patch 0002 teaches ExplainPreScanNodes about
it, which results in a number of regression test output changes that I
personally consider to be improvements -- with these patches, we
properly qualify some column references with a table alias as EXPLAIN
does in all other cases, as opposed to printing them as bare column
names with no alias. Patch 0003 checks that this is the only problem
of this type that is visible at the stage where we are constructing
join paths.

Still talking about problem #1, the second subproblem I've identified
is that during setrefs processing, we elide trivial SubqueryScan,
Append, and MergeAppend nodes from the final plan. So during planning
we might see, for example, that a certain join is between RTI 4 and
RTI 5 and it's, say, a hash join. But after setrefs processing, it may
appear that RTI was joined to, say, RTI 13, which might not even have
been part of the same subquery level. If we record that we want to see
RTI 4 joined to RTI 13 via a hash join, that will be completely
useless -- the join planning code will never see those two RTIs as
options to be joined to each other. What I've done in 0006 is made it
so that we keep a record of each node elided during setrefs
processing. This list of elided nodes is stored in the PlannedStmt
outside of the portion of the tree that actually gets executed, so
that code that is doing plan tree inspection can look at it but
execution doesn't get any slower (except possibly by having to copy a
slightly larger amount of data around when reading and writing
PlannedStmt objects, which seems like it should be negligible).

Now let's talk about problem #2. I believe that we do not actually
want to refer to what happened to RTI 4 and RTI 5 as I mooted in the
previous paragraph, but rather to refer to relations by some kind of
name. However, we can't use the names shown in the EXPLAIN output,
because those are not really present in the plan and are only assigned
on-the-fly by EXPLAIN; hence, they can't be known at plan time. Since
planning proceeds one subquery at a time, I think the right way to
approach this problem is to first name the subquery and then to name
the table within that subquery. Subqueries sort of have names right
now, at least some of them, but it's an odd system: a CTE subquery,
for example, has the name mentioned by the user, but other kinds of
subplans just get names like "InitPlan 3" or "SubPlan 2". The real
problem, though, is that those names are only assigned after we've
FINISHED planned the subquery. If we begin planning our very first
subquery, it might turn out to be InitPlan 1 or SubPlan 1, or if while
planning it we recurse into some further subquery then *that* subquery
might become InitPlan 1 or SubPlan 1 and OUR subquery might become
InitPlan 2 or SubPlan 2 (or higher, if we find more subqueries and
recurse into them too). Thus, being given some information about how
the user wants, say, SubPlan 2 to be planned is completely useless
because we won't know whether that is us until after we've done the
planning that the user is trying to influence.

To solve that problem, I decided to arrange for every subquery to have
a unique name that is assigned before we begin planning it. Patch 0004
does this. Then, 0005 records those names in the final plan. That's
enough that you can look at the scanrelid (or apprelids, etc.) of a
Plan node and relate that back to a named subquery and a particular
RTI within that subquery. There is still the problem of how to name
relations within a single subquery, since it's possible to reuse
aliases within the same subquery level (simple cases are rejected, but
there are at least two ways to bypass the error check, and they look
intentional, so we can't just block it). Then, references to a certain
alias name can be further multiplied by inheritance expansion. This is
all a bit hairy but I haven't really found any fundamental problems
here that keep you from deciding on a workable naming convention.

Hope you find this interesting. If you do, let me know what you think.

Thanks,

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: plan shape work

Tomas Vondra <tomas@vondra.me> — 2025-05-20T10:05:05Z

On 5/19/25 20:01, Robert Haas wrote:
> Hi,
> 
> A couple of people at pgconf.dev seemed to want to know more about my
> ongoing plan shape work, so here are the patches I have currently.
> This is a long way from something that actually looks like a usable
> feature, but these are bits of infrastructure that I think will be
> necessary to get to a usable feature. As a recap, my overall goal here
> is to make it so that you can examine a finished plan, figure out what
> decisions the planner made, and then somehow get the planner to make
> those same decisions over again in a future planning cycle. Since
> doing this for all types of planner decisions seems too difficult for
> an initial goal, I'm focusing on scans and joins for now. A further
> goal is that I want it to be possible for extensions to use this
> infrastructure to implement a variety of different policies that they
> might feel to be beneficial, so I'm looking to minimize the amount of
> stuff that has to be done in core PostgreSQL or can only be used by
> core PostgreSQL.
> 
> ...

Thanks for the overview. I don't have any immediate feedback, but it
sounds like it might be related to the "making planner decisions clear"
session from the unconference ...

The basic premise of that session was about how to give users better
info about the planner decisions - why paths were selected/rejected,
etc. A simple example would be "why was the index not used", and the
possible answers include "dominated by cost by another path" or "does
not match the index keys" etc.

I wonder if this work might be useful for something like that.

regards

-- 
Tomas Vondra

Re: plan shape work

Maciek Sakrejda <maciek@pganalyze.com> — 2025-05-20T19:09:04Z

+1, this seems like it could be very useful. A somewhat related issue
is being able to tie plan nodes back to the query text: it can be hard
to understand the planner's decisions if it's not even clear what part
of the query it's making decisions about. I'm sure this is not an easy
problem in general, but I wonder if you think that could be improved
in the course of this work, or if you have other thoughts about it.

Thanks,
Maciek

Re: plan shape work

Robert Haas <robertmhaas@gmail.com> — 2025-05-21T13:58:51Z

On Tue, May 20, 2025 at 2:45 PM Tomas Vondra <tomas@vondra.me> wrote:
> Thanks for the overview. I don't have any immediate feedback, but it
> sounds like it might be related to the "making planner decisions clear"
> session from the unconference ...
>
> The basic premise of that session was about how to give users better
> info about the planner decisions - why paths were selected/rejected,
> etc. A simple example would be "why was the index not used", and the
> possible answers include "dominated by cost by another path" or "does
> not match the index keys" etc.
>
> I wonder if this work might be useful for something like that.

I've been wondering that, too. There's definitely some indirect ways
in which that might be the case. For example, I think this work would
lend itself to saying "hey, try planning this query, but for that
table over there, use an index scan on this table." Then, it either
still doesn't -- meaning the index isn't usable for some reason -- or
it does and you can see the resulting plan with presumably higher cost
and maybe infer why it didn't happen. That's better than today, where
we have only very crude tools that let us do things like disable an
entire scan type for the entire query, and I think it would make it a
lot easier and less frustrating for a knowledgeable user to figure out
why things are happening.

But even though I think that would be better than today, I'm not sure
it rises to the level of actually being good, because I think it still
requires a fairly knowledgeable operator to figure things out, and you
probably have to experiment a bunch to understand the situation
instead of, say, being able to just look at the EXPLAIN plan and see
the answer. I think being able to look at the EXPLAIN plan and see the
answer, without needing a bunch of poking around, would be the ideal
scenario here.

But in some sense this is the same problem as understanding how an AI
neural network is reasoning. The answer to "why did the planner pick
plan X" is always "X was the cheapest possible plan". Ideas like "we
chose a merge join because both tables are large enough that neither
would fit into a hash table conveniently" are human explanations of
why the math had the effect that it did; they are not how the planner
actually reasons. So it's not just a matter of exposing the actual
reasoning process to the user, because the computer is not reasoning
in a way that a human would. It would have to be a matter of exposing
some kind of other information that would allow the human being to
comprehend easily what led the machine's algorithm to a certain
conclusion; and it is not obvious how to get there.

I have a sense - possibly an incorrect one - that the core of the
problem here is that the planner considers lots of very similar
alternatives. A hypothetical feature that showed the second-cheapest
plan would be all but useless, because the second-cheapest plan would
just be a very minor variation of the cheapest plan in almost all
cases. One idea that crossed my mind was to display information in
EXPLAIN about what would have happened if we'd done something really
different. For instance, suppose that at a certain level of the plan
tree we actually chose a merge join, but we also show the estimated
cost of the cheapest hash join (if any) and the cheapest nested loop
(if any) that we considered at that level. The user might be able to
draw useful conclusions based on whether those numbers were altogether
absent (i.e. that join type was not viable at all) or whether the cost
was a little higher or a lot higher than that of the path actually
chosen. For scans, you could list which indexes were believed to be
usable and perhaps what the cost would have been for the cheapest one
not actually selected; and what the cost of a sequential scan would
have been if you hadn't picked one.

I'm not sure how useful this would be, so the whole idea might
actually suck, or maybe it's sort of the right idea but needs a bunch
of refinement to really be useful. I don't have a better idea right
now, though.

If there are any notes that were taken during that unconference
session, please point me in the right direction; I was in another
session at that time but would read any available notes with interest.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: plan shape work

Robert Haas <robertmhaas@gmail.com> — 2025-05-21T14:28:47Z

On Tue, May 20, 2025 at 3:09 PM Maciek Sakrejda <maciek@pganalyze.com> wrote:
> +1, this seems like it could be very useful. A somewhat related issue
> is being able to tie plan nodes back to the query text: it can be hard
> to understand the planner's decisions if it's not even clear what part
> of the query it's making decisions about. I'm sure this is not an easy
> problem in general, but I wonder if you think that could be improved
> in the course of this work, or if you have other thoughts about it.

Thanks. I don't really have any ideas about the problem you mention,
perhaps partly because I haven't experienced it too much. I mean, I
have sometimes been confused about which parts of the query go with
which parts of the EXPLAIN, but I think in my experience so far that
is mostly because either (1) both the query and the EXPLAIN output are
super long and maybe also super-wide and therefore it's hard to
correlate things by eye or (2) somebody wrote a query where they use
the same table and/or table alias over and over again in different
parts of the query and so it's hard to tell which reference goes with
which. Neither of those problems seems all that exciting to me from a
dev perspective: if you're calling everything a or x or orders or
something, maybe don't do that, and if your query is 1500 characters
long, I guess you need to budget some time to align that with the
query plan. I don't really know how much we can do here. But maybe
there are cases that I haven't seen where something better is
possible, or perhaps you have some good idea that I haven't
considered.

(If I'm honest, I do have an idea that I think might very
significantly improve the readability of EXPLAIN output. I think it
would make it much less wide in normal cases without making it much
longer. This has been percolating in my brain for a few years now and
I have the vague intention of proposing it at some point, but not
until I'm good and ready to be flamed to a well-done crisp, because
I'm quite sure there will be more than one opinion on the merits.)

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: plan shape work

Maciek Sakrejda <m.sakrejda@gmail.com> — 2025-05-21T16:02:13Z

On Wed, May 21, 2025 at 7:29 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Tue, May 20, 2025 at 3:09 PM Maciek Sakrejda <maciek@pganalyze.com> wrote:
> > +1, this seems like it could be very useful. A somewhat related issue
> > is being able to tie plan nodes back to the query text: it can be hard
> > to understand the planner's decisions if it's not even clear what part
> > of the query it's making decisions about. I'm sure this is not an easy
> > problem in general, but I wonder if you think that could be improved
> > in the course of this work, or if you have other thoughts about it.
>
> Thanks. I don't really have any ideas about the problem you mention,
> perhaps partly because I haven't experienced it too much.

That may be due to your extensive experience with Postgres and EXPLAIN plans.

> I mean, I
> have sometimes been confused about which parts of the query go with
> which parts of the EXPLAIN, but I think in my experience so far that
> is mostly because either (1) both the query and the EXPLAIN output are
> super long and maybe also super-wide and therefore it's hard to
> correlate things by eye or (2) somebody wrote a query where they use
> the same table and/or table alias over and over again in different
> parts of the query and so it's hard to tell which reference goes with
> which. Neither of those problems seems all that exciting to me from a
> dev perspective: if you're calling everything a or x or orders or
> something, maybe don't do that, and if your query is 1500 characters
> long, I guess you need to budget some time to align that with the
> query plan.

Fair enough, although the people trying to make sense of EXPLAIN plans
are sometimes not the same ones who are writing the queries. And
sometimes the queries are not written by people at all but by ORMs
(or—heaven help us—vibe coded). "Don't do X" is a reasonable response
to "It hurts when I do X," but it doesn't really solve the user's
problem. That said, it's hard to argue with "We don't have any good
ideas on how to improve this right now, and it's not a total dumpster
fire, so we'll focus on other work."

> I don't really know how much we can do here. But maybe
> there are cases that I haven't seen where something better is
> possible, or perhaps you have some good idea that I haven't
> considered.

No great ideas here. I thought initially that a good solution would be
to have structured EXPLAIN output include something like "Query Text
Start Index" and "Query Text End Index" fields for each node, but I
realized that this doesn't really work for multiple joins (and
probably other cases). Maybe "Query Text Indices", as a list of pairs?
But from the little I know about the planner, that seems like any sort
of tracking back to the source would be hard to implement. And it only
really solves the problem for external EXPLAIN viewers, and only ones
that put in the work to support this. I'm not sure if the problem can
be meaningfully addressed for text format, but maybe that's another
reason not to spend time on it in core.

> (If I'm honest, I do have an idea that I think might very
> significantly improve the readability of EXPLAIN output. I think it
> would make it much less wide in normal cases without making it much
> longer. This has been percolating in my brain for a few years now and
> I have the vague intention of proposing it at some point, but not
> until I'm good and ready to be flamed to a well-done crisp, because
> I'm quite sure there will be more than one opinion on the merits.)

I'm intrigued, and happy to stand by with an extinguisher. The road to
great ideas is paved with bad ideas.

Thanks,
Maciek

Re: plan shape work

Robert Haas <robertmhaas@gmail.com> — 2025-05-21T16:27:32Z

On Wed, May 21, 2025 at 12:03 PM Maciek Sakrejda <m.sakrejda@gmail.com> wrote:
> That may be due to your extensive experience with Postgres and EXPLAIN plans.

Yes, that is very possible. All things being equal, it helps to have
done something a lot of times.

> Fair enough, although the people trying to make sense of EXPLAIN plans
> are sometimes not the same ones who are writing the queries. And
> sometimes the queries are not written by people at all but by ORMs
> (or—heaven help us—vibe coded). "Don't do X" is a reasonable response
> to "It hurts when I do X," but it doesn't really solve the user's
> problem. That said, it's hard to argue with "We don't have any good
> ideas on how to improve this right now, and it's not a total dumpster
> fire, so we'll focus on other work."

+1 to all of that.

> No great ideas here. I thought initially that a good solution would be
> to have structured EXPLAIN output include something like "Query Text
> Start Index" and "Query Text End Index" fields for each node, but I
> realized that this doesn't really work for multiple joins (and
> probably other cases). Maybe "Query Text Indices", as a list of pairs?
> But from the little I know about the planner, that seems like any sort
> of tracking back to the source would be hard to implement. And it only
> really solves the problem for external EXPLAIN viewers, and only ones
> that put in the work to support this. I'm not sure if the problem can
> be meaningfully addressed for text format, but maybe that's another
> reason not to spend time on it in core.

I'm not gonna say you couldn't make something like that work, but it
sounds like a lot of effort for a hypothetical piece of external
visualization software that might or might not produce satisfying
results. My advice to anyone wanting to pursue this idea would be:
make a totally fake POC first. Get a sample query with at least a
moderately complex plan, get the EXPLAIN output, manually generate
whatever data you think PostgreSQL ought to be able to spit out, and
do a mock-up of an external viewer. When you're happy with the
results, show it to some other people and see if they also like it. We
can have the discussion about whether to include anything in core and
what it should be after that. I definitely would not rule out the
possibility that something like this could turn out to be really cool
-- maybe hovering over stuff and having the corresponding part of the
plan get highlighted will turn out to be awesome. But I think it might
also turn out that there are things where it's not quite clear what
you can or should usefully highlight, like target-list items, or for
example a case where the query says that a.x = b.x and b.x = c.x but
in the actual plan we use evaluate a.x = c.x, an expression not
appearing anywhere in the query text. The legwork of sorting some of
that kind of stuff out should really happen before making a feature
proposal.

> I'm intrigued, and happy to stand by with an extinguisher. The road to
> great ideas is paved with bad ideas.

Thanks. That proposal is a task for another day, but I appreciate the sentiment.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: plan shape work

Andy Fan <zhihuifan1213@163.com> — 2025-05-22T00:54:49Z

Robert Haas <robertmhaas@gmail.com> writes:

> Hi,
>
> ... As a recap, my overall goal here
> is to make it so that you can examine a finished plan, figure out what
> decisions the planner made, and then somehow get the planner to make
> those same decisions over again in a future planning cycle.

I am feeling that this is similar with Oracle's outline feature, where
the final plan is examined and then a series of hints are stored and
then during the replanning of the the same query, these hints will be
applied to planner.  If one of the hints is not appliable any more, like
the index is unusable, it is just ignored.

> This list of elided nodes is stored in the PlannedStmt

I did a quick check on the attached patches and I can see some more
information is added into PlannedStmt. then my question are the
PlannedStmt is not avaiable during the future planning cycle, then how
does these information would be helpful on the feture planning? and I'm
strange that there are little changes on the optimizer part. Does this
patchset just a preparation work for reconstructing a same plan in
future? 

> outside of the portion of the tree that actually gets executed, so
> that code that is doing plan tree inspection can look at it but
> execution doesn't get any slower.

Thank you for sharing this!

-- 
Best Regards
Andy Fan

Re: plan shape work

Andy Fan <zhihuifan1213@163.com> — 2025-05-22T08:14:23Z

Andy Fan <zhihuifan1213@163.com> writes:

>
>> This list of elided nodes is stored in the PlannedStmt
>
> I did a quick check on the attached patches and I can see some more
> information is added into PlannedStmt. then my question are the
> PlannedStmt is not avaiable during the future planning cycle, then how
> does these information would be helpful on the feture planning? and I'm
> strange that there are little changes on the optimizer part. Does this
> patchset just a preparation work for reconstructing a same plan in
> future? 

I'm sure it is a preliminary work for reconstructing a same plan in
future, sorry for this noise. 

-- 
Best Regards
Andy Fan

Re: plan shape work

Robert Haas <robertmhaas@gmail.com> — 2025-08-26T14:58:33Z

On Mon, May 19, 2025 at 2:01 PM Robert Haas <robertmhaas@gmail.com> wrote:
> A couple of people at pgconf.dev seemed to want to know more about my
> ongoing plan shape work, so here are the patches I have currently.

Here's an updated patch set. My goal for the September CommitFest is
to get patches 0001-0004 committed. Of course, if there are too many
objections or too little review, that might not happen, but that's my
goal.

This patch set is basically unchanged from the previous patch set,
except that I've added one new patch. 0007 records information about
Append node consolidation into the final plan tree. Without this, when
we build an AppendPath or MergeAppendPath and pull up the subpaths
from a similar underlying node, we can lose the RTIs from the
subordinate node, making it very difficult to analyze the plan after
the fact.

Just to remark a bit further on the structure of the patch set,
0001-0003 are closely related. The only one I really need committed in
order to move forward is 0001, but I think the others are a good idea.
There is probably room for some bikeshedding on the output produced by
0002. Then after that, 0004 stands alone as an incredibly important
and foundational patch: without it, there's no way to know what the
name of a subplan will be until after it's already been planned. I am
fairly confident in the approach that I've taken here, but it does
cause user-visible changes in EXPLAIN output about which people might
conceivably have strong opinions. Getting agreement either on what
I've done here or some variant of the approach is essential for me to
be able to move forward. Then, 0005-0007 all have to do with
preserving in the final plan various details that today would be
discarded at the end of planning. While I'm happy to have comments on
these now, I'm still not completely confident that I've found all
issues in this area or handled them perfectly; hence, I'm not in a
hurry to move forward with those just yet.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: plan shape work

Bruce Momjian <bruce@momjian.us> — 2025-08-26T20:16:34Z

On Tue, Aug 26, 2025 at 10:58:33AM -0400, Robert Haas wrote:
> During planning, there is one range table per subquery; at the end if
> planning, those separate range tables are flattened into a single
> range table. Prior to this change, it was impractical for code
> examining the final plan to understand which parts of the flattened
> range table came from which subquery's range table.
> 
> If the only consumer of the final plan is the executor, that is
> completely fine. However, if some code wants to examine the final
> plan, or what happens when we execute it, and extract information from
> it that be used in future planning cycles, it's inconvenient.

I am very interested in how plans can be used for future planning.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Do not let urgent matters crowd out time for investment in the future.

Re: plan shape work

Andrei Lepikhov <lepihov@gmail.com> — 2025-08-27T09:41:00Z

On 19/5/2025 20:01, Robert Haas wrote:
> Hope you find this interesting. If you do, let me know what you think.Thanks for looking in this direction!

Since 2017, we designed features that should 'memorise' the experience
of previous executions, checking the PlanState tree and instrumentation
after execution.
The first dumb prototype you should know - AQO. The following, more
robust code is 'replan' [1]. These two features utilise the core patch,
and I hope this patch can be removed after a few adjustments to the core.
In fact, 'replan' is used to stop execution in the middle (if the
time/memory usage limit is achieved), pass through the state, collect
valuable data and execute again. That's why using data for the
subsequent execution needs a matching query tree, and that data may be
irrelevant for a different set of constants.

The lessons learned during design and after some usage in real life:
1. Subplans should be uniquely identified during the planning. Like you
mentioned, two similar subplans on the same query level may be executed
differently and provide different row numbers to the 'knowledge base'. I
used the subquery_planner hack that generated a unique ID for each subplan.
2. Each node should be identified - I used a kind of signature at each
RelOptInfo node - just a hash generated by restrictinfos and underlying
signatures. This approach needs a create_plan hook to copy the signature
to the Plan nodes.
3. A great source of fluctuations is 'never executed' nodes - because
depending on the constant, the subtree may never be executed or produce
tons of tuples - I resolved it by just ignoring 'never executed'
results, no in-core changes needed.
4. A tree of paths may implement a single RelOptInfo. I have saved the
signature at the top of the Plan node for the corresponding RelOptInfo
and have never encountered any problems yet. It limits the use of
gathered data on cardinality/group number/peak memory consumption
somewhat, but not significantly.

I think the node extension fields and hooks, debated in the thread [2],
may be enough to enable extensions to implement such a feature.

What's more, I personally prefer to have a hook that allows extension
checking of a condition during execution - it may be done inside the
ExecProcNode() by calling the node-specific hook, which an extension may
initialise in the PlanState structure before the start of execution.
Additionally, it would be beneficial to have a hook for error processing
at the top of the portal execution code - this is a key point for
managing query resources. If we need to stop and re-execute the query,
it is the only place where we can release all the resources assigned.
One more helpful thing - an option to postpone receiver sending data out
of the instance-side, even result headings. We may want to decide on the
query plan after some tuples are produced, and if something has already
been sent to the user, we can't just stop and rebuild the plan.

[1] https://postgrespro.com/docs/enterprise/16/realtime-query-replanning
[2]
https://www.postgresql.org/message-id/flat/CA%2BTgmoYxfg90rw13%2BJcYwn4dwSC%2Bagw7o8-A%2BfA3M0fh96pg8w%40mail.gmail.com

--
regards, Andrei Lepikhov

Re: plan shape work

Alexandra Wang <alexandra.wang.oss@gmail.com> — 2025-08-28T17:22:06Z

Hi Robert,

On Tue, Aug 26, 2025 at 7:59 AM Robert Haas <robertmhaas@gmail.com> wrote:

> On Mon, May 19, 2025 at 2:01 PM Robert Haas <robertmhaas@gmail.com> wrote:
> > A couple of people at pgconf.dev seemed to want to know more about my
> > ongoing plan shape work, so here are the patches I have currently.
>
> Here's an updated patch set. My goal for the September CommitFest is
> to get patches 0001-0004 committed. Of course, if there are too many
> objections or too little review, that might not happen, but that's my
> goal.
>

Thanks for the patches!

I don’t know enough about the history in this area to object to your
approach or suggest an alternative design. That said, I’ve reviewed
patches 0001-0004, and as individual patches they make sense to me.

Below are some more detailed comments, which would only be relevant if
you decide to proceed in this direction.

0002:

                         QUERY PLAN
 ----------------------------------------------------------
  Nested Loop Left Join
-   Output: t1.i, (1), t2.i2, i3, t4.i4
+   Output: t1.i, (1), t2.i2, t3.i3, t4.i4
    ->  Nested Loop Left Join
-         Output: t1.i, t2.i2, (1), i3
+         Output: t1.i, t2.i2, (1), t3.i3
          Join Filter: false
          ->  Hash Left Join
                Output: t1.i, t2.i2, (1)

These plan changes introduced by 0002, which adds schema qualifiers,
make me very happy. I think it’s a nice improvement on its own.

In reply to 0002's commit message:
> XXX. I have broken this out as a separate commit for now; however,
> it could be merged with the commit to add 'relids' to 'Result'; or
> the patch series could even be rejiggered to present this as the
> primary benefit of that change, leaving the EXPLAIN changes as a
> secondary benefit, instead of the current organization, which does
> the reverse.

I’m fine with the current organization. I agree that if we just
compare the EXPLAIN changes in 0001, which add additional “Replaces:”
information for the simple Result node, with the EXPLAIN changes in
0002, the changes in 0002 are arguably more attractive. However, I
think the EXPLAIN changes in 0001 are a more direct reflection of what
the rest of 0001 is trying to achieve: keeping track of the RTIs a
Result node is scanning. The changes in 0002 feel more like a side
benefit.

With that said, this is just my personal preference. If other
reviewers feel differently, I won’t object.

0003:

In get_scanned_rtindexes():
+       case T_NestLoop:
+           {
+               Bitmapset  *outer_scanrelids;
+               Bitmapset  *inner_scanrelids;
+               Bitmapset  *combined_scanrelids;
+
+               outer_scanrelids =
+                   get_scanned_rtindexes(root, plan->lefttree);
+               inner_scanrelids =
+                   get_scanned_rtindexes(root, plan->righttree);
+               combined_scanrelids =
+                   bms_union(outer_scanrelids, inner_scanrelids);
+               inner_scanrelids = remove_join_rtis(root, inner_scanrelids);
+
+               return remove_join_rtis(root, combined_scanrelids);
+               break;
+           }

It looks like there is an redundant assignment to inner_scanrelids:
+               inner_scanrelids = remove_join_rtis(root, inner_scanrelids);

0004:

There is a compiler warning reported in the CommitFest build:
https://cirrus-ci.com/task/6248981396717568

[23:03:57.811] subselect.c: In function ‘sublinktype_to_string’:
[23:03:57.811] subselect.c:3232:1: error: control reaches end of non-void
function [-Werror=return-type]
[23:03:57.811] 3232 | }
[23:03:57.811] | ^
[23:03:57.811] cc1: all warnings being treated as errors
[23:03:57.812] make[4]: *** [<builtin>: subselect.o] Error 1
[23:03:57.812] make[3]: *** [../../../src/backend/common.mk:37:
plan-recursive] Error 2
[23:03:57.812] make[2]: *** [common.mk:37: optimizer-recursive] Error 2
[23:03:57.812] make[2]: *** Waiting for unfinished jobs....
[23:04:05.513] make[1]: *** [Makefile:42: all-backend-recurse] Error 2
[23:04:05.514] make: *** [GNUmakefile:21: world-bin-src-recurse] Error 2

You might want to add a return to get rid of the warning.

Still in 0004:
--- a/src/backend/optimizer/plan/planagg.c
+++ b/src/backend/optimizer/plan/planagg.c
@@ -339,6 +340,8 @@ build_minmax_path(PlannerInfo *root, MinMaxAggInfo
*mminfo,
    memcpy(subroot, root, sizeof(PlannerInfo));
    subroot->query_level++;
    subroot->parent_root = root;
+   subroot->plan_name = choose_plan_name(root->glob, "minmax", true);
+
    /* reset subplan-related stuff */
    subroot->plan_params = NIL;
    subroot->outer_params = NULL;
@@ -359,6 +362,9 @@ build_minmax_path(PlannerInfo *root, MinMaxAggInfo
*mminfo,
    /* and we haven't created PlaceHolderInfos, either */
    Assert(subroot->placeholder_list == NIL);

+   /* Add this to list of all PlannerInfo objects. */
+   root->glob->allroots = lappend(root->glob->allroots, root);
+

In the last diff, it should add "subroot" instead of "root" to the
list of all PlannerInfos. Currently, if there are multiple minmax
expressions, we end up with the following plan showing duplicate
names:

test=# explain (costs off) SELECT MIN(value), MAX(value) FROM test_minmax;
                                           QUERY PLAN
-------------------------------------------------------------------------------------------------
 Result
   Replaces: Aggregate
   InitPlan minmax_1
     ->  Limit
           ->  Index Only Scan using test_minmax_value_idx on test_minmax
                 Index Cond: (value IS NOT NULL)
   InitPlan minmax_1
     ->  Limit
           ->  Index Only Scan Backward using test_minmax_value_idx on
test_minmax test_minmax_1
                 Index Cond: (value IS NOT NULL)
(10 rows)

Best,
Alex

Re: plan shape work

Robert Haas <robertmhaas@gmail.com> — 2025-09-02T20:07:17Z

On Thu, Aug 28, 2025 at 1:22 PM Alexandra Wang
<alexandra.wang.oss@gmail.com> wrote:
> Thanks for the patches!

Thanks for the review. Responding just briefly to avoid quoting too much text:

- I'm glad to hear that you like 0002 and consider it an improvement
independent of what follows.
- I'm glad to hear that you're OK with the current split between 0001 and 0002.
- I would like opinions on those topics from more people.
- I have attempted to fix all of the other mistakes that you pointed
out in the attached v3, which is also rebased.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: plan shape work

Richard Guo <guofenglinux@gmail.com> — 2025-09-04T08:21:10Z

On Wed, Sep 3, 2025 at 5:07 AM Robert Haas <robertmhaas@gmail.com> wrote:
> Thanks for the review. Responding just briefly to avoid quoting too much text:
>
> - I'm glad to hear that you like 0002 and consider it an improvement
> independent of what follows.
> - I'm glad to hear that you're OK with the current split between 0001 and 0002.
> - I would like opinions on those topics from more people.
> - I have attempted to fix all of the other mistakes that you pointed
> out in the attached v3, which is also rebased.

I've reviewed 0001 to 0003, and they all make sense to me.  The
changes to the EXPLAIN output in 0001 and 0002 are very nice
improvements.

I found some issues with 0003 though.  It seems get_scanned_rtindexes
is intended to return RTI sets with outer join relids excluded.  For
some node types, such as Append and MergeAppend, it fails to do so,
which can cause the assertion in assert_join_preserves_scan_rtis to
fail.  For example:

create table p (a int, b int) partition by range(a);
create table p1 partition of p for values from (0) to (10);
create table p2 partition of p for values from (10) to (20);

set enable_partitionwise_join to on;

explain (costs off)
select * from p t1
    left join p t2 on t1.a = t2.a
    left join p t3 on t2.b = t3.b;
server closed the connection unexpectedly

Besides, to exclude outer join relids, it iterates over the RTI sets,
checks each RTE for type RTE_JOIN, and bms_del_member it if found (cf.
remove_join_rtis).  I think a simpler approach would be to leverage
PlannerInfo.outer_join_rels:

    scanrelids = bms_difference(scanrelids, root->outer_join_rels);

Therefore, I suggest that we don't try to remove the join RTIs in
get_scanned_rtindexes.  Instead, we do that in
assert_join_preserves_scan_rtis -- before comparing the RTIs from the
outer and inner subplans with the join's RTIs -- by leveraging
PlannerInfo.outer_join_rels.  And remove_join_rtis can be retired.

- Richard

Re: plan shape work

Robert Haas <robertmhaas@gmail.com> — 2025-09-05T15:51:07Z

On Thu, Sep 4, 2025 at 4:21 AM Richard Guo <guofenglinux@gmail.com> wrote:
> I found some issues with 0003 though.  It seems get_scanned_rtindexes
> is intended to return RTI sets with outer join relids excluded.  For
> some node types, such as Append and MergeAppend, it fails to do so,
> which can cause the assertion in assert_join_preserves_scan_rtis to
> fail.  For example:
>
> create table p (a int, b int) partition by range(a);
> create table p1 partition of p for values from (0) to (10);
> create table p2 partition of p for values from (10) to (20);
>
> set enable_partitionwise_join to on;
>
> explain (costs off)
> select * from p t1
>     left join p t2 on t1.a = t2.a
>     left join p t3 on t2.b = t3.b;
> server closed the connection unexpectedly

Ouch. Good catch.

> Besides, to exclude outer join relids, it iterates over the RTI sets,
> checks each RTE for type RTE_JOIN, and bms_del_member it if found (cf.
> remove_join_rtis).  I think a simpler approach would be to leverage
> PlannerInfo.outer_join_rels:
>
>     scanrelids = bms_difference(scanrelids, root->outer_join_rels);

I was not aware of outer_join_rels, so thank you for pointing it out.
However, consider this query:

select 1 from pg_class a inner join pg_class b on a.relfilenode = b.relfilenode;

Here, we end up with a three-item range table: one for a, one for b,
and one for the join. But the join is not an outer join, and does not
appear in root->outer_join_rels. Therefore, I'm not sure we can rely
on outer_join_rels in this scenario.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: plan shape work

Tom Lane <tgl@sss.pgh.pa.us> — 2025-09-05T16:00:55Z

Robert Haas <robertmhaas@gmail.com> writes:
> I was not aware of outer_join_rels, so thank you for pointing it out.
> However, consider this query:

> select 1 from pg_class a inner join pg_class b on a.relfilenode = b.relfilenode;

> Here, we end up with a three-item range table: one for a, one for b,
> and one for the join. But the join is not an outer join, and does not
> appear in root->outer_join_rels. Therefore, I'm not sure we can rely
> on outer_join_rels in this scenario.

Plain (not-outer) joins will never be included in a relid set in the
first place.

			regards, tom lane

Re: plan shape work

Robert Haas <robertmhaas@gmail.com> — 2025-09-05T16:40:22Z

On Fri, Sep 5, 2025 at 12:00 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Plain (not-outer) joins will never be included in a relid set in the
> first place.

Ah, well then Richard's idea might work! Let me try it and see what happens...

Thanks,

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: plan shape work

Robert Haas <robertmhaas@gmail.com> — 2025-09-05T17:38:43Z

On Fri, Sep 5, 2025 at 12:40 PM Robert Haas <robertmhaas@gmail.com> wrote:
> Ah, well then Richard's idea might work! Let me try it and see what happens...

Here's the result. Seem to work OK.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: plan shape work

Dilip Kumar <dilipbalaut@gmail.com> — 2025-09-08T05:05:32Z

On Wed, May 21, 2025 at 7:29 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Tue, May 20, 2025 at 2:45 PM Tomas Vondra <tomas@vondra.me> wrote:

> I have a sense - possibly an incorrect one - that the core of the
> problem here is that the planner considers lots of very similar
> alternatives. A hypothetical feature that showed the second-cheapest
> plan would be all but useless, because the second-cheapest plan would
> just be a very minor variation of the cheapest plan in almost all
> cases. One idea that crossed my mind was to display information in
> EXPLAIN about what would have happened if we'd done something really
> different. For instance, suppose that at a certain level of the plan
> tree we actually chose a merge join, but we also show the estimated
> cost of the cheapest hash join (if any) and the cheapest nested loop
> (if any) that we considered at that level. The user might be able to
> draw useful conclusions based on whether those numbers were altogether
> absent (i.e. that join type was not viable at all) or whether the cost
> was a little higher or a lot higher than that of the path actually
> chosen. For scans, you could list which indexes were believed to be
> usable and perhaps what the cost would have been for the cheapest one
> not actually selected; and what the cost of a sequential scan would
> have been if you hadn't picked one.
>
> I'm not sure how useful this would be, so the whole idea might
> actually suck, or maybe it's sort of the right idea but needs a bunch
> of refinement to really be useful. I don't have a better idea right
> now, though.

Having detailed information on the costs of alternative join
methods/scan method, even when a different method is chosen, would be
valuable information. For example, if a merge join is selected for
tables t1 and t2 in a subquery, showing the estimated costs for both a
hash join and a nested loop join would provide a more complete picture
of the planner's decision-making process.

And I believe, this information would be particularly useful if the
cost of a non-selected plan, such as a nested loop join, is very close
to the cost of the chosen merge join. In such cases, a database
administrator or query optimizer could use this insight to manually
override the planner's choice and opt for the nested loop join for
specific tables in a subquery. This level of detail would empower
users to fine-tune query performance and explore alternative execution
strategies.

IIUC, one of the goal of this work is where operator can say I want to
use this scan method while scanning a particular table in a particular
subquery, that means if the planner can give the information about non
selected paths as well then it would be really helpful in making this
process more smooth otherwise without much information on what path
got rejected its very hard to provide hints.

-- 
Regards,
Dilip Kumar
Google

Re: plan shape work

Richard Guo <guofenglinux@gmail.com> — 2025-09-08T09:51:33Z

On Sat, Sep 6, 2025 at 1:00 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > I was not aware of outer_join_rels, so thank you for pointing it out.
> > However, consider this query:
>
> > select 1 from pg_class a inner join pg_class b on a.relfilenode = b.relfilenode;
>
> > Here, we end up with a three-item range table: one for a, one for b,
> > and one for the join. But the join is not an outer join, and does not
> > appear in root->outer_join_rels. Therefore, I'm not sure we can rely
> > on outer_join_rels in this scenario.

> Plain (not-outer) joins will never be included in a relid set in the
> first place.

Exactly.  Non-outer joins wouldn't cause Vars to become null, so we
never include them in the joinrel's relids.

BTW, I'm wondering if we can take outer join relids into account in
assert_join_preserves_scan_rtis(), which could make the check more
useful.  A joinrel's relids consists of three parts: the outer plan's
relids, the inner plan's relids, and the relids of outer joins that
are calculated at this join.  We already have the first two.  If we
can find a way to determine the third, we'd be able to assert that:

 outer_relids U inner_relids U outerjoin_relids == joinrel->relids

Determining the third part can be tricky though, especially due to
outer-join identity 3: the "outerjoin_relids" of one outer join might
include more than one outer join relids.  But I think this is till
doable.

(This may not be useful for your overall goal in this patchset, so
feel free to ignore it if it's not of interest.)

- Richard

Re: plan shape work

Robert Haas <robertmhaas@gmail.com> — 2025-09-08T13:56:18Z

On Mon, Sep 8, 2025 at 5:51 AM Richard Guo <guofenglinux@gmail.com> wrote:
> > Plain (not-outer) joins will never be included in a relid set in the
> > first place.
>
> Exactly.  Non-outer joins wouldn't cause Vars to become null, so we
> never include them in the joinrel's relids.

OK. I didn't understand what the rules were there.

> BTW, I'm wondering if we can take outer join relids into account in
> assert_join_preserves_scan_rtis(), which could make the check more
> useful.  A joinrel's relids consists of three parts: the outer plan's
> relids, the inner plan's relids, and the relids of outer joins that
> are calculated at this join.  We already have the first two.  If we
> can find a way to determine the third, we'd be able to assert that:
>
>  outer_relids U inner_relids U outerjoin_relids == joinrel->relids
>
> Determining the third part can be tricky though, especially due to
> outer-join identity 3: the "outerjoin_relids" of one outer join might
> include more than one outer join relids.  But I think this is till
> doable.
>
> (This may not be useful for your overall goal in this patchset, so
> feel free to ignore it if it's not of interest.)

I don't mind doing the work if there's a reasonable and useful way of
accomplishing the goal. However, one concern I have is that it seems
pointless if we're computing outerjoin_relids by essentially redoing
the same computation that set the join's relid set in the first place.
In that case, the cross-check has no real probative value. All it
would be demonstrating is that if you calculate outerjoin_relids twice
using essentially the same methodology, you get the same answer. That
seems like a waste of code to me. If there's a way to calculate
outerjoin_relids using a different methodology than what we used when
populating the joinrelids, that would be interesting. It would be
similar to how the existing code recomputes the outer and inner relids
in a way that can potentially find issues that otherwise would not
have been spotted (such as the Result node case).

Do you have a proposal?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: plan shape work

Richard Guo <guofenglinux@gmail.com> — 2025-09-09T02:22:40Z

On Mon, Sep 8, 2025 at 10:56 PM Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Sep 8, 2025 at 5:51 AM Richard Guo <guofenglinux@gmail.com> wrote:
> > BTW, I'm wondering if we can take outer join relids into account in
> > assert_join_preserves_scan_rtis(), which could make the check more
> > useful.  A joinrel's relids consists of three parts: the outer plan's
> > relids, the inner plan's relids, and the relids of outer joins that
> > are calculated at this join.  We already have the first two.  If we
> > can find a way to determine the third, we'd be able to assert that:
> >
> >  outer_relids U inner_relids U outerjoin_relids == joinrel->relids
> >
> > Determining the third part can be tricky though, especially due to
> > outer-join identity 3: the "outerjoin_relids" of one outer join might
> > include more than one outer join relids.  But I think this is till
> > doable.
> >
> > (This may not be useful for your overall goal in this patchset, so
> > feel free to ignore it if it's not of interest.)

> I don't mind doing the work if there's a reasonable and useful way of
> accomplishing the goal. However, one concern I have is that it seems
> pointless if we're computing outerjoin_relids by essentially redoing
> the same computation that set the join's relid set in the first place.
> In that case, the cross-check has no real probative value. All it
> would be demonstrating is that if you calculate outerjoin_relids twice
> using essentially the same methodology, you get the same answer. That
> seems like a waste of code to me. If there's a way to calculate
> outerjoin_relids using a different methodology than what we used when
> populating the joinrelids, that would be interesting. It would be
> similar to how the existing code recomputes the outer and inner relids
> in a way that can potentially find issues that otherwise would not
> have been spotted (such as the Result node case).
>
> Do you have a proposal?

One idea (not fully thought through) is that we record the calculated
outerjoin_relids for each outer join in its JoinPaths.  (We cannot
store this in the joinrel's RelOptInfo because it varies depending on
the join sequence we use.)  And then we could use the recorded
outerjoin_relids for the assertion here:

 outer_relids U inner_relids U joinpath->ojrelids == joinrel->relids

The value of this approach, IMO, is that it could help verify the
correctness of how we compute outer joins' outerjoin_relids, ie. the
logic in add_outer_joins_to_relids(), which is quite complex due to
outer-join identity 3.  If we miscalculate the outerjoin_relids for
one certain outer join, this assertion could catch it effectively.

However, this shouldn't be a requirement for committing your patches.
Maybe we should discuss it in a separate thread.

- Richard

Re: plan shape work

Robert Haas <robertmhaas@gmail.com> — 2025-09-09T13:17:51Z

First of all, as an administrative note, since both you and Alex seem
to like 0001 and 0002 and no suggestions for improvement have been
offered, I plan to commit those soon unless there are objections or
additional review comments. I will likely do the same for 0003 as
well, pending the results of the current conversation, but maybe not
quite as quickly. I believe that 0004 still needs more review, and its
effects will be more user-visible than 0001-0003, so I don't plan to
move forward with that immediately, but I invite review comments.

On Mon, Sep 8, 2025 at 10:22 PM Richard Guo <guofenglinux@gmail.com> wrote:
> One idea (not fully thought through) is that we record the calculated
> outerjoin_relids for each outer join in its JoinPaths.  (We cannot
> store this in the joinrel's RelOptInfo because it varies depending on
> the join sequence we use.)  And then we could use the recorded
> outerjoin_relids for the assertion here:
>
>  outer_relids U inner_relids U joinpath->ojrelids == joinrel->relids
>
> The value of this approach, IMO, is that it could help verify the
> correctness of how we compute outer joins' outerjoin_relids, ie. the
> logic in add_outer_joins_to_relids(), which is quite complex due to
> outer-join identity 3.  If we miscalculate the outerjoin_relids for
> one certain outer join, this assertion could catch it effectively.
>
> However, this shouldn't be a requirement for committing your patches.
> Maybe we should discuss it in a separate thread.

I'm OK with moving the conversation to a separate thread, but can you
clarify from where you believe that joinpath->ojrelids would be
populated? It seems to me that the assertion couldn't pass unless
every join path ended up with the same value of joinpath->ojrelids.
That's because, for a given joinrel, there is only one value of
joinrel->relids; and all of those RTIs must be either RTE_JOIN or
non-RTE_JOIN. The non-RTE_JOIN RTIs will be found only in outer_relids
U inner_relids, and the RTE_JOIN RTIs will be found only in
joinpath->ojrelids. Therefore, it seems impossible for the assertion
to pass unless the value is the same for all join paths. If that is
correct, then I don't think we should store the value in the join
path. Instead, if we want to cross-check it, we could calculate the
value that would have been stored into joinpath->ojrelids at whatever
earlier stage we had the information available to do so, and it should
be equal to bms_intersect(joinrel->relids, root->outer_join_rels),
which I think would have to be already initialized before we can think
of building a join path.

Please feel free to correct me if I am misunderstanding.

Thanks,

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: plan shape work

Tom Lane <tgl@sss.pgh.pa.us> — 2025-09-09T15:12:51Z

Robert Haas <robertmhaas@gmail.com> writes:
> First of all, as an administrative note, since both you and Alex seem
> to like 0001 and 0002 and no suggestions for improvement have been
> offered, I plan to commit those soon unless there are objections or
> additional review comments.

FWIW, I don't love the details of 0001: I think it's going in the
right direction, but could use more polish.  In particular, you've
defined Result.relids in a way that seems ambiguous.  There are two
different meanings for NULL, and one of them is being interpreted as
an "Aggregate" without a lot of principle behind that.  I think you
need to store more data in order to make that less of a hack.

So far as I can see from the regression-test changes, the "Aggregate"
case only occurs when we've replaced a aggregate calculation with
a MinMaxAggPath representing an index endpoint probe.  What I would
like to see come out when that's the case is something like

	Replaces: MIN or MAX aggregate over scan on tab1

This means first that the Result.relids needs to include the relid of
the table being scanned by the indexscan, and second that EXPLAIN will
then need some other cue to help it distinguish this case from a
case where it should just say "Replaces: Scan on tab1".  It's possible
that you could intuit that by examining the initplans attached to the
Result node, but I think what would make a ton more sense is to add
an enum field to Result that explicitly identifies why it's there.
We've got at least "apply one-time filter to subplan", "apply per-row
gating filter to subplan", "represent a relation proven empty", and
"place-hold for a MinMaxAgg InitPlan".  Tracing back from all the
calls to make_result() might identify some more cases.  I'm not
arguing that the user-visible EXPLAIN output should distinguish
all of these (but probably overexplain should).  I think though
that it'd be useful to have this recorded in the plan tree.

> On Mon, Sep 8, 2025 at 10:22 PM Richard Guo <guofenglinux@gmail.com> wrote:
>> One idea (not fully thought through) is that we record the calculated
>> outerjoin_relids for each outer join in its JoinPaths.

> I'm OK with moving the conversation to a separate thread, but can you
> clarify from where you believe that joinpath->ojrelids would be
> populated? It seems to me that the assertion couldn't pass unless
> every join path ended up with the same value of joinpath->ojrelids.

What I have been intending to suggest is that you should add a field
to join plan nodes that is zero if an inner join, but the relid of
the outer join RTE if it's an outer join.  This is uniquely defined
because any given join node implements a specific outer join, even
though the planner's relids for the completed join are (intentionally)
ambiguous about the order in which multiple joins were done.

The reason I wanted to do this is that I think it'd become possible to
tighten the assertions in setrefs.c about whether Vars' varnullingrels
are correct, so that we can always assert that those relid sets are
exactly thus-and-so and not have to settle for superset/subset tests.
I've not worked through the details to be entirely sure that this is
possible, so I didn't bring it up before.  But maybe labeling join
nodes this way would also address Richard's concern.  In any case
it fits into your overall goal of decorating plan trees with more
information.

			regards, tom lane

Re: plan shape work

Robert Haas <robertmhaas@gmail.com> — 2025-09-09T16:00:26Z

On Tue, Sep 9, 2025 at 11:12 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I think what would make a ton more sense is to add
> an enum field to Result that explicitly identifies why it's there.
> We've got at least "apply one-time filter to subplan", "apply per-row
> gating filter to subplan", "represent a relation proven empty", and
> "place-hold for a MinMaxAgg InitPlan".

Thanks, I'll look into this.

> What I have been intending to suggest is that you should add a field
> to join plan nodes that is zero if an inner join, but the relid of
> the outer join RTE if it's an outer join.  This is uniquely defined
> because any given join node implements a specific outer join, even
> though the planner's relids for the completed join are (intentionally)
> ambiguous about the order in which multiple joins were done.
>
> The reason I wanted to do this is that I think it'd become possible to
> tighten the assertions in setrefs.c about whether Vars' varnullingrels
> are correct, so that we can always assert that those relid sets are
> exactly thus-and-so and not have to settle for superset/subset tests.
> I've not worked through the details to be entirely sure that this is
> possible, so I didn't bring it up before.  But maybe labeling join
> nodes this way would also address Richard's concern.  In any case
> it fits into your overall goal of decorating plan trees with more
> information.

Oh, that seems quite elegant! Then we could reasonably expect to
re-find all the relevant RTIs and no others with an appropriate tree
traversal.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: plan shape work

Robert Haas <robertmhaas@gmail.com> — 2025-09-09T19:43:12Z

On Tue, Sep 9, 2025 at 12:00 PM Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Sep 9, 2025 at 11:12 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > I think what would make a ton more sense is to add
> > an enum field to Result that explicitly identifies why it's there.
> > We've got at least "apply one-time filter to subplan", "apply per-row
> > gating filter to subplan", "represent a relation proven empty", and
> > "place-hold for a MinMaxAgg InitPlan".
>
> Thanks, I'll look into this.

Just a random thought, but another idea that crossed my mind here at
one point was to actually split the Result node up into Result nodes
with subplans and Result nodes without subplans. We could call the
version with a subplan "Project" and the version without a subplan
"Result", for example. This seems a little silly because both variants
would need to be able to handle resconstantqual, or alternatively we'd
have to be OK with getting "Project" on top of "Result" in some cases
where a single "Result" node currently does both jobs. On the other
hand, only Project needs a subplan, and only Result needs relids.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: plan shape work

Tom Lane <tgl@sss.pgh.pa.us> — 2025-09-09T20:37:38Z

Robert Haas <robertmhaas@gmail.com> writes:
> Just a random thought, but another idea that crossed my mind here at
> one point was to actually split the Result node up into Result nodes
> with subplans and Result nodes without subplans. We could call the
> version with a subplan "Project" and the version without a subplan
> "Result", for example. This seems a little silly because both variants
> would need to be able to handle resconstantqual, or alternatively we'd
> have to be OK with getting "Project" on top of "Result" in some cases
> where a single "Result" node currently does both jobs. On the other
> hand, only Project needs a subplan, and only Result needs relids.

Maybe.  I kinda feel that actually redesigning plan trees is outside
the scope of this project, but maybe we should consider it.

I think though that you might be underestimating the amount of
commonality.  For instance, we might need a projecting node on top of
a subquery-in-FROM subplan, but that node would still have to bear
a relid --- the relid of the RTE_SUBQUERY RTE in the upper query,
not that of any RTE in the subquery, but nonetheless it's a relid.

Another variant of this is that that RTE_SUBQUERY relid would normally
be borne by a SubqueryScan plan node, but if we elide the SubqueryScan
because it isn't doing anything useful, where shall we put that relid?
If we don't store it anywhere then we will not be able to reconstruct
correct join relids for the upper plan level.

			regards, tom lane

Re: plan shape work

Richard Guo <guofenglinux@gmail.com> — 2025-09-10T07:16:29Z

On Tue, Sep 9, 2025 at 10:18 PM Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Sep 8, 2025 at 10:22 PM Richard Guo <guofenglinux@gmail.com> wrote:
> > One idea (not fully thought through) is that we record the calculated
> > outerjoin_relids for each outer join in its JoinPaths.  (We cannot
> > store this in the joinrel's RelOptInfo because it varies depending on
> > the join sequence we use.)  And then we could use the recorded
> > outerjoin_relids for the assertion here:
> >
> >  outer_relids U inner_relids U joinpath->ojrelids == joinrel->relids

> I'm OK with moving the conversation to a separate thread, but can you
> clarify from where you believe that joinpath->ojrelids would be
> populated? It seems to me that the assertion couldn't pass unless
> every join path ended up with the same value of joinpath->ojrelids.
> That's because, for a given joinrel, there is only one value of
> joinrel->relids; and all of those RTIs must be either RTE_JOIN or
> non-RTE_JOIN. The non-RTE_JOIN RTIs will be found only in outer_relids
> U inner_relids, and the RTE_JOIN RTIs will be found only in
> joinpath->ojrelids. Therefore, it seems impossible for the assertion
> to pass unless the value is the same for all join paths.

Hmm, this isn't quite what I had in mind.  What I was thinking is that
the outer join relids included in joinrel->relids can also be found
from its outer or inner.  For example, consider a query like:

  (A leftjoin B on (Pab)) leftjoin C on (Pbc)

For the join with joinrel->relids being {1, 2, 3, 4, 5}, {1, 2, 3}
comes from the outer side, {4} comes from the inner side, and {5} is
the outer join being calculated at this join.  So the Assert I
proposed earlier becomes:

  {1, 2, 3} U {4} U {5} == {1, 2, 3, 4, 5}

However, if we have transformed it to:

  A leftjoin (B leftjoin C on (Pbc)) on (Pab)

For this same join, {1} comes from the outer side, {2, 4} comes from
the inner side, and {3, 5} are the outer joins being calculated at
this join.  So the Assert becomes:

  {1} U {2, 4} U {3, 5} == {1, 2, 3, 4, 5}

Either way, the assertion should always hold -- if it doesn't, there's
likely a bug in how we're calculating the relids.

As you can see, the set of outer joins calculated at the same join can
vary depending on the join order.  What I suggested is to record this
information in JoinPaths (or maybe also in Join plan nodes so that
get_scanned_rtindexes can collect it) for the assertion.

- Richard

Re: plan shape work

Robert Haas <robertmhaas@gmail.com> — 2025-09-10T14:11:54Z

On Wed, Sep 10, 2025 at 3:16 AM Richard Guo <guofenglinux@gmail.com> wrote:
> Hmm, this isn't quite what I had in mind.  What I was thinking is that
> the outer join relids included in joinrel->relids can also be found
> from its outer or inner.  For example, consider a query like:
>
>   (A leftjoin B on (Pab)) leftjoin C on (Pbc)
>
> For the join with joinrel->relids being {1, 2, 3, 4, 5}, {1, 2, 3}
> comes from the outer side, {4} comes from the inner side, and {5} is
> the outer join being calculated at this join.  So the Assert I
> proposed earlier becomes:
>
>   {1, 2, 3} U {4} U {5} == {1, 2, 3, 4, 5}

Makes sense.

> However, if we have transformed it to:
>
>   A leftjoin (B leftjoin C on (Pbc)) on (Pab)
>
> For this same join, {1} comes from the outer side, {2, 4} comes from
> the inner side, and {3, 5} are the outer joins being calculated at
> this join.  So the Assert becomes:
>
>   {1} U {2, 4} U {3, 5} == {1, 2, 3, 4, 5}

Hmm. As I understood it, Tom was proposing a single, optional ojrelid
for each join, with all outer joins having a value and no inner join
having one. What you are proposing seems to be a very similar concept,
but as I understand it, you're saying that each join would carry a
*set* of ojrelids, which might be empty and might contain more than
one element.

Experimenting with this example, it looks like you're correct and Tom,
or my understanding of Tom, is incorrect. What I see is that I get a
structure like this:

   {MERGEPATH
   :parent_relids (b 1 2 3 4 5)
   :jpath.outerjoinpath
      {PATH
      :parent_relids (b 1)
   :jpath.innerjoinpath
      {MERGEPATH
      :parent_relids (b 2 4)
      :jpath.outerjoinpath
         {PATH
         :parent_relids (b 2)
         }
      :jpath.innerjoinpath
         {PATH
         :parent_relids (b 4)
         }
      }
   }

So, for the assertion to pass, the more deeply nested merge join would
need to have ojrelids = {} and the less-deeply nested one would need
ojrelids={3,5}.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: plan shape work

Robert Haas <robertmhaas@gmail.com> — 2025-09-10T17:28:49Z

On Tue, Sep 9, 2025 at 4:37 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Maybe.  I kinda feel that actually redesigning plan trees is outside
> the scope of this project, but maybe we should consider it.

Yeah, I would rather not go there, all things being equal. We could
also consider displaying things differently even if the underlying
node type is the same. ExplainNode already has examples where
pname/sname are set to something other than the node tag, e.g. we set
EXPLAIN designation based on (((ModifyTable *) plan)->operation) or
agg->aggstrategy. I'm not sure if this is the right way to go, but
it's worth a thought.

> I think though that you might be underestimating the amount of
> commonality.  For instance, we might need a projecting node on top of
> a subquery-in-FROM subplan, but that node would still have to bear
> a relid --- the relid of the RTE_SUBQUERY RTE in the upper query,
> not that of any RTE in the subquery, but nonetheless it's a relid.
>
> Another variant of this is that that RTE_SUBQUERY relid would normally
> be borne by a SubqueryScan plan node, but if we elide the SubqueryScan
> because it isn't doing anything useful, where shall we put that relid?
> If we don't store it anywhere then we will not be able to reconstruct
> correct join relids for the upper plan level.

I don't quite understand how these two scenarios are different, but I
have found it critical to distinguish problems that happen at setrefs
time from problems that happen during main planning. If we're talking
about feeding information from one planning cycle forward to the next,
we need be able to look at the plan tree and understand what the
pre-setrefs state of affairs was, because when the next planning cycle
happens, the decisions we want to influence are happening pre-setrefs.
The later patches in this patch series deal with exactly this problem:
0004 and 0005 make it possible to match up an RTI from the flattened
range table that pops out of one planning cycle with a specific
subroot and RTI relative to that subroot during the following cycle;
and 0006 and 0007 arrange to leave a breadcrumb trail in cases where
setrefs-time processing deletes plan nodes. The setrefs-time elision
of SubqueryScan nodes is recorded by the mechanism in 0006.

I went back and studied 0001 some more today in reference to your
comments about classifying Result nodes. 0001 already loosely
classifies Result nodes as either "gating" result nodes (that have a
subplan) or "simple" result notes (that don't). "Gating" result notes
happen for target-list projection and/or to apply a one-time filter. I
don't think the reasons for gating nodes need to be recorded anywhere;
either the tlist matches the underlying node or it doesn't, and either
resconstantqual contains something or not. "Simple" result have more
interesting reasons for existing:

1. MinMaxAgg placeholders
2. Degenerate grouping
3. No-FROM-clause cases (these go through create_group_result_plan
like the previous case, but are arguably distinct)
4. Relations proven empty

There's a sort of hybrid case when we want a gating result node but
the underlying node is a simple result. In that case, the patch builds
a new simple result that is similar to the existing one but with the
gating result's target list and one-time filter. It seems OK to me to
forget all about the gating result node and its reason for existence
in this case and just consider ourselves to have updated the
underlying Result node. Otherwise, you'd have to consider that a
Result node might have multiple reasons for existence: whatever caused
the "simple" Result note to get created, plus possibly projection or
one-time filtering.

Now, looking at (1)-(4) above, (3) is actually a special case of (4):

robert.haas=# explain (range_table) select 1;
                QUERY PLAN
------------------------------------------
 Result  (cost=0.00..0.01 rows=1 width=4)
   RTIs: 1
 RTI 1 (result):
   Eref: "*RESULT*" ()
(4 rows)

So the reason why the patch set feels justified in printing "Replaces:
Aggregate" when there are no relids is because we must have case (1)
or (2) from the above list. But it does seem fragile. Not only can we
confuse (1) and (2), but also, only the top-level grouping rel
necessarily has empty relids. We already have child grouping rels that
have relid sets, and I suspect Richard's pending work on eager
aggregation will introduce more of them. This patch won't be able to
distinguish those from case (4).

So maybe what we want for Result reasons is something like
RESULT_REPLACES_BASEREL, RESULT_REPLACES_JOINREL,
RESULT_REPLACES_GROUPING_REL, RESULT_IMPLEMENTS_MINMAX_AGGREGATE?
That's a bit verbose; shorter alternatives welcome. The first two
could be merged, since the cardinality of the relid set should
distinguish them. Or it could be more like RESULT_REPLACES_SCAN,
RESULT_REPLACES_JOIN, RESULT_REPLACES_AGGREGATE,
RESULT_IMPLEMENTS_MINMAX_AGGREGATE, to more closely match what we
would presumably show in the EXPLAIN output.

Thoughts?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: plan shape work

Tom Lane <tgl@sss.pgh.pa.us> — 2025-09-11T18:19:40Z

Richard Guo <guofenglinux@gmail.com> writes:
> As you can see, the set of outer joins calculated at the same join can
> vary depending on the join order.  What I suggested is to record this
> information in JoinPaths (or maybe also in Join plan nodes so that
> get_scanned_rtindexes can collect it) for the assertion.

I do not think this is correct, or at least it's not the most useful
way to think about it.  As I stated earlier, each join plan node that
is doing a non-inner join has a unique corresponding outer-join RTE
that describes what it's doing.  The point that you are making is
that if we've applied identity 3 to swap the order of two outer
joins, then we can't claim that the output of the lower plan node is a
fully-correct representation of the semantics of the join that it's
doing: there may be values that should have gone to NULL but won't
until after the upper plan node processes the rows.

If we're going to attach more labeling to the plan nodes, I'd
prefer to do what I suggested and label the nodes with the specific
outer join that they think they are implementing.  With Richard's
proposal it will remain impossible to tell which node is doing what.

While I've still not worked through the details, it might be that
we can't implement my desire to make setrefs.c's nullingrels
checks exact unless we *also* store the bitmap sets that Richard
is proposing.  I don't know if it's worth carrying two new fields
in order to make that work.

I don't entirely buy that Richard's proposed assertion is worth
doing: I think I agree with Robert's original opinion that it's
redundant.  I do think that tightening the nullingrels checks
would be useful.

			regards, tom lane

Re: plan shape work

Robert Haas <robertmhaas@gmail.com> — 2025-09-11T18:41:54Z

On Thu, Sep 11, 2025 at 2:19 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I do not think this is correct, or at least it's not the most useful
> way to think about it.  As I stated earlier, each join plan node that
> is doing a non-inner join has a unique corresponding outer-join RTE
> that describes what it's doing.  The point that you are making is
> that if we've applied identity 3 to swap the order of two outer
> joins, then we can't claim that the output of the lower plan node is a
> fully-correct representation of the semantics of the join that it's
> doing: there may be values that should have gone to NULL but won't
> until after the upper plan node processes the rows.
>
> If we're going to attach more labeling to the plan nodes, I'd
> prefer to do what I suggested and label the nodes with the specific
> outer join that they think they are implementing.  With Richard's
> proposal it will remain impossible to tell which node is doing what.

Conceptually, I prefer your idea of one RTI per join node, but I don't
understand how to make it work. Let's say that, as in Richard's
example, the query is written as (A leftjoin B on (Pab)) leftjoin C on
(Pbc) but we end up with a plan tree that looks like this:

Something Join (RTIs: 1 2 3 4 5)
-> Scan on A (RTI: 1)
-> Whatever Join (RTIs: 2 4)
    -> Scan on B (RTI: 2)
    -> Scan on C (RTI: 4)

RTE 3 is the join between A and B, and RTI 5 is the join between A-B
and C. It makes plenty of sense to associate RTI 5 with the Something
Join, so your model seems to require us to associate RTI 3 with the
Whatever Join, because there's no place else for it to go. That seems
to create two problems.

First, RTI 3 is for an A-B join, and the Whatever Join is for a B-C
join, and it sounds wrong to associate an RTI with a join when not all
of the rels being joined are present at that level. Can we really say
that the Whatever Join is implementing RTI 3 given that RTI 3 includes
A?

Second, even ignoring that problem, if we now try to assert that the
RTIs of a joinrel are the union of the RTIs we see in the plan tree,
the assertion is going to fail, because now the Whatever Join sees
RTIs 2 and 4 through its children and RTI 3 through its own ojrelid,
but the joinrel's RTIs are {2,4}, not {2,3,4}.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: plan shape work

Robert Haas <robertmhaas@gmail.com> — 2025-09-11T20:07:03Z

Here's a likely-doomed new version of just the first three patches.
0002 is unchanged. 0001 has been reworked so that a Result node
contains a result_type. This is as per Tom's suggestion, but it's
unclear to me that he will like the details. 0003 has been reworked so
that when we build a Join plan, we annotate it with the ojrelids
completed at that level, which Tom said earlier that he thought was
the wrong idea (but after I'd already written the code, and I've
already replied to say I don't understand what the alternative is).

Hence, I expect this version to crash and burn, but maybe it will do
so in such a way that I have some idea what to propose instead.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: plan shape work

Tom Lane <tgl@sss.pgh.pa.us> — 2025-09-12T15:08:51Z

Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Sep 11, 2025 at 2:19 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> If we're going to attach more labeling to the plan nodes, I'd
>> prefer to do what I suggested and label the nodes with the specific
>> outer join that they think they are implementing.  With Richard's
>> proposal it will remain impossible to tell which node is doing what.

> Conceptually, I prefer your idea of one RTI per join node, but I don't
> understand how to make it work. Let's say that, as in Richard's
> example, the query is written as (A leftjoin B on (Pab)) leftjoin C on
> (Pbc) but we end up with a plan tree that looks like this:

> Something Join (RTIs: 1 2 3 4 5)
> -> Scan on A (RTI: 1)
> -> Whatever Join (RTIs: 2 4)
>     -> Scan on B (RTI: 2)
>     -> Scan on C (RTI: 4)

After thinking about this for awhile, I believe that Richard and I
each had half of the right solution ;-).  Let me propose some new
terminology in hopes of clarifying matters:

* A join plan node "starts" an outer join if it performs the
null-extension step corresponding to that OJ (specifically,
if it is the first join doing null-extension over the minimum
RHS of that OJ).

* A join plan node "completes" an outer join if its output
nulls all the values that that OJ should null when done
according to syntactic order.

In simple cases where we have not applied OJ identity 3, every
outer-join plan node starts and completes a single OJ relid.
But if we have applied identity 3 in the forward direction,
as per your example above, it's different.  The physically
lower join node starts OJ 5, but doesn't complete it.  The
upper node starts OJ 3, and completes both 3 and 5.  I think
that it's possible for the topmost join to complete more than
two OJs, if we have a nest of multiple OJs that can all be
re-ordered via identity 3.

I was arguing for labeling plan nodes according to which OJ they
start (always a unique relid).  Richard was arguing for labeling
according to which OJ(s) they complete (zero, one, or more relids).
But I now think it's probably worth doing both.  We need the
completion bitmapsets if we want to cross-check Var nullingrels,
because those correspond to the nullingrels that should get added
at each join's output.  I think that we also want the start labels
though.  For one thing, if the start nodes are not identified,
it's impossible to understand how much of the tree is the "no
man's land" where a C variable may or may not have gone to null
on its way to becoming a C* variable.  But in general I think
that we'll want to be able to identify an outer-join plan node
even if it does not complete its OJ.

			regards, tom lane

Re: plan shape work

Robert Haas <robertmhaas@gmail.com> — 2025-09-12T15:54:28Z

On Fri, Sep 12, 2025 at 11:08 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I was arguing for labeling plan nodes according to which OJ they
> start (always a unique relid).  Richard was arguing for labeling
> according to which OJ(s) they complete (zero, one, or more relids).

I agree with everything in your reply up to and including this part.

> But I now think it's probably worth doing both.  We need the
> completion bitmapsets if we want to cross-check Var nullingrels,
> because those correspond to the nullingrels that should get added
> at each join's output.  I think that we also want the start labels
> though.  For one thing, if the start nodes are not identified,
> it's impossible to understand how much of the tree is the "no
> man's land" where a C variable may or may not have gone to null
> on its way to becoming a C* variable.  But in general I think
> that we'll want to be able to identify an outer-join plan node
> even if it does not complete its OJ.

So, it looks to me like the way this works today is that
join_is_legal() figures out the relevant SpecialJoinInfo and then
add_outer_joins_to_relids() decides what to add to the joinrel's relid
set. So I think, though I am not quite sure, that if somewhere around
that point in the code we copied sjinfo->ojrelid into the RelOptInfo,
and then propagated that through to the final plan, that might be what
you're looking for here. However, that assumes that the choice of
SpecialJoinInfo is fixed for all possible ways of constructing a given
joinrel, which I think might not be true. In Richard's test case, one
simply can't go wrong, because the lower join only draws from two
baserels. But in a case like A LJ B ON A.x = B.x LJ C ON B.y = C.y LJ
D ON B.z = D.z, the B-C-D joinrel could presumably be constructed by
joining either B-D to C or B-C to D, and I'm guessing that will result
in a different choice of SpecialJoinInfo in each case. That would mean
that the ojrelid has to be per-Path rather than per-RelOptInfo.

While in theory that's fine, it sounds expensive. I was hoping that we
could piggyback mostly on existing calculations here, exposing data we
already have instead of calculating new things. If we want to expose
the starting-the-outer-join-RTI value that you want here, it seems
like we're going to have to redo some of the join_is_legal() work and
some of the add_outer_joins_to_relids() work for every path. I'm
skeptical about expending those cycles. It's not clear to me that I
need to care about outer join RTIs at all for what I'm trying to do --
focusing where RTIs originating from baserels end up in the final plan
tree is, as far as I can see, completely adequate. There might be
other people who want to do other things that would benefit more from
seeing that stuff, though. I'm not against exposing information that
is easily calculated and might be useful to somebody, even if it's
just for planner debugging. But it seems to me that what you're asking
here might be going quite a bit further than that.

So my counter-proposal is that this patch set should either (1) expose
nothing at all about join RTIs because I don't have a need for them or
(2) expose the join RTIs completed at a certain level because that's
easily calculated from the data we already have; and if you want to
later also expose the single join RTI started at a certain level for a
varnullingrels cross-check or any other purpose, then you can propose
a patch for that. Alternatively, if you want to edit my 0003 patch to
work the way you think it should, cool.  Or if you can describe what
you think it should do, I'm somewhat willing to try implementing that,
but that's definitely not such a great choice from my perspective. I'm
afraid that I'm getting pulled a bit further down the garden path than
I really want to go trying to satisfy your desire to perform a
cross-check that I don't really understand and/or expose information
for which I don't see a clear need in my own work. What I need is for
all the baserels that appear in the final Plan tree to be properly
labelled.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: plan shape work

Tom Lane <tgl@sss.pgh.pa.us> — 2025-09-12T16:18:16Z

Robert Haas <robertmhaas@gmail.com> writes:
> So, it looks to me like the way this works today is that
> join_is_legal() figures out the relevant SpecialJoinInfo and then
> add_outer_joins_to_relids() decides what to add to the joinrel's relid
> set. So I think, though I am not quite sure, that if somewhere around
> that point in the code we copied sjinfo->ojrelid into the RelOptInfo,
> and then propagated that through to the final plan, that might be what
> you're looking for here.

Not the RelOptInfo, but the Path.  I have not looked at details, but
it might be necessary to identify these labels at Path construction
time rather than reconstructing them during createplan.c.  I'd rather
not, because bloating Path nodes with hopefully-redundant information
isn't attractive.  But if it turns out that createplan.c doesn't have
enough information then we might have to.

I'm also thinking that this notion of starting/completing OJs might be
useful in its own right to clarify and even simplify some of the Path
manipulations we do.  But that will require reviewing the code.

> So my counter-proposal is that this patch set should either (1) expose
> nothing at all about join RTIs because I don't have a need for them or
> (2) expose the join RTIs completed at a certain level because that's
> easily calculated from the data we already have; and if you want to
> later also expose the single join RTI started at a certain level for a
> varnullingrels cross-check or any other purpose, then you can propose
> a patch for that.

If what you want to do is only interested in baserel RTIs, then I
think we should leave outer join RTIs out of the discussion for the
present.  I was not looking to make you do work you aren't interested
in.  I reserve the right to do said work later ...

			regards, tom lane

Re: plan shape work

Robert Haas <robertmhaas@gmail.com> — 2025-09-12T17:32:10Z

On Fri, Sep 12, 2025 at 12:18 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Not the RelOptInfo, but the Path.  I have not looked at details, but
> it might be necessary to identify these labels at Path construction
> time rather than reconstructing them during createplan.c.  I'd rather
> not, because bloating Path nodes with hopefully-redundant information
> isn't attractive.  But if it turns out that createplan.c doesn't have
> enough information then we might have to.

My intuition (which might be wrong) is that there's enough information
available to do it during createplan.c, but I'm not sure that there's
enough information available to do it efficiently at that stage. You
could grovel through the whole Plan tree and find all of the scan
RTIs, and then from there you should be able to work out what joins
were commuted, and then from there you should be able to work out
which SpecialJoinInfo goes with each Join that appears in the final
plan tree. That doesn't sound like a lot of fun, though. On the other
hand, I'm not sure doing this at Path creation time is going to be a
picnic either. Right now, we're sort of cheating by only caring about
what OJs are finished at a certain level: that is consistent across
all possible ways of forming a joinrel, but which OJs are started at a
certain level is not, and I'm not currently seeing how to fix that
without adding cycles.

> I'm also thinking that this notion of starting/completing OJs might be
> useful in its own right to clarify and even simplify some of the Path
> manipulations we do.  But that will require reviewing the code.

Makes sense.

> If what you want to do is only interested in baserel RTIs, then I
> think we should leave outer join RTIs out of the discussion for the
> present.  I was not looking to make you do work you aren't interested
> in.  I reserve the right to do said work later ...

Absolutely. I'm more than happy to have you do that.

We sort of got started down this path because, reviewing v4-0003,
Richard commented that I might be able to sanity-check
something-or-other about RTE_JOIN RTIs instead of just focusing on
baserels. From there, this sub-thread has turned into a discussion of
exactly what that sanity check should be. v5 exposes the
completed-at-this-level OJs in the final plan tree, which is easy to
compute and could be useful for somebody's plan introspection, but (1)
I don't need it and (2) it just derives them from the joinrel's RTI
set rather than in any independent per-path way that might lead to a
more meaningful cross-check. Having done the work to create v5-0003, I
find myself thinking it feels a little tidier than v4 and am somewhat
inclined to prefer it; I think that it's very possible that you or
Richard might find it a useful basis for future work to further
strengthen the way things work in this area. However, on the other
hand, maybe not, and going back to v4-0003 is also completely
reasonable. I don't care much one way or the other as long as nobody's
too mad when the dust settles.

But, I also can't commit either v4-0003 or v5-0003 or any variant
thereof until we agree on what to do about 0001, and you're the
holdout there. v5-0001 adds a result_type field to the Result node in
response to your previous review comments, so knowing whether that
looks like what you want or whether you would prefer something else is
the blocker for me as of this moment.

Thanks,

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: plan shape work

Tom Lane <tgl@sss.pgh.pa.us> — 2025-09-12T17:44:32Z

Robert Haas <robertmhaas@gmail.com> writes:
> But, I also can't commit either v4-0003 or v5-0003 or any variant
> thereof until we agree on what to do about 0001, and you're the
> holdout there.

Yeah, I owe you a review, hope to get to it over the weekend.

			regards, tom lane

Re: plan shape work

Robert Haas <robertmhaas@gmail.com> — 2025-09-12T17:52:01Z

On Fri, Sep 12, 2025 at 1:44 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > But, I also can't commit either v4-0003 or v5-0003 or any variant
> > thereof until we agree on what to do about 0001, and you're the
> > holdout there.
>
> Yeah, I owe you a review, hope to get to it over the weekend.

Thanks. This is a good time to mention that I appreciate your
engagement in this thread. Let me know if there's something that you'd
like me to pay attention to in turn.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: plan shape work

Richard Guo <guofenglinux@gmail.com> — 2025-09-13T08:46:09Z

On Sat, Sep 13, 2025 at 12:08 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> After thinking about this for awhile, I believe that Richard and I
> each had half of the right solution ;-).  Let me propose some new
> terminology in hopes of clarifying matters:
>
> * A join plan node "starts" an outer join if it performs the
> null-extension step corresponding to that OJ (specifically,
> if it is the first join doing null-extension over the minimum
> RHS of that OJ).
>
> * A join plan node "completes" an outer join if its output
> nulls all the values that that OJ should null when done
> according to syntactic order.

This new notion makes a lot of sense to me.  I feel that it could help
us optimize some existing logic, or at least make certain parts of it
easier to understand.  It might be worth adding to the README, maybe
under the section "Relation Identification and Qual Clause Placement",
where we explain the idea behind pushed-down joins.

- Richard

Re: plan shape work

Richard Guo <guofenglinux@gmail.com> — 2025-09-13T09:02:04Z

On Sat, Sep 13, 2025 at 2:32 AM Robert Haas <robertmhaas@gmail.com> wrote:
> We sort of got started down this path because, reviewing v4-0003,
> Richard commented that I might be able to sanity-check
> something-or-other about RTE_JOIN RTIs instead of just focusing on
> baserels. From there, this sub-thread has turned into a discussion of
> exactly what that sanity check should be.

Yeah, when commenting on v4-0003 about including outer join relids in
the assertion, I had a feeling that it might take the discussion off
topic -- sorry that it did.  That's why I suggested moving this part
of discussion to a separate thread.

I still think that cross-checking outer join relids isn't a
requirement for committing your patches, so I'm totally fine if you
end up with a version that doesn't assert outer join relids.

- Richard

Re: plan shape work

Tom Lane <tgl@sss.pgh.pa.us> — 2025-09-14T23:42:43Z

I wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> But, I also can't commit either v4-0003 or v5-0003 or any variant
>> thereof until we agree on what to do about 0001, and you're the
>> holdout there.

> Yeah, I owe you a review, hope to get to it over the weekend.

I'm running out of weekend, but I found time to look at v5-0001,
so here are some comments.  In general, it's in pretty good shape
and these are nitpicks.

* I think your placement of the show_result_replacement_info call
site suffers from add-at-the-end syndrome.  It should certainly go
before the show_instrumentation_count call: IMO we expect stuff
added by EXPLAIN ANALYZE to appear after stuff that's there in plain
EXPLAIN.  But I'd really argue that from a user's standpoint this
information is part of the fundamental plan structure and so it
deserves more prominence.  I'd lean to putting it first in the
T_Result case, before the "One-Time Filter".  (Thought experiment:
if we'd had this EXPLAIN field from day one, where do you think
it would have been placed?)

* Even more nitpicky:

+			if (plan->lefttree == NULL)
+				show_result_replacement_info(castNode(Result, plan), es);

I think show_result_replacement_info should have responsibility for
deciding whether to print anything in the lefttree == NULL case.
(This will affect the Asserts in show_result_replacement_info,
but those seem a little odd anyway.)

+		case RESULT_TYPE_UPPER:
+			/* a small white lie */
+			replacement_type = "Aggregate";
+			break;

I find this unconvincing: is it really an aggregate?  It doesn't
help that this case doesn't seem to be reached anywhere in the
regression tests.

In general I suspect that we'll have to refine RESULT_TYPE_UPPER
in the future.  I don't think this fear needs to block committing
of what you have though.

+        /* Work out what reference name to use and added it the string. */

The grammar police will be after you.

+     * (Arguably, we should instead display the RTE name in some other way in
+     * such cases, but in typical cases the RTE name is *RESULT* and printing
+     * "Result on *RESULT*" or similar doesn't seem especially useful, so for
+     * now we don't print anything at all.)

Right offhand, I think that RTE_RESULT *always* has the name *RESULT*,
so the "typical" bit seems misleading.  Personally I'd drop this
para altogether.

+    /*
+     * We're replacing either a scan or a join, according to the number of
+     * rels in the relids set.
+     */
+    if (nrels == 0)
+        ExplainPropertyText("Replaces", replacement_type, es);
+    else
+    {
+        char *s = psprintf("%s on %s", replacement_type, buf.data);
+
+        ExplainPropertyText("Replaces", s, es);
+    }

This comment seems to neither have anything to do with the logic,
or to be adding anything.  But do you need nrels at all?  I'd
be inclined to check for "buf.len > 0" to see if you want to
insert the "on ..." bit.

+ * make_simple_result
+ *      Build a Result plan node that returns a single row (or possibly no rows,
+ *      if the one-time filtered defined by resconstantqual returns false)

I don't love the name "make_simple_result", as the cases this handles
are frequently far from simple.  I don't have a clearly-better idea
offhand though.  Maybe "make_one_row_result"?  In any case, "one-time
filtered" needs help.

In general, I wonder if it'd be better for the callers of
make_xxx_result to pass in the result_type to use.  Those
functions have such a narrow view of the available info
that I'm dubious that they can get it right.  Especially
if/when we decide that RESULT_TYPE_UPPER needs subdivision.

+ * relids identifies the relation for which this Result node is generating the
+ * tuples. When subplan is not NULL, it should be empty: this node is not
+ * generating anything in that case, just acting on tuples generated by the
+ * subplan. Otherwise, it may contain a single RTI (as when this Result node
+ * is substituted for a scan); multiple RTIs (as when this Result node is
+ * substituted for a join); or no RTIs at all (as when this Result node is
+ * substituted for an upper rel).

I doubt this claim that the relid set will be empty for an upper rel.
I think it's more likely that it will include all the rels for the
query.

			regards, tom lane

Re: plan shape work

Robert Haas <robertmhaas@gmail.com> — 2025-09-15T16:32:15Z

Thanks for the review. Comments to which I don't respond below are duly noted.

On Sun, Sep 14, 2025 at 7:42 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> * I think your placement of the show_result_replacement_info call
> site suffers from add-at-the-end syndrome.  It should certainly go
> before the show_instrumentation_count call: IMO we expect stuff
> added by EXPLAIN ANALYZE to appear after stuff that's there in plain
> EXPLAIN.  But I'd really argue that from a user's standpoint this
> information is part of the fundamental plan structure and so it
> deserves more prominence.  I'd lean to putting it first in the
> T_Result case, before the "One-Time Filter".  (Thought experiment:
> if we'd had this EXPLAIN field from day one, where do you think
> it would have been placed?)

Yes, I wondered if it should actually look more like this:

Degenerate Scan on blah
Degenerate Join on blah, blah, blah
Degenerate Aggregate
MinMaxAggregate Result

So getting rid of Replaces: altogether. In fact, "Replaces" is really
a complete misnomer in the case of a MinMaxAggregate; I'm just not
sure exactly what to do instead.


> +               case RESULT_TYPE_UPPER:
> +                       /* a small white lie */
> +                       replacement_type = "Aggregate";
> +                       break;
>
> I find this unconvincing: is it really an aggregate?  It doesn't
> help that this case doesn't seem to be reached anywhere in the
> regression tests.

This is the case I know about where that can be reached.

robert.haas=# explain select 1 as a, 2 as b having false;
                QUERY PLAN
------------------------------------------
 Result  (cost=0.00..0.01 rows=1 width=8)
   One-Time Filter: false
   Replaces: Aggregate
(3 rows)

> In general I suspect that we'll have to refine RESULT_TYPE_UPPER
> in the future.  I don't think this fear needs to block committing
> of what you have though.

+1.

> +     * (Arguably, we should instead display the RTE name in some other way in
> +     * such cases, but in typical cases the RTE name is *RESULT* and printing
> +     * "Result on *RESULT*" or similar doesn't seem especially useful, so for
> +     * now we don't print anything at all.)
>
> Right offhand, I think that RTE_RESULT *always* has the name *RESULT*,
> so the "typical" bit seems misleading.  Personally I'd drop this
> para altogether.

Counterexample:

robert.haas=# explain verbose select * from pgbench_accounts where 0 = 1;
                QUERY PLAN
------------------------------------------
 Result  (cost=0.00..0.00 rows=0 width=0)
   Output: aid, bid, abalance, filler
   One-Time Filter: false
   Replaces: Scan on pgbench_accounts
(4 rows)

debug_print_plan says:

      :alias <>
      :eref
         {ALIAS
         :aliasname pgbench_accounts
         :colnames ("aid" "bid" "abalance" "filler")
         }

> In general, I wonder if it'd be better for the callers of
> make_xxx_result to pass in the result_type to use.  Those
> functions have such a narrow view of the available info
> that I'm dubious that they can get it right.  Especially
> if/when we decide that RESULT_TYPE_UPPER needs subdivision.

That was my first thought, but after experimentation I think it sucks,
especially because of this:

                /*
                 * The only path for it is a trivial Result path.  We cheat a
                 * bit here by using a GroupResultPath, because that way we
                 * can just jam the quals into it without preprocessing them.
                 * (But, if you hold your head at the right angle, a FROM-less
                 * SELECT is a kind of degenerate-grouping case, so it's not
                 * that much of a cheat.)
                 */

I would argue that if you hold your head at that angle, you need your
head examined. Perhaps that wasn't the case when this comment was
written, but from the viewpoint of this project, I think it's pretty
clear. What this does is use create_group_result_path() not only for
actual degenerate-grouping cases (where we're replacing an aggregate)
but also for no-FROM-clause cases (where we're replacing a scan). We
could adjust things so that the no-FROM-clause case doesn't take this
code path, or passes down a flag, but it's clear from examination of
the RelOptInfo. Likewise, for scans vs. joins, peaking at the
reloptkind is a very easy way to tell whether we've got a baserel or a
joinrel and having the caller go to extra trouble to pass it down just
seems silly.

> + * relids identifies the relation for which this Result node is generating the
> + * tuples. When subplan is not NULL, it should be empty: this node is not
> + * generating anything in that case, just acting on tuples generated by the
> + * subplan. Otherwise, it may contain a single RTI (as when this Result node
> + * is substituted for a scan); multiple RTIs (as when this Result node is
> + * substituted for a join); or no RTIs at all (as when this Result node is
> + * substituted for an upper rel).
>
> I doubt this claim that the relid set will be empty for an upper rel.
> I think it's more likely that it will include all the rels for the
> query.

Upper rels are created by fetch_upper_rel(). The third argument
becomes the relids set. Most call sites pass that argument as NULL:

[robert.haas pgsql]$ git grep fetch_upper_rel src/backend/optimizer/
src/backend/optimizer/path/allpaths.c:   sub_final_rel =
fetch_upper_rel(rel->subroot, UPPERREL_FINAL, NULL);
src/backend/optimizer/path/costsize.c:   sub_final_rel =
fetch_upper_rel(subroot, UPPERREL_FINAL, NULL);
src/backend/optimizer/plan/planagg.c:    grouped_rel =
fetch_upper_rel(root, UPPERREL_GROUP_AGG, NULL);
src/backend/optimizer/plan/planner.c:    final_rel =
fetch_upper_rel(root, UPPERREL_FINAL, NULL);
src/backend/optimizer/plan/planner.c:    final_rel =
fetch_upper_rel(root, UPPERREL_FINAL, NULL);
src/backend/optimizer/plan/planner.c:    final_rel =
fetch_upper_rel(root, UPPERREL_FINAL, NULL);
src/backend/optimizer/plan/planner.c:        grouped_rel =
fetch_upper_rel(root, UPPERREL_GROUP_AGG,
src/backend/optimizer/plan/planner.c:        grouped_rel =
fetch_upper_rel(root, UPPERREL_GROUP_AGG, NULL);
src/backend/optimizer/plan/planner.c:    window_rel =
fetch_upper_rel(root, UPPERREL_WINDOW, NULL);
src/backend/optimizer/plan/planner.c:    distinct_rel =
fetch_upper_rel(root, UPPERREL_DISTINCT, NULL);
src/backend/optimizer/plan/planner.c:    partial_distinct_rel =
fetch_upper_rel(root, UPPERREL_PARTIAL_DISTINCT,
src/backend/optimizer/plan/planner.c:    ordered_rel =
fetch_upper_rel(root, UPPERREL_ORDERED, NULL);
src/backend/optimizer/plan/planner.c:    partially_grouped_rel =
fetch_upper_rel(root,
src/backend/optimizer/plan/setrefs.c:
IS_DUMMY_REL(fetch_upper_rel(rel->subroot,
src/backend/optimizer/plan/subselect.c:  final_rel =
fetch_upper_rel(subroot, UPPERREL_FINAL, NULL);
src/backend/optimizer/plan/subselect.c:          final_rel =
fetch_upper_rel(subroot, UPPERREL_FINAL, NULL);
src/backend/optimizer/plan/subselect.c:      final_rel =
fetch_upper_rel(subroot, UPPERREL_FINAL, NULL);
src/backend/optimizer/prep/prepunion.c:  result_rel =
fetch_upper_rel(root, UPPERREL_SETOP,
src/backend/optimizer/prep/prepunion.c:  final_rel =
fetch_upper_rel(rel->subroot, UPPERREL_FINAL, NULL);
src/backend/optimizer/prep/prepunion.c:  result_rel =
fetch_upper_rel(root, UPPERREL_SETOP, relids);
src/backend/optimizer/prep/prepunion.c:  result_rel =
fetch_upper_rel(root, UPPERREL_SETOP,
src/backend/optimizer/util/relnode.c: * fetch_upper_rel
src/backend/optimizer/util/relnode.c:fetch_upper_rel(PlannerInfo
*root, UpperRelationKind kind, Relids relids)

The exceptions are: make_grouping_rel() passes a relids set when
IS_OTHER_REL(input_rel); create_partial_grouping_paths() passes a
relids set when creating UPPERREL_PARTIAL_GROUP_AGG; and
generate_union_paths() and generate_nonunion_paths() in prepunion.c
bubble up the underlying relids. AFAICS, a non-parallel,
non-partitionwise aggregate ends up with an empty relid set, and even
those cases end up with an empty relid set for the topmost grouping
rel, even if they create some other upper rels that do have non-empty
relid sets.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: plan shape work

Tom Lane <tgl@sss.pgh.pa.us> — 2025-09-15T19:43:46Z

Robert Haas <robertmhaas@gmail.com> writes:
> On Sun, Sep 14, 2025 at 7:42 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> ... But I'd really argue that from a user's standpoint this
>> information is part of the fundamental plan structure and so it
>> deserves more prominence.  I'd lean to putting it first in the
>> T_Result case, before the "One-Time Filter".  (Thought experiment:
>> if we'd had this EXPLAIN field from day one, where do you think
>> it would have been placed?)

> Yes, I wondered if it should actually look more like this:
> Degenerate Scan on blah
> Degenerate Join on blah, blah, blah
> Degenerate Aggregate
> MinMaxAggregate Result
> So getting rid of Replaces: altogether.

Yeah, that would be pretty tempting if we were working in a green
field.  I think it might be too much change though.  Also, from a
developer's standpoint it's better if what EXPLAIN prints agrees
with what the node types are internally.

>> +               case RESULT_TYPE_UPPER:
>> +                       /* a small white lie */
>> +                       replacement_type = "Aggregate";
>> +                       break;
>> 
>> I find this unconvincing: is it really an aggregate?  It doesn't
>> help that this case doesn't seem to be reached anywhere in the
>> regression tests.

> This is the case I know about where that can be reached.

> robert.haas=# explain select 1 as a, 2 as b having false;
>                 QUERY PLAN
> ------------------------------------------
>  Result  (cost=0.00..0.01 rows=1 width=8)
>    One-Time Filter: false
>    Replaces: Aggregate
> (3 rows)

Hmm, okay.  "Replaces: Aggregate" doesn't seem like a great
explanation, but I don't have a better idea offhand.
In any case I'm not sure this result_type value is going to
have a long shelf-life, so arguing about how to spell it
may not be a productive use of time.

I do suggest adding the above as a regression test.

>> Right offhand, I think that RTE_RESULT *always* has the name *RESULT*,
>> so the "typical" bit seems misleading.  Personally I'd drop this
>> para altogether.

> Counterexample:
> robert.haas=# explain verbose select * from pgbench_accounts where 0 = 1;
>                 QUERY PLAN
> ------------------------------------------
>  Result  (cost=0.00..0.00 rows=0 width=0)
>    Output: aid, bid, abalance, filler
>    One-Time Filter: false
>    Replaces: Scan on pgbench_accounts
> (4 rows)

Uh ... this example does not involve an RTE_RESULT does it?
I see a regular RTE_RELATION RTE for pgbench_accounts, which
your code duly prints.

Looking at the code, there are just three places (all in
prepjointree.c) that create RTE_RESULT RTEs.  Two of them
are building the RTE from scratch, and they both do
	rte->eref = makeAlias("*RESULT*", NIL);
However the third one (pull_up_constant_function) is changing
a pulled-up RTE_FUNCTION into RTE_RESULT, and it doesn't do
anything to the RTE's eref.  While that makes your argument
nominally true, I'd be inclined to argue that this was an oversight
and it should have changed the alias/eref fields to look like other
RTE_RESULTs.  (I've not investigated, but I wonder what your
patch prints for such cases.)

Bottom line remains that I don't think the comment para I quoted
is adding any useful info.  The first half of that comment block
was sufficient.

>> In general, I wonder if it'd be better for the callers of
>> make_xxx_result to pass in the result_type to use.

> That was my first thought, but after experimentation I think it sucks,

Hmph.  Okay, if you tried it and it's bad, I'll accept your opinion.

>> I doubt this claim that the relid set will be empty for an upper rel.
>> I think it's more likely that it will include all the rels for the
>> query.

> Upper rels are created by fetch_upper_rel(). The third argument
> becomes the relids set. Most call sites pass that argument as NULL:

Okay, but "most" is not "all".  I suggest that that comment is just
too specific in the first place.  Even if it's 100% correct today,
it's likely to be falsified in future and no one will remember to
update it.  Something like this might have a longer shelf life:

+ * relids identifies the relation for which this Result node is generating the
+ * tuples. When subplan is not NULL, it should be empty: this node is not
+ * generating anything in that case, just acting on tuples generated by the
+ * subplan. Otherwise, it contains the relids of the planner relation that
+ * the Result represents.

			regards, tom lane

Re: plan shape work

Robert Haas <robertmhaas@gmail.com> — 2025-09-15T21:07:47Z

On Mon, Sep 15, 2025 at 3:43 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Yes, I wondered if it should actually look more like this:
> > Degenerate Scan on blah
> > Degenerate Join on blah, blah, blah
> > Degenerate Aggregate
> > MinMaxAggregate Result
> > So getting rid of Replaces: altogether.
>
> Yeah, that would be pretty tempting if we were working in a green
> field.  I think it might be too much change though.  Also, from a
> developer's standpoint it's better if what EXPLAIN prints agrees
> with what the node types are internally.

Well, I was just bringing it up because you seemed to think that the
Replaces: line was not necessarily where we would have put it in a
green field. I am not sure about where we would have put it, but I
think in a green field we wouldn't have it at all, and I actually
think the above is worth considering. I think people would get used to
it pretty fast and that it might be more clear than "Result", which
doesn't really mean a whole lot. However, if you or others don't like
that and you want to just move the Replaces line slightly higher in
the output, I mean, I don't mind that, I just don't know that it's
really a material difference.

> I do suggest adding the above as a regression test.

Makes sense.

> Uh ... this example does not involve an RTE_RESULT does it?
> I see a regular RTE_RELATION RTE for pgbench_accounts, which
> your code duly prints.
>
> Looking at the code, there are just three places (all in
> prepjointree.c) that create RTE_RESULT RTEs.  Two of them
> are building the RTE from scratch, and they both do
>         rte->eref = makeAlias("*RESULT*", NIL);
> However the third one (pull_up_constant_function) is changing
> a pulled-up RTE_FUNCTION into RTE_RESULT, and it doesn't do
> anything to the RTE's eref.  While that makes your argument
> nominally true, I'd be inclined to argue that this was an oversight
> and it should have changed the alias/eref fields to look like other
> RTE_RESULTs.  (I've not investigated, but I wonder what your
> patch prints for such cases.)

Will investigate.

> + * relids identifies the relation for which this Result node is generating the
> + * tuples. When subplan is not NULL, it should be empty: this node is not
> + * generating anything in that case, just acting on tuples generated by the
> + * subplan. Otherwise, it contains the relids of the planner relation that
> + * the Result represents.

OK.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: plan shape work

Tom Lane <tgl@sss.pgh.pa.us> — 2025-09-15T21:15:44Z

Robert Haas <robertmhaas@gmail.com> writes:
> Well, I was just bringing it up because you seemed to think that the
> Replaces: line was not necessarily where we would have put it in a
> green field. I am not sure about where we would have put it, but I
> think in a green field we wouldn't have it at all, and I actually
> think the above is worth considering. I think people would get used to
> it pretty fast and that it might be more clear than "Result", which
> doesn't really mean a whole lot. However, if you or others don't like
> that and you want to just move the Replaces line slightly higher in
> the output, I mean, I don't mind that, I just don't know that it's
> really a material difference.

Yeah, I'm content with making it a "Replaces" attribute, I'm just
complaining about the ordering.  I'd be the first to agree that
this is purely cosmetic, but in my mind there is a pretty clear
precedence ordering among the attributes that EXPLAIN prints.

			regards, tom lane

Re: plan shape work

Robert Haas <robertmhaas@gmail.com> — 2025-09-16T14:17:42Z

On Mon, Sep 15, 2025 at 3:43 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> anything to the RTE's eref.  While that makes your argument
> nominally true, I'd be inclined to argue that this was an oversight
> and it should have changed the alias/eref fields to look like other
> RTE_RESULTs.  (I've not investigated, but I wonder what your
> patch prints for such cases.)

It just prints "-> Result" and that's it, as in this example:

robert.haas=# create or replace function absolutely_not() returns bool
return false;
CREATE FUNCTION
robert.haas=# explain (costs off) select * from generate_series(1,3) g
full join absolutely_not() n on true;
                QUERY PLAN
------------------------------------------
 Merge Full Join
   ->  Function Scan on generate_series g
   ->  Materialize
         ->  Result
(4 rows)

robert.haas=# explain (costs off, range_table) select * from
generate_series(1,3) g full join absolutely_not() n on true;
                QUERY PLAN
------------------------------------------
 Merge Full Join
   ->  Function Scan on generate_series g
         Scan RTI: 1
   ->  Materialize
         ->  Result
               RTIs: 2
 RTI 1 (function, in-from-clause):
   Alias: g ()
   Eref: g (g)
   WITH ORDINALITY: false
 RTI 2 (result, in-from-clause):
   Alias: n ()
   Eref: n (n)
 RTI 3 (join, in-from-clause):
   Eref: unnamed_join (g, n)
   Join Type: Full
(16 rows)

Here's a new patch set. My main questions are:

1. Did I miss anything you wanted fixed in 0001?

2. Should 0001 be combined with 0002 or kept separate?

3. Do you have a preference between this version of 0003 and the older
revision that just ignored outer-join relids?

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: plan shape work

Robert Haas <robertmhaas@gmail.com> — 2025-09-16T15:27:04Z

CI is unhappy with v6 because:

[14:37:37.742] In function ‘show_result_replacement_info’,
[14:37:37.742]     inlined from ‘ExplainNode’ at explain.c:2240:4:
[14:37:37.742] explain.c:4849:33: error: ‘replacement_type’ may be
used uninitialized [-Werror=maybe-uninitialized]
[14:37:37.742]  4849 |                 char       *s = psprintf("%s on
%s", replacement_type, buf.data);
[14:37:37.742]       |
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[14:37:37.742] explain.c: In function ‘ExplainNode’:
[14:37:37.742] explain.c:4769:21: note: ‘replacement_type’ was declared here
[14:37:37.742]  4769 |         char       *replacement_type;
[14:37:37.742]       |                     ^~~~~~~~~~~~~~~~

So apparently an "enum" over every value of the switch is not good
enough for the value to be assigned.

I'm inclined to change the code like this to fix it:

    char       *replacement_type = "???";

...in the hopes of still producing a warning here if somebody adds
another label to the enum.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: plan shape work

Robert Haas <robertmhaas@gmail.com> — 2025-09-18T17:23:11Z

On Tue, Sep 16, 2025 at 11:27 AM Robert Haas <robertmhaas@gmail.com> wrote:
> I'm inclined to change the code like this to fix it:
>
>     char       *replacement_type = "???";
>
> ...in the hopes of still producing a warning here if somebody adds
> another label to the enum.

Done in this version. I've also now gone back and rebased the rest of
the patches as well, so this email includes all 7 patches instead of
just the first 3. To recall, my goal for this CF was to get 1-4
committed.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: plan shape work

Junwang Zhao <zhjwpku@gmail.com> — 2025-09-21T09:52:21Z

Hi Robert,

On Fri, Sep 19, 2025 at 1:23 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Tue, Sep 16, 2025 at 11:27 AM Robert Haas <robertmhaas@gmail.com> wrote:
> > I'm inclined to change the code like this to fix it:
> >
> >     char       *replacement_type = "???";
> >
> > ...in the hopes of still producing a warning here if somebody adds
> > another label to the enum.
>
> Done in this version. I've also now gone back and rebased the rest of
> the patches as well, so this email includes all 7 patches instead of
> just the first 3. To recall, my goal for this CF was to get 1-4
> committed.
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com

I have a question about the following changes:

- splan = plan;
  if (IsA(plan, Result))
  {
  Result    *rplan = (Result *) plan;

- if (rplan->plan.lefttree == NULL &&
- rplan->resconstantqual == NULL)
- splan = NULL;
+ gplan->plan.lefttree = NULL;
+ gplan->relids = rplan->relids;
+ gplan->result_type = rplan->result_type;
  }

You set gplan->relids and gplan->result_type, but at the end you
returned &gplan->plan, what's the point of setting these two fields?


-- 
Regards
Junwang Zhao

Re: plan shape work

Junwang Zhao <zhjwpku@gmail.com> — 2025-09-21T10:34:49Z

On Sun, Sep 21, 2025 at 5:52 PM Junwang Zhao <zhjwpku@gmail.com> wrote:
>
> Hi Robert,
>
> On Fri, Sep 19, 2025 at 1:23 AM Robert Haas <robertmhaas@gmail.com> wrote:
> >
> > On Tue, Sep 16, 2025 at 11:27 AM Robert Haas <robertmhaas@gmail.com> wrote:
> > > I'm inclined to change the code like this to fix it:
> > >
> > >     char       *replacement_type = "???";
> > >
> > > ...in the hopes of still producing a warning here if somebody adds
> > > another label to the enum.
> >
> > Done in this version. I've also now gone back and rebased the rest of
> > the patches as well, so this email includes all 7 patches instead of
> > just the first 3. To recall, my goal for this CF was to get 1-4
> > committed.
> >
> > --
> > Robert Haas
> > EDB: http://www.enterprisedb.com
>
> I have a question about the following changes:
>
> - splan = plan;
>   if (IsA(plan, Result))
>   {
>   Result    *rplan = (Result *) plan;
>
> - if (rplan->plan.lefttree == NULL &&
> - rplan->resconstantqual == NULL)
> - splan = NULL;
> + gplan->plan.lefttree = NULL;
> + gplan->relids = rplan->relids;
> + gplan->result_type = rplan->result_type;
>   }
>
> You set gplan->relids and gplan->result_type, but at the end you
> returned &gplan->plan, what's the point of setting these two fields?

After further study, this should not be a problem, since the address of
&gplan->plan is the same as gplan, but the code is a little bit confusing
at first glance, I think *return (Plan *) gplan* is easier to understand
but I don't insist ;)

>
>
> --
> Regards
> Junwang Zhao



-- 
Regards
Junwang Zhao

Re: plan shape work

Robert Haas <robertmhaas@gmail.com> — 2025-09-22T00:26:00Z

On Sun, Sep 21, 2025 at 6:35 AM Junwang Zhao <zhjwpku@gmail.com> wrote:
> After further study, this should not be a problem, since the address of
> &gplan->plan is the same as gplan, but the code is a little bit confusing
> at first glance, I think *return (Plan *) gplan* is easier to understand
> but I don't insist ;)

Thanks for looking into it. Stylistically, I prefer the style without
the cast, but the effect is the same.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: plan shape work

Tom Lane <tgl@sss.pgh.pa.us> — 2025-09-22T18:15:22Z

Robert Haas <robertmhaas@gmail.com> writes:
> Done in this version. I've also now gone back and rebased the rest of
> the patches as well, so this email includes all 7 patches instead of
> just the first 3. To recall, my goal for this CF was to get 1-4
> committed.

I found time finally to look through all of these.  Some notes:

0001: committable, no further comments.

0002: Yeah, this does seem like an improvement.  There is something
faintly weird about output like

          ->  Result
-               Output: i3
+               Output: t3.i3
                Replaces: Scan on t3
                One-Time Filter: false

because the whole point of this construct is that we're *not*
scanning t3.  However, failing to supply the Var prefix is surely
not better.  I like the fact that the output in upper plan
levels is now just like what you'd see with a non-phony t3 scan.
I think you could commit this too.  Merging it with 0001 is
reasonable, but if you'd rather keep them separate that's okay too.

0003: doesn't feel ready for commit.  In the first place, you've used
the "completed an outer join" terminology in the commit message and
a couple of comments, but defined it nowhere.  I think a para or
two in optimizer/README to define "starting" and "completing"
outer joins is essential.  (I'm also still quite bemused by marking
nodes that complete outer joins but not those that start them.)
In the second place, we should not need to add two hundred lines
of new code to createplan.c to accomplish this.  Why not simply
bms_difference the joinrel's relids from the union of the inputs'
relids?  (And no, I do not believe that computing the value two
different ways so you can assert they're the same is adding anything
whatsoever except wasted cycles.)

By and large, I don't believe that it's necessary for 0003 to depend
on 0001 either.  We already have the information available from the
paths' parent RelOptInfos.

0004: commit msg is not very convincing about why this is a good idea.
It really looks like change for the sake of change, so you need to
make a better case for it.  Also, this output seems inconsistent:

  Function Scan on pg_catalog.generate_series x
-   Output: ARRAY(SubPlan 1)
+   Output: ARRAY(array_1)
    Function Call: generate_series(1, 3)
-   SubPlan 1
+   SubPlan array_1

Why isn't it now "ARRAY(SubPlan array_1)"?  The previous notation was
chosen to be not-confusable with a plain function call, but you've
lost that.  Likewise for cases like
-   Group Key: (InitPlan 1).col1
+   Group Key: (minmax_1).col1
which now looks exactly like an ordinary Var.

nitpick: sublinktype_to_string() should return const char *.

0005: I'm even less convinced about there being a need for this.
It's not that hard to see which RTIs are in which subplan,
especially after the other changes in this patchset.

0006: I agree with the need for this, but the details seem messy,
and it's not very clear that the proposed data structure would
be convenient to use.  Do we really need to rely on plan_node_id?
Why haven't you integrated record_elided_node into
clean_up_removed_plan_level?

An idea maybe worth thinking about is that instead of completely
eliding the plan node, we could replace it with a "no-op" plan node
that both EXPLAIN and the executor will look right through.
That node could carry the relid(s) that we lost.  Not sure how
messy this'd be to integrate, though.

0007: not sure about this either.  Why not simply add those
relid(s) to the surviving node's apprelids?  Again, I don't
love the amount of code and data structure that's being added
for a hypothetical use-case.  It's not even clear that this
form of the data structure would be helpful to anyone.

			regards, tom lane

Re: plan shape work

Robert Haas <robertmhaas@gmail.com> — 2025-09-23T15:34:45Z

On Mon, Sep 22, 2025 at 2:15 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I found time finally to look through all of these.  Some notes:

Thanks. Committed 0001 and 0002 together. I agree with you that 0002
is a bit weird, but I think it is less weird than the status quo ante,
and it seems you agree. I think it's best to regard a Result node that
replaces a scan or join as being a form of scan or join that just so
happens to be able to be optimized to a degree not normally possible.
There's a good argument for calling something like this a dummy
scan/join, as I mentioned before. There's no compelling reason we have
to make that terminology change, as we noted in the earlier
discussion, but it's a useful and not incorrect way of thinking about
it.

> 0003: doesn't feel ready for commit.  In the first place, you've used
> the "completed an outer join" terminology in the commit message and
> a couple of comments, but defined it nowhere.  I think a para or
> two in optimizer/README to define "starting" and "completing"
> outer joins is essential.  (I'm also still quite bemused by marking
> nodes that complete outer joins but not those that start them.)
> In the second place, we should not need to add two hundred lines
> of new code to createplan.c to accomplish this.  Why not simply
> bms_difference the joinrel's relids from the union of the inputs'
> relids?  (And no, I do not believe that computing the value two
> different ways so you can assert they're the same is adding anything
> whatsoever except wasted cycles.)

So, let me just back up a minute here and talk about overall goals.
What I originally wanted to do with this patch is ensure that the
non-RTE_JOIN RTIs from the joinrel are all mentioned in the final
plan. Without the changes to the Result node, that's not the case;
with the changes to the Result node, that is, AFAICT, now the case. It
can be argued that what has been numbered 0003 up to now is not really
necessary at all, since all it does is make it less likely that we
will introduce cases with similar problems to the Result-node case in
the future, and that could be judged unlikely enough to make 0003 not
worth committing. However, it seems like you might be proposing
keeping the patch in some form but throwing out the part of the code
that walks the plan tree to collect RTIs, and that doesn't make a
whole lot of sense to me. I have some confidence that the joinrel's
RTI set will include all of the RTIs from the input rels; what I'm
worried about is whether those RTIs survive into the final Plan.

On the other hand, I'm not sure that I'm interpreting your remarks
correctly. When you say "bms_difference the joinrel's relids from the
union of the inputs' relids" maybe you're specifically talking about
the handling of the RTE_JOIN relids, and I don't care very much how we
account for those. So I guess I need some clarification here as to
what your thinking is.

With regard to your other comments, I'm not opposed to updating the
README, or alternatively, I'm also not opposed to adjusting the
wording of the comments and commit message to avoid that particular
terminology. As far as your parenthetical comment about not marking
outer joins started, I don't actually care what we do, but as I said
before, I don't see an efficient way to identify outer joins started
at a particular level, and it isn't worth adding planner cycles for
information for which I have no concrete need.

> 0004: commit msg is not very convincing about why this is a good idea.
> It really looks like change for the sake of change, so you need to
> make a better case for it.

You're right. That commit message presupposes that the goal is already
understood, instead of explaining it. See the first message on this
thread, in the paragraph that begins "Now let's talk about problem
#2," for the justification. Quoting the most relevant part:

Subqueries sort of have names right
now, at least some of them, but it's an odd system: a CTE subquery,
for example, has the name mentioned by the user, but other kinds of
subplans just get names like "InitPlan 3" or "SubPlan 2". The real
problem, though, is that those names are only assigned after we've
FINISHED planned the subquery. If we begin planning our very first
subquery, it might turn out to be InitPlan 1 or SubPlan 1, or if while
planning it we recurse into some further subquery then *that* subquery
might become InitPlan 1 or SubPlan 1 and OUR subquery might become
InitPlan 2 or SubPlan 2 (or higher, if we find more subqueries and
recurse into them too). Thus, being given some information about how
the user wants, say, SubPlan 2 to be planned is completely useless
because we won't know whether that is us until after we've done the
planning that the user is trying to influence.

I need to figure out some good way of explaining this (hopefully not
too verbosely) in the commit message.

> Also, this output seems inconsistent:
>
>   Function Scan on pg_catalog.generate_series x
> -   Output: ARRAY(SubPlan 1)
> +   Output: ARRAY(array_1)
>     Function Call: generate_series(1, 3)
> -   SubPlan 1
> +   SubPlan array_1
>
> Why isn't it now "ARRAY(SubPlan array_1)"?  The previous notation was
> chosen to be not-confusable with a plain function call, but you've
> lost that.  Likewise for cases like
> -   Group Key: (InitPlan 1).col1
> +   Group Key: (minmax_1).col1
> which now looks exactly like an ordinary Var.

I don't think there's anything keeping me from making that print
InitPlan/SubPlan there. I just thought it looked a little verbose that
way. I thought that the "InitPlan 1" notation was probably due to the
fact that "1" is not a proper name, rather than (as you suggest here)
to avoid confusion with other syntax. Compare CTEs, which are also
subplans, but for which we simply print the CTE's alias name, rather
than "CTE alias_name".

> nitpick: sublinktype_to_string() should return const char *.

This bleeds into a bunch of other things. I'll poke at it and try to
figure something out. I may need help from someone with superior const
skills, but I'll give it a go.

> 0005: I'm even less convinced about there being a need for this.
> It's not that hard to see which RTIs are in which subplan,
> especially after the other changes in this patchset.

Let me just say that everything in this patch set is the result of
experimentation -- trying to write code that attempts to interpret a
Plan tree and discovering problems doing so along the way. I'm not
altogether convinced that patches 0005-0008 attack the problems in the
best possible way and I'm happy to hear suggestions for how to do it
better, but I'm pretty confident that they're all trying to tackle
real problems. The way to think about 0004 and 0005, IMHO, is this:
suppose we look at the final Plan, and we see that RTI 17 was planned
in such-and-such a way. If we want to reproduce that planning decision
in the next cycle -- or change that planning decision in the next
planning cycle -- we need to be able to figure out which subroot RTI
17 came from and what RTI it had within that subroot. With 0004 and
0005 applied, you can figure out that RTI 17 was originally RTI 5
within a subroot that was called expr_1, or whatever, and when we
replan the same query we will assign the name expr_1 to the same
subroot before we begin planning it and RTI 5 within that subroot will
refer to the same thing it did before. This assumes, of course, that
the query is the same and that inlining decisions and so forth haven't
changed and that no objects have been swapped out for objects with the
same name; but the point is that even if none of that has happened we
can't match things up in an automated way without this infrastructure,
or at least I don't know how to do it. If you do, I'm all ears. I
would much rather minimize the amount of information that we have to
store in the Plan tree or PlannedStmt and just have code to do the
necessary interpretation based on the data that's already available,
but I could not see how to make it work.

> 0006: I agree with the need for this, but the details seem messy,
> and it's not very clear that the proposed data structure would
> be convenient to use.  Do we really need to rely on plan_node_id?
> Why haven't you integrated record_elided_node into
> clean_up_removed_plan_level?
>
> An idea maybe worth thinking about is that instead of completely
> eliding the plan node, we could replace it with a "no-op" plan node
> that both EXPLAIN and the executor will look right through.
> That node could carry the relid(s) that we lost.  Not sure how
> messy this'd be to integrate, though.

There's certainly nothing that requires us to use this particular
scheme for identifying the locations where certain plan nodes were
elided. For example, we could put an additional List * in each plan
node and stash a list of elided nodes there rather than indexing by
plan_node_id. The disadvantage of that is simply that it increases the
size of the Plan struct for every node, and most nodes will end up
with an empty list. Indexing by plan_node_id was just my way of
getting the information that I wanted to store out of the tree that
the executor examines. We could potentially also do as you say here
and keep the Plan nodes around and then somehow arrange to ignore them
at explain and execution time, but I don't have a good idea of how we
would do that without adding overhead. In terms of the usability of
the data structure, experimentation has shown it to be serviceable.
There is obviously a risk that with many elided nodes, having to
grovel through the list of nodes looking for a certain plan ID over
and over again could be inefficient -- but large numbers of elided
nodes don't seem all that likely, so maybe it's OK; and I suppose any
code that uses this could always build a hash table if warranted. But
I'm not saying it's a perfect solution.

In terms of why record_elided_node is not integrated into
clean_up_removed_plan_level, that was just a stylistic choice.
clean_up_removed_plan_level() could instead call record_elided_node(),
or record_elided_node() could cease to exist and the logic be inlined
into clean_up_removed_plan_level().

> 0007: not sure about this either.  Why not simply add those
> relid(s) to the surviving node's apprelids?  Again, I don't
> love the amount of code and data structure that's being added
> for a hypothetical use-case.  It's not even clear that this
> form of the data structure would be helpful to anyone.

I'd say that is the patch I'm least sure about. Note that my goal for
this commitfest was to get 0001-0004 committed, partly because I
wasn't too sure whether the later patches might need some adjustment.
My intuition is that flattening the relid sets together loses
important information, but I don't have a test case proving that near
at hand, so maybe it doesn't. However, I'm fairly certain that it is
at least a lot more convenient to have the information in this form.
In general, if we see an Append or MergeAppend node generated from a
RelOptInfo with a single RTI, that's a partition-wise scan of a
partitioned relation, and if we see an Append or MergeAppend node
generated from a RelOptInfo with more than one RTI, that's a
partition-wise join of all those relations. So, very naively, if we
just combine the relid sets for a bunch of partitionwise scans into a
single RTI set, it looks like we've got a partitionwise join, but we
don't. Now, if it's only set operations that result in multiple levels
of Append/MergeAppend nodes getting collapsed, it might be possible to
disentangle what actually happened by partitioning the final set of
RTIs by the subroot from which they originate. I'm not sure that's how
it works, though. At any rate, I agree with you that this is not
adequately motivated at present. Having said that, at ten thousand
feet, the motivation here is to be able to figure out from the final
plan tree where exactly we switched to partitionwise operation --
whether that was done for some joinrel or only when we got down to the
underlying baserel.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: plan shape work

Tom Lane <tgl@sss.pgh.pa.us> — 2025-09-23T21:27:37Z

Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Sep 22, 2025 at 2:15 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> In the second place, we should not need to add two hundred lines
>> of new code to createplan.c to accomplish this.  Why not simply
>> bms_difference the joinrel's relids from the union of the inputs'
>> relids?

> ... On the other hand, I'm not sure that I'm interpreting your remarks
> correctly. When you say "bms_difference the joinrel's relids from the
> union of the inputs' relids" maybe you're specifically talking about
> the handling of the RTE_JOIN relids, and I don't care very much how we
> account for those. So I guess I need some clarification here as to
> what your thinking is.

What I'm saying is that I'd be much happier with 0003 if it looked
about like the attached.  We do not need a heap of mechanism
redundantly proving that the planner is getting these things right
(and potentially containing its own bugs).

> Note that my goal for
> this commitfest was to get 0001-0004 committed, partly because I
> wasn't too sure whether the later patches might need some adjustment.

Fair enough.  I think we can reach agreement on that much pretty quickly.

			regards, tom lane

Re: plan shape work

Robert Haas <robertmhaas@gmail.com> — 2025-09-24T12:27:47Z

On Tue, Sep 23, 2025 at 5:27 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> What I'm saying is that I'd be much happier with 0003 if it looked
> about like the attached.  We do not need a heap of mechanism
> redundantly proving that the planner is getting these things right
> (and potentially containing its own bugs).

Thanks for the demo. That doesn't actually Assert anything, so the
commit message is a lie, but I get the point, and based on that, I
think what we should do is just drop 0003 altogether for now. I don't
need the ojrelids field for anything currently, and if I or someone
else does later, we can always revisit this idea. Or, if it turns out
that we later introduce more bugs that my version of 0003 would have
caught, we can re-ask the question of whether we want to Assert
something. I don't agree with your judgement that this is an
unreasonable amount of mechanism for what it checks, but I'm entirely
prepared to concede that 0003 is kind of clunky, and I also think that
the rate at which we do new things in the planner is low enough that
it could easily be decades or forever before we have another problem
that this would have caught. Hence, I'm fine with dropping this patch.
Let's call it "some code that Robert found useful for personal
testing" and move on.

> > Note that my goal for
> > this commitfest was to get 0001-0004 committed, partly because I
> > wasn't too sure whether the later patches might need some adjustment.
>
> Fair enough.  I think we can reach agreement on that much pretty quickly.

Cool. Let's focus on 0004 then, and possibly 0005 since it's somewhat
related and you seem to have an idea that there could be a better way
of solving that problem. That's not necessarily to say that 0005 would
get committed this CF, unless we happen to agree vigorously on
something, but if there's a way to work around needing 0005 or if it
needs to be redone in some other form, it would be good to have some
idea around that sooner rather than later.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: plan shape work

Robert Haas <robertmhaas@gmail.com> — 2025-09-24T14:34:00Z

On Wed, Sep 24, 2025 at 8:27 AM Robert Haas <robertmhaas@gmail.com> wrote:
> Cool. Let's focus on 0004 then, and possibly 0005 since it's somewhat
> related and you seem to have an idea that there could be a better way
> of solving that problem. That's not necessarily to say that 0005 would
> get committed this CF, unless we happen to agree vigorously on
> something, but if there's a way to work around needing 0005 or if it
> needs to be redone in some other form, it would be good to have some
> idea around that sooner rather than later.

Here's a new patch set. 0004 is now 0001 and similarly all other patch
numbers are -3, since the old 0001 and 0002 were committed together
and the 0003 is abandoned. I made the following changes to
old-0004/new-0001:

- I rewrote the commit message. I'm not really sure this is any
clearer about the motivation for this patch, but I tried. Suggestions
appreciated.

- CI was complaining about a warning from sublinktype_to_string, which
I've tried to suppress by adding a dummy return to the end of the
function.

- You (Tom) complained about the lack of const on
sublinktype_to_string, so this version has been const-ified. The const
bled into the arguments to choose_plan_name() and subquery_planner(),
and into the plan_name structure members within PlannerInfo and
SubPlan. I don't know if this is the right thing to do, so feel free
to set me straight.

- You (Tom) also asked why not print InitPlan/SubPlan wherever we
refer to subplans, so this version restores that behavior. I did that
by putting logic to print the InitPlan or SubPlan prefix in
ruleutils.c, while not including InitPlan or SubPlan in the SubPlan's
plan_name field any more. The reason for this is that the purpose of
the patch set is to assign names before planning, and the decision as
to whether something is an InitPlan or a SubPlan is made after
planning, so keeping "InitPlan" or "SubPlan" in the actual plan name
undermines the whole point of the patch. I would argue that this is
actually better on philosophical grounds, because I think that the
fact that something is an expression rather than an EXISTS clause or
whatever is part of the identify of the resulting object, whereas I
think whether something is an InitPlan or a SubPlan is mostly
interesting as a performance characteristic rather than as a definer
of identity. I don't necessarily expect this position to be accepted
without debate, but I prefer the EXPLAIN output with the patch to the
pre-patch output.

The remaining patches are simply rebased and are only included here in
case we want to discuss how they could be
better/different/unnecessary, especially what's now 0002, rather than
because I'm looking to get something committed right away.

Thanks,

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: plan shape work

Tom Lane <tgl@sss.pgh.pa.us> — 2025-09-24T22:03:19Z

Robert Haas <robertmhaas@gmail.com> writes:
> Here's a new patch set. 0004 is now 0001 and similarly all other patch
> numbers are -3, since the old 0001 and 0002 were committed together
> and the 0003 is abandoned. I made the following changes to
> old-0004/new-0001:

I've not looked at all of these patches, but here's a review of v9-0001.

> - I rewrote the commit message. I'm not really sure this is any
> clearer about the motivation for this patch, but I tried. Suggestions
> appreciated.

It's much better, thanks.

> - You (Tom) complained about the lack of const on
> sublinktype_to_string, so this version has been const-ified. The const
> bled into the arguments to choose_plan_name() and subquery_planner(),
> and into the plan_name structure members within PlannerInfo and
> SubPlan. I don't know if this is the right thing to do, so feel free
> to set me straight.

I don't think so.  We do not have a nice story on marking Node fields
const: it's very unclear for example what consequences that ought to
have for copyObject().  Maybe somebody will tackle that issue someday,
but it's not something to touch casually in a patch with other
objectives.  So I don't think we can make the plan_name fields const.
The best solution I think is to make choose_plan_name() take a const
string and return a non-const one.  The attached v10-0001 is just like
your v9-0001 except for doing the const stuff this way.  I chose to
fix the impedance mismatch within choose_plan_name() by having it
pstrdup when it wants to just return the "name" string, but you could
make a case for holding your nose and just casting away const there.

> - You (Tom) also asked why not print InitPlan/SubPlan wherever we
> refer to subplans, so this version restores that behavior.

Thanks.  I'm good with the output now (modulo the bug described
below).  Someone could potentially argue that this exposes more
of the internals than we really ought to, such as the difference
between expr and multiexpr SubLinks, but I'm okay with that.

Aside from the const issue, something I don't really like at the
coding level is the use of an "allroots" list.  One reason is that
it's partially redundant with the adjacent "subroots" list, but
a bigger one is that we have transient roots that shouldn't be
in there.  An example here is pull_up_simple_subquery: it builds
a clone of the query's PlannerInfo to help it use various
infrastructure along the way to flattening the subquery, but
that clone is not referenced anymore after the function exits.
You were putting that into allroots, which seems to me to be
a fundamental error, even more so because it went in with the
same plan_name as the root it was cloned from.

I think a better idea is to keep a list of just the subplan
names that we've assigned so far.  That has a far clearer
charter, plus it can be updated immediately by choose_plan_name()
instead of relying on the caller to do the right thing later.
I coded this up, and was rather surprised to find that it changed
some regression outputs.  On investigation, that's because
build_minmax_path() was actually doing the wrong thing later:
it was putting the wrong root into allroots, so that "minmax_1"
never became assigned and could be re-used later.

I also observed that SS_process_ctes() was not using
choose_plan_name() but simply assigning the user-written CTE
name.  I believe it's possible to use the same CTE name in
different parts of a query tree, so this fails to achieve
the stated purpose of making the names unique.

I'm still a little bit uncomfortable about whether
it's okay for pull_up_simple_subquery() to just do

+	subroot->plan_name = root->plan_name;

rather than giving some other name to the transient subroot.
I think it's okay because we are not making any meaningful planning
decisions during the life of the subroot, just seeing if we can
transform the subquery into a form that allows it to be pulled up.
But you might think differently.  Perhaps a potential compromise
is to set the transient subroot's plan_name to NULL instead?

Anyway, v10-0002 is a delta patch to use a list of subplan
names instead of "allroots", and there are a couple of trivial
cosmetic changes too.

			regards, tom lane

Re: plan shape work

Robert Haas <robertmhaas@gmail.com> — 2025-09-25T13:10:54Z

On Wed, Sep 24, 2025 at 6:03 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I don't think so.  We do not have a nice story on marking Node fields
> const: it's very unclear for example what consequences that ought to
> have for copyObject().  Maybe somebody will tackle that issue someday,
> but it's not something to touch casually in a patch with other
> objectives.  So I don't think we can make the plan_name fields const.
> The best solution I think is to make choose_plan_name() take a const
> string and return a non-const one.  The attached v10-0001 is just like
> your v9-0001 except for doing the const stuff this way.  I chose to
> fix the impedance mismatch within choose_plan_name() by having it
> pstrdup when it wants to just return the "name" string, but you could
> make a case for holding your nose and just casting away const there.

Yeah, these are the kinds of trade-offs I never know how to make. I
have a little difficulty believing that it's worth a pstrdup() to have
the benefit of marking things const, but maybe it is, and the
difference is probably microscopic either way. It's not like we're
going to be faced with many planning problems involving gigantic
numbers of subqueries. If this were a hot code path we'd want to think
harder.

> > - You (Tom) also asked why not print InitPlan/SubPlan wherever we
> > refer to subplans, so this version restores that behavior.
>
> Thanks.  I'm good with the output now (modulo the bug described
> below).  Someone could potentially argue that this exposes more
> of the internals than we really ought to, such as the difference
> between expr and multiexpr SubLinks, but I'm okay with that.

I actually thought it looked kind of nice exposing some of that
detail, but it's certainly a judgement call.

> Aside from the const issue, something I don't really like at the
> coding level is the use of an "allroots" list.  One reason is that
> it's partially redundant with the adjacent "subroots" list, but
> a bigger one is that we have transient roots that shouldn't be
> in there.  An example here is pull_up_simple_subquery: it builds
> a clone of the query's PlannerInfo to help it use various
> infrastructure along the way to flattening the subquery, but
> that clone is not referenced anymore after the function exits.
> You were putting that into allroots, which seems to me to be
> a fundamental error, even more so because it went in with the
> same plan_name as the root it was cloned from.
>
> I think a better idea is to keep a list of just the subplan
> names that we've assigned so far.  That has a far clearer
> charter, plus it can be updated immediately by choose_plan_name()
> instead of relying on the caller to do the right thing later.
> I coded this up, and was rather surprised to find that it changed
> some regression outputs.  On investigation, that's because
> build_minmax_path() was actually doing the wrong thing later:
> it was putting the wrong root into allroots, so that "minmax_1"
> never became assigned and could be re-used later.

Ooph, that's embarrassing. I think the reason that I ended up making a
list of the roots themselves rather than the strings is that I was
thinking that everything in this data structure would need to be a
node, and I didn't want to cons up String nodes for every list
element. Then later I marked that structure member read_write_ignore
and never stopped to think that maybe then we didn't need nodes at
all. So, in short, I think this is a great solution and thanks a ton
for putting in the legwork to figure it out.

> I also observed that SS_process_ctes() was not using
> choose_plan_name() but simply assigning the user-written CTE
> name.  I believe it's possible to use the same CTE name in
> different parts of a query tree, so this fails to achieve
> the stated purpose of making the names unique.

Ouch, that's another good catch. Also, even if the CTEs had to be
unique among themselves, there could be a collision between a CTE name
and a generated name.

> I'm still a little bit uncomfortable about whether
> it's okay for pull_up_simple_subquery() to just do
>
> +       subroot->plan_name = root->plan_name;
>
> rather than giving some other name to the transient subroot.
> I think it's okay because we are not making any meaningful planning
> decisions during the life of the subroot, just seeing if we can
> transform the subquery into a form that allows it to be pulled up.
> But you might think differently.  Perhaps a potential compromise
> is to set the transient subroot's plan_name to NULL instead?

I don't like NULL. Imagine that some meaningful planning decision does
get made during the life of the subroot, and imagine further that some
hook is called by means of which some extension can influence that
decision. What plan_name do we want that extension to see? I think
there's some argument for letting it see the plan name of the parent
plan into which we're thinking about inlining the subquery, which I
believe is the effect of the current coding, and there's perhaps also
an argument for having a wholly new plan name there just to really
identify that particular decision clearly, but if we put NULL, that's
the name we use for the topmost query level. If we changed things so
that the top query level gets called "main" or some other constant
string, then it would make sense to use NULL as a sentinel value here
to mean "undefined," but I kind of think we probably don't want to go
there.

> Anyway, v10-0002 is a delta patch to use a list of subplan
> names instead of "allroots", and there are a couple of trivial
> cosmetic changes too.

OK, I'll try out these versions and let you know what I find out. Thanks much.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: plan shape work

Tom Lane <tgl@sss.pgh.pa.us> — 2025-09-25T13:43:24Z

Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Sep 24, 2025 at 6:03 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I think a better idea is to keep a list of just the subplan
>> names that we've assigned so far.  That has a far clearer
>> charter, plus it can be updated immediately by choose_plan_name()
>> instead of relying on the caller to do the right thing later.
>> I coded this up, and was rather surprised to find that it changed
>> some regression outputs.  On investigation, that's because
>> build_minmax_path() was actually doing the wrong thing later:
>> it was putting the wrong root into allroots, so that "minmax_1"
>> never became assigned and could be re-used later.

> Ooph, that's embarrassing. I think the reason that I ended up making a
> list of the roots themselves rather than the strings is that I was
> thinking that everything in this data structure would need to be a
> node, and I didn't want to cons up String nodes for every list
> element. Then later I marked that structure member read_write_ignore
> and never stopped to think that maybe then we didn't need nodes at
> all. So, in short, I think this is a great solution and thanks a ton
> for putting in the legwork to figure it out.

I think if we do decide later that the list-of-names needs to be
Nodes, then converting them to String nodes is a perfectly fine
solution.  As you remark elsewhere, none of this is going to be
more than microscopic compared to the overall cost of planning
a subplan.  But for right now, yeah, we don't need it.

Anyway, it seems the only remaining issue is what name
pull_up_simple_subquery should give to its transient root clone.
I don't actually believe that it matters, so if you're content
with re-using the parent name, let's just roll with that until
someone complains.

			regards, tom lane

Re: plan shape work

Alexandra Wang <alexandra.wang.oss@gmail.com> — 2025-09-26T01:20:24Z

Hi there,

I've tried v10-000{1,2}+v9-0002 and v9-000{1,2}. I was curious whether
the names choose_plan_name() chose for subqueries match the Subquery
Scan names in the EXPLAIN plan. My guess is that since the former is
chosen before planning and the latter after planning, they might
differ. I think it's probably ok to have different naming mechanisms
as long as the names are unique within themselves. But in case anyone
else cares about the naming inconsistency, here's an example that
shows it.

-- applied patches
I've applied v10-0001, v10-0002 and v9-0002. I needed v9-0002 because
I want to see the plan_names in the debug plan and in the
EXPLAIN(RANGE_TABLE) plan with pg_overexplain.
(Applying v9-000{1,2} instead should give the same results)

-- setup
CREATE TABLE r (a int, b int);
CREATE TABLE s (c int, d int);
LOAD 'pg_overexplain';
SET debug_print_plan to on;
SET client_min_messages to 'log';

-- query
EXPLAIN (range_table, costs off)
SELECT *
FROM
  (SELECT a FROM
     (SELECT a, b FROM r WHERE b > 42 ORDER BY a)
   UNION ALL
     (SELECT c FROM
(SELECT c, d FROM s WHERE d > 24 ORDER BY d)));

-- plan
                  QUERY PLAN
----------------------------------------------
 Append
   Append RTIs: 1
   ->  Subquery Scan on unnamed_subquery_1
         Scan RTI: 4
         ->  Sort
               Sort Key: r.a
               ->  Seq Scan on r
                     Filter: (b > 42)
                     Scan RTI: 6
   ->  Subquery Scan on unnamed_subquery_2
         Scan RTI: 5
         ->  Sort
               Sort Key: s.d
               ->  Seq Scan on s
                     Filter: (d > 24)
                     Scan RTI: 7
 RTI 1 (subquery, inherited, in-from-clause):
   Eref: unnamed_subquery (a)
 RTI 2 (subquery):
   Eref: unnamed_subquery (a)
 RTI 3 (subquery):
   Eref: unnamed_subquery (c)
 RTI 4 (subquery, in-from-clause):
   Eref: unnamed_subquery (a, b)
 RTI 5 (subquery, in-from-clause):
   Eref: unnamed_subquery (c, d)
 RTI 6 (relation, in-from-clause):
   Subplan: unnamed_subquery
   Eref: r (a, b)
   Relation: r
   Relation Kind: relation
   Relation Lock Mode: AccessShareLock
   Permission Info Index: 1
 RTI 7 (relation, in-from-clause):
   Subplan: unnamed_subquery_1
   Eref: s (c, d)
   Relation: s
   Relation Kind: relation
   Relation Lock Mode: AccessShareLock
   Permission Info Index: 2
 Unprunable RTIs: 6 7
(41 rows)

-- interesting part of the debug plan:
   :subplans <>
   :subrtinfos (
      {SUBPLANRTINFO
      :plan_name unnamed_subquery
      :rtoffset 5
      :dummy false
      }
      {SUBPLANRTINFO
      :plan_name unnamed_subquery_1
      :rtoffset 6
      :dummy false
      }

It appears that in the EXPLAIN plan the subqueries are named
"unnamed_subquery" (does not show up in the EXPLAIN output),
"unnamed_subquery_1", and "unnamed_subquery_2"; whereas in the RTIs
section from pg_overexplain, as well as in the debug plan's
:subrtinfos section, the subplans are named "unnamed_subquery" and
"unnamed_subquery_1".

IIUC, the Subquery Scan names in the query plan, for example:
   ->  Subquery Scan on unnamed_subquery_2
is the name assigned to this Subquery Scan node of RTI: 5, after
planning.

And the Subplan name of an RTI with pg_overexplain, for example:
 RTI 7 (relation, in-from-clause):
   Subplan: unnamed_subquery_1
   Eref: s (c, d)
is the Subplan name chosen before planning. RTI 7 here maps to the
SUBPLANRTINFO with ":rtoffset 6" in the debug plan, which means it
belongs to the Subplan named "unnamed_subquery_1". This is what I
think causes confusion, because from the query plan we see that RTI 7
is under the Subquery Scan on "unnamed_subquery_2", not
"unnamed_subquery_1".

I think technically this is not a problem, since we can uniquely
identify the Subplans using the names assigned before planning, and we
can also uniquely identify the Subquery Scans in the EXPLAIN plan
using the names assigned after planning. Still, I found it a bit
confusing when looking at the EXPLAIN(RANGE_TABLE) output, where the
same name "unnamed_subquery_1" not only doesn't mean the same plan
node, but also not in the same branch of the plan tree.

Thoughts?

Best,
Alex

Re: plan shape work

Robert Haas <robertmhaas@gmail.com> — 2025-09-26T01:40:56Z

On Thu, Sep 25, 2025 at 9:21 PM Alexandra Wang
<alexandra.wang.oss@gmail.com> wrote:
> I've tried v10-000{1,2}+v9-0002 and v9-000{1,2}. I was curious whether
> the names choose_plan_name() chose for subqueries match the Subquery
> Scan names in the EXPLAIN plan. My guess is that since the former is
> chosen before planning and the latter after planning, they might
> differ. I think it's probably ok to have different naming mechanisms
> as long as the names are unique within themselves. But in case anyone
> else cares about the naming inconsistency, here's an example that
> shows it.

Yeah. Technically, these are not the same names: one set of names is
the names assigned to the subqueries, and the other is the set of
names assigned to the relations that get scanned by subquery scans.
This may seem like a technicality, but that's not entirely the case,
because in each case the chosen names are unique. If, for example,
there were a table named unnamed_subquery that we were
subquery-scanning, then you couldn't also have a subquery scan of that
table, but you could still have a subquery with that name.

But maybe there's still some way to improve this. It would probably be
hard to make it perfect because of the fact that EXPLAIN names are
unique across all relations in the query, as noted above. However, we
might be able to make it so the names match in the absence of name
conflicts with user specified aliases or table names. Or maybe we
should consider some larger change to the EXPLAIN format so that we
display subquery names instead of relation names.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: plan shape work

Tom Lane <tgl@sss.pgh.pa.us> — 2025-09-26T01:53:30Z

Robert Haas <robertmhaas@gmail.com> writes:
> But maybe there's still some way to improve this. It would probably be
> hard to make it perfect because of the fact that EXPLAIN names are
> unique across all relations in the query, as noted above. However, we
> might be able to make it so the names match in the absence of name
> conflicts with user specified aliases or table names. Or maybe we
> should consider some larger change to the EXPLAIN format so that we
> display subquery names instead of relation names.

I'm kind of down on that last idea, because relation names/aliases
are user-supplied or at least user-suppliable, whereas what
choose_plan_name() generates is not under user control.  So if we
try to make this match, it should be in the direction of using
the parent query's subquery alias whenever possible.  But I fear
that it'll be hard to make this match up exactly unless we choose
to take the selection of unique relation aliases out of EXPLAIN
altogether and make the planner do it.  Which seems like a bad
idea, because then we're paying that cost all the time, and
it's not small for big queries.

			regards, tom lane

Re: plan shape work

Richard Guo <guofenglinux@gmail.com> — 2025-09-26T02:26:16Z

FWIW, I'm a bit concerned about the double for loop inside
choose_plan_name(), especially since the outer loop runs with a true
condition.  Maybe I'm just worrying over nothing, as we probably don't
expect a large number of subroots in practice, but the nested loops
still make me a little uneasy.

- Richard

Re: plan shape work

Tom Lane <tgl@sss.pgh.pa.us> — 2025-09-26T02:37:20Z

Richard Guo <guofenglinux@gmail.com> writes:
> FWIW, I'm a bit concerned about the double for loop inside
> choose_plan_name(), especially since the outer loop runs with a true
> condition.  Maybe I'm just worrying over nothing, as we probably don't
> expect a large number of subroots in practice, but the nested loops
> still make me a little uneasy.

I really doubt that a query could have enough subplans to make
that a problem.  But if I'm wrong, it's surely something we could
improve in a localized way later.

			regards, tom lane

Re: plan shape work

Richard Guo <guofenglinux@gmail.com> — 2025-09-26T06:29:38Z

On Fri, Sep 26, 2025 at 11:37 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Richard Guo <guofenglinux@gmail.com> writes:
> > FWIW, I'm a bit concerned about the double for loop inside
> > choose_plan_name(), especially since the outer loop runs with a true
> > condition.  Maybe I'm just worrying over nothing, as we probably don't
> > expect a large number of subroots in practice, but the nested loops
> > still make me a little uneasy.

> I really doubt that a query could have enough subplans to make
> that a problem.  But if I'm wrong, it's surely something we could
> improve in a localized way later.

I'm concerned not only about the potential for a large number of
subplans but also because if there happens to be a bug within the
nested loops, the always-true condition in the outer loop could cause
an infinite loop.  However, if you're confident that the code inside
the loop is completely bug-free and will remain so through future
changes, then this shouldn't be an issue.

Looking at choose_plan_name(), IIUC, the nested loop is used to find
the next unused suffix number for a given name.  I'm wondering why not
simply iterate through glob->subplanNames once, check the suffix
number for each name matching the given base name, determine the
current maximum suffix, and then use "max_suffix + 1" as the next
unused suffix.  This approach requires only a single pass through the
list, and if there's a bug, the worst-case scenario would be a
duplicate name rather than an infinite loop.  It seems to me that this
approach is both more efficient and less risky.

- Richard

Re: plan shape work

Tom Lane <tgl@sss.pgh.pa.us> — 2025-09-26T14:23:21Z

Richard Guo <guofenglinux@gmail.com> writes:
> Looking at choose_plan_name(), IIUC, the nested loop is used to find
> the next unused suffix number for a given name.  I'm wondering why not
> simply iterate through glob->subplanNames once, check the suffix
> number for each name matching the given base name, determine the
> current maximum suffix, and then use "max_suffix + 1" as the next
> unused suffix.  This approach requires only a single pass through the
> list, and if there's a bug, the worst-case scenario would be a
> duplicate name rather than an infinite loop.  It seems to me that this
> approach is both more efficient and less risky.

"simply" is perhaps not the right adjective there.  My guess is that
this approach nets out to more code, more possibilities for bugs
(especially in cases where one name is a prefix of another), and
will be slower in typical cases with just a few subplan names.

As an example of edge cases that your idea introduces, what happens
if a user-written subquery name is "expr_999999999999999999999999"
and then we need to generate a unique name based on "expr"?  Now
we have an integer-overflow situation to worry about, with possibly
platform-dependent results.

But it's Robert's patch, so he gets to make the call.

			regards, tom lane

Re: plan shape work

Robert Haas <robertmhaas@gmail.com> — 2025-09-26T14:23:29Z

On Fri, Sep 26, 2025 at 2:29 AM Richard Guo <guofenglinux@gmail.com> wrote:
> Looking at choose_plan_name(), IIUC, the nested loop is used to find
> the next unused suffix number for a given name.  I'm wondering why not
> simply iterate through glob->subplanNames once, check the suffix
> number for each name matching the given base name, determine the
> current maximum suffix, and then use "max_suffix + 1" as the next
> unused suffix.  This approach requires only a single pass through the
> list, and if there's a bug, the worst-case scenario would be a
> duplicate name rather than an infinite loop.  It seems to me that this
> approach is both more efficient and less risky.

I feel like the current coding is more straightforward and this should
be considered in terms of the chance of having bugs. Doing as you
propose here would require starting with a prefix match of the string
and then attempting a string-to-integer conversion on the remaining
bytes. That's certainly doable but such things tend to be a bit
fiddly: you have to make sure you do the right thing when you see a
non-digit and when the value overflows or is zero. It wouldn't be that
hard to get it right, but I think it would be a little trickier than
we have now. If we find that the performance cost of this function is
too high in some scenario, replacing it within an implementation along
these lines would make sense to me, but I am not too worried about the
current logic accidentally causing an infinite loop. I don't think
that will happen but even if it does it should be a simple fix.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: plan shape work

Richard Guo <guofenglinux@gmail.com> — 2025-09-29T01:47:50Z

On Fri, Sep 26, 2025 at 11:23 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Richard Guo <guofenglinux@gmail.com> writes:
> > Looking at choose_plan_name(), IIUC, the nested loop is used to find
> > the next unused suffix number for a given name.  I'm wondering why not
> > simply iterate through glob->subplanNames once, check the suffix
> > number for each name matching the given base name, determine the
> > current maximum suffix, and then use "max_suffix + 1" as the next
> > unused suffix.  This approach requires only a single pass through the
> > list, and if there's a bug, the worst-case scenario would be a
> > duplicate name rather than an infinite loop.  It seems to me that this
> > approach is both more efficient and less risky.

> "simply" is perhaps not the right adjective there.  My guess is that
> this approach nets out to more code, more possibilities for bugs
> (especially in cases where one name is a prefix of another), and
> will be slower in typical cases with just a few subplan names.
>
> As an example of edge cases that your idea introduces, what happens
> if a user-written subquery name is "expr_999999999999999999999999"
> and then we need to generate a unique name based on "expr"?  Now
> we have an integer-overflow situation to worry about, with possibly
> platform-dependent results.

I'd argue that this hypothetical edge case can be resolved with a bit
of canonicalization in how subplan names are represented internally.
I think the issue you mentioned arises because there is no clearly
distinction between the base name and the numeric suffix.  I haven't
spent much time thinking about it, but an off-the-cuff idea is to
require that all subplan names in glob->subplanNames end with a suffix
of the form "_<number>".  (If no numeric suffix is required, we can
use the suffix "_0".)  With this convention, we can simply split
on the last underscore: everything before it is the base name, and
everything after is the numeric suffix.

The user-written subquery name "expr_999999999999999999999999" would
be internally represented as "expr_999999999999999999999999_0".  Then,
when we need to generate a unique name based on "expr", it won't match
with the base name of that subquery name.

With this canonicalization in place, my proposed approach is simply a
matter of applying strrchr(name, '_') and tracking the maximum suffix
number in a single pass over glob->subplanNames.  I think this can be
handled with straightforward, basic C code.  It seems to me that
this could also eliminate the need for the additional loop under the
"if (!always_number)" branch in choose_plan_name().

By replacing a pass over the subplanNames list plus a nested loop with
a single pass, I doubt that this would be slower in typical cases.

- Richard

Re: plan shape work

Tom Lane <tgl@sss.pgh.pa.us> — 2025-09-29T02:41:47Z

Richard Guo <guofenglinux@gmail.com> writes:
> On Fri, Sep 26, 2025 at 11:23 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> As an example of edge cases that your idea introduces, what happens
>> if a user-written subquery name is "expr_999999999999999999999999"
>> and then we need to generate a unique name based on "expr"?  Now
>> we have an integer-overflow situation to worry about, with possibly
>> platform-dependent results.

> I'd argue that this hypothetical edge case can be resolved with a bit
> of canonicalization in how subplan names are represented internally.

[ raised eyebrow... ]  How did you get to that from the complaint
that Robert's patch was not obviously bug-free?  (A complaint I
thought was unmerited, but nevermind.)  This proposal is neither
simple, nor obviously bug-free.  Moreover, in view of comments
upthread, I think we should look with great suspicion on any
proposal that involves changing user-supplied subquery aliases
unnecessarily.

			regards, tom lane

Re: plan shape work

Richard Guo <guofenglinux@gmail.com> — 2025-09-29T08:52:18Z

On Mon, Sep 29, 2025 at 11:41 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Richard Guo <guofenglinux@gmail.com> writes:
> > On Fri, Sep 26, 2025 at 11:23 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> As an example of edge cases that your idea introduces, what happens
> >> if a user-written subquery name is "expr_999999999999999999999999"
> >> and then we need to generate a unique name based on "expr"?  Now
> >> we have an integer-overflow situation to worry about, with possibly
> >> platform-dependent results.

> > I'd argue that this hypothetical edge case can be resolved with a bit
> > of canonicalization in how subplan names are represented internally.

> [ raised eyebrow... ]  How did you get to that from the complaint
> that Robert's patch was not obviously bug-free?  (A complaint I
> thought was unmerited, but nevermind.)

I'm not sure I fully understand your point here.  Apologies if I got
it wrong.

Firstly, my intention in the previous email was merely to propose a
solution for my approach to address the edge case you raised.  I don't
see how this relates to my so-called "complaint" about Robert's patch
not being obviously bug-free.  You raised a case where my approach
won't work, and I provided a solution to address it.  That's all.

Secondly, I don't think it's fair to characterize my concern as a
complaint when I expressed that the nested loop with an always-true
condition is vulnerable to bugs and could potentially cause an
infinite loop if such a bug exists.

In a nearby thread, I was asked whether I can guarantee my code is
100% bug-free.  After some consideration, I think I cannot make such a
guarantee, and I doubt that anyone realistically can.  Given this, I
think it's important that we try our best to write code that minimizes
the potential bad-effect should a bug occur.

Therefore, upon observing a nested loop with an always-true condition
in the patch, I expressed my concern and suggested a possible
improvement.  However, I did not expect that concern to be treated as
an unmerited complaint.

> This proposal is neither
> simple, nor obviously bug-free.  Moreover, in view of comments
> upthread, I think we should look with great suspicion on any
> proposal that involves changing user-supplied subquery aliases
> unnecessarily.

It seems no one has attempted to code up the approach I suggested, so
I went ahead and did it; please see the attached PoC patch.  It's just
a proof of concept to show what I have in mind, so please excuse the
lack of comments and necessary assertions for now.

I agree that this implementation cannot be guaranteed to be bug-free,
but I'm not sure I agree that it's not simple.  I'm also not convinced
that it would be slower in typical cases.

BTW, a small nit I just noticed: I suggest explicitly initializing
glob->subplanNames in standard_planner().  It may be argued that this
is pointless, as makeNode() zeroes all fields by default.  But AFAICS
subplanNames is the only field in PlannerGlobal that is not explicitly
initialized.

- Richard

Re: plan shape work

Robert Haas <robertmhaas@gmail.com> — 2025-09-29T14:12:31Z

On Mon, Sep 29, 2025 at 4:52 AM Richard Guo <guofenglinux@gmail.com> wrote:
> It seems no one has attempted to code up the approach I suggested, so
> I went ahead and did it; please see the attached PoC patch.  It's just
> a proof of concept to show what I have in mind, so please excuse the
> lack of comments and necessary assertions for now.

I think that if there are subqueries named expr_1 and expr_3, this
will assign the name expr_4 next, whereas the previous patch assigns
the name expr_2. That might not be a bug, but it's different and I
don't like it as well.

I also think that if there are subqueries named expr_1a and expr_2a,
this will assign the name expr_3 next, whereas the previous patch
assigns the name expr_1. I would consider that a clear bug.

Your code will also see expr_01 and decide that the next name should
be expr_2 rather than expr_1. I don't think that's right, either.

I also think that it's a bug that your function sometimes returns a
value different from the one it appends to canon_names.

I don't really understand why you're so fixed on this point. I think
that the code as I wrote it is quite a normal way to write code for
that kind of thing. Sure, there are other things that we could do, but
I wrote the code that way I did precisely in order to avoid behaviors
like the ones I mention above. It would be possible to rearrange the
code so that the termination condition for the loop was something like
"!found", or to rewrite the loop as a do { ... } while (!found)
construct as we do in set_rtable_names() for a very similar problem to
the code that this is solving, but I think the generated machine code
would be exactly the same and the code would not look as intuitive for
a human to read. Somebody else might have had a different stylistic
preference, but if you are going to object every time I write for (;;)
or while (1), you're going to hate an awful lot of my code for, IMHO,
very little reason. It's reasonable to be concerned about whether a
loop will ever terminate, but the mere fact of putting the loop exit
someplace other than the top of the loop isn't enough to say that
there's a problem.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: plan shape work

Tom Lane <tgl@sss.pgh.pa.us> — 2025-09-29T14:27:57Z

Robert Haas <robertmhaas@gmail.com> writes:
> I don't really understand why you're so fixed on this point. I think
> that the code as I wrote it is quite a normal way to write code for
> that kind of thing.

Also, we have numerous other places that generate de-duplicated names
in pretty much this way (ruleutils.c's set_rtable_names being a very
closely related case).  I don't think we should go inventing some
random new way to do that.

If it turns out that Robert's code is too slow in practice, I would
prefer to deal with that by using a hashtable to keep track of
already-allocated names, not by changing the user-visible behavior.
I'm content to wait for field complaints before building such logic
though, because I really doubt that queries would ever have so
many subplans as to be a problem.

			regards, tom lane

Re: plan shape work

Richard Guo <guofenglinux@gmail.com> — 2025-09-30T01:59:10Z

On Mon, Sep 29, 2025 at 11:12 PM Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Sep 29, 2025 at 4:52 AM Richard Guo <guofenglinux@gmail.com> wrote:
> > It seems no one has attempted to code up the approach I suggested, so
> > I went ahead and did it; please see the attached PoC patch.  It's just
> > a proof of concept to show what I have in mind, so please excuse the
> > lack of comments and necessary assertions for now.

> I think that if there are subqueries named expr_1 and expr_3, this
> will assign the name expr_4 next, whereas the previous patch assigns
> the name expr_2. That might not be a bug, but it's different and I
> don't like it as well.
>
> I also think that if there are subqueries named expr_1a and expr_2a,
> this will assign the name expr_3 next, whereas the previous patch
> assigns the name expr_1. I would consider that a clear bug.
>
> Your code will also see expr_01 and decide that the next name should
> be expr_2 rather than expr_1. I don't think that's right, either.
>
> I also think that it's a bug that your function sometimes returns a
> value different from the one it appends to canon_names.

I don't think any of the bugs you described upthread exist in the PoC
patch.  The patch ensures that all names stored in glob->subplanNames
are canonicalized to the format "$basename_$suffixnum".  If no numeric
suffix is required, the name is stored with a "_0" suffix.  This
guarantees a clear distinction between the base name and the numeric
suffix for all names stored in the list.  (Please note that this
canonicalization applies only to how names are stored internally, not
to user-visible names.  So no user-visible behavior change here.)

I don't really see how uncanonicalized names like expr_1a, expr_2a, or
expr_01 would appear in the subplanNames list to begin with.  Perhaps
you're referring to user-supplied subquery aliases?  In that case, the
patch deliberately avoids matching expr to those names, since it only
compares the base name.  As a result, it would assign expr_1 as the
next name, which is the expected behavior.

This PoC patch passes all the regression tests, which at least, IMHO,
suggests that it avoids such basic bugs.

However, since both you and Tom feel this proposal doesn't make
sense, I'll withdraw it.  Apologies for any trouble this has caused.

- Richard

Re: plan shape work

Robert Haas <robertmhaas@gmail.com> — 2025-10-06T19:13:59Z

On Mon, Sep 29, 2025 at 9:59 PM Richard Guo <guofenglinux@gmail.com> wrote:
> I don't think any of the bugs you described upthread exist in the PoC
> patch.  The patch ensures that all names stored in glob->subplanNames
> are canonicalized to the format "$basename_$suffixnum".  If no numeric
> suffix is required, the name is stored with a "_0" suffix.  This
> guarantees a clear distinction between the base name and the numeric
> suffix for all names stored in the list.  (Please note that this
> canonicalization applies only to how names are stored internally, not
> to user-visible names.  So no user-visible behavior change here.)

I studied your patch in more detail today and I find that you are
correct. It doesn't do what I thought it did. However, it also doesn't
guarantee uniqueness. Here is an example:

robert.haas=# explain select (select random() limit 1), (select
random() limit 2) from (select * from pg_class limit 1) expr_1;
WARNING:  choose_plan_name for name "expr" returns "expr_1"
WARNING:  choose_plan_name for name "expr" returns "expr_2"
WARNING:  choose_plan_name for name "expr_1" returns "expr_1"
                               QUERY PLAN
-------------------------------------------------------------------------
 Subquery Scan on expr_1  (cost=0.03..0.08 rows=1 width=16)
   InitPlan expr_1
     ->  Limit  (cost=0.00..0.01 rows=1 width=8)
           ->  Result  (cost=0.00..0.01 rows=1 width=8)
   InitPlan expr_2
     ->  Limit  (cost=0.00..0.01 rows=1 width=8)
           ->  Result  (cost=0.00..0.01 rows=1 width=8)
   ->  Limit  (cost=0.00..0.04 rows=1 width=240)
         ->  Seq Scan on pg_class  (cost=0.00..18.16 rows=416 width=240)
(9 rows)

When I originally looked at the patch, I believed that the use of
atoi() without sanity checking was going to cause the sorts of
problems I was mentioning. That was wrong, since you never let a
user-specified name leak directly into the list, but only names to
which your code has added a numeric suffix. But that also means that
you don't get conflicts between names that, in reality, should
conflict.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: plan shape work

Robert Haas <robertmhaas@gmail.com> — 2025-10-06T19:36:49Z

I see that Richard's PoC last patch confused cfbot. Here's a new
version of just the patch proposed for commit for CfBot testing.

Re: plan shape work

Richard Guo <guofenglinux@gmail.com> — 2025-10-09T02:24:49Z

On Tue, Oct 7, 2025 at 4:37 AM Robert Haas <robertmhaas@gmail.com> wrote:
> I see that Richard's PoC last patch confused cfbot. Here's a new
> version of just the patch proposed for commit for CfBot testing.

Does it make sense to explicitly initialize glob->subplanNames in
standard_planner()?  I understand this might seem pointless since
makeNode() zeroes all fields by default, but subplanNames is currently
the only field in PlannerGlobal that isn't explicitly initialized.  I
previously committed a patch (2c0ed86d3) to ensure all PlannerGlobal
fields are explicitly initialized, and I'd prefer to maintain that
consistency.

I actually suggested the same in [1] (the last paragraph), but it
seems to have been overlooked.

[1] https://postgr.es/m/CAMbWs4-ysLvZiWp=w5=+noCMdX9FHFrrc0Wuk-TcUz1RDmEbkQ@mail.gmail.com

- Richard

Re: plan shape work

Tom Lane <tgl@sss.pgh.pa.us> — 2025-10-09T03:04:37Z

Richard Guo <guofenglinux@gmail.com> writes:
> Does it make sense to explicitly initialize glob->subplanNames in
> standard_planner()?  I understand this might seem pointless since
> makeNode() zeroes all fields by default, but subplanNames is currently
> the only field in PlannerGlobal that isn't explicitly initialized.  I
> previously committed a patch (2c0ed86d3) to ensure all PlannerGlobal
> fields are explicitly initialized, and I'd prefer to maintain that
> consistency.

We don't really have consensus on that point, I fear.  I like the
initialize-em-all-explicitly approach, but some other senior hackers
think it's useless verbiage.

My argument for doing it explicitly is that when adding a new field
to a struct, one frequently searches for existing references to a
nearby field.  Without initialize-em-all, this risks missing places
where you need to initialize your new field.  If you'd only set it
to zero, then fine ... but what if that particular place needs some
other initial value?  So I think omitting initializations-to-zero
risks future bugs of omission.  Some other folk don't find that
argument very compelling, though.

			regards, tom lane

Re: plan shape work

Robert Haas <robertmhaas@gmail.com> — 2025-10-09T12:34:56Z

On Wed, Oct 8, 2025 at 11:04 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> We don't really have consensus on that point, I fear.  I like the
> initialize-em-all-explicitly approach, but some other senior hackers
> think it's useless verbiage.

I'm in the "useless verbiage" camp.

> My argument for doing it explicitly is that when adding a new field
> to a struct, one frequently searches for existing references to a
> nearby field.  Without initialize-em-all, this risks missing places
> where you need to initialize your new field.  If you'd only set it
> to zero, then fine ... but what if that particular place needs some
> other initial value?  So I think omitting initializations-to-zero
> risks future bugs of omission.  Some other folk don't find that
> argument very compelling, though.

This problem does not typically happen for me because it is my habit
to start by grepping for makeNode(Whatever) -- or for relevant palloc
calls, for non-Node types -- at the very start of my research into any
topic, and only later to look into specific fields.

There might be a consistency argument for doing this in this case, if
other nearby fields are doing the same thing, and if one of you wants
to go and make it so I have much better things to do than spend time
complaining about it. But any code I write is likely to rely on
makeNode() or palloc0() to zero fields wherever relying on such a
thing is convenient, unless somebody forces me to do otherwise in a
particular case.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: plan shape work

Robert Haas <robertmhaas@gmail.com> — 2025-12-01T20:31:06Z

On Wed, Sep 24, 2025 at 6:03 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Aside from the const issue, something I don't really like at the
> coding level is the use of an "allroots" list.  One reason is that
> it's partially redundant with the adjacent "subroots" list, but
> a bigger one is that we have transient roots that shouldn't be
> in there.  An example here is pull_up_simple_subquery: it builds
> a clone of the query's PlannerInfo to help it use various
> infrastructure along the way to flattening the subquery, but
> that clone is not referenced anymore after the function exits.
> You were putting that into allroots, which seems to me to be
> a fundamental error, even more so because it went in with the
> same plan_name as the root it was cloned from.
>
> I think a better idea is to keep a list of just the subplan
> names that we've assigned so far.  That has a far clearer
> charter, plus it can be updated immediately by choose_plan_name()
> instead of relying on the caller to do the right thing later.
> I coded this up, and was rather surprised to find that it changed
> some regression outputs.  On investigation, that's because
> build_minmax_path() was actually doing the wrong thing later:
> it was putting the wrong root into allroots, so that "minmax_1"
> never became assigned and could be re-used later.

Today, I discovered a disadvantage of the change from allroots to
subplanNames. The concern that you raise about transient roots still
seems entirely valid to me. However, the allroots list - if correctly
constructed to exclude such transient roots - seems to me to have
potential utility for extensions that subplanNames doesn't. What I
wanted to do was use planner_shutdown_hook to copy some information
from each PlannerInfo to the extension_state field of the PlannedStmt
and, without allroots, there's no easy way to find all of them.

I do think there are other ways to solve that problem. For instance,
we could add a hook that's called when we invoke subquery_planner,
similar to the way that planner_hook can be used to intercept calls to
planner(). The new hook could then do whatever it likes at the end of
each call to subquery_planner(). That has the disadvantage of making
each call to subquery_planner() a little more expensive, but it might
turn out that such a hook has other utility.

For what I was trying to do today, I can probably even solve it using
set_join_pathlist_hook. I wanted to capture the relid sets of all
baserels and joinrels where rel->unique_rel is non-NULL, and I think
that I could do that by having set_join_pathlist_hook add notice when
save_jointype is JOIN_UNIQUE_INNER or JOIN_UNIQUE_OUTER, and then add
innerrel or outerrel as appropriate to a list hanging off an object
attached using Get/SetPlannerGlobalExtensionState, making sure not to
add the same one more than once. But that only works in this specific
case, and it's pretty indirect even for that.

So I'm wondering if we ought to step back and rethink a bit.
subplanNames ensures that we assign a different name to every
PlannerInfo, but it doesn't give you any help finding the PlannerInfo
given the name, or enumerating all of the PlannerInfo objects that
exist. If we went back to allroots, and just fixed the problem of
temporary entries creeping into the list, the core code wouldn't
really notice the difference, but I think extensions would have an
easier time.

Thoughts?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: plan shape work

Robert Haas <robertmhaas@gmail.com> — 2025-12-08T19:06:35Z

On Mon, Dec 1, 2025 at 3:31 PM Robert Haas <robertmhaas@gmail.com> wrote:
> So I'm wondering if we ought to step back and rethink a bit.
> subplanNames ensures that we assign a different name to every
> PlannerInfo, but it doesn't give you any help finding the PlannerInfo
> given the name, or enumerating all of the PlannerInfo objects that
> exist. If we went back to allroots, and just fixed the problem of
> temporary entries creeping into the list, the core code wouldn't
> really notice the difference, but I think extensions would have an
> easier time.
>
> Thoughts?

Here's a couple of patches to bring back allroots. The first one adds
allroots, makes choose_plan_name() use it, and adds assertions that
the contents of allroots and subplanNames correspond. The second one
removes subplanNames. I imagine that these would be collapsed for
commit, but testing 0001 without 0002 seems useful for peace of mind.

-- 
Robert Haas
EDB: http://www.enterprisedb.com