Thread
Commits
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Assign each subquery a unique name prior to planning it.
- 8c49a484e8eb 19 (unreleased) landed
-
Keep track of what RTIs a Result node is scanning.
- f2bae51dfd5b 19 (unreleased) landed
-
plan shape work
Robert Haas <robertmhaas@gmail.com> — 2025-05-19T18:01:55Z
Hi, A couple of people at pgconf.dev seemed to want to know more about my ongoing plan shape work, so here are the patches I have currently. This is a long way from something that actually looks like a usable feature, but these are bits of infrastructure that I think will be necessary to get to a usable feature. As a recap, my overall goal here is to make it so that you can examine a finished plan, figure out what decisions the planner made, and then somehow get the planner to make those same decisions over again in a future planning cycle. Since doing this for all types of planner decisions seems too difficult for an initial goal, I'm focusing on scans and joins for now. A further goal is that I want it to be possible for extensions to use this infrastructure to implement a variety of different policies that they might feel to be beneficial, so I'm looking to minimize the amount of stuff that has to be done in core PostgreSQL or can only be used by core PostgreSQL. So far I've identified two main problems. First, you need to be able to reconstruct the planner decisions from the final plan, which you almost can do already but we're missing a few key pieces of information in the final plan tree. Second, you need to be able to write those decisions down in a way that can be correctly and unambiguously reinterpreted during a future planning cycle for the same query. For example, if we say that the planner chose a sequential scan of table foo, what exactly does that mean? Table foo could appear multiple times in the query, either in different subqueries or the same one, and it could be a partitioned table with a different scan method for each partition. Let's start by talking about problem #1. I've found two subproblems in this area so far. The first is that a Result node does not store the relids of the scan or join that it replaces. Note that a Result note whose job is to apply a one-time filter or a projection to some subordinate node is not an issue here -- we can just look through the Result node to whatever scan or join is beneath it. The concern here is about the case where a scan or join is proven empty and entirely replaced by a Result node that has "One-Time Filter: false". Patch 0001 adds that field, and patch 0002 teaches ExplainPreScanNodes about it, which results in a number of regression test output changes that I personally consider to be improvements -- with these patches, we properly qualify some column references with a table alias as EXPLAIN does in all other cases, as opposed to printing them as bare column names with no alias. Patch 0003 checks that this is the only problem of this type that is visible at the stage where we are constructing join paths. Still talking about problem #1, the second subproblem I've identified is that during setrefs processing, we elide trivial SubqueryScan, Append, and MergeAppend nodes from the final plan. So during planning we might see, for example, that a certain join is between RTI 4 and RTI 5 and it's, say, a hash join. But after setrefs processing, it may appear that RTI was joined to, say, RTI 13, which might not even have been part of the same subquery level. If we record that we want to see RTI 4 joined to RTI 13 via a hash join, that will be completely useless -- the join planning code will never see those two RTIs as options to be joined to each other. What I've done in 0006 is made it so that we keep a record of each node elided during setrefs processing. This list of elided nodes is stored in the PlannedStmt outside of the portion of the tree that actually gets executed, so that code that is doing plan tree inspection can look at it but execution doesn't get any slower (except possibly by having to copy a slightly larger amount of data around when reading and writing PlannedStmt objects, which seems like it should be negligible). Now let's talk about problem #2. I believe that we do not actually want to refer to what happened to RTI 4 and RTI 5 as I mooted in the previous paragraph, but rather to refer to relations by some kind of name. However, we can't use the names shown in the EXPLAIN output, because those are not really present in the plan and are only assigned on-the-fly by EXPLAIN; hence, they can't be known at plan time. Since planning proceeds one subquery at a time, I think the right way to approach this problem is to first name the subquery and then to name the table within that subquery. Subqueries sort of have names right now, at least some of them, but it's an odd system: a CTE subquery, for example, has the name mentioned by the user, but other kinds of subplans just get names like "InitPlan 3" or "SubPlan 2". The real problem, though, is that those names are only assigned after we've FINISHED planned the subquery. If we begin planning our very first subquery, it might turn out to be InitPlan 1 or SubPlan 1, or if while planning it we recurse into some further subquery then *that* subquery might become InitPlan 1 or SubPlan 1 and OUR subquery might become InitPlan 2 or SubPlan 2 (or higher, if we find more subqueries and recurse into them too). Thus, being given some information about how the user wants, say, SubPlan 2 to be planned is completely useless because we won't know whether that is us until after we've done the planning that the user is trying to influence. To solve that problem, I decided to arrange for every subquery to have a unique name that is assigned before we begin planning it. Patch 0004 does this. Then, 0005 records those names in the final plan. That's enough that you can look at the scanrelid (or apprelids, etc.) of a Plan node and relate that back to a named subquery and a particular RTI within that subquery. There is still the problem of how to name relations within a single subquery, since it's possible to reuse aliases within the same subquery level (simple cases are rejected, but there are at least two ways to bypass the error check, and they look intentional, so we can't just block it). Then, references to a certain alias name can be further multiplied by inheritance expansion. This is all a bit hairy but I haven't really found any fundamental problems here that keep you from deciding on a workable naming convention. Hope you find this interesting. If you do, let me know what you think. Thanks, -- Robert Haas EDB: http://www.enterprisedb.com
-
Re: plan shape work
Tomas Vondra <tomas@vondra.me> — 2025-05-20T10:05:05Z
On 5/19/25 20:01, Robert Haas wrote: > Hi, > > A couple of people at pgconf.dev seemed to want to know more about my > ongoing plan shape work, so here are the patches I have currently. > This is a long way from something that actually looks like a usable > feature, but these are bits of infrastructure that I think will be > necessary to get to a usable feature. As a recap, my overall goal here > is to make it so that you can examine a finished plan, figure out what > decisions the planner made, and then somehow get the planner to make > those same decisions over again in a future planning cycle. Since > doing this for all types of planner decisions seems too difficult for > an initial goal, I'm focusing on scans and joins for now. A further > goal is that I want it to be possible for extensions to use this > infrastructure to implement a variety of different policies that they > might feel to be beneficial, so I'm looking to minimize the amount of > stuff that has to be done in core PostgreSQL or can only be used by > core PostgreSQL. > > ... Thanks for the overview. I don't have any immediate feedback, but it sounds like it might be related to the "making planner decisions clear" session from the unconference ... The basic premise of that session was about how to give users better info about the planner decisions - why paths were selected/rejected, etc. A simple example would be "why was the index not used", and the possible answers include "dominated by cost by another path" or "does not match the index keys" etc. I wonder if this work might be useful for something like that. regards -- Tomas Vondra
-
Re: plan shape work
Maciek Sakrejda <maciek@pganalyze.com> — 2025-05-20T19:09:04Z
+1, this seems like it could be very useful. A somewhat related issue is being able to tie plan nodes back to the query text: it can be hard to understand the planner's decisions if it's not even clear what part of the query it's making decisions about. I'm sure this is not an easy problem in general, but I wonder if you think that could be improved in the course of this work, or if you have other thoughts about it. Thanks, Maciek
-
Re: plan shape work
Robert Haas <robertmhaas@gmail.com> — 2025-05-21T13:58:51Z
On Tue, May 20, 2025 at 2:45 PM Tomas Vondra <tomas@vondra.me> wrote: > Thanks for the overview. I don't have any immediate feedback, but it > sounds like it might be related to the "making planner decisions clear" > session from the unconference ... > > The basic premise of that session was about how to give users better > info about the planner decisions - why paths were selected/rejected, > etc. A simple example would be "why was the index not used", and the > possible answers include "dominated by cost by another path" or "does > not match the index keys" etc. > > I wonder if this work might be useful for something like that. I've been wondering that, too. There's definitely some indirect ways in which that might be the case. For example, I think this work would lend itself to saying "hey, try planning this query, but for that table over there, use an index scan on this table." Then, it either still doesn't -- meaning the index isn't usable for some reason -- or it does and you can see the resulting plan with presumably higher cost and maybe infer why it didn't happen. That's better than today, where we have only very crude tools that let us do things like disable an entire scan type for the entire query, and I think it would make it a lot easier and less frustrating for a knowledgeable user to figure out why things are happening. But even though I think that would be better than today, I'm not sure it rises to the level of actually being good, because I think it still requires a fairly knowledgeable operator to figure things out, and you probably have to experiment a bunch to understand the situation instead of, say, being able to just look at the EXPLAIN plan and see the answer. I think being able to look at the EXPLAIN plan and see the answer, without needing a bunch of poking around, would be the ideal scenario here. But in some sense this is the same problem as understanding how an AI neural network is reasoning. The answer to "why did the planner pick plan X" is always "X was the cheapest possible plan". Ideas like "we chose a merge join because both tables are large enough that neither would fit into a hash table conveniently" are human explanations of why the math had the effect that it did; they are not how the planner actually reasons. So it's not just a matter of exposing the actual reasoning process to the user, because the computer is not reasoning in a way that a human would. It would have to be a matter of exposing some kind of other information that would allow the human being to comprehend easily what led the machine's algorithm to a certain conclusion; and it is not obvious how to get there. I have a sense - possibly an incorrect one - that the core of the problem here is that the planner considers lots of very similar alternatives. A hypothetical feature that showed the second-cheapest plan would be all but useless, because the second-cheapest plan would just be a very minor variation of the cheapest plan in almost all cases. One idea that crossed my mind was to display information in EXPLAIN about what would have happened if we'd done something really different. For instance, suppose that at a certain level of the plan tree we actually chose a merge join, but we also show the estimated cost of the cheapest hash join (if any) and the cheapest nested loop (if any) that we considered at that level. The user might be able to draw useful conclusions based on whether those numbers were altogether absent (i.e. that join type was not viable at all) or whether the cost was a little higher or a lot higher than that of the path actually chosen. For scans, you could list which indexes were believed to be usable and perhaps what the cost would have been for the cheapest one not actually selected; and what the cost of a sequential scan would have been if you hadn't picked one. I'm not sure how useful this would be, so the whole idea might actually suck, or maybe it's sort of the right idea but needs a bunch of refinement to really be useful. I don't have a better idea right now, though. If there are any notes that were taken during that unconference session, please point me in the right direction; I was in another session at that time but would read any available notes with interest. -- Robert Haas EDB: http://www.enterprisedb.com
-
Re: plan shape work
Robert Haas <robertmhaas@gmail.com> — 2025-05-21T14:28:47Z
On Tue, May 20, 2025 at 3:09 PM Maciek Sakrejda <maciek@pganalyze.com> wrote: > +1, this seems like it could be very useful. A somewhat related issue > is being able to tie plan nodes back to the query text: it can be hard > to understand the planner's decisions if it's not even clear what part > of the query it's making decisions about. I'm sure this is not an easy > problem in general, but I wonder if you think that could be improved > in the course of this work, or if you have other thoughts about it. Thanks. I don't really have any ideas about the problem you mention, perhaps partly because I haven't experienced it too much. I mean, I have sometimes been confused about which parts of the query go with which parts of the EXPLAIN, but I think in my experience so far that is mostly because either (1) both the query and the EXPLAIN output are super long and maybe also super-wide and therefore it's hard to correlate things by eye or (2) somebody wrote a query where they use the same table and/or table alias over and over again in different parts of the query and so it's hard to tell which reference goes with which. Neither of those problems seems all that exciting to me from a dev perspective: if you're calling everything a or x or orders or something, maybe don't do that, and if your query is 1500 characters long, I guess you need to budget some time to align that with the query plan. I don't really know how much we can do here. But maybe there are cases that I haven't seen where something better is possible, or perhaps you have some good idea that I haven't considered. (If I'm honest, I do have an idea that I think might very significantly improve the readability of EXPLAIN output. I think it would make it much less wide in normal cases without making it much longer. This has been percolating in my brain for a few years now and I have the vague intention of proposing it at some point, but not until I'm good and ready to be flamed to a well-done crisp, because I'm quite sure there will be more than one opinion on the merits.) -- Robert Haas EDB: http://www.enterprisedb.com
-
Re: plan shape work
Maciek Sakrejda <m.sakrejda@gmail.com> — 2025-05-21T16:02:13Z
On Wed, May 21, 2025 at 7:29 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Tue, May 20, 2025 at 3:09 PM Maciek Sakrejda <maciek@pganalyze.com> wrote: > > +1, this seems like it could be very useful. A somewhat related issue > > is being able to tie plan nodes back to the query text: it can be hard > > to understand the planner's decisions if it's not even clear what part > > of the query it's making decisions about. I'm sure this is not an easy > > problem in general, but I wonder if you think that could be improved > > in the course of this work, or if you have other thoughts about it. > > Thanks. I don't really have any ideas about the problem you mention, > perhaps partly because I haven't experienced it too much. That may be due to your extensive experience with Postgres and EXPLAIN plans. > I mean, I > have sometimes been confused about which parts of the query go with > which parts of the EXPLAIN, but I think in my experience so far that > is mostly because either (1) both the query and the EXPLAIN output are > super long and maybe also super-wide and therefore it's hard to > correlate things by eye or (2) somebody wrote a query where they use > the same table and/or table alias over and over again in different > parts of the query and so it's hard to tell which reference goes with > which. Neither of those problems seems all that exciting to me from a > dev perspective: if you're calling everything a or x or orders or > something, maybe don't do that, and if your query is 1500 characters > long, I guess you need to budget some time to align that with the > query plan. Fair enough, although the people trying to make sense of EXPLAIN plans are sometimes not the same ones who are writing the queries. And sometimes the queries are not written by people at all but by ORMs (or—heaven help us—vibe coded). "Don't do X" is a reasonable response to "It hurts when I do X," but it doesn't really solve the user's problem. That said, it's hard to argue with "We don't have any good ideas on how to improve this right now, and it's not a total dumpster fire, so we'll focus on other work." > I don't really know how much we can do here. But maybe > there are cases that I haven't seen where something better is > possible, or perhaps you have some good idea that I haven't > considered. No great ideas here. I thought initially that a good solution would be to have structured EXPLAIN output include something like "Query Text Start Index" and "Query Text End Index" fields for each node, but I realized that this doesn't really work for multiple joins (and probably other cases). Maybe "Query Text Indices", as a list of pairs? But from the little I know about the planner, that seems like any sort of tracking back to the source would be hard to implement. And it only really solves the problem for external EXPLAIN viewers, and only ones that put in the work to support this. I'm not sure if the problem can be meaningfully addressed for text format, but maybe that's another reason not to spend time on it in core. > (If I'm honest, I do have an idea that I think might very > significantly improve the readability of EXPLAIN output. I think it > would make it much less wide in normal cases without making it much > longer. This has been percolating in my brain for a few years now and > I have the vague intention of proposing it at some point, but not > until I'm good and ready to be flamed to a well-done crisp, because > I'm quite sure there will be more than one opinion on the merits.) I'm intrigued, and happy to stand by with an extinguisher. The road to great ideas is paved with bad ideas. Thanks, Maciek
-
Re: plan shape work
Robert Haas <robertmhaas@gmail.com> — 2025-05-21T16:27:32Z
On Wed, May 21, 2025 at 12:03 PM Maciek Sakrejda <m.sakrejda@gmail.com> wrote: > That may be due to your extensive experience with Postgres and EXPLAIN plans. Yes, that is very possible. All things being equal, it helps to have done something a lot of times. > Fair enough, although the people trying to make sense of EXPLAIN plans > are sometimes not the same ones who are writing the queries. And > sometimes the queries are not written by people at all but by ORMs > (or—heaven help us—vibe coded). "Don't do X" is a reasonable response > to "It hurts when I do X," but it doesn't really solve the user's > problem. That said, it's hard to argue with "We don't have any good > ideas on how to improve this right now, and it's not a total dumpster > fire, so we'll focus on other work." +1 to all of that. > No great ideas here. I thought initially that a good solution would be > to have structured EXPLAIN output include something like "Query Text > Start Index" and "Query Text End Index" fields for each node, but I > realized that this doesn't really work for multiple joins (and > probably other cases). Maybe "Query Text Indices", as a list of pairs? > But from the little I know about the planner, that seems like any sort > of tracking back to the source would be hard to implement. And it only > really solves the problem for external EXPLAIN viewers, and only ones > that put in the work to support this. I'm not sure if the problem can > be meaningfully addressed for text format, but maybe that's another > reason not to spend time on it in core. I'm not gonna say you couldn't make something like that work, but it sounds like a lot of effort for a hypothetical piece of external visualization software that might or might not produce satisfying results. My advice to anyone wanting to pursue this idea would be: make a totally fake POC first. Get a sample query with at least a moderately complex plan, get the EXPLAIN output, manually generate whatever data you think PostgreSQL ought to be able to spit out, and do a mock-up of an external viewer. When you're happy with the results, show it to some other people and see if they also like it. We can have the discussion about whether to include anything in core and what it should be after that. I definitely would not rule out the possibility that something like this could turn out to be really cool -- maybe hovering over stuff and having the corresponding part of the plan get highlighted will turn out to be awesome. But I think it might also turn out that there are things where it's not quite clear what you can or should usefully highlight, like target-list items, or for example a case where the query says that a.x = b.x and b.x = c.x but in the actual plan we use evaluate a.x = c.x, an expression not appearing anywhere in the query text. The legwork of sorting some of that kind of stuff out should really happen before making a feature proposal. > I'm intrigued, and happy to stand by with an extinguisher. The road to > great ideas is paved with bad ideas. Thanks. That proposal is a task for another day, but I appreciate the sentiment. -- Robert Haas EDB: http://www.enterprisedb.com
-
Re: plan shape work
Andy Fan <zhihuifan1213@163.com> — 2025-05-22T00:54:49Z
Robert Haas <robertmhaas@gmail.com> writes: > Hi, > > ... As a recap, my overall goal here > is to make it so that you can examine a finished plan, figure out what > decisions the planner made, and then somehow get the planner to make > those same decisions over again in a future planning cycle. I am feeling that this is similar with Oracle's outline feature, where the final plan is examined and then a series of hints are stored and then during the replanning of the the same query, these hints will be applied to planner. If one of the hints is not appliable any more, like the index is unusable, it is just ignored. > This list of elided nodes is stored in the PlannedStmt I did a quick check on the attached patches and I can see some more information is added into PlannedStmt. then my question are the PlannedStmt is not avaiable during the future planning cycle, then how does these information would be helpful on the feture planning? and I'm strange that there are little changes on the optimizer part. Does this patchset just a preparation work for reconstructing a same plan in future? > outside of the portion of the tree that actually gets executed, so > that code that is doing plan tree inspection can look at it but > execution doesn't get any slower. Thank you for sharing this! -- Best Regards Andy Fan
-
Re: plan shape work
Andy Fan <zhihuifan1213@163.com> — 2025-05-22T08:14:23Z
Andy Fan <zhihuifan1213@163.com> writes: > >> This list of elided nodes is stored in the PlannedStmt > > I did a quick check on the attached patches and I can see some more > information is added into PlannedStmt. then my question are the > PlannedStmt is not avaiable during the future planning cycle, then how > does these information would be helpful on the feture planning? and I'm > strange that there are little changes on the optimizer part. Does this > patchset just a preparation work for reconstructing a same plan in > future? I'm sure it is a preliminary work for reconstructing a same plan in future, sorry for this noise. -- Best Regards Andy Fan
-
Re: plan shape work
Robert Haas <robertmhaas@gmail.com> — 2025-08-26T14:58:33Z
On Mon, May 19, 2025 at 2:01 PM Robert Haas <robertmhaas@gmail.com> wrote: > A couple of people at pgconf.dev seemed to want to know more about my > ongoing plan shape work, so here are the patches I have currently. Here's an updated patch set. My goal for the September CommitFest is to get patches 0001-0004 committed. Of course, if there are too many objections or too little review, that might not happen, but that's my goal. This patch set is basically unchanged from the previous patch set, except that I've added one new patch. 0007 records information about Append node consolidation into the final plan tree. Without this, when we build an AppendPath or MergeAppendPath and pull up the subpaths from a similar underlying node, we can lose the RTIs from the subordinate node, making it very difficult to analyze the plan after the fact. Just to remark a bit further on the structure of the patch set, 0001-0003 are closely related. The only one I really need committed in order to move forward is 0001, but I think the others are a good idea. There is probably room for some bikeshedding on the output produced by 0002. Then after that, 0004 stands alone as an incredibly important and foundational patch: without it, there's no way to know what the name of a subplan will be until after it's already been planned. I am fairly confident in the approach that I've taken here, but it does cause user-visible changes in EXPLAIN output about which people might conceivably have strong opinions. Getting agreement either on what I've done here or some variant of the approach is essential for me to be able to move forward. Then, 0005-0007 all have to do with preserving in the final plan various details that today would be discarded at the end of planning. While I'm happy to have comments on these now, I'm still not completely confident that I've found all issues in this area or handled them perfectly; hence, I'm not in a hurry to move forward with those just yet. -- Robert Haas EDB: http://www.enterprisedb.com
-
Re: plan shape work
Bruce Momjian <bruce@momjian.us> — 2025-08-26T20:16:34Z
On Tue, Aug 26, 2025 at 10:58:33AM -0400, Robert Haas wrote: > During planning, there is one range table per subquery; at the end if > planning, those separate range tables are flattened into a single > range table. Prior to this change, it was impractical for code > examining the final plan to understand which parts of the flattened > range table came from which subquery's range table. > > If the only consumer of the final plan is the executor, that is > completely fine. However, if some code wants to examine the final > plan, or what happens when we execute it, and extract information from > it that be used in future planning cycles, it's inconvenient. I am very interested in how plans can be used for future planning. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Do not let urgent matters crowd out time for investment in the future.
-
Re: plan shape work
Andrei Lepikhov <lepihov@gmail.com> — 2025-08-27T09:41:00Z
On 19/5/2025 20:01, Robert Haas wrote: > Hope you find this interesting. If you do, let me know what you think.Thanks for looking in this direction! Since 2017, we designed features that should 'memorise' the experience of previous executions, checking the PlanState tree and instrumentation after execution. The first dumb prototype you should know - AQO. The following, more robust code is 'replan' [1]. These two features utilise the core patch, and I hope this patch can be removed after a few adjustments to the core. In fact, 'replan' is used to stop execution in the middle (if the time/memory usage limit is achieved), pass through the state, collect valuable data and execute again. That's why using data for the subsequent execution needs a matching query tree, and that data may be irrelevant for a different set of constants. The lessons learned during design and after some usage in real life: 1. Subplans should be uniquely identified during the planning. Like you mentioned, two similar subplans on the same query level may be executed differently and provide different row numbers to the 'knowledge base'. I used the subquery_planner hack that generated a unique ID for each subplan. 2. Each node should be identified - I used a kind of signature at each RelOptInfo node - just a hash generated by restrictinfos and underlying signatures. This approach needs a create_plan hook to copy the signature to the Plan nodes. 3. A great source of fluctuations is 'never executed' nodes - because depending on the constant, the subtree may never be executed or produce tons of tuples - I resolved it by just ignoring 'never executed' results, no in-core changes needed. 4. A tree of paths may implement a single RelOptInfo. I have saved the signature at the top of the Plan node for the corresponding RelOptInfo and have never encountered any problems yet. It limits the use of gathered data on cardinality/group number/peak memory consumption somewhat, but not significantly. I think the node extension fields and hooks, debated in the thread [2], may be enough to enable extensions to implement such a feature. What's more, I personally prefer to have a hook that allows extension checking of a condition during execution - it may be done inside the ExecProcNode() by calling the node-specific hook, which an extension may initialise in the PlanState structure before the start of execution. Additionally, it would be beneficial to have a hook for error processing at the top of the portal execution code - this is a key point for managing query resources. If we need to stop and re-execute the query, it is the only place where we can release all the resources assigned. One more helpful thing - an option to postpone receiver sending data out of the instance-side, even result headings. We may want to decide on the query plan after some tuples are produced, and if something has already been sent to the user, we can't just stop and rebuild the plan. [1] https://postgrespro.com/docs/enterprise/16/realtime-query-replanning [2] https://www.postgresql.org/message-id/flat/CA%2BTgmoYxfg90rw13%2BJcYwn4dwSC%2Bagw7o8-A%2BfA3M0fh96pg8w%40mail.gmail.com -- regards, Andrei Lepikhov
-
Re: plan shape work
Alexandra Wang <alexandra.wang.oss@gmail.com> — 2025-08-28T17:22:06Z
Hi Robert, On Tue, Aug 26, 2025 at 7:59 AM Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, May 19, 2025 at 2:01 PM Robert Haas <robertmhaas@gmail.com> wrote: > > A couple of people at pgconf.dev seemed to want to know more about my > > ongoing plan shape work, so here are the patches I have currently. > > Here's an updated patch set. My goal for the September CommitFest is > to get patches 0001-0004 committed. Of course, if there are too many > objections or too little review, that might not happen, but that's my > goal. > Thanks for the patches! I don’t know enough about the history in this area to object to your approach or suggest an alternative design. That said, I’ve reviewed patches 0001-0004, and as individual patches they make sense to me. Below are some more detailed comments, which would only be relevant if you decide to proceed in this direction. 0002: QUERY PLAN ---------------------------------------------------------- Nested Loop Left Join - Output: t1.i, (1), t2.i2, i3, t4.i4 + Output: t1.i, (1), t2.i2, t3.i3, t4.i4 -> Nested Loop Left Join - Output: t1.i, t2.i2, (1), i3 + Output: t1.i, t2.i2, (1), t3.i3 Join Filter: false -> Hash Left Join Output: t1.i, t2.i2, (1) These plan changes introduced by 0002, which adds schema qualifiers, make me very happy. I think it’s a nice improvement on its own. In reply to 0002's commit message: > XXX. I have broken this out as a separate commit for now; however, > it could be merged with the commit to add 'relids' to 'Result'; or > the patch series could even be rejiggered to present this as the > primary benefit of that change, leaving the EXPLAIN changes as a > secondary benefit, instead of the current organization, which does > the reverse. I’m fine with the current organization. I agree that if we just compare the EXPLAIN changes in 0001, which add additional “Replaces:” information for the simple Result node, with the EXPLAIN changes in 0002, the changes in 0002 are arguably more attractive. However, I think the EXPLAIN changes in 0001 are a more direct reflection of what the rest of 0001 is trying to achieve: keeping track of the RTIs a Result node is scanning. The changes in 0002 feel more like a side benefit. With that said, this is just my personal preference. If other reviewers feel differently, I won’t object. 0003: In get_scanned_rtindexes(): + case T_NestLoop: + { + Bitmapset *outer_scanrelids; + Bitmapset *inner_scanrelids; + Bitmapset *combined_scanrelids; + + outer_scanrelids = + get_scanned_rtindexes(root, plan->lefttree); + inner_scanrelids = + get_scanned_rtindexes(root, plan->righttree); + combined_scanrelids = + bms_union(outer_scanrelids, inner_scanrelids); + inner_scanrelids = remove_join_rtis(root, inner_scanrelids); + + return remove_join_rtis(root, combined_scanrelids); + break; + } It looks like there is an redundant assignment to inner_scanrelids: + inner_scanrelids = remove_join_rtis(root, inner_scanrelids); 0004: There is a compiler warning reported in the CommitFest build: https://cirrus-ci.com/task/6248981396717568 [23:03:57.811] subselect.c: In function ‘sublinktype_to_string’: [23:03:57.811] subselect.c:3232:1: error: control reaches end of non-void function [-Werror=return-type] [23:03:57.811] 3232 | } [23:03:57.811] | ^ [23:03:57.811] cc1: all warnings being treated as errors [23:03:57.812] make[4]: *** [<builtin>: subselect.o] Error 1 [23:03:57.812] make[3]: *** [../../../src/backend/common.mk:37: plan-recursive] Error 2 [23:03:57.812] make[2]: *** [common.mk:37: optimizer-recursive] Error 2 [23:03:57.812] make[2]: *** Waiting for unfinished jobs.... [23:04:05.513] make[1]: *** [Makefile:42: all-backend-recurse] Error 2 [23:04:05.514] make: *** [GNUmakefile:21: world-bin-src-recurse] Error 2 You might want to add a return to get rid of the warning. Still in 0004: --- a/src/backend/optimizer/plan/planagg.c +++ b/src/backend/optimizer/plan/planagg.c @@ -339,6 +340,8 @@ build_minmax_path(PlannerInfo *root, MinMaxAggInfo *mminfo, memcpy(subroot, root, sizeof(PlannerInfo)); subroot->query_level++; subroot->parent_root = root; + subroot->plan_name = choose_plan_name(root->glob, "minmax", true); + /* reset subplan-related stuff */ subroot->plan_params = NIL; subroot->outer_params = NULL; @@ -359,6 +362,9 @@ build_minmax_path(PlannerInfo *root, MinMaxAggInfo *mminfo, /* and we haven't created PlaceHolderInfos, either */ Assert(subroot->placeholder_list == NIL); + /* Add this to list of all PlannerInfo objects. */ + root->glob->allroots = lappend(root->glob->allroots, root); + In the last diff, it should add "subroot" instead of "root" to the list of all PlannerInfos. Currently, if there are multiple minmax expressions, we end up with the following plan showing duplicate names: test=# explain (costs off) SELECT MIN(value), MAX(value) FROM test_minmax; QUERY PLAN ------------------------------------------------------------------------------------------------- Result Replaces: Aggregate InitPlan minmax_1 -> Limit -> Index Only Scan using test_minmax_value_idx on test_minmax Index Cond: (value IS NOT NULL) InitPlan minmax_1 -> Limit -> Index Only Scan Backward using test_minmax_value_idx on test_minmax test_minmax_1 Index Cond: (value IS NOT NULL) (10 rows) Best, Alex -
Re: plan shape work
Robert Haas <robertmhaas@gmail.com> — 2025-09-02T20:07:17Z
On Thu, Aug 28, 2025 at 1:22 PM Alexandra Wang <alexandra.wang.oss@gmail.com> wrote: > Thanks for the patches! Thanks for the review. Responding just briefly to avoid quoting too much text: - I'm glad to hear that you like 0002 and consider it an improvement independent of what follows. - I'm glad to hear that you're OK with the current split between 0001 and 0002. - I would like opinions on those topics from more people. - I have attempted to fix all of the other mistakes that you pointed out in the attached v3, which is also rebased. -- Robert Haas EDB: http://www.enterprisedb.com
-
Re: plan shape work
Richard Guo <guofenglinux@gmail.com> — 2025-09-04T08:21:10Z
On Wed, Sep 3, 2025 at 5:07 AM Robert Haas <robertmhaas@gmail.com> wrote: > Thanks for the review. Responding just briefly to avoid quoting too much text: > > - I'm glad to hear that you like 0002 and consider it an improvement > independent of what follows. > - I'm glad to hear that you're OK with the current split between 0001 and 0002. > - I would like opinions on those topics from more people. > - I have attempted to fix all of the other mistakes that you pointed > out in the attached v3, which is also rebased. I've reviewed 0001 to 0003, and they all make sense to me. The changes to the EXPLAIN output in 0001 and 0002 are very nice improvements. I found some issues with 0003 though. It seems get_scanned_rtindexes is intended to return RTI sets with outer join relids excluded. For some node types, such as Append and MergeAppend, it fails to do so, which can cause the assertion in assert_join_preserves_scan_rtis to fail. For example: create table p (a int, b int) partition by range(a); create table p1 partition of p for values from (0) to (10); create table p2 partition of p for values from (10) to (20); set enable_partitionwise_join to on; explain (costs off) select * from p t1 left join p t2 on t1.a = t2.a left join p t3 on t2.b = t3.b; server closed the connection unexpectedly Besides, to exclude outer join relids, it iterates over the RTI sets, checks each RTE for type RTE_JOIN, and bms_del_member it if found (cf. remove_join_rtis). I think a simpler approach would be to leverage PlannerInfo.outer_join_rels: scanrelids = bms_difference(scanrelids, root->outer_join_rels); Therefore, I suggest that we don't try to remove the join RTIs in get_scanned_rtindexes. Instead, we do that in assert_join_preserves_scan_rtis -- before comparing the RTIs from the outer and inner subplans with the join's RTIs -- by leveraging PlannerInfo.outer_join_rels. And remove_join_rtis can be retired. - Richard -
Re: plan shape work
Robert Haas <robertmhaas@gmail.com> — 2025-09-05T15:51:07Z
On Thu, Sep 4, 2025 at 4:21 AM Richard Guo <guofenglinux@gmail.com> wrote: > I found some issues with 0003 though. It seems get_scanned_rtindexes > is intended to return RTI sets with outer join relids excluded. For > some node types, such as Append and MergeAppend, it fails to do so, > which can cause the assertion in assert_join_preserves_scan_rtis to > fail. For example: > > create table p (a int, b int) partition by range(a); > create table p1 partition of p for values from (0) to (10); > create table p2 partition of p for values from (10) to (20); > > set enable_partitionwise_join to on; > > explain (costs off) > select * from p t1 > left join p t2 on t1.a = t2.a > left join p t3 on t2.b = t3.b; > server closed the connection unexpectedly Ouch. Good catch. > Besides, to exclude outer join relids, it iterates over the RTI sets, > checks each RTE for type RTE_JOIN, and bms_del_member it if found (cf. > remove_join_rtis). I think a simpler approach would be to leverage > PlannerInfo.outer_join_rels: > > scanrelids = bms_difference(scanrelids, root->outer_join_rels); I was not aware of outer_join_rels, so thank you for pointing it out. However, consider this query: select 1 from pg_class a inner join pg_class b on a.relfilenode = b.relfilenode; Here, we end up with a three-item range table: one for a, one for b, and one for the join. But the join is not an outer join, and does not appear in root->outer_join_rels. Therefore, I'm not sure we can rely on outer_join_rels in this scenario. -- Robert Haas EDB: http://www.enterprisedb.com
-
Re: plan shape work
Tom Lane <tgl@sss.pgh.pa.us> — 2025-09-05T16:00:55Z
Robert Haas <robertmhaas@gmail.com> writes: > I was not aware of outer_join_rels, so thank you for pointing it out. > However, consider this query: > select 1 from pg_class a inner join pg_class b on a.relfilenode = b.relfilenode; > Here, we end up with a three-item range table: one for a, one for b, > and one for the join. But the join is not an outer join, and does not > appear in root->outer_join_rels. Therefore, I'm not sure we can rely > on outer_join_rels in this scenario. Plain (not-outer) joins will never be included in a relid set in the first place. regards, tom lane
-
Re: plan shape work
Robert Haas <robertmhaas@gmail.com> — 2025-09-05T16:40:22Z
On Fri, Sep 5, 2025 at 12:00 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Plain (not-outer) joins will never be included in a relid set in the > first place. Ah, well then Richard's idea might work! Let me try it and see what happens... Thanks, -- Robert Haas EDB: http://www.enterprisedb.com
-
Re: plan shape work
Robert Haas <robertmhaas@gmail.com> — 2025-09-05T17:38:43Z
On Fri, Sep 5, 2025 at 12:40 PM Robert Haas <robertmhaas@gmail.com> wrote: > Ah, well then Richard's idea might work! Let me try it and see what happens... Here's the result. Seem to work OK. -- Robert Haas EDB: http://www.enterprisedb.com
-
Re: plan shape work
Dilip Kumar <dilipbalaut@gmail.com> — 2025-09-08T05:05:32Z
On Wed, May 21, 2025 at 7:29 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Tue, May 20, 2025 at 2:45 PM Tomas Vondra <tomas@vondra.me> wrote: > I have a sense - possibly an incorrect one - that the core of the > problem here is that the planner considers lots of very similar > alternatives. A hypothetical feature that showed the second-cheapest > plan would be all but useless, because the second-cheapest plan would > just be a very minor variation of the cheapest plan in almost all > cases. One idea that crossed my mind was to display information in > EXPLAIN about what would have happened if we'd done something really > different. For instance, suppose that at a certain level of the plan > tree we actually chose a merge join, but we also show the estimated > cost of the cheapest hash join (if any) and the cheapest nested loop > (if any) that we considered at that level. The user might be able to > draw useful conclusions based on whether those numbers were altogether > absent (i.e. that join type was not viable at all) or whether the cost > was a little higher or a lot higher than that of the path actually > chosen. For scans, you could list which indexes were believed to be > usable and perhaps what the cost would have been for the cheapest one > not actually selected; and what the cost of a sequential scan would > have been if you hadn't picked one. > > I'm not sure how useful this would be, so the whole idea might > actually suck, or maybe it's sort of the right idea but needs a bunch > of refinement to really be useful. I don't have a better idea right > now, though. Having detailed information on the costs of alternative join methods/scan method, even when a different method is chosen, would be valuable information. For example, if a merge join is selected for tables t1 and t2 in a subquery, showing the estimated costs for both a hash join and a nested loop join would provide a more complete picture of the planner's decision-making process. And I believe, this information would be particularly useful if the cost of a non-selected plan, such as a nested loop join, is very close to the cost of the chosen merge join. In such cases, a database administrator or query optimizer could use this insight to manually override the planner's choice and opt for the nested loop join for specific tables in a subquery. This level of detail would empower users to fine-tune query performance and explore alternative execution strategies. IIUC, one of the goal of this work is where operator can say I want to use this scan method while scanning a particular table in a particular subquery, that means if the planner can give the information about non selected paths as well then it would be really helpful in making this process more smooth otherwise without much information on what path got rejected its very hard to provide hints. -- Regards, Dilip Kumar Google
-
Re: plan shape work
Richard Guo <guofenglinux@gmail.com> — 2025-09-08T09:51:33Z
On Sat, Sep 6, 2025 at 1:00 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > I was not aware of outer_join_rels, so thank you for pointing it out. > > However, consider this query: > > > select 1 from pg_class a inner join pg_class b on a.relfilenode = b.relfilenode; > > > Here, we end up with a three-item range table: one for a, one for b, > > and one for the join. But the join is not an outer join, and does not > > appear in root->outer_join_rels. Therefore, I'm not sure we can rely > > on outer_join_rels in this scenario. > Plain (not-outer) joins will never be included in a relid set in the > first place. Exactly. Non-outer joins wouldn't cause Vars to become null, so we never include them in the joinrel's relids. BTW, I'm wondering if we can take outer join relids into account in assert_join_preserves_scan_rtis(), which could make the check more useful. A joinrel's relids consists of three parts: the outer plan's relids, the inner plan's relids, and the relids of outer joins that are calculated at this join. We already have the first two. If we can find a way to determine the third, we'd be able to assert that: outer_relids U inner_relids U outerjoin_relids == joinrel->relids Determining the third part can be tricky though, especially due to outer-join identity 3: the "outerjoin_relids" of one outer join might include more than one outer join relids. But I think this is till doable. (This may not be useful for your overall goal in this patchset, so feel free to ignore it if it's not of interest.) - Richard
-
Re: plan shape work
Robert Haas <robertmhaas@gmail.com> — 2025-09-08T13:56:18Z
On Mon, Sep 8, 2025 at 5:51 AM Richard Guo <guofenglinux@gmail.com> wrote: > > Plain (not-outer) joins will never be included in a relid set in the > > first place. > > Exactly. Non-outer joins wouldn't cause Vars to become null, so we > never include them in the joinrel's relids. OK. I didn't understand what the rules were there. > BTW, I'm wondering if we can take outer join relids into account in > assert_join_preserves_scan_rtis(), which could make the check more > useful. A joinrel's relids consists of three parts: the outer plan's > relids, the inner plan's relids, and the relids of outer joins that > are calculated at this join. We already have the first two. If we > can find a way to determine the third, we'd be able to assert that: > > outer_relids U inner_relids U outerjoin_relids == joinrel->relids > > Determining the third part can be tricky though, especially due to > outer-join identity 3: the "outerjoin_relids" of one outer join might > include more than one outer join relids. But I think this is till > doable. > > (This may not be useful for your overall goal in this patchset, so > feel free to ignore it if it's not of interest.) I don't mind doing the work if there's a reasonable and useful way of accomplishing the goal. However, one concern I have is that it seems pointless if we're computing outerjoin_relids by essentially redoing the same computation that set the join's relid set in the first place. In that case, the cross-check has no real probative value. All it would be demonstrating is that if you calculate outerjoin_relids twice using essentially the same methodology, you get the same answer. That seems like a waste of code to me. If there's a way to calculate outerjoin_relids using a different methodology than what we used when populating the joinrelids, that would be interesting. It would be similar to how the existing code recomputes the outer and inner relids in a way that can potentially find issues that otherwise would not have been spotted (such as the Result node case). Do you have a proposal? -- Robert Haas EDB: http://www.enterprisedb.com
-
Re: plan shape work
Richard Guo <guofenglinux@gmail.com> — 2025-09-09T02:22:40Z
On Mon, Sep 8, 2025 at 10:56 PM Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Sep 8, 2025 at 5:51 AM Richard Guo <guofenglinux@gmail.com> wrote: > > BTW, I'm wondering if we can take outer join relids into account in > > assert_join_preserves_scan_rtis(), which could make the check more > > useful. A joinrel's relids consists of three parts: the outer plan's > > relids, the inner plan's relids, and the relids of outer joins that > > are calculated at this join. We already have the first two. If we > > can find a way to determine the third, we'd be able to assert that: > > > > outer_relids U inner_relids U outerjoin_relids == joinrel->relids > > > > Determining the third part can be tricky though, especially due to > > outer-join identity 3: the "outerjoin_relids" of one outer join might > > include more than one outer join relids. But I think this is till > > doable. > > > > (This may not be useful for your overall goal in this patchset, so > > feel free to ignore it if it's not of interest.) > I don't mind doing the work if there's a reasonable and useful way of > accomplishing the goal. However, one concern I have is that it seems > pointless if we're computing outerjoin_relids by essentially redoing > the same computation that set the join's relid set in the first place. > In that case, the cross-check has no real probative value. All it > would be demonstrating is that if you calculate outerjoin_relids twice > using essentially the same methodology, you get the same answer. That > seems like a waste of code to me. If there's a way to calculate > outerjoin_relids using a different methodology than what we used when > populating the joinrelids, that would be interesting. It would be > similar to how the existing code recomputes the outer and inner relids > in a way that can potentially find issues that otherwise would not > have been spotted (such as the Result node case). > > Do you have a proposal? One idea (not fully thought through) is that we record the calculated outerjoin_relids for each outer join in its JoinPaths. (We cannot store this in the joinrel's RelOptInfo because it varies depending on the join sequence we use.) And then we could use the recorded outerjoin_relids for the assertion here: outer_relids U inner_relids U joinpath->ojrelids == joinrel->relids The value of this approach, IMO, is that it could help verify the correctness of how we compute outer joins' outerjoin_relids, ie. the logic in add_outer_joins_to_relids(), which is quite complex due to outer-join identity 3. If we miscalculate the outerjoin_relids for one certain outer join, this assertion could catch it effectively. However, this shouldn't be a requirement for committing your patches. Maybe we should discuss it in a separate thread. - Richard
-
Re: plan shape work
Robert Haas <robertmhaas@gmail.com> — 2025-09-09T13:17:51Z
First of all, as an administrative note, since both you and Alex seem to like 0001 and 0002 and no suggestions for improvement have been offered, I plan to commit those soon unless there are objections or additional review comments. I will likely do the same for 0003 as well, pending the results of the current conversation, but maybe not quite as quickly. I believe that 0004 still needs more review, and its effects will be more user-visible than 0001-0003, so I don't plan to move forward with that immediately, but I invite review comments. On Mon, Sep 8, 2025 at 10:22 PM Richard Guo <guofenglinux@gmail.com> wrote: > One idea (not fully thought through) is that we record the calculated > outerjoin_relids for each outer join in its JoinPaths. (We cannot > store this in the joinrel's RelOptInfo because it varies depending on > the join sequence we use.) And then we could use the recorded > outerjoin_relids for the assertion here: > > outer_relids U inner_relids U joinpath->ojrelids == joinrel->relids > > The value of this approach, IMO, is that it could help verify the > correctness of how we compute outer joins' outerjoin_relids, ie. the > logic in add_outer_joins_to_relids(), which is quite complex due to > outer-join identity 3. If we miscalculate the outerjoin_relids for > one certain outer join, this assertion could catch it effectively. > > However, this shouldn't be a requirement for committing your patches. > Maybe we should discuss it in a separate thread. I'm OK with moving the conversation to a separate thread, but can you clarify from where you believe that joinpath->ojrelids would be populated? It seems to me that the assertion couldn't pass unless every join path ended up with the same value of joinpath->ojrelids. That's because, for a given joinrel, there is only one value of joinrel->relids; and all of those RTIs must be either RTE_JOIN or non-RTE_JOIN. The non-RTE_JOIN RTIs will be found only in outer_relids U inner_relids, and the RTE_JOIN RTIs will be found only in joinpath->ojrelids. Therefore, it seems impossible for the assertion to pass unless the value is the same for all join paths. If that is correct, then I don't think we should store the value in the join path. Instead, if we want to cross-check it, we could calculate the value that would have been stored into joinpath->ojrelids at whatever earlier stage we had the information available to do so, and it should be equal to bms_intersect(joinrel->relids, root->outer_join_rels), which I think would have to be already initialized before we can think of building a join path. Please feel free to correct me if I am misunderstanding. Thanks, -- Robert Haas EDB: http://www.enterprisedb.com
-
Re: plan shape work
Tom Lane <tgl@sss.pgh.pa.us> — 2025-09-09T15:12:51Z
Robert Haas <robertmhaas@gmail.com> writes: > First of all, as an administrative note, since both you and Alex seem > to like 0001 and 0002 and no suggestions for improvement have been > offered, I plan to commit those soon unless there are objections or > additional review comments. FWIW, I don't love the details of 0001: I think it's going in the right direction, but could use more polish. In particular, you've defined Result.relids in a way that seems ambiguous. There are two different meanings for NULL, and one of them is being interpreted as an "Aggregate" without a lot of principle behind that. I think you need to store more data in order to make that less of a hack. So far as I can see from the regression-test changes, the "Aggregate" case only occurs when we've replaced a aggregate calculation with a MinMaxAggPath representing an index endpoint probe. What I would like to see come out when that's the case is something like Replaces: MIN or MAX aggregate over scan on tab1 This means first that the Result.relids needs to include the relid of the table being scanned by the indexscan, and second that EXPLAIN will then need some other cue to help it distinguish this case from a case where it should just say "Replaces: Scan on tab1". It's possible that you could intuit that by examining the initplans attached to the Result node, but I think what would make a ton more sense is to add an enum field to Result that explicitly identifies why it's there. We've got at least "apply one-time filter to subplan", "apply per-row gating filter to subplan", "represent a relation proven empty", and "place-hold for a MinMaxAgg InitPlan". Tracing back from all the calls to make_result() might identify some more cases. I'm not arguing that the user-visible EXPLAIN output should distinguish all of these (but probably overexplain should). I think though that it'd be useful to have this recorded in the plan tree. > On Mon, Sep 8, 2025 at 10:22 PM Richard Guo <guofenglinux@gmail.com> wrote: >> One idea (not fully thought through) is that we record the calculated >> outerjoin_relids for each outer join in its JoinPaths. > I'm OK with moving the conversation to a separate thread, but can you > clarify from where you believe that joinpath->ojrelids would be > populated? It seems to me that the assertion couldn't pass unless > every join path ended up with the same value of joinpath->ojrelids. What I have been intending to suggest is that you should add a field to join plan nodes that is zero if an inner join, but the relid of the outer join RTE if it's an outer join. This is uniquely defined because any given join node implements a specific outer join, even though the planner's relids for the completed join are (intentionally) ambiguous about the order in which multiple joins were done. The reason I wanted to do this is that I think it'd become possible to tighten the assertions in setrefs.c about whether Vars' varnullingrels are correct, so that we can always assert that those relid sets are exactly thus-and-so and not have to settle for superset/subset tests. I've not worked through the details to be entirely sure that this is possible, so I didn't bring it up before. But maybe labeling join nodes this way would also address Richard's concern. In any case it fits into your overall goal of decorating plan trees with more information. regards, tom lane
-
Re: plan shape work
Robert Haas <robertmhaas@gmail.com> — 2025-09-09T16:00:26Z
On Tue, Sep 9, 2025 at 11:12 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I think what would make a ton more sense is to add > an enum field to Result that explicitly identifies why it's there. > We've got at least "apply one-time filter to subplan", "apply per-row > gating filter to subplan", "represent a relation proven empty", and > "place-hold for a MinMaxAgg InitPlan". Thanks, I'll look into this. > What I have been intending to suggest is that you should add a field > to join plan nodes that is zero if an inner join, but the relid of > the outer join RTE if it's an outer join. This is uniquely defined > because any given join node implements a specific outer join, even > though the planner's relids for the completed join are (intentionally) > ambiguous about the order in which multiple joins were done. > > The reason I wanted to do this is that I think it'd become possible to > tighten the assertions in setrefs.c about whether Vars' varnullingrels > are correct, so that we can always assert that those relid sets are > exactly thus-and-so and not have to settle for superset/subset tests. > I've not worked through the details to be entirely sure that this is > possible, so I didn't bring it up before. But maybe labeling join > nodes this way would also address Richard's concern. In any case > it fits into your overall goal of decorating plan trees with more > information. Oh, that seems quite elegant! Then we could reasonably expect to re-find all the relevant RTIs and no others with an appropriate tree traversal. -- Robert Haas EDB: http://www.enterprisedb.com
-
Re: plan shape work
Robert Haas <robertmhaas@gmail.com> — 2025-09-09T19:43:12Z
On Tue, Sep 9, 2025 at 12:00 PM Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Sep 9, 2025 at 11:12 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > I think what would make a ton more sense is to add > > an enum field to Result that explicitly identifies why it's there. > > We've got at least "apply one-time filter to subplan", "apply per-row > > gating filter to subplan", "represent a relation proven empty", and > > "place-hold for a MinMaxAgg InitPlan". > > Thanks, I'll look into this. Just a random thought, but another idea that crossed my mind here at one point was to actually split the Result node up into Result nodes with subplans and Result nodes without subplans. We could call the version with a subplan "Project" and the version without a subplan "Result", for example. This seems a little silly because both variants would need to be able to handle resconstantqual, or alternatively we'd have to be OK with getting "Project" on top of "Result" in some cases where a single "Result" node currently does both jobs. On the other hand, only Project needs a subplan, and only Result needs relids. -- Robert Haas EDB: http://www.enterprisedb.com
-
Re: plan shape work
Tom Lane <tgl@sss.pgh.pa.us> — 2025-09-09T20:37:38Z
Robert Haas <robertmhaas@gmail.com> writes: > Just a random thought, but another idea that crossed my mind here at > one point was to actually split the Result node up into Result nodes > with subplans and Result nodes without subplans. We could call the > version with a subplan "Project" and the version without a subplan > "Result", for example. This seems a little silly because both variants > would need to be able to handle resconstantqual, or alternatively we'd > have to be OK with getting "Project" on top of "Result" in some cases > where a single "Result" node currently does both jobs. On the other > hand, only Project needs a subplan, and only Result needs relids. Maybe. I kinda feel that actually redesigning plan trees is outside the scope of this project, but maybe we should consider it. I think though that you might be underestimating the amount of commonality. For instance, we might need a projecting node on top of a subquery-in-FROM subplan, but that node would still have to bear a relid --- the relid of the RTE_SUBQUERY RTE in the upper query, not that of any RTE in the subquery, but nonetheless it's a relid. Another variant of this is that that RTE_SUBQUERY relid would normally be borne by a SubqueryScan plan node, but if we elide the SubqueryScan because it isn't doing anything useful, where shall we put that relid? If we don't store it anywhere then we will not be able to reconstruct correct join relids for the upper plan level. regards, tom lane
-
Re: plan shape work
Richard Guo <guofenglinux@gmail.com> — 2025-09-10T07:16:29Z
On Tue, Sep 9, 2025 at 10:18 PM Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Sep 8, 2025 at 10:22 PM Richard Guo <guofenglinux@gmail.com> wrote: > > One idea (not fully thought through) is that we record the calculated > > outerjoin_relids for each outer join in its JoinPaths. (We cannot > > store this in the joinrel's RelOptInfo because it varies depending on > > the join sequence we use.) And then we could use the recorded > > outerjoin_relids for the assertion here: > > > > outer_relids U inner_relids U joinpath->ojrelids == joinrel->relids > I'm OK with moving the conversation to a separate thread, but can you > clarify from where you believe that joinpath->ojrelids would be > populated? It seems to me that the assertion couldn't pass unless > every join path ended up with the same value of joinpath->ojrelids. > That's because, for a given joinrel, there is only one value of > joinrel->relids; and all of those RTIs must be either RTE_JOIN or > non-RTE_JOIN. The non-RTE_JOIN RTIs will be found only in outer_relids > U inner_relids, and the RTE_JOIN RTIs will be found only in > joinpath->ojrelids. Therefore, it seems impossible for the assertion > to pass unless the value is the same for all join paths. Hmm, this isn't quite what I had in mind. What I was thinking is that the outer join relids included in joinrel->relids can also be found from its outer or inner. For example, consider a query like: (A leftjoin B on (Pab)) leftjoin C on (Pbc) For the join with joinrel->relids being {1, 2, 3, 4, 5}, {1, 2, 3} comes from the outer side, {4} comes from the inner side, and {5} is the outer join being calculated at this join. So the Assert I proposed earlier becomes: {1, 2, 3} U {4} U {5} == {1, 2, 3, 4, 5} However, if we have transformed it to: A leftjoin (B leftjoin C on (Pbc)) on (Pab) For this same join, {1} comes from the outer side, {2, 4} comes from the inner side, and {3, 5} are the outer joins being calculated at this join. So the Assert becomes: {1} U {2, 4} U {3, 5} == {1, 2, 3, 4, 5} Either way, the assertion should always hold -- if it doesn't, there's likely a bug in how we're calculating the relids. As you can see, the set of outer joins calculated at the same join can vary depending on the join order. What I suggested is to record this information in JoinPaths (or maybe also in Join plan nodes so that get_scanned_rtindexes can collect it) for the assertion. - Richard -
Re: plan shape work
Robert Haas <robertmhaas@gmail.com> — 2025-09-10T14:11:54Z
On Wed, Sep 10, 2025 at 3:16 AM Richard Guo <guofenglinux@gmail.com> wrote: > Hmm, this isn't quite what I had in mind. What I was thinking is that > the outer join relids included in joinrel->relids can also be found > from its outer or inner. For example, consider a query like: > > (A leftjoin B on (Pab)) leftjoin C on (Pbc) > > For the join with joinrel->relids being {1, 2, 3, 4, 5}, {1, 2, 3} > comes from the outer side, {4} comes from the inner side, and {5} is > the outer join being calculated at this join. So the Assert I > proposed earlier becomes: > > {1, 2, 3} U {4} U {5} == {1, 2, 3, 4, 5} Makes sense. > However, if we have transformed it to: > > A leftjoin (B leftjoin C on (Pbc)) on (Pab) > > For this same join, {1} comes from the outer side, {2, 4} comes from > the inner side, and {3, 5} are the outer joins being calculated at > this join. So the Assert becomes: > > {1} U {2, 4} U {3, 5} == {1, 2, 3, 4, 5} Hmm. As I understood it, Tom was proposing a single, optional ojrelid for each join, with all outer joins having a value and no inner join having one. What you are proposing seems to be a very similar concept, but as I understand it, you're saying that each join would carry a *set* of ojrelids, which might be empty and might contain more than one element. Experimenting with this example, it looks like you're correct and Tom, or my understanding of Tom, is incorrect. What I see is that I get a structure like this: {MERGEPATH :parent_relids (b 1 2 3 4 5) :jpath.outerjoinpath {PATH :parent_relids (b 1) :jpath.innerjoinpath {MERGEPATH :parent_relids (b 2 4) :jpath.outerjoinpath {PATH :parent_relids (b 2) } :jpath.innerjoinpath {PATH :parent_relids (b 4) } } } So, for the assertion to pass, the more deeply nested merge join would need to have ojrelids = {} and the less-deeply nested one would need ojrelids={3,5}. -- Robert Haas EDB: http://www.enterprisedb.com -
Re: plan shape work
Robert Haas <robertmhaas@gmail.com> — 2025-09-10T17:28:49Z
On Tue, Sep 9, 2025 at 4:37 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Maybe. I kinda feel that actually redesigning plan trees is outside > the scope of this project, but maybe we should consider it. Yeah, I would rather not go there, all things being equal. We could also consider displaying things differently even if the underlying node type is the same. ExplainNode already has examples where pname/sname are set to something other than the node tag, e.g. we set EXPLAIN designation based on (((ModifyTable *) plan)->operation) or agg->aggstrategy. I'm not sure if this is the right way to go, but it's worth a thought. > I think though that you might be underestimating the amount of > commonality. For instance, we might need a projecting node on top of > a subquery-in-FROM subplan, but that node would still have to bear > a relid --- the relid of the RTE_SUBQUERY RTE in the upper query, > not that of any RTE in the subquery, but nonetheless it's a relid. > > Another variant of this is that that RTE_SUBQUERY relid would normally > be borne by a SubqueryScan plan node, but if we elide the SubqueryScan > because it isn't doing anything useful, where shall we put that relid? > If we don't store it anywhere then we will not be able to reconstruct > correct join relids for the upper plan level. I don't quite understand how these two scenarios are different, but I have found it critical to distinguish problems that happen at setrefs time from problems that happen during main planning. If we're talking about feeding information from one planning cycle forward to the next, we need be able to look at the plan tree and understand what the pre-setrefs state of affairs was, because when the next planning cycle happens, the decisions we want to influence are happening pre-setrefs. The later patches in this patch series deal with exactly this problem: 0004 and 0005 make it possible to match up an RTI from the flattened range table that pops out of one planning cycle with a specific subroot and RTI relative to that subroot during the following cycle; and 0006 and 0007 arrange to leave a breadcrumb trail in cases where setrefs-time processing deletes plan nodes. The setrefs-time elision of SubqueryScan nodes is recorded by the mechanism in 0006. I went back and studied 0001 some more today in reference to your comments about classifying Result nodes. 0001 already loosely classifies Result nodes as either "gating" result nodes (that have a subplan) or "simple" result notes (that don't). "Gating" result notes happen for target-list projection and/or to apply a one-time filter. I don't think the reasons for gating nodes need to be recorded anywhere; either the tlist matches the underlying node or it doesn't, and either resconstantqual contains something or not. "Simple" result have more interesting reasons for existing: 1. MinMaxAgg placeholders 2. Degenerate grouping 3. No-FROM-clause cases (these go through create_group_result_plan like the previous case, but are arguably distinct) 4. Relations proven empty There's a sort of hybrid case when we want a gating result node but the underlying node is a simple result. In that case, the patch builds a new simple result that is similar to the existing one but with the gating result's target list and one-time filter. It seems OK to me to forget all about the gating result node and its reason for existence in this case and just consider ourselves to have updated the underlying Result node. Otherwise, you'd have to consider that a Result node might have multiple reasons for existence: whatever caused the "simple" Result note to get created, plus possibly projection or one-time filtering. Now, looking at (1)-(4) above, (3) is actually a special case of (4): robert.haas=# explain (range_table) select 1; QUERY PLAN ------------------------------------------ Result (cost=0.00..0.01 rows=1 width=4) RTIs: 1 RTI 1 (result): Eref: "*RESULT*" () (4 rows) So the reason why the patch set feels justified in printing "Replaces: Aggregate" when there are no relids is because we must have case (1) or (2) from the above list. But it does seem fragile. Not only can we confuse (1) and (2), but also, only the top-level grouping rel necessarily has empty relids. We already have child grouping rels that have relid sets, and I suspect Richard's pending work on eager aggregation will introduce more of them. This patch won't be able to distinguish those from case (4). So maybe what we want for Result reasons is something like RESULT_REPLACES_BASEREL, RESULT_REPLACES_JOINREL, RESULT_REPLACES_GROUPING_REL, RESULT_IMPLEMENTS_MINMAX_AGGREGATE? That's a bit verbose; shorter alternatives welcome. The first two could be merged, since the cardinality of the relid set should distinguish them. Or it could be more like RESULT_REPLACES_SCAN, RESULT_REPLACES_JOIN, RESULT_REPLACES_AGGREGATE, RESULT_IMPLEMENTS_MINMAX_AGGREGATE, to more closely match what we would presumably show in the EXPLAIN output. Thoughts? -- Robert Haas EDB: http://www.enterprisedb.com -
Re: plan shape work
Tom Lane <tgl@sss.pgh.pa.us> — 2025-09-11T18:19:40Z
Richard Guo <guofenglinux@gmail.com> writes: > As you can see, the set of outer joins calculated at the same join can > vary depending on the join order. What I suggested is to record this > information in JoinPaths (or maybe also in Join plan nodes so that > get_scanned_rtindexes can collect it) for the assertion. I do not think this is correct, or at least it's not the most useful way to think about it. As I stated earlier, each join plan node that is doing a non-inner join has a unique corresponding outer-join RTE that describes what it's doing. The point that you are making is that if we've applied identity 3 to swap the order of two outer joins, then we can't claim that the output of the lower plan node is a fully-correct representation of the semantics of the join that it's doing: there may be values that should have gone to NULL but won't until after the upper plan node processes the rows. If we're going to attach more labeling to the plan nodes, I'd prefer to do what I suggested and label the nodes with the specific outer join that they think they are implementing. With Richard's proposal it will remain impossible to tell which node is doing what. While I've still not worked through the details, it might be that we can't implement my desire to make setrefs.c's nullingrels checks exact unless we *also* store the bitmap sets that Richard is proposing. I don't know if it's worth carrying two new fields in order to make that work. I don't entirely buy that Richard's proposed assertion is worth doing: I think I agree with Robert's original opinion that it's redundant. I do think that tightening the nullingrels checks would be useful. regards, tom lane
-
Re: plan shape work
Robert Haas <robertmhaas@gmail.com> — 2025-09-11T18:41:54Z
On Thu, Sep 11, 2025 at 2:19 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I do not think this is correct, or at least it's not the most useful > way to think about it. As I stated earlier, each join plan node that > is doing a non-inner join has a unique corresponding outer-join RTE > that describes what it's doing. The point that you are making is > that if we've applied identity 3 to swap the order of two outer > joins, then we can't claim that the output of the lower plan node is a > fully-correct representation of the semantics of the join that it's > doing: there may be values that should have gone to NULL but won't > until after the upper plan node processes the rows. > > If we're going to attach more labeling to the plan nodes, I'd > prefer to do what I suggested and label the nodes with the specific > outer join that they think they are implementing. With Richard's > proposal it will remain impossible to tell which node is doing what. Conceptually, I prefer your idea of one RTI per join node, but I don't understand how to make it work. Let's say that, as in Richard's example, the query is written as (A leftjoin B on (Pab)) leftjoin C on (Pbc) but we end up with a plan tree that looks like this: Something Join (RTIs: 1 2 3 4 5) -> Scan on A (RTI: 1) -> Whatever Join (RTIs: 2 4) -> Scan on B (RTI: 2) -> Scan on C (RTI: 4) RTE 3 is the join between A and B, and RTI 5 is the join between A-B and C. It makes plenty of sense to associate RTI 5 with the Something Join, so your model seems to require us to associate RTI 3 with the Whatever Join, because there's no place else for it to go. That seems to create two problems. First, RTI 3 is for an A-B join, and the Whatever Join is for a B-C join, and it sounds wrong to associate an RTI with a join when not all of the rels being joined are present at that level. Can we really say that the Whatever Join is implementing RTI 3 given that RTI 3 includes A? Second, even ignoring that problem, if we now try to assert that the RTIs of a joinrel are the union of the RTIs we see in the plan tree, the assertion is going to fail, because now the Whatever Join sees RTIs 2 and 4 through its children and RTI 3 through its own ojrelid, but the joinrel's RTIs are {2,4}, not {2,3,4}. -- Robert Haas EDB: http://www.enterprisedb.com -
Re: plan shape work
Robert Haas <robertmhaas@gmail.com> — 2025-09-11T20:07:03Z
Here's a likely-doomed new version of just the first three patches. 0002 is unchanged. 0001 has been reworked so that a Result node contains a result_type. This is as per Tom's suggestion, but it's unclear to me that he will like the details. 0003 has been reworked so that when we build a Join plan, we annotate it with the ojrelids completed at that level, which Tom said earlier that he thought was the wrong idea (but after I'd already written the code, and I've already replied to say I don't understand what the alternative is). Hence, I expect this version to crash and burn, but maybe it will do so in such a way that I have some idea what to propose instead. -- Robert Haas EDB: http://www.enterprisedb.com
-
Re: plan shape work
Tom Lane <tgl@sss.pgh.pa.us> — 2025-09-12T15:08:51Z
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Sep 11, 2025 at 2:19 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> If we're going to attach more labeling to the plan nodes, I'd >> prefer to do what I suggested and label the nodes with the specific >> outer join that they think they are implementing. With Richard's >> proposal it will remain impossible to tell which node is doing what. > Conceptually, I prefer your idea of one RTI per join node, but I don't > understand how to make it work. Let's say that, as in Richard's > example, the query is written as (A leftjoin B on (Pab)) leftjoin C on > (Pbc) but we end up with a plan tree that looks like this: > Something Join (RTIs: 1 2 3 4 5) > -> Scan on A (RTI: 1) > -> Whatever Join (RTIs: 2 4) > -> Scan on B (RTI: 2) > -> Scan on C (RTI: 4) After thinking about this for awhile, I believe that Richard and I each had half of the right solution ;-). Let me propose some new terminology in hopes of clarifying matters: * A join plan node "starts" an outer join if it performs the null-extension step corresponding to that OJ (specifically, if it is the first join doing null-extension over the minimum RHS of that OJ). * A join plan node "completes" an outer join if its output nulls all the values that that OJ should null when done according to syntactic order. In simple cases where we have not applied OJ identity 3, every outer-join plan node starts and completes a single OJ relid. But if we have applied identity 3 in the forward direction, as per your example above, it's different. The physically lower join node starts OJ 5, but doesn't complete it. The upper node starts OJ 3, and completes both 3 and 5. I think that it's possible for the topmost join to complete more than two OJs, if we have a nest of multiple OJs that can all be re-ordered via identity 3. I was arguing for labeling plan nodes according to which OJ they start (always a unique relid). Richard was arguing for labeling according to which OJ(s) they complete (zero, one, or more relids). But I now think it's probably worth doing both. We need the completion bitmapsets if we want to cross-check Var nullingrels, because those correspond to the nullingrels that should get added at each join's output. I think that we also want the start labels though. For one thing, if the start nodes are not identified, it's impossible to understand how much of the tree is the "no man's land" where a C variable may or may not have gone to null on its way to becoming a C* variable. But in general I think that we'll want to be able to identify an outer-join plan node even if it does not complete its OJ. regards, tom lane
-
Re: plan shape work
Robert Haas <robertmhaas@gmail.com> — 2025-09-12T15:54:28Z
On Fri, Sep 12, 2025 at 11:08 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I was arguing for labeling plan nodes according to which OJ they > start (always a unique relid). Richard was arguing for labeling > according to which OJ(s) they complete (zero, one, or more relids). I agree with everything in your reply up to and including this part. > But I now think it's probably worth doing both. We need the > completion bitmapsets if we want to cross-check Var nullingrels, > because those correspond to the nullingrels that should get added > at each join's output. I think that we also want the start labels > though. For one thing, if the start nodes are not identified, > it's impossible to understand how much of the tree is the "no > man's land" where a C variable may or may not have gone to null > on its way to becoming a C* variable. But in general I think > that we'll want to be able to identify an outer-join plan node > even if it does not complete its OJ. So, it looks to me like the way this works today is that join_is_legal() figures out the relevant SpecialJoinInfo and then add_outer_joins_to_relids() decides what to add to the joinrel's relid set. So I think, though I am not quite sure, that if somewhere around that point in the code we copied sjinfo->ojrelid into the RelOptInfo, and then propagated that through to the final plan, that might be what you're looking for here. However, that assumes that the choice of SpecialJoinInfo is fixed for all possible ways of constructing a given joinrel, which I think might not be true. In Richard's test case, one simply can't go wrong, because the lower join only draws from two baserels. But in a case like A LJ B ON A.x = B.x LJ C ON B.y = C.y LJ D ON B.z = D.z, the B-C-D joinrel could presumably be constructed by joining either B-D to C or B-C to D, and I'm guessing that will result in a different choice of SpecialJoinInfo in each case. That would mean that the ojrelid has to be per-Path rather than per-RelOptInfo. While in theory that's fine, it sounds expensive. I was hoping that we could piggyback mostly on existing calculations here, exposing data we already have instead of calculating new things. If we want to expose the starting-the-outer-join-RTI value that you want here, it seems like we're going to have to redo some of the join_is_legal() work and some of the add_outer_joins_to_relids() work for every path. I'm skeptical about expending those cycles. It's not clear to me that I need to care about outer join RTIs at all for what I'm trying to do -- focusing where RTIs originating from baserels end up in the final plan tree is, as far as I can see, completely adequate. There might be other people who want to do other things that would benefit more from seeing that stuff, though. I'm not against exposing information that is easily calculated and might be useful to somebody, even if it's just for planner debugging. But it seems to me that what you're asking here might be going quite a bit further than that. So my counter-proposal is that this patch set should either (1) expose nothing at all about join RTIs because I don't have a need for them or (2) expose the join RTIs completed at a certain level because that's easily calculated from the data we already have; and if you want to later also expose the single join RTI started at a certain level for a varnullingrels cross-check or any other purpose, then you can propose a patch for that. Alternatively, if you want to edit my 0003 patch to work the way you think it should, cool. Or if you can describe what you think it should do, I'm somewhat willing to try implementing that, but that's definitely not such a great choice from my perspective. I'm afraid that I'm getting pulled a bit further down the garden path than I really want to go trying to satisfy your desire to perform a cross-check that I don't really understand and/or expose information for which I don't see a clear need in my own work. What I need is for all the baserels that appear in the final Plan tree to be properly labelled. -- Robert Haas EDB: http://www.enterprisedb.com
-
Re: plan shape work
Tom Lane <tgl@sss.pgh.pa.us> — 2025-09-12T16:18:16Z
Robert Haas <robertmhaas@gmail.com> writes: > So, it looks to me like the way this works today is that > join_is_legal() figures out the relevant SpecialJoinInfo and then > add_outer_joins_to_relids() decides what to add to the joinrel's relid > set. So I think, though I am not quite sure, that if somewhere around > that point in the code we copied sjinfo->ojrelid into the RelOptInfo, > and then propagated that through to the final plan, that might be what > you're looking for here. Not the RelOptInfo, but the Path. I have not looked at details, but it might be necessary to identify these labels at Path construction time rather than reconstructing them during createplan.c. I'd rather not, because bloating Path nodes with hopefully-redundant information isn't attractive. But if it turns out that createplan.c doesn't have enough information then we might have to. I'm also thinking that this notion of starting/completing OJs might be useful in its own right to clarify and even simplify some of the Path manipulations we do. But that will require reviewing the code. > So my counter-proposal is that this patch set should either (1) expose > nothing at all about join RTIs because I don't have a need for them or > (2) expose the join RTIs completed at a certain level because that's > easily calculated from the data we already have; and if you want to > later also expose the single join RTI started at a certain level for a > varnullingrels cross-check or any other purpose, then you can propose > a patch for that. If what you want to do is only interested in baserel RTIs, then I think we should leave outer join RTIs out of the discussion for the present. I was not looking to make you do work you aren't interested in. I reserve the right to do said work later ... regards, tom lane
-
Re: plan shape work
Robert Haas <robertmhaas@gmail.com> — 2025-09-12T17:32:10Z
On Fri, Sep 12, 2025 at 12:18 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Not the RelOptInfo, but the Path. I have not looked at details, but > it might be necessary to identify these labels at Path construction > time rather than reconstructing them during createplan.c. I'd rather > not, because bloating Path nodes with hopefully-redundant information > isn't attractive. But if it turns out that createplan.c doesn't have > enough information then we might have to. My intuition (which might be wrong) is that there's enough information available to do it during createplan.c, but I'm not sure that there's enough information available to do it efficiently at that stage. You could grovel through the whole Plan tree and find all of the scan RTIs, and then from there you should be able to work out what joins were commuted, and then from there you should be able to work out which SpecialJoinInfo goes with each Join that appears in the final plan tree. That doesn't sound like a lot of fun, though. On the other hand, I'm not sure doing this at Path creation time is going to be a picnic either. Right now, we're sort of cheating by only caring about what OJs are finished at a certain level: that is consistent across all possible ways of forming a joinrel, but which OJs are started at a certain level is not, and I'm not currently seeing how to fix that without adding cycles. > I'm also thinking that this notion of starting/completing OJs might be > useful in its own right to clarify and even simplify some of the Path > manipulations we do. But that will require reviewing the code. Makes sense. > If what you want to do is only interested in baserel RTIs, then I > think we should leave outer join RTIs out of the discussion for the > present. I was not looking to make you do work you aren't interested > in. I reserve the right to do said work later ... Absolutely. I'm more than happy to have you do that. We sort of got started down this path because, reviewing v4-0003, Richard commented that I might be able to sanity-check something-or-other about RTE_JOIN RTIs instead of just focusing on baserels. From there, this sub-thread has turned into a discussion of exactly what that sanity check should be. v5 exposes the completed-at-this-level OJs in the final plan tree, which is easy to compute and could be useful for somebody's plan introspection, but (1) I don't need it and (2) it just derives them from the joinrel's RTI set rather than in any independent per-path way that might lead to a more meaningful cross-check. Having done the work to create v5-0003, I find myself thinking it feels a little tidier than v4 and am somewhat inclined to prefer it; I think that it's very possible that you or Richard might find it a useful basis for future work to further strengthen the way things work in this area. However, on the other hand, maybe not, and going back to v4-0003 is also completely reasonable. I don't care much one way or the other as long as nobody's too mad when the dust settles. But, I also can't commit either v4-0003 or v5-0003 or any variant thereof until we agree on what to do about 0001, and you're the holdout there. v5-0001 adds a result_type field to the Result node in response to your previous review comments, so knowing whether that looks like what you want or whether you would prefer something else is the blocker for me as of this moment. Thanks, -- Robert Haas EDB: http://www.enterprisedb.com
-
Re: plan shape work
Tom Lane <tgl@sss.pgh.pa.us> — 2025-09-12T17:44:32Z
Robert Haas <robertmhaas@gmail.com> writes: > But, I also can't commit either v4-0003 or v5-0003 or any variant > thereof until we agree on what to do about 0001, and you're the > holdout there. Yeah, I owe you a review, hope to get to it over the weekend. regards, tom lane
-
Re: plan shape work
Robert Haas <robertmhaas@gmail.com> — 2025-09-12T17:52:01Z
On Fri, Sep 12, 2025 at 1:44 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > But, I also can't commit either v4-0003 or v5-0003 or any variant > > thereof until we agree on what to do about 0001, and you're the > > holdout there. > > Yeah, I owe you a review, hope to get to it over the weekend. Thanks. This is a good time to mention that I appreciate your engagement in this thread. Let me know if there's something that you'd like me to pay attention to in turn. -- Robert Haas EDB: http://www.enterprisedb.com
-
Re: plan shape work
Richard Guo <guofenglinux@gmail.com> — 2025-09-13T08:46:09Z
On Sat, Sep 13, 2025 at 12:08 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > After thinking about this for awhile, I believe that Richard and I > each had half of the right solution ;-). Let me propose some new > terminology in hopes of clarifying matters: > > * A join plan node "starts" an outer join if it performs the > null-extension step corresponding to that OJ (specifically, > if it is the first join doing null-extension over the minimum > RHS of that OJ). > > * A join plan node "completes" an outer join if its output > nulls all the values that that OJ should null when done > according to syntactic order. This new notion makes a lot of sense to me. I feel that it could help us optimize some existing logic, or at least make certain parts of it easier to understand. It might be worth adding to the README, maybe under the section "Relation Identification and Qual Clause Placement", where we explain the idea behind pushed-down joins. - Richard
-
Re: plan shape work
Richard Guo <guofenglinux@gmail.com> — 2025-09-13T09:02:04Z
On Sat, Sep 13, 2025 at 2:32 AM Robert Haas <robertmhaas@gmail.com> wrote: > We sort of got started down this path because, reviewing v4-0003, > Richard commented that I might be able to sanity-check > something-or-other about RTE_JOIN RTIs instead of just focusing on > baserels. From there, this sub-thread has turned into a discussion of > exactly what that sanity check should be. Yeah, when commenting on v4-0003 about including outer join relids in the assertion, I had a feeling that it might take the discussion off topic -- sorry that it did. That's why I suggested moving this part of discussion to a separate thread. I still think that cross-checking outer join relids isn't a requirement for committing your patches, so I'm totally fine if you end up with a version that doesn't assert outer join relids. - Richard
-
Re: plan shape work
Tom Lane <tgl@sss.pgh.pa.us> — 2025-09-14T23:42:43Z
I wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> But, I also can't commit either v4-0003 or v5-0003 or any variant >> thereof until we agree on what to do about 0001, and you're the >> holdout there. > Yeah, I owe you a review, hope to get to it over the weekend. I'm running out of weekend, but I found time to look at v5-0001, so here are some comments. In general, it's in pretty good shape and these are nitpicks. * I think your placement of the show_result_replacement_info call site suffers from add-at-the-end syndrome. It should certainly go before the show_instrumentation_count call: IMO we expect stuff added by EXPLAIN ANALYZE to appear after stuff that's there in plain EXPLAIN. But I'd really argue that from a user's standpoint this information is part of the fundamental plan structure and so it deserves more prominence. I'd lean to putting it first in the T_Result case, before the "One-Time Filter". (Thought experiment: if we'd had this EXPLAIN field from day one, where do you think it would have been placed?) * Even more nitpicky: + if (plan->lefttree == NULL) + show_result_replacement_info(castNode(Result, plan), es); I think show_result_replacement_info should have responsibility for deciding whether to print anything in the lefttree == NULL case. (This will affect the Asserts in show_result_replacement_info, but those seem a little odd anyway.) + case RESULT_TYPE_UPPER: + /* a small white lie */ + replacement_type = "Aggregate"; + break; I find this unconvincing: is it really an aggregate? It doesn't help that this case doesn't seem to be reached anywhere in the regression tests. In general I suspect that we'll have to refine RESULT_TYPE_UPPER in the future. I don't think this fear needs to block committing of what you have though. + /* Work out what reference name to use and added it the string. */ The grammar police will be after you. + * (Arguably, we should instead display the RTE name in some other way in + * such cases, but in typical cases the RTE name is *RESULT* and printing + * "Result on *RESULT*" or similar doesn't seem especially useful, so for + * now we don't print anything at all.) Right offhand, I think that RTE_RESULT *always* has the name *RESULT*, so the "typical" bit seems misleading. Personally I'd drop this para altogether. + /* + * We're replacing either a scan or a join, according to the number of + * rels in the relids set. + */ + if (nrels == 0) + ExplainPropertyText("Replaces", replacement_type, es); + else + { + char *s = psprintf("%s on %s", replacement_type, buf.data); + + ExplainPropertyText("Replaces", s, es); + } This comment seems to neither have anything to do with the logic, or to be adding anything. But do you need nrels at all? I'd be inclined to check for "buf.len > 0" to see if you want to insert the "on ..." bit. + * make_simple_result + * Build a Result plan node that returns a single row (or possibly no rows, + * if the one-time filtered defined by resconstantqual returns false) I don't love the name "make_simple_result", as the cases this handles are frequently far from simple. I don't have a clearly-better idea offhand though. Maybe "make_one_row_result"? In any case, "one-time filtered" needs help. In general, I wonder if it'd be better for the callers of make_xxx_result to pass in the result_type to use. Those functions have such a narrow view of the available info that I'm dubious that they can get it right. Especially if/when we decide that RESULT_TYPE_UPPER needs subdivision. + * relids identifies the relation for which this Result node is generating the + * tuples. When subplan is not NULL, it should be empty: this node is not + * generating anything in that case, just acting on tuples generated by the + * subplan. Otherwise, it may contain a single RTI (as when this Result node + * is substituted for a scan); multiple RTIs (as when this Result node is + * substituted for a join); or no RTIs at all (as when this Result node is + * substituted for an upper rel). I doubt this claim that the relid set will be empty for an upper rel. I think it's more likely that it will include all the rels for the query. regards, tom lane -
Re: plan shape work
Robert Haas <robertmhaas@gmail.com> — 2025-09-15T16:32:15Z
Thanks for the review. Comments to which I don't respond below are duly noted. On Sun, Sep 14, 2025 at 7:42 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > * I think your placement of the show_result_replacement_info call > site suffers from add-at-the-end syndrome. It should certainly go > before the show_instrumentation_count call: IMO we expect stuff > added by EXPLAIN ANALYZE to appear after stuff that's there in plain > EXPLAIN. But I'd really argue that from a user's standpoint this > information is part of the fundamental plan structure and so it > deserves more prominence. I'd lean to putting it first in the > T_Result case, before the "One-Time Filter". (Thought experiment: > if we'd had this EXPLAIN field from day one, where do you think > it would have been placed?) Yes, I wondered if it should actually look more like this: Degenerate Scan on blah Degenerate Join on blah, blah, blah Degenerate Aggregate MinMaxAggregate Result So getting rid of Replaces: altogether. In fact, "Replaces" is really a complete misnomer in the case of a MinMaxAggregate; I'm just not sure exactly what to do instead. > + case RESULT_TYPE_UPPER: > + /* a small white lie */ > + replacement_type = "Aggregate"; > + break; > > I find this unconvincing: is it really an aggregate? It doesn't > help that this case doesn't seem to be reached anywhere in the > regression tests. This is the case I know about where that can be reached. robert.haas=# explain select 1 as a, 2 as b having false; QUERY PLAN ------------------------------------------ Result (cost=0.00..0.01 rows=1 width=8) One-Time Filter: false Replaces: Aggregate (3 rows) > In general I suspect that we'll have to refine RESULT_TYPE_UPPER > in the future. I don't think this fear needs to block committing > of what you have though. +1. > + * (Arguably, we should instead display the RTE name in some other way in > + * such cases, but in typical cases the RTE name is *RESULT* and printing > + * "Result on *RESULT*" or similar doesn't seem especially useful, so for > + * now we don't print anything at all.) > > Right offhand, I think that RTE_RESULT *always* has the name *RESULT*, > so the "typical" bit seems misleading. Personally I'd drop this > para altogether. Counterexample: robert.haas=# explain verbose select * from pgbench_accounts where 0 = 1; QUERY PLAN ------------------------------------------ Result (cost=0.00..0.00 rows=0 width=0) Output: aid, bid, abalance, filler One-Time Filter: false Replaces: Scan on pgbench_accounts (4 rows) debug_print_plan says: :alias <> :eref {ALIAS :aliasname pgbench_accounts :colnames ("aid" "bid" "abalance" "filler") } > In general, I wonder if it'd be better for the callers of > make_xxx_result to pass in the result_type to use. Those > functions have such a narrow view of the available info > that I'm dubious that they can get it right. Especially > if/when we decide that RESULT_TYPE_UPPER needs subdivision. That was my first thought, but after experimentation I think it sucks, especially because of this: /* * The only path for it is a trivial Result path. We cheat a * bit here by using a GroupResultPath, because that way we * can just jam the quals into it without preprocessing them. * (But, if you hold your head at the right angle, a FROM-less * SELECT is a kind of degenerate-grouping case, so it's not * that much of a cheat.) */ I would argue that if you hold your head at that angle, you need your head examined. Perhaps that wasn't the case when this comment was written, but from the viewpoint of this project, I think it's pretty clear. What this does is use create_group_result_path() not only for actual degenerate-grouping cases (where we're replacing an aggregate) but also for no-FROM-clause cases (where we're replacing a scan). We could adjust things so that the no-FROM-clause case doesn't take this code path, or passes down a flag, but it's clear from examination of the RelOptInfo. Likewise, for scans vs. joins, peaking at the reloptkind is a very easy way to tell whether we've got a baserel or a joinrel and having the caller go to extra trouble to pass it down just seems silly. > + * relids identifies the relation for which this Result node is generating the > + * tuples. When subplan is not NULL, it should be empty: this node is not > + * generating anything in that case, just acting on tuples generated by the > + * subplan. Otherwise, it may contain a single RTI (as when this Result node > + * is substituted for a scan); multiple RTIs (as when this Result node is > + * substituted for a join); or no RTIs at all (as when this Result node is > + * substituted for an upper rel). > > I doubt this claim that the relid set will be empty for an upper rel. > I think it's more likely that it will include all the rels for the > query. Upper rels are created by fetch_upper_rel(). The third argument becomes the relids set. Most call sites pass that argument as NULL: [robert.haas pgsql]$ git grep fetch_upper_rel src/backend/optimizer/ src/backend/optimizer/path/allpaths.c: sub_final_rel = fetch_upper_rel(rel->subroot, UPPERREL_FINAL, NULL); src/backend/optimizer/path/costsize.c: sub_final_rel = fetch_upper_rel(subroot, UPPERREL_FINAL, NULL); src/backend/optimizer/plan/planagg.c: grouped_rel = fetch_upper_rel(root, UPPERREL_GROUP_AGG, NULL); src/backend/optimizer/plan/planner.c: final_rel = fetch_upper_rel(root, UPPERREL_FINAL, NULL); src/backend/optimizer/plan/planner.c: final_rel = fetch_upper_rel(root, UPPERREL_FINAL, NULL); src/backend/optimizer/plan/planner.c: final_rel = fetch_upper_rel(root, UPPERREL_FINAL, NULL); src/backend/optimizer/plan/planner.c: grouped_rel = fetch_upper_rel(root, UPPERREL_GROUP_AGG, src/backend/optimizer/plan/planner.c: grouped_rel = fetch_upper_rel(root, UPPERREL_GROUP_AGG, NULL); src/backend/optimizer/plan/planner.c: window_rel = fetch_upper_rel(root, UPPERREL_WINDOW, NULL); src/backend/optimizer/plan/planner.c: distinct_rel = fetch_upper_rel(root, UPPERREL_DISTINCT, NULL); src/backend/optimizer/plan/planner.c: partial_distinct_rel = fetch_upper_rel(root, UPPERREL_PARTIAL_DISTINCT, src/backend/optimizer/plan/planner.c: ordered_rel = fetch_upper_rel(root, UPPERREL_ORDERED, NULL); src/backend/optimizer/plan/planner.c: partially_grouped_rel = fetch_upper_rel(root, src/backend/optimizer/plan/setrefs.c: IS_DUMMY_REL(fetch_upper_rel(rel->subroot, src/backend/optimizer/plan/subselect.c: final_rel = fetch_upper_rel(subroot, UPPERREL_FINAL, NULL); src/backend/optimizer/plan/subselect.c: final_rel = fetch_upper_rel(subroot, UPPERREL_FINAL, NULL); src/backend/optimizer/plan/subselect.c: final_rel = fetch_upper_rel(subroot, UPPERREL_FINAL, NULL); src/backend/optimizer/prep/prepunion.c: result_rel = fetch_upper_rel(root, UPPERREL_SETOP, src/backend/optimizer/prep/prepunion.c: final_rel = fetch_upper_rel(rel->subroot, UPPERREL_FINAL, NULL); src/backend/optimizer/prep/prepunion.c: result_rel = fetch_upper_rel(root, UPPERREL_SETOP, relids); src/backend/optimizer/prep/prepunion.c: result_rel = fetch_upper_rel(root, UPPERREL_SETOP, src/backend/optimizer/util/relnode.c: * fetch_upper_rel src/backend/optimizer/util/relnode.c:fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids) The exceptions are: make_grouping_rel() passes a relids set when IS_OTHER_REL(input_rel); create_partial_grouping_paths() passes a relids set when creating UPPERREL_PARTIAL_GROUP_AGG; and generate_union_paths() and generate_nonunion_paths() in prepunion.c bubble up the underlying relids. AFAICS, a non-parallel, non-partitionwise aggregate ends up with an empty relid set, and even those cases end up with an empty relid set for the topmost grouping rel, even if they create some other upper rels that do have non-empty relid sets. -- Robert Haas EDB: http://www.enterprisedb.com -
Re: plan shape work
Tom Lane <tgl@sss.pgh.pa.us> — 2025-09-15T19:43:46Z
Robert Haas <robertmhaas@gmail.com> writes: > On Sun, Sep 14, 2025 at 7:42 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> ... But I'd really argue that from a user's standpoint this >> information is part of the fundamental plan structure and so it >> deserves more prominence. I'd lean to putting it first in the >> T_Result case, before the "One-Time Filter". (Thought experiment: >> if we'd had this EXPLAIN field from day one, where do you think >> it would have been placed?) > Yes, I wondered if it should actually look more like this: > Degenerate Scan on blah > Degenerate Join on blah, blah, blah > Degenerate Aggregate > MinMaxAggregate Result > So getting rid of Replaces: altogether. Yeah, that would be pretty tempting if we were working in a green field. I think it might be too much change though. Also, from a developer's standpoint it's better if what EXPLAIN prints agrees with what the node types are internally. >> + case RESULT_TYPE_UPPER: >> + /* a small white lie */ >> + replacement_type = "Aggregate"; >> + break; >> >> I find this unconvincing: is it really an aggregate? It doesn't >> help that this case doesn't seem to be reached anywhere in the >> regression tests. > This is the case I know about where that can be reached. > robert.haas=# explain select 1 as a, 2 as b having false; > QUERY PLAN > ------------------------------------------ > Result (cost=0.00..0.01 rows=1 width=8) > One-Time Filter: false > Replaces: Aggregate > (3 rows) Hmm, okay. "Replaces: Aggregate" doesn't seem like a great explanation, but I don't have a better idea offhand. In any case I'm not sure this result_type value is going to have a long shelf-life, so arguing about how to spell it may not be a productive use of time. I do suggest adding the above as a regression test. >> Right offhand, I think that RTE_RESULT *always* has the name *RESULT*, >> so the "typical" bit seems misleading. Personally I'd drop this >> para altogether. > Counterexample: > robert.haas=# explain verbose select * from pgbench_accounts where 0 = 1; > QUERY PLAN > ------------------------------------------ > Result (cost=0.00..0.00 rows=0 width=0) > Output: aid, bid, abalance, filler > One-Time Filter: false > Replaces: Scan on pgbench_accounts > (4 rows) Uh ... this example does not involve an RTE_RESULT does it? I see a regular RTE_RELATION RTE for pgbench_accounts, which your code duly prints. Looking at the code, there are just three places (all in prepjointree.c) that create RTE_RESULT RTEs. Two of them are building the RTE from scratch, and they both do rte->eref = makeAlias("*RESULT*", NIL); However the third one (pull_up_constant_function) is changing a pulled-up RTE_FUNCTION into RTE_RESULT, and it doesn't do anything to the RTE's eref. While that makes your argument nominally true, I'd be inclined to argue that this was an oversight and it should have changed the alias/eref fields to look like other RTE_RESULTs. (I've not investigated, but I wonder what your patch prints for such cases.) Bottom line remains that I don't think the comment para I quoted is adding any useful info. The first half of that comment block was sufficient. >> In general, I wonder if it'd be better for the callers of >> make_xxx_result to pass in the result_type to use. > That was my first thought, but after experimentation I think it sucks, Hmph. Okay, if you tried it and it's bad, I'll accept your opinion. >> I doubt this claim that the relid set will be empty for an upper rel. >> I think it's more likely that it will include all the rels for the >> query. > Upper rels are created by fetch_upper_rel(). The third argument > becomes the relids set. Most call sites pass that argument as NULL: Okay, but "most" is not "all". I suggest that that comment is just too specific in the first place. Even if it's 100% correct today, it's likely to be falsified in future and no one will remember to update it. Something like this might have a longer shelf life: + * relids identifies the relation for which this Result node is generating the + * tuples. When subplan is not NULL, it should be empty: this node is not + * generating anything in that case, just acting on tuples generated by the + * subplan. Otherwise, it contains the relids of the planner relation that + * the Result represents. regards, tom lane -
Re: plan shape work
Robert Haas <robertmhaas@gmail.com> — 2025-09-15T21:07:47Z
On Mon, Sep 15, 2025 at 3:43 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Yes, I wondered if it should actually look more like this: > > Degenerate Scan on blah > > Degenerate Join on blah, blah, blah > > Degenerate Aggregate > > MinMaxAggregate Result > > So getting rid of Replaces: altogether. > > Yeah, that would be pretty tempting if we were working in a green > field. I think it might be too much change though. Also, from a > developer's standpoint it's better if what EXPLAIN prints agrees > with what the node types are internally. Well, I was just bringing it up because you seemed to think that the Replaces: line was not necessarily where we would have put it in a green field. I am not sure about where we would have put it, but I think in a green field we wouldn't have it at all, and I actually think the above is worth considering. I think people would get used to it pretty fast and that it might be more clear than "Result", which doesn't really mean a whole lot. However, if you or others don't like that and you want to just move the Replaces line slightly higher in the output, I mean, I don't mind that, I just don't know that it's really a material difference. > I do suggest adding the above as a regression test. Makes sense. > Uh ... this example does not involve an RTE_RESULT does it? > I see a regular RTE_RELATION RTE for pgbench_accounts, which > your code duly prints. > > Looking at the code, there are just three places (all in > prepjointree.c) that create RTE_RESULT RTEs. Two of them > are building the RTE from scratch, and they both do > rte->eref = makeAlias("*RESULT*", NIL); > However the third one (pull_up_constant_function) is changing > a pulled-up RTE_FUNCTION into RTE_RESULT, and it doesn't do > anything to the RTE's eref. While that makes your argument > nominally true, I'd be inclined to argue that this was an oversight > and it should have changed the alias/eref fields to look like other > RTE_RESULTs. (I've not investigated, but I wonder what your > patch prints for such cases.) Will investigate. > + * relids identifies the relation for which this Result node is generating the > + * tuples. When subplan is not NULL, it should be empty: this node is not > + * generating anything in that case, just acting on tuples generated by the > + * subplan. Otherwise, it contains the relids of the planner relation that > + * the Result represents. OK. -- Robert Haas EDB: http://www.enterprisedb.com -
Re: plan shape work
Tom Lane <tgl@sss.pgh.pa.us> — 2025-09-15T21:15:44Z
Robert Haas <robertmhaas@gmail.com> writes: > Well, I was just bringing it up because you seemed to think that the > Replaces: line was not necessarily where we would have put it in a > green field. I am not sure about where we would have put it, but I > think in a green field we wouldn't have it at all, and I actually > think the above is worth considering. I think people would get used to > it pretty fast and that it might be more clear than "Result", which > doesn't really mean a whole lot. However, if you or others don't like > that and you want to just move the Replaces line slightly higher in > the output, I mean, I don't mind that, I just don't know that it's > really a material difference. Yeah, I'm content with making it a "Replaces" attribute, I'm just complaining about the ordering. I'd be the first to agree that this is purely cosmetic, but in my mind there is a pretty clear precedence ordering among the attributes that EXPLAIN prints. regards, tom lane
-
Re: plan shape work
Robert Haas <robertmhaas@gmail.com> — 2025-09-16T14:17:42Z
On Mon, Sep 15, 2025 at 3:43 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > anything to the RTE's eref. While that makes your argument > nominally true, I'd be inclined to argue that this was an oversight > and it should have changed the alias/eref fields to look like other > RTE_RESULTs. (I've not investigated, but I wonder what your > patch prints for such cases.) It just prints "-> Result" and that's it, as in this example: robert.haas=# create or replace function absolutely_not() returns bool return false; CREATE FUNCTION robert.haas=# explain (costs off) select * from generate_series(1,3) g full join absolutely_not() n on true; QUERY PLAN ------------------------------------------ Merge Full Join -> Function Scan on generate_series g -> Materialize -> Result (4 rows) robert.haas=# explain (costs off, range_table) select * from generate_series(1,3) g full join absolutely_not() n on true; QUERY PLAN ------------------------------------------ Merge Full Join -> Function Scan on generate_series g Scan RTI: 1 -> Materialize -> Result RTIs: 2 RTI 1 (function, in-from-clause): Alias: g () Eref: g (g) WITH ORDINALITY: false RTI 2 (result, in-from-clause): Alias: n () Eref: n (n) RTI 3 (join, in-from-clause): Eref: unnamed_join (g, n) Join Type: Full (16 rows) Here's a new patch set. My main questions are: 1. Did I miss anything you wanted fixed in 0001? 2. Should 0001 be combined with 0002 or kept separate? 3. Do you have a preference between this version of 0003 and the older revision that just ignored outer-join relids? -- Robert Haas EDB: http://www.enterprisedb.com -
Re: plan shape work
Robert Haas <robertmhaas@gmail.com> — 2025-09-16T15:27:04Z
CI is unhappy with v6 because: [14:37:37.742] In function ‘show_result_replacement_info’, [14:37:37.742] inlined from ‘ExplainNode’ at explain.c:2240:4: [14:37:37.742] explain.c:4849:33: error: ‘replacement_type’ may be used uninitialized [-Werror=maybe-uninitialized] [14:37:37.742] 4849 | char *s = psprintf("%s on %s", replacement_type, buf.data); [14:37:37.742] | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [14:37:37.742] explain.c: In function ‘ExplainNode’: [14:37:37.742] explain.c:4769:21: note: ‘replacement_type’ was declared here [14:37:37.742] 4769 | char *replacement_type; [14:37:37.742] | ^~~~~~~~~~~~~~~~ So apparently an "enum" over every value of the switch is not good enough for the value to be assigned. I'm inclined to change the code like this to fix it: char *replacement_type = "???"; ...in the hopes of still producing a warning here if somebody adds another label to the enum. -- Robert Haas EDB: http://www.enterprisedb.com -
Re: plan shape work
Robert Haas <robertmhaas@gmail.com> — 2025-09-18T17:23:11Z
On Tue, Sep 16, 2025 at 11:27 AM Robert Haas <robertmhaas@gmail.com> wrote: > I'm inclined to change the code like this to fix it: > > char *replacement_type = "???"; > > ...in the hopes of still producing a warning here if somebody adds > another label to the enum. Done in this version. I've also now gone back and rebased the rest of the patches as well, so this email includes all 7 patches instead of just the first 3. To recall, my goal for this CF was to get 1-4 committed. -- Robert Haas EDB: http://www.enterprisedb.com
-
Re: plan shape work
Junwang Zhao <zhjwpku@gmail.com> — 2025-09-21T09:52:21Z
Hi Robert, On Fri, Sep 19, 2025 at 1:23 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Tue, Sep 16, 2025 at 11:27 AM Robert Haas <robertmhaas@gmail.com> wrote: > > I'm inclined to change the code like this to fix it: > > > > char *replacement_type = "???"; > > > > ...in the hopes of still producing a warning here if somebody adds > > another label to the enum. > > Done in this version. I've also now gone back and rebased the rest of > the patches as well, so this email includes all 7 patches instead of > just the first 3. To recall, my goal for this CF was to get 1-4 > committed. > > -- > Robert Haas > EDB: http://www.enterprisedb.com I have a question about the following changes: - splan = plan; if (IsA(plan, Result)) { Result *rplan = (Result *) plan; - if (rplan->plan.lefttree == NULL && - rplan->resconstantqual == NULL) - splan = NULL; + gplan->plan.lefttree = NULL; + gplan->relids = rplan->relids; + gplan->result_type = rplan->result_type; } You set gplan->relids and gplan->result_type, but at the end you returned &gplan->plan, what's the point of setting these two fields? -- Regards Junwang Zhao -
Re: plan shape work
Junwang Zhao <zhjwpku@gmail.com> — 2025-09-21T10:34:49Z
On Sun, Sep 21, 2025 at 5:52 PM Junwang Zhao <zhjwpku@gmail.com> wrote: > > Hi Robert, > > On Fri, Sep 19, 2025 at 1:23 AM Robert Haas <robertmhaas@gmail.com> wrote: > > > > On Tue, Sep 16, 2025 at 11:27 AM Robert Haas <robertmhaas@gmail.com> wrote: > > > I'm inclined to change the code like this to fix it: > > > > > > char *replacement_type = "???"; > > > > > > ...in the hopes of still producing a warning here if somebody adds > > > another label to the enum. > > > > Done in this version. I've also now gone back and rebased the rest of > > the patches as well, so this email includes all 7 patches instead of > > just the first 3. To recall, my goal for this CF was to get 1-4 > > committed. > > > > -- > > Robert Haas > > EDB: http://www.enterprisedb.com > > I have a question about the following changes: > > - splan = plan; > if (IsA(plan, Result)) > { > Result *rplan = (Result *) plan; > > - if (rplan->plan.lefttree == NULL && > - rplan->resconstantqual == NULL) > - splan = NULL; > + gplan->plan.lefttree = NULL; > + gplan->relids = rplan->relids; > + gplan->result_type = rplan->result_type; > } > > You set gplan->relids and gplan->result_type, but at the end you > returned &gplan->plan, what's the point of setting these two fields? After further study, this should not be a problem, since the address of &gplan->plan is the same as gplan, but the code is a little bit confusing at first glance, I think *return (Plan *) gplan* is easier to understand but I don't insist ;) > > > -- > Regards > Junwang Zhao -- Regards Junwang Zhao -
Re: plan shape work
Robert Haas <robertmhaas@gmail.com> — 2025-09-22T00:26:00Z
On Sun, Sep 21, 2025 at 6:35 AM Junwang Zhao <zhjwpku@gmail.com> wrote: > After further study, this should not be a problem, since the address of > &gplan->plan is the same as gplan, but the code is a little bit confusing > at first glance, I think *return (Plan *) gplan* is easier to understand > but I don't insist ;) Thanks for looking into it. Stylistically, I prefer the style without the cast, but the effect is the same. -- Robert Haas EDB: http://www.enterprisedb.com
-
Re: plan shape work
Tom Lane <tgl@sss.pgh.pa.us> — 2025-09-22T18:15:22Z
Robert Haas <robertmhaas@gmail.com> writes: > Done in this version. I've also now gone back and rebased the rest of > the patches as well, so this email includes all 7 patches instead of > just the first 3. To recall, my goal for this CF was to get 1-4 > committed. I found time finally to look through all of these. Some notes: 0001: committable, no further comments. 0002: Yeah, this does seem like an improvement. There is something faintly weird about output like -> Result - Output: i3 + Output: t3.i3 Replaces: Scan on t3 One-Time Filter: false because the whole point of this construct is that we're *not* scanning t3. However, failing to supply the Var prefix is surely not better. I like the fact that the output in upper plan levels is now just like what you'd see with a non-phony t3 scan. I think you could commit this too. Merging it with 0001 is reasonable, but if you'd rather keep them separate that's okay too. 0003: doesn't feel ready for commit. In the first place, you've used the "completed an outer join" terminology in the commit message and a couple of comments, but defined it nowhere. I think a para or two in optimizer/README to define "starting" and "completing" outer joins is essential. (I'm also still quite bemused by marking nodes that complete outer joins but not those that start them.) In the second place, we should not need to add two hundred lines of new code to createplan.c to accomplish this. Why not simply bms_difference the joinrel's relids from the union of the inputs' relids? (And no, I do not believe that computing the value two different ways so you can assert they're the same is adding anything whatsoever except wasted cycles.) By and large, I don't believe that it's necessary for 0003 to depend on 0001 either. We already have the information available from the paths' parent RelOptInfos. 0004: commit msg is not very convincing about why this is a good idea. It really looks like change for the sake of change, so you need to make a better case for it. Also, this output seems inconsistent: Function Scan on pg_catalog.generate_series x - Output: ARRAY(SubPlan 1) + Output: ARRAY(array_1) Function Call: generate_series(1, 3) - SubPlan 1 + SubPlan array_1 Why isn't it now "ARRAY(SubPlan array_1)"? The previous notation was chosen to be not-confusable with a plain function call, but you've lost that. Likewise for cases like - Group Key: (InitPlan 1).col1 + Group Key: (minmax_1).col1 which now looks exactly like an ordinary Var. nitpick: sublinktype_to_string() should return const char *. 0005: I'm even less convinced about there being a need for this. It's not that hard to see which RTIs are in which subplan, especially after the other changes in this patchset. 0006: I agree with the need for this, but the details seem messy, and it's not very clear that the proposed data structure would be convenient to use. Do we really need to rely on plan_node_id? Why haven't you integrated record_elided_node into clean_up_removed_plan_level? An idea maybe worth thinking about is that instead of completely eliding the plan node, we could replace it with a "no-op" plan node that both EXPLAIN and the executor will look right through. That node could carry the relid(s) that we lost. Not sure how messy this'd be to integrate, though. 0007: not sure about this either. Why not simply add those relid(s) to the surviving node's apprelids? Again, I don't love the amount of code and data structure that's being added for a hypothetical use-case. It's not even clear that this form of the data structure would be helpful to anyone. regards, tom lane -
Re: plan shape work
Robert Haas <robertmhaas@gmail.com> — 2025-09-23T15:34:45Z
On Mon, Sep 22, 2025 at 2:15 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I found time finally to look through all of these. Some notes: Thanks. Committed 0001 and 0002 together. I agree with you that 0002 is a bit weird, but I think it is less weird than the status quo ante, and it seems you agree. I think it's best to regard a Result node that replaces a scan or join as being a form of scan or join that just so happens to be able to be optimized to a degree not normally possible. There's a good argument for calling something like this a dummy scan/join, as I mentioned before. There's no compelling reason we have to make that terminology change, as we noted in the earlier discussion, but it's a useful and not incorrect way of thinking about it. > 0003: doesn't feel ready for commit. In the first place, you've used > the "completed an outer join" terminology in the commit message and > a couple of comments, but defined it nowhere. I think a para or > two in optimizer/README to define "starting" and "completing" > outer joins is essential. (I'm also still quite bemused by marking > nodes that complete outer joins but not those that start them.) > In the second place, we should not need to add two hundred lines > of new code to createplan.c to accomplish this. Why not simply > bms_difference the joinrel's relids from the union of the inputs' > relids? (And no, I do not believe that computing the value two > different ways so you can assert they're the same is adding anything > whatsoever except wasted cycles.) So, let me just back up a minute here and talk about overall goals. What I originally wanted to do with this patch is ensure that the non-RTE_JOIN RTIs from the joinrel are all mentioned in the final plan. Without the changes to the Result node, that's not the case; with the changes to the Result node, that is, AFAICT, now the case. It can be argued that what has been numbered 0003 up to now is not really necessary at all, since all it does is make it less likely that we will introduce cases with similar problems to the Result-node case in the future, and that could be judged unlikely enough to make 0003 not worth committing. However, it seems like you might be proposing keeping the patch in some form but throwing out the part of the code that walks the plan tree to collect RTIs, and that doesn't make a whole lot of sense to me. I have some confidence that the joinrel's RTI set will include all of the RTIs from the input rels; what I'm worried about is whether those RTIs survive into the final Plan. On the other hand, I'm not sure that I'm interpreting your remarks correctly. When you say "bms_difference the joinrel's relids from the union of the inputs' relids" maybe you're specifically talking about the handling of the RTE_JOIN relids, and I don't care very much how we account for those. So I guess I need some clarification here as to what your thinking is. With regard to your other comments, I'm not opposed to updating the README, or alternatively, I'm also not opposed to adjusting the wording of the comments and commit message to avoid that particular terminology. As far as your parenthetical comment about not marking outer joins started, I don't actually care what we do, but as I said before, I don't see an efficient way to identify outer joins started at a particular level, and it isn't worth adding planner cycles for information for which I have no concrete need. > 0004: commit msg is not very convincing about why this is a good idea. > It really looks like change for the sake of change, so you need to > make a better case for it. You're right. That commit message presupposes that the goal is already understood, instead of explaining it. See the first message on this thread, in the paragraph that begins "Now let's talk about problem #2," for the justification. Quoting the most relevant part: Subqueries sort of have names right now, at least some of them, but it's an odd system: a CTE subquery, for example, has the name mentioned by the user, but other kinds of subplans just get names like "InitPlan 3" or "SubPlan 2". The real problem, though, is that those names are only assigned after we've FINISHED planned the subquery. If we begin planning our very first subquery, it might turn out to be InitPlan 1 or SubPlan 1, or if while planning it we recurse into some further subquery then *that* subquery might become InitPlan 1 or SubPlan 1 and OUR subquery might become InitPlan 2 or SubPlan 2 (or higher, if we find more subqueries and recurse into them too). Thus, being given some information about how the user wants, say, SubPlan 2 to be planned is completely useless because we won't know whether that is us until after we've done the planning that the user is trying to influence. I need to figure out some good way of explaining this (hopefully not too verbosely) in the commit message. > Also, this output seems inconsistent: > > Function Scan on pg_catalog.generate_series x > - Output: ARRAY(SubPlan 1) > + Output: ARRAY(array_1) > Function Call: generate_series(1, 3) > - SubPlan 1 > + SubPlan array_1 > > Why isn't it now "ARRAY(SubPlan array_1)"? The previous notation was > chosen to be not-confusable with a plain function call, but you've > lost that. Likewise for cases like > - Group Key: (InitPlan 1).col1 > + Group Key: (minmax_1).col1 > which now looks exactly like an ordinary Var. I don't think there's anything keeping me from making that print InitPlan/SubPlan there. I just thought it looked a little verbose that way. I thought that the "InitPlan 1" notation was probably due to the fact that "1" is not a proper name, rather than (as you suggest here) to avoid confusion with other syntax. Compare CTEs, which are also subplans, but for which we simply print the CTE's alias name, rather than "CTE alias_name". > nitpick: sublinktype_to_string() should return const char *. This bleeds into a bunch of other things. I'll poke at it and try to figure something out. I may need help from someone with superior const skills, but I'll give it a go. > 0005: I'm even less convinced about there being a need for this. > It's not that hard to see which RTIs are in which subplan, > especially after the other changes in this patchset. Let me just say that everything in this patch set is the result of experimentation -- trying to write code that attempts to interpret a Plan tree and discovering problems doing so along the way. I'm not altogether convinced that patches 0005-0008 attack the problems in the best possible way and I'm happy to hear suggestions for how to do it better, but I'm pretty confident that they're all trying to tackle real problems. The way to think about 0004 and 0005, IMHO, is this: suppose we look at the final Plan, and we see that RTI 17 was planned in such-and-such a way. If we want to reproduce that planning decision in the next cycle -- or change that planning decision in the next planning cycle -- we need to be able to figure out which subroot RTI 17 came from and what RTI it had within that subroot. With 0004 and 0005 applied, you can figure out that RTI 17 was originally RTI 5 within a subroot that was called expr_1, or whatever, and when we replan the same query we will assign the name expr_1 to the same subroot before we begin planning it and RTI 5 within that subroot will refer to the same thing it did before. This assumes, of course, that the query is the same and that inlining decisions and so forth haven't changed and that no objects have been swapped out for objects with the same name; but the point is that even if none of that has happened we can't match things up in an automated way without this infrastructure, or at least I don't know how to do it. If you do, I'm all ears. I would much rather minimize the amount of information that we have to store in the Plan tree or PlannedStmt and just have code to do the necessary interpretation based on the data that's already available, but I could not see how to make it work. > 0006: I agree with the need for this, but the details seem messy, > and it's not very clear that the proposed data structure would > be convenient to use. Do we really need to rely on plan_node_id? > Why haven't you integrated record_elided_node into > clean_up_removed_plan_level? > > An idea maybe worth thinking about is that instead of completely > eliding the plan node, we could replace it with a "no-op" plan node > that both EXPLAIN and the executor will look right through. > That node could carry the relid(s) that we lost. Not sure how > messy this'd be to integrate, though. There's certainly nothing that requires us to use this particular scheme for identifying the locations where certain plan nodes were elided. For example, we could put an additional List * in each plan node and stash a list of elided nodes there rather than indexing by plan_node_id. The disadvantage of that is simply that it increases the size of the Plan struct for every node, and most nodes will end up with an empty list. Indexing by plan_node_id was just my way of getting the information that I wanted to store out of the tree that the executor examines. We could potentially also do as you say here and keep the Plan nodes around and then somehow arrange to ignore them at explain and execution time, but I don't have a good idea of how we would do that without adding overhead. In terms of the usability of the data structure, experimentation has shown it to be serviceable. There is obviously a risk that with many elided nodes, having to grovel through the list of nodes looking for a certain plan ID over and over again could be inefficient -- but large numbers of elided nodes don't seem all that likely, so maybe it's OK; and I suppose any code that uses this could always build a hash table if warranted. But I'm not saying it's a perfect solution. In terms of why record_elided_node is not integrated into clean_up_removed_plan_level, that was just a stylistic choice. clean_up_removed_plan_level() could instead call record_elided_node(), or record_elided_node() could cease to exist and the logic be inlined into clean_up_removed_plan_level(). > 0007: not sure about this either. Why not simply add those > relid(s) to the surviving node's apprelids? Again, I don't > love the amount of code and data structure that's being added > for a hypothetical use-case. It's not even clear that this > form of the data structure would be helpful to anyone. I'd say that is the patch I'm least sure about. Note that my goal for this commitfest was to get 0001-0004 committed, partly because I wasn't too sure whether the later patches might need some adjustment. My intuition is that flattening the relid sets together loses important information, but I don't have a test case proving that near at hand, so maybe it doesn't. However, I'm fairly certain that it is at least a lot more convenient to have the information in this form. In general, if we see an Append or MergeAppend node generated from a RelOptInfo with a single RTI, that's a partition-wise scan of a partitioned relation, and if we see an Append or MergeAppend node generated from a RelOptInfo with more than one RTI, that's a partition-wise join of all those relations. So, very naively, if we just combine the relid sets for a bunch of partitionwise scans into a single RTI set, it looks like we've got a partitionwise join, but we don't. Now, if it's only set operations that result in multiple levels of Append/MergeAppend nodes getting collapsed, it might be possible to disentangle what actually happened by partitioning the final set of RTIs by the subroot from which they originate. I'm not sure that's how it works, though. At any rate, I agree with you that this is not adequately motivated at present. Having said that, at ten thousand feet, the motivation here is to be able to figure out from the final plan tree where exactly we switched to partitionwise operation -- whether that was done for some joinrel or only when we got down to the underlying baserel. -- Robert Haas EDB: http://www.enterprisedb.com
-
Re: plan shape work
Tom Lane <tgl@sss.pgh.pa.us> — 2025-09-23T21:27:37Z
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Sep 22, 2025 at 2:15 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> In the second place, we should not need to add two hundred lines >> of new code to createplan.c to accomplish this. Why not simply >> bms_difference the joinrel's relids from the union of the inputs' >> relids? > ... On the other hand, I'm not sure that I'm interpreting your remarks > correctly. When you say "bms_difference the joinrel's relids from the > union of the inputs' relids" maybe you're specifically talking about > the handling of the RTE_JOIN relids, and I don't care very much how we > account for those. So I guess I need some clarification here as to > what your thinking is. What I'm saying is that I'd be much happier with 0003 if it looked about like the attached. We do not need a heap of mechanism redundantly proving that the planner is getting these things right (and potentially containing its own bugs). > Note that my goal for > this commitfest was to get 0001-0004 committed, partly because I > wasn't too sure whether the later patches might need some adjustment. Fair enough. I think we can reach agreement on that much pretty quickly. regards, tom lane
-
Re: plan shape work
Robert Haas <robertmhaas@gmail.com> — 2025-09-24T12:27:47Z
On Tue, Sep 23, 2025 at 5:27 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > What I'm saying is that I'd be much happier with 0003 if it looked > about like the attached. We do not need a heap of mechanism > redundantly proving that the planner is getting these things right > (and potentially containing its own bugs). Thanks for the demo. That doesn't actually Assert anything, so the commit message is a lie, but I get the point, and based on that, I think what we should do is just drop 0003 altogether for now. I don't need the ojrelids field for anything currently, and if I or someone else does later, we can always revisit this idea. Or, if it turns out that we later introduce more bugs that my version of 0003 would have caught, we can re-ask the question of whether we want to Assert something. I don't agree with your judgement that this is an unreasonable amount of mechanism for what it checks, but I'm entirely prepared to concede that 0003 is kind of clunky, and I also think that the rate at which we do new things in the planner is low enough that it could easily be decades or forever before we have another problem that this would have caught. Hence, I'm fine with dropping this patch. Let's call it "some code that Robert found useful for personal testing" and move on. > > Note that my goal for > > this commitfest was to get 0001-0004 committed, partly because I > > wasn't too sure whether the later patches might need some adjustment. > > Fair enough. I think we can reach agreement on that much pretty quickly. Cool. Let's focus on 0004 then, and possibly 0005 since it's somewhat related and you seem to have an idea that there could be a better way of solving that problem. That's not necessarily to say that 0005 would get committed this CF, unless we happen to agree vigorously on something, but if there's a way to work around needing 0005 or if it needs to be redone in some other form, it would be good to have some idea around that sooner rather than later. -- Robert Haas EDB: http://www.enterprisedb.com
-
Re: plan shape work
Robert Haas <robertmhaas@gmail.com> — 2025-09-24T14:34:00Z
On Wed, Sep 24, 2025 at 8:27 AM Robert Haas <robertmhaas@gmail.com> wrote: > Cool. Let's focus on 0004 then, and possibly 0005 since it's somewhat > related and you seem to have an idea that there could be a better way > of solving that problem. That's not necessarily to say that 0005 would > get committed this CF, unless we happen to agree vigorously on > something, but if there's a way to work around needing 0005 or if it > needs to be redone in some other form, it would be good to have some > idea around that sooner rather than later. Here's a new patch set. 0004 is now 0001 and similarly all other patch numbers are -3, since the old 0001 and 0002 were committed together and the 0003 is abandoned. I made the following changes to old-0004/new-0001: - I rewrote the commit message. I'm not really sure this is any clearer about the motivation for this patch, but I tried. Suggestions appreciated. - CI was complaining about a warning from sublinktype_to_string, which I've tried to suppress by adding a dummy return to the end of the function. - You (Tom) complained about the lack of const on sublinktype_to_string, so this version has been const-ified. The const bled into the arguments to choose_plan_name() and subquery_planner(), and into the plan_name structure members within PlannerInfo and SubPlan. I don't know if this is the right thing to do, so feel free to set me straight. - You (Tom) also asked why not print InitPlan/SubPlan wherever we refer to subplans, so this version restores that behavior. I did that by putting logic to print the InitPlan or SubPlan prefix in ruleutils.c, while not including InitPlan or SubPlan in the SubPlan's plan_name field any more. The reason for this is that the purpose of the patch set is to assign names before planning, and the decision as to whether something is an InitPlan or a SubPlan is made after planning, so keeping "InitPlan" or "SubPlan" in the actual plan name undermines the whole point of the patch. I would argue that this is actually better on philosophical grounds, because I think that the fact that something is an expression rather than an EXISTS clause or whatever is part of the identify of the resulting object, whereas I think whether something is an InitPlan or a SubPlan is mostly interesting as a performance characteristic rather than as a definer of identity. I don't necessarily expect this position to be accepted without debate, but I prefer the EXPLAIN output with the patch to the pre-patch output. The remaining patches are simply rebased and are only included here in case we want to discuss how they could be better/different/unnecessary, especially what's now 0002, rather than because I'm looking to get something committed right away. Thanks, -- Robert Haas EDB: http://www.enterprisedb.com
-
Re: plan shape work
Tom Lane <tgl@sss.pgh.pa.us> — 2025-09-24T22:03:19Z
Robert Haas <robertmhaas@gmail.com> writes: > Here's a new patch set. 0004 is now 0001 and similarly all other patch > numbers are -3, since the old 0001 and 0002 were committed together > and the 0003 is abandoned. I made the following changes to > old-0004/new-0001: I've not looked at all of these patches, but here's a review of v9-0001. > - I rewrote the commit message. I'm not really sure this is any > clearer about the motivation for this patch, but I tried. Suggestions > appreciated. It's much better, thanks. > - You (Tom) complained about the lack of const on > sublinktype_to_string, so this version has been const-ified. The const > bled into the arguments to choose_plan_name() and subquery_planner(), > and into the plan_name structure members within PlannerInfo and > SubPlan. I don't know if this is the right thing to do, so feel free > to set me straight. I don't think so. We do not have a nice story on marking Node fields const: it's very unclear for example what consequences that ought to have for copyObject(). Maybe somebody will tackle that issue someday, but it's not something to touch casually in a patch with other objectives. So I don't think we can make the plan_name fields const. The best solution I think is to make choose_plan_name() take a const string and return a non-const one. The attached v10-0001 is just like your v9-0001 except for doing the const stuff this way. I chose to fix the impedance mismatch within choose_plan_name() by having it pstrdup when it wants to just return the "name" string, but you could make a case for holding your nose and just casting away const there. > - You (Tom) also asked why not print InitPlan/SubPlan wherever we > refer to subplans, so this version restores that behavior. Thanks. I'm good with the output now (modulo the bug described below). Someone could potentially argue that this exposes more of the internals than we really ought to, such as the difference between expr and multiexpr SubLinks, but I'm okay with that. Aside from the const issue, something I don't really like at the coding level is the use of an "allroots" list. One reason is that it's partially redundant with the adjacent "subroots" list, but a bigger one is that we have transient roots that shouldn't be in there. An example here is pull_up_simple_subquery: it builds a clone of the query's PlannerInfo to help it use various infrastructure along the way to flattening the subquery, but that clone is not referenced anymore after the function exits. You were putting that into allroots, which seems to me to be a fundamental error, even more so because it went in with the same plan_name as the root it was cloned from. I think a better idea is to keep a list of just the subplan names that we've assigned so far. That has a far clearer charter, plus it can be updated immediately by choose_plan_name() instead of relying on the caller to do the right thing later. I coded this up, and was rather surprised to find that it changed some regression outputs. On investigation, that's because build_minmax_path() was actually doing the wrong thing later: it was putting the wrong root into allroots, so that "minmax_1" never became assigned and could be re-used later. I also observed that SS_process_ctes() was not using choose_plan_name() but simply assigning the user-written CTE name. I believe it's possible to use the same CTE name in different parts of a query tree, so this fails to achieve the stated purpose of making the names unique. I'm still a little bit uncomfortable about whether it's okay for pull_up_simple_subquery() to just do + subroot->plan_name = root->plan_name; rather than giving some other name to the transient subroot. I think it's okay because we are not making any meaningful planning decisions during the life of the subroot, just seeing if we can transform the subquery into a form that allows it to be pulled up. But you might think differently. Perhaps a potential compromise is to set the transient subroot's plan_name to NULL instead? Anyway, v10-0002 is a delta patch to use a list of subplan names instead of "allroots", and there are a couple of trivial cosmetic changes too. regards, tom lane
-
Re: plan shape work
Robert Haas <robertmhaas@gmail.com> — 2025-09-25T13:10:54Z
On Wed, Sep 24, 2025 at 6:03 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I don't think so. We do not have a nice story on marking Node fields > const: it's very unclear for example what consequences that ought to > have for copyObject(). Maybe somebody will tackle that issue someday, > but it's not something to touch casually in a patch with other > objectives. So I don't think we can make the plan_name fields const. > The best solution I think is to make choose_plan_name() take a const > string and return a non-const one. The attached v10-0001 is just like > your v9-0001 except for doing the const stuff this way. I chose to > fix the impedance mismatch within choose_plan_name() by having it > pstrdup when it wants to just return the "name" string, but you could > make a case for holding your nose and just casting away const there. Yeah, these are the kinds of trade-offs I never know how to make. I have a little difficulty believing that it's worth a pstrdup() to have the benefit of marking things const, but maybe it is, and the difference is probably microscopic either way. It's not like we're going to be faced with many planning problems involving gigantic numbers of subqueries. If this were a hot code path we'd want to think harder. > > - You (Tom) also asked why not print InitPlan/SubPlan wherever we > > refer to subplans, so this version restores that behavior. > > Thanks. I'm good with the output now (modulo the bug described > below). Someone could potentially argue that this exposes more > of the internals than we really ought to, such as the difference > between expr and multiexpr SubLinks, but I'm okay with that. I actually thought it looked kind of nice exposing some of that detail, but it's certainly a judgement call. > Aside from the const issue, something I don't really like at the > coding level is the use of an "allroots" list. One reason is that > it's partially redundant with the adjacent "subroots" list, but > a bigger one is that we have transient roots that shouldn't be > in there. An example here is pull_up_simple_subquery: it builds > a clone of the query's PlannerInfo to help it use various > infrastructure along the way to flattening the subquery, but > that clone is not referenced anymore after the function exits. > You were putting that into allroots, which seems to me to be > a fundamental error, even more so because it went in with the > same plan_name as the root it was cloned from. > > I think a better idea is to keep a list of just the subplan > names that we've assigned so far. That has a far clearer > charter, plus it can be updated immediately by choose_plan_name() > instead of relying on the caller to do the right thing later. > I coded this up, and was rather surprised to find that it changed > some regression outputs. On investigation, that's because > build_minmax_path() was actually doing the wrong thing later: > it was putting the wrong root into allroots, so that "minmax_1" > never became assigned and could be re-used later. Ooph, that's embarrassing. I think the reason that I ended up making a list of the roots themselves rather than the strings is that I was thinking that everything in this data structure would need to be a node, and I didn't want to cons up String nodes for every list element. Then later I marked that structure member read_write_ignore and never stopped to think that maybe then we didn't need nodes at all. So, in short, I think this is a great solution and thanks a ton for putting in the legwork to figure it out. > I also observed that SS_process_ctes() was not using > choose_plan_name() but simply assigning the user-written CTE > name. I believe it's possible to use the same CTE name in > different parts of a query tree, so this fails to achieve > the stated purpose of making the names unique. Ouch, that's another good catch. Also, even if the CTEs had to be unique among themselves, there could be a collision between a CTE name and a generated name. > I'm still a little bit uncomfortable about whether > it's okay for pull_up_simple_subquery() to just do > > + subroot->plan_name = root->plan_name; > > rather than giving some other name to the transient subroot. > I think it's okay because we are not making any meaningful planning > decisions during the life of the subroot, just seeing if we can > transform the subquery into a form that allows it to be pulled up. > But you might think differently. Perhaps a potential compromise > is to set the transient subroot's plan_name to NULL instead? I don't like NULL. Imagine that some meaningful planning decision does get made during the life of the subroot, and imagine further that some hook is called by means of which some extension can influence that decision. What plan_name do we want that extension to see? I think there's some argument for letting it see the plan name of the parent plan into which we're thinking about inlining the subquery, which I believe is the effect of the current coding, and there's perhaps also an argument for having a wholly new plan name there just to really identify that particular decision clearly, but if we put NULL, that's the name we use for the topmost query level. If we changed things so that the top query level gets called "main" or some other constant string, then it would make sense to use NULL as a sentinel value here to mean "undefined," but I kind of think we probably don't want to go there. > Anyway, v10-0002 is a delta patch to use a list of subplan > names instead of "allroots", and there are a couple of trivial > cosmetic changes too. OK, I'll try out these versions and let you know what I find out. Thanks much. -- Robert Haas EDB: http://www.enterprisedb.com
-
Re: plan shape work
Tom Lane <tgl@sss.pgh.pa.us> — 2025-09-25T13:43:24Z
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Sep 24, 2025 at 6:03 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I think a better idea is to keep a list of just the subplan >> names that we've assigned so far. That has a far clearer >> charter, plus it can be updated immediately by choose_plan_name() >> instead of relying on the caller to do the right thing later. >> I coded this up, and was rather surprised to find that it changed >> some regression outputs. On investigation, that's because >> build_minmax_path() was actually doing the wrong thing later: >> it was putting the wrong root into allroots, so that "minmax_1" >> never became assigned and could be re-used later. > Ooph, that's embarrassing. I think the reason that I ended up making a > list of the roots themselves rather than the strings is that I was > thinking that everything in this data structure would need to be a > node, and I didn't want to cons up String nodes for every list > element. Then later I marked that structure member read_write_ignore > and never stopped to think that maybe then we didn't need nodes at > all. So, in short, I think this is a great solution and thanks a ton > for putting in the legwork to figure it out. I think if we do decide later that the list-of-names needs to be Nodes, then converting them to String nodes is a perfectly fine solution. As you remark elsewhere, none of this is going to be more than microscopic compared to the overall cost of planning a subplan. But for right now, yeah, we don't need it. Anyway, it seems the only remaining issue is what name pull_up_simple_subquery should give to its transient root clone. I don't actually believe that it matters, so if you're content with re-using the parent name, let's just roll with that until someone complains. regards, tom lane
-
Re: plan shape work
Alexandra Wang <alexandra.wang.oss@gmail.com> — 2025-09-26T01:20:24Z
Hi there, I've tried v10-000{1,2}+v9-0002 and v9-000{1,2}. I was curious whether the names choose_plan_name() chose for subqueries match the Subquery Scan names in the EXPLAIN plan. My guess is that since the former is chosen before planning and the latter after planning, they might differ. I think it's probably ok to have different naming mechanisms as long as the names are unique within themselves. But in case anyone else cares about the naming inconsistency, here's an example that shows it. -- applied patches I've applied v10-0001, v10-0002 and v9-0002. I needed v9-0002 because I want to see the plan_names in the debug plan and in the EXPLAIN(RANGE_TABLE) plan with pg_overexplain. (Applying v9-000{1,2} instead should give the same results) -- setup CREATE TABLE r (a int, b int); CREATE TABLE s (c int, d int); LOAD 'pg_overexplain'; SET debug_print_plan to on; SET client_min_messages to 'log'; -- query EXPLAIN (range_table, costs off) SELECT * FROM (SELECT a FROM (SELECT a, b FROM r WHERE b > 42 ORDER BY a) UNION ALL (SELECT c FROM (SELECT c, d FROM s WHERE d > 24 ORDER BY d))); -- plan QUERY PLAN ---------------------------------------------- Append Append RTIs: 1 -> Subquery Scan on unnamed_subquery_1 Scan RTI: 4 -> Sort Sort Key: r.a -> Seq Scan on r Filter: (b > 42) Scan RTI: 6 -> Subquery Scan on unnamed_subquery_2 Scan RTI: 5 -> Sort Sort Key: s.d -> Seq Scan on s Filter: (d > 24) Scan RTI: 7 RTI 1 (subquery, inherited, in-from-clause): Eref: unnamed_subquery (a) RTI 2 (subquery): Eref: unnamed_subquery (a) RTI 3 (subquery): Eref: unnamed_subquery (c) RTI 4 (subquery, in-from-clause): Eref: unnamed_subquery (a, b) RTI 5 (subquery, in-from-clause): Eref: unnamed_subquery (c, d) RTI 6 (relation, in-from-clause): Subplan: unnamed_subquery Eref: r (a, b) Relation: r Relation Kind: relation Relation Lock Mode: AccessShareLock Permission Info Index: 1 RTI 7 (relation, in-from-clause): Subplan: unnamed_subquery_1 Eref: s (c, d) Relation: s Relation Kind: relation Relation Lock Mode: AccessShareLock Permission Info Index: 2 Unprunable RTIs: 6 7 (41 rows) -- interesting part of the debug plan: :subplans <> :subrtinfos ( {SUBPLANRTINFO :plan_name unnamed_subquery :rtoffset 5 :dummy false } {SUBPLANRTINFO :plan_name unnamed_subquery_1 :rtoffset 6 :dummy false } It appears that in the EXPLAIN plan the subqueries are named "unnamed_subquery" (does not show up in the EXPLAIN output), "unnamed_subquery_1", and "unnamed_subquery_2"; whereas in the RTIs section from pg_overexplain, as well as in the debug plan's :subrtinfos section, the subplans are named "unnamed_subquery" and "unnamed_subquery_1". IIUC, the Subquery Scan names in the query plan, for example: -> Subquery Scan on unnamed_subquery_2 is the name assigned to this Subquery Scan node of RTI: 5, after planning. And the Subplan name of an RTI with pg_overexplain, for example: RTI 7 (relation, in-from-clause): Subplan: unnamed_subquery_1 Eref: s (c, d) is the Subplan name chosen before planning. RTI 7 here maps to the SUBPLANRTINFO with ":rtoffset 6" in the debug plan, which means it belongs to the Subplan named "unnamed_subquery_1". This is what I think causes confusion, because from the query plan we see that RTI 7 is under the Subquery Scan on "unnamed_subquery_2", not "unnamed_subquery_1". I think technically this is not a problem, since we can uniquely identify the Subplans using the names assigned before planning, and we can also uniquely identify the Subquery Scans in the EXPLAIN plan using the names assigned after planning. Still, I found it a bit confusing when looking at the EXPLAIN(RANGE_TABLE) output, where the same name "unnamed_subquery_1" not only doesn't mean the same plan node, but also not in the same branch of the plan tree. Thoughts? Best, Alex -
Re: plan shape work
Robert Haas <robertmhaas@gmail.com> — 2025-09-26T01:40:56Z
On Thu, Sep 25, 2025 at 9:21 PM Alexandra Wang <alexandra.wang.oss@gmail.com> wrote: > I've tried v10-000{1,2}+v9-0002 and v9-000{1,2}. I was curious whether > the names choose_plan_name() chose for subqueries match the Subquery > Scan names in the EXPLAIN plan. My guess is that since the former is > chosen before planning and the latter after planning, they might > differ. I think it's probably ok to have different naming mechanisms > as long as the names are unique within themselves. But in case anyone > else cares about the naming inconsistency, here's an example that > shows it. Yeah. Technically, these are not the same names: one set of names is the names assigned to the subqueries, and the other is the set of names assigned to the relations that get scanned by subquery scans. This may seem like a technicality, but that's not entirely the case, because in each case the chosen names are unique. If, for example, there were a table named unnamed_subquery that we were subquery-scanning, then you couldn't also have a subquery scan of that table, but you could still have a subquery with that name. But maybe there's still some way to improve this. It would probably be hard to make it perfect because of the fact that EXPLAIN names are unique across all relations in the query, as noted above. However, we might be able to make it so the names match in the absence of name conflicts with user specified aliases or table names. Or maybe we should consider some larger change to the EXPLAIN format so that we display subquery names instead of relation names. -- Robert Haas EDB: http://www.enterprisedb.com -
Re: plan shape work
Tom Lane <tgl@sss.pgh.pa.us> — 2025-09-26T01:53:30Z
Robert Haas <robertmhaas@gmail.com> writes: > But maybe there's still some way to improve this. It would probably be > hard to make it perfect because of the fact that EXPLAIN names are > unique across all relations in the query, as noted above. However, we > might be able to make it so the names match in the absence of name > conflicts with user specified aliases or table names. Or maybe we > should consider some larger change to the EXPLAIN format so that we > display subquery names instead of relation names. I'm kind of down on that last idea, because relation names/aliases are user-supplied or at least user-suppliable, whereas what choose_plan_name() generates is not under user control. So if we try to make this match, it should be in the direction of using the parent query's subquery alias whenever possible. But I fear that it'll be hard to make this match up exactly unless we choose to take the selection of unique relation aliases out of EXPLAIN altogether and make the planner do it. Which seems like a bad idea, because then we're paying that cost all the time, and it's not small for big queries. regards, tom lane
-
Re: plan shape work
Richard Guo <guofenglinux@gmail.com> — 2025-09-26T02:26:16Z
FWIW, I'm a bit concerned about the double for loop inside choose_plan_name(), especially since the outer loop runs with a true condition. Maybe I'm just worrying over nothing, as we probably don't expect a large number of subroots in practice, but the nested loops still make me a little uneasy. - Richard
-
Re: plan shape work
Tom Lane <tgl@sss.pgh.pa.us> — 2025-09-26T02:37:20Z
Richard Guo <guofenglinux@gmail.com> writes: > FWIW, I'm a bit concerned about the double for loop inside > choose_plan_name(), especially since the outer loop runs with a true > condition. Maybe I'm just worrying over nothing, as we probably don't > expect a large number of subroots in practice, but the nested loops > still make me a little uneasy. I really doubt that a query could have enough subplans to make that a problem. But if I'm wrong, it's surely something we could improve in a localized way later. regards, tom lane
-
Re: plan shape work
Richard Guo <guofenglinux@gmail.com> — 2025-09-26T06:29:38Z
On Fri, Sep 26, 2025 at 11:37 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Richard Guo <guofenglinux@gmail.com> writes: > > FWIW, I'm a bit concerned about the double for loop inside > > choose_plan_name(), especially since the outer loop runs with a true > > condition. Maybe I'm just worrying over nothing, as we probably don't > > expect a large number of subroots in practice, but the nested loops > > still make me a little uneasy. > I really doubt that a query could have enough subplans to make > that a problem. But if I'm wrong, it's surely something we could > improve in a localized way later. I'm concerned not only about the potential for a large number of subplans but also because if there happens to be a bug within the nested loops, the always-true condition in the outer loop could cause an infinite loop. However, if you're confident that the code inside the loop is completely bug-free and will remain so through future changes, then this shouldn't be an issue. Looking at choose_plan_name(), IIUC, the nested loop is used to find the next unused suffix number for a given name. I'm wondering why not simply iterate through glob->subplanNames once, check the suffix number for each name matching the given base name, determine the current maximum suffix, and then use "max_suffix + 1" as the next unused suffix. This approach requires only a single pass through the list, and if there's a bug, the worst-case scenario would be a duplicate name rather than an infinite loop. It seems to me that this approach is both more efficient and less risky. - Richard
-
Re: plan shape work
Tom Lane <tgl@sss.pgh.pa.us> — 2025-09-26T14:23:21Z
Richard Guo <guofenglinux@gmail.com> writes: > Looking at choose_plan_name(), IIUC, the nested loop is used to find > the next unused suffix number for a given name. I'm wondering why not > simply iterate through glob->subplanNames once, check the suffix > number for each name matching the given base name, determine the > current maximum suffix, and then use "max_suffix + 1" as the next > unused suffix. This approach requires only a single pass through the > list, and if there's a bug, the worst-case scenario would be a > duplicate name rather than an infinite loop. It seems to me that this > approach is both more efficient and less risky. "simply" is perhaps not the right adjective there. My guess is that this approach nets out to more code, more possibilities for bugs (especially in cases where one name is a prefix of another), and will be slower in typical cases with just a few subplan names. As an example of edge cases that your idea introduces, what happens if a user-written subquery name is "expr_999999999999999999999999" and then we need to generate a unique name based on "expr"? Now we have an integer-overflow situation to worry about, with possibly platform-dependent results. But it's Robert's patch, so he gets to make the call. regards, tom lane
-
Re: plan shape work
Robert Haas <robertmhaas@gmail.com> — 2025-09-26T14:23:29Z
On Fri, Sep 26, 2025 at 2:29 AM Richard Guo <guofenglinux@gmail.com> wrote: > Looking at choose_plan_name(), IIUC, the nested loop is used to find > the next unused suffix number for a given name. I'm wondering why not > simply iterate through glob->subplanNames once, check the suffix > number for each name matching the given base name, determine the > current maximum suffix, and then use "max_suffix + 1" as the next > unused suffix. This approach requires only a single pass through the > list, and if there's a bug, the worst-case scenario would be a > duplicate name rather than an infinite loop. It seems to me that this > approach is both more efficient and less risky. I feel like the current coding is more straightforward and this should be considered in terms of the chance of having bugs. Doing as you propose here would require starting with a prefix match of the string and then attempting a string-to-integer conversion on the remaining bytes. That's certainly doable but such things tend to be a bit fiddly: you have to make sure you do the right thing when you see a non-digit and when the value overflows or is zero. It wouldn't be that hard to get it right, but I think it would be a little trickier than we have now. If we find that the performance cost of this function is too high in some scenario, replacing it within an implementation along these lines would make sense to me, but I am not too worried about the current logic accidentally causing an infinite loop. I don't think that will happen but even if it does it should be a simple fix. -- Robert Haas EDB: http://www.enterprisedb.com
-
Re: plan shape work
Richard Guo <guofenglinux@gmail.com> — 2025-09-29T01:47:50Z
On Fri, Sep 26, 2025 at 11:23 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Richard Guo <guofenglinux@gmail.com> writes: > > Looking at choose_plan_name(), IIUC, the nested loop is used to find > > the next unused suffix number for a given name. I'm wondering why not > > simply iterate through glob->subplanNames once, check the suffix > > number for each name matching the given base name, determine the > > current maximum suffix, and then use "max_suffix + 1" as the next > > unused suffix. This approach requires only a single pass through the > > list, and if there's a bug, the worst-case scenario would be a > > duplicate name rather than an infinite loop. It seems to me that this > > approach is both more efficient and less risky. > "simply" is perhaps not the right adjective there. My guess is that > this approach nets out to more code, more possibilities for bugs > (especially in cases where one name is a prefix of another), and > will be slower in typical cases with just a few subplan names. > > As an example of edge cases that your idea introduces, what happens > if a user-written subquery name is "expr_999999999999999999999999" > and then we need to generate a unique name based on "expr"? Now > we have an integer-overflow situation to worry about, with possibly > platform-dependent results. I'd argue that this hypothetical edge case can be resolved with a bit of canonicalization in how subplan names are represented internally. I think the issue you mentioned arises because there is no clearly distinction between the base name and the numeric suffix. I haven't spent much time thinking about it, but an off-the-cuff idea is to require that all subplan names in glob->subplanNames end with a suffix of the form "_<number>". (If no numeric suffix is required, we can use the suffix "_0".) With this convention, we can simply split on the last underscore: everything before it is the base name, and everything after is the numeric suffix. The user-written subquery name "expr_999999999999999999999999" would be internally represented as "expr_999999999999999999999999_0". Then, when we need to generate a unique name based on "expr", it won't match with the base name of that subquery name. With this canonicalization in place, my proposed approach is simply a matter of applying strrchr(name, '_') and tracking the maximum suffix number in a single pass over glob->subplanNames. I think this can be handled with straightforward, basic C code. It seems to me that this could also eliminate the need for the additional loop under the "if (!always_number)" branch in choose_plan_name(). By replacing a pass over the subplanNames list plus a nested loop with a single pass, I doubt that this would be slower in typical cases. - Richard
-
Re: plan shape work
Tom Lane <tgl@sss.pgh.pa.us> — 2025-09-29T02:41:47Z
Richard Guo <guofenglinux@gmail.com> writes: > On Fri, Sep 26, 2025 at 11:23 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> As an example of edge cases that your idea introduces, what happens >> if a user-written subquery name is "expr_999999999999999999999999" >> and then we need to generate a unique name based on "expr"? Now >> we have an integer-overflow situation to worry about, with possibly >> platform-dependent results. > I'd argue that this hypothetical edge case can be resolved with a bit > of canonicalization in how subplan names are represented internally. [ raised eyebrow... ] How did you get to that from the complaint that Robert's patch was not obviously bug-free? (A complaint I thought was unmerited, but nevermind.) This proposal is neither simple, nor obviously bug-free. Moreover, in view of comments upthread, I think we should look with great suspicion on any proposal that involves changing user-supplied subquery aliases unnecessarily. regards, tom lane
-
Re: plan shape work
Richard Guo <guofenglinux@gmail.com> — 2025-09-29T08:52:18Z
On Mon, Sep 29, 2025 at 11:41 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Richard Guo <guofenglinux@gmail.com> writes: > > On Fri, Sep 26, 2025 at 11:23 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> As an example of edge cases that your idea introduces, what happens > >> if a user-written subquery name is "expr_999999999999999999999999" > >> and then we need to generate a unique name based on "expr"? Now > >> we have an integer-overflow situation to worry about, with possibly > >> platform-dependent results. > > I'd argue that this hypothetical edge case can be resolved with a bit > > of canonicalization in how subplan names are represented internally. > [ raised eyebrow... ] How did you get to that from the complaint > that Robert's patch was not obviously bug-free? (A complaint I > thought was unmerited, but nevermind.) I'm not sure I fully understand your point here. Apologies if I got it wrong. Firstly, my intention in the previous email was merely to propose a solution for my approach to address the edge case you raised. I don't see how this relates to my so-called "complaint" about Robert's patch not being obviously bug-free. You raised a case where my approach won't work, and I provided a solution to address it. That's all. Secondly, I don't think it's fair to characterize my concern as a complaint when I expressed that the nested loop with an always-true condition is vulnerable to bugs and could potentially cause an infinite loop if such a bug exists. In a nearby thread, I was asked whether I can guarantee my code is 100% bug-free. After some consideration, I think I cannot make such a guarantee, and I doubt that anyone realistically can. Given this, I think it's important that we try our best to write code that minimizes the potential bad-effect should a bug occur. Therefore, upon observing a nested loop with an always-true condition in the patch, I expressed my concern and suggested a possible improvement. However, I did not expect that concern to be treated as an unmerited complaint. > This proposal is neither > simple, nor obviously bug-free. Moreover, in view of comments > upthread, I think we should look with great suspicion on any > proposal that involves changing user-supplied subquery aliases > unnecessarily. It seems no one has attempted to code up the approach I suggested, so I went ahead and did it; please see the attached PoC patch. It's just a proof of concept to show what I have in mind, so please excuse the lack of comments and necessary assertions for now. I agree that this implementation cannot be guaranteed to be bug-free, but I'm not sure I agree that it's not simple. I'm also not convinced that it would be slower in typical cases. BTW, a small nit I just noticed: I suggest explicitly initializing glob->subplanNames in standard_planner(). It may be argued that this is pointless, as makeNode() zeroes all fields by default. But AFAICS subplanNames is the only field in PlannerGlobal that is not explicitly initialized. - Richard
-
Re: plan shape work
Robert Haas <robertmhaas@gmail.com> — 2025-09-29T14:12:31Z
On Mon, Sep 29, 2025 at 4:52 AM Richard Guo <guofenglinux@gmail.com> wrote: > It seems no one has attempted to code up the approach I suggested, so > I went ahead and did it; please see the attached PoC patch. It's just > a proof of concept to show what I have in mind, so please excuse the > lack of comments and necessary assertions for now. I think that if there are subqueries named expr_1 and expr_3, this will assign the name expr_4 next, whereas the previous patch assigns the name expr_2. That might not be a bug, but it's different and I don't like it as well. I also think that if there are subqueries named expr_1a and expr_2a, this will assign the name expr_3 next, whereas the previous patch assigns the name expr_1. I would consider that a clear bug. Your code will also see expr_01 and decide that the next name should be expr_2 rather than expr_1. I don't think that's right, either. I also think that it's a bug that your function sometimes returns a value different from the one it appends to canon_names. I don't really understand why you're so fixed on this point. I think that the code as I wrote it is quite a normal way to write code for that kind of thing. Sure, there are other things that we could do, but I wrote the code that way I did precisely in order to avoid behaviors like the ones I mention above. It would be possible to rearrange the code so that the termination condition for the loop was something like "!found", or to rewrite the loop as a do { ... } while (!found) construct as we do in set_rtable_names() for a very similar problem to the code that this is solving, but I think the generated machine code would be exactly the same and the code would not look as intuitive for a human to read. Somebody else might have had a different stylistic preference, but if you are going to object every time I write for (;;) or while (1), you're going to hate an awful lot of my code for, IMHO, very little reason. It's reasonable to be concerned about whether a loop will ever terminate, but the mere fact of putting the loop exit someplace other than the top of the loop isn't enough to say that there's a problem. -- Robert Haas EDB: http://www.enterprisedb.com -
Re: plan shape work
Tom Lane <tgl@sss.pgh.pa.us> — 2025-09-29T14:27:57Z
Robert Haas <robertmhaas@gmail.com> writes: > I don't really understand why you're so fixed on this point. I think > that the code as I wrote it is quite a normal way to write code for > that kind of thing. Also, we have numerous other places that generate de-duplicated names in pretty much this way (ruleutils.c's set_rtable_names being a very closely related case). I don't think we should go inventing some random new way to do that. If it turns out that Robert's code is too slow in practice, I would prefer to deal with that by using a hashtable to keep track of already-allocated names, not by changing the user-visible behavior. I'm content to wait for field complaints before building such logic though, because I really doubt that queries would ever have so many subplans as to be a problem. regards, tom lane
-
Re: plan shape work
Richard Guo <guofenglinux@gmail.com> — 2025-09-30T01:59:10Z
On Mon, Sep 29, 2025 at 11:12 PM Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Sep 29, 2025 at 4:52 AM Richard Guo <guofenglinux@gmail.com> wrote: > > It seems no one has attempted to code up the approach I suggested, so > > I went ahead and did it; please see the attached PoC patch. It's just > > a proof of concept to show what I have in mind, so please excuse the > > lack of comments and necessary assertions for now. > I think that if there are subqueries named expr_1 and expr_3, this > will assign the name expr_4 next, whereas the previous patch assigns > the name expr_2. That might not be a bug, but it's different and I > don't like it as well. > > I also think that if there are subqueries named expr_1a and expr_2a, > this will assign the name expr_3 next, whereas the previous patch > assigns the name expr_1. I would consider that a clear bug. > > Your code will also see expr_01 and decide that the next name should > be expr_2 rather than expr_1. I don't think that's right, either. > > I also think that it's a bug that your function sometimes returns a > value different from the one it appends to canon_names. I don't think any of the bugs you described upthread exist in the PoC patch. The patch ensures that all names stored in glob->subplanNames are canonicalized to the format "$basename_$suffixnum". If no numeric suffix is required, the name is stored with a "_0" suffix. This guarantees a clear distinction between the base name and the numeric suffix for all names stored in the list. (Please note that this canonicalization applies only to how names are stored internally, not to user-visible names. So no user-visible behavior change here.) I don't really see how uncanonicalized names like expr_1a, expr_2a, or expr_01 would appear in the subplanNames list to begin with. Perhaps you're referring to user-supplied subquery aliases? In that case, the patch deliberately avoids matching expr to those names, since it only compares the base name. As a result, it would assign expr_1 as the next name, which is the expected behavior. This PoC patch passes all the regression tests, which at least, IMHO, suggests that it avoids such basic bugs. However, since both you and Tom feel this proposal doesn't make sense, I'll withdraw it. Apologies for any trouble this has caused. - Richard
-
Re: plan shape work
Robert Haas <robertmhaas@gmail.com> — 2025-10-06T19:13:59Z
On Mon, Sep 29, 2025 at 9:59 PM Richard Guo <guofenglinux@gmail.com> wrote: > I don't think any of the bugs you described upthread exist in the PoC > patch. The patch ensures that all names stored in glob->subplanNames > are canonicalized to the format "$basename_$suffixnum". If no numeric > suffix is required, the name is stored with a "_0" suffix. This > guarantees a clear distinction between the base name and the numeric > suffix for all names stored in the list. (Please note that this > canonicalization applies only to how names are stored internally, not > to user-visible names. So no user-visible behavior change here.) I studied your patch in more detail today and I find that you are correct. It doesn't do what I thought it did. However, it also doesn't guarantee uniqueness. Here is an example: robert.haas=# explain select (select random() limit 1), (select random() limit 2) from (select * from pg_class limit 1) expr_1; WARNING: choose_plan_name for name "expr" returns "expr_1" WARNING: choose_plan_name for name "expr" returns "expr_2" WARNING: choose_plan_name for name "expr_1" returns "expr_1" QUERY PLAN ------------------------------------------------------------------------- Subquery Scan on expr_1 (cost=0.03..0.08 rows=1 width=16) InitPlan expr_1 -> Limit (cost=0.00..0.01 rows=1 width=8) -> Result (cost=0.00..0.01 rows=1 width=8) InitPlan expr_2 -> Limit (cost=0.00..0.01 rows=1 width=8) -> Result (cost=0.00..0.01 rows=1 width=8) -> Limit (cost=0.00..0.04 rows=1 width=240) -> Seq Scan on pg_class (cost=0.00..18.16 rows=416 width=240) (9 rows) When I originally looked at the patch, I believed that the use of atoi() without sanity checking was going to cause the sorts of problems I was mentioning. That was wrong, since you never let a user-specified name leak directly into the list, but only names to which your code has added a numeric suffix. But that also means that you don't get conflicts between names that, in reality, should conflict. -- Robert Haas EDB: http://www.enterprisedb.com -
Re: plan shape work
Robert Haas <robertmhaas@gmail.com> — 2025-10-06T19:36:49Z
I see that Richard's PoC last patch confused cfbot. Here's a new version of just the patch proposed for commit for CfBot testing.
-
Re: plan shape work
Richard Guo <guofenglinux@gmail.com> — 2025-10-09T02:24:49Z
On Tue, Oct 7, 2025 at 4:37 AM Robert Haas <robertmhaas@gmail.com> wrote: > I see that Richard's PoC last patch confused cfbot. Here's a new > version of just the patch proposed for commit for CfBot testing. Does it make sense to explicitly initialize glob->subplanNames in standard_planner()? I understand this might seem pointless since makeNode() zeroes all fields by default, but subplanNames is currently the only field in PlannerGlobal that isn't explicitly initialized. I previously committed a patch (2c0ed86d3) to ensure all PlannerGlobal fields are explicitly initialized, and I'd prefer to maintain that consistency. I actually suggested the same in [1] (the last paragraph), but it seems to have been overlooked. [1] https://postgr.es/m/CAMbWs4-ysLvZiWp=w5=+noCMdX9FHFrrc0Wuk-TcUz1RDmEbkQ@mail.gmail.com - Richard
-
Re: plan shape work
Tom Lane <tgl@sss.pgh.pa.us> — 2025-10-09T03:04:37Z
Richard Guo <guofenglinux@gmail.com> writes: > Does it make sense to explicitly initialize glob->subplanNames in > standard_planner()? I understand this might seem pointless since > makeNode() zeroes all fields by default, but subplanNames is currently > the only field in PlannerGlobal that isn't explicitly initialized. I > previously committed a patch (2c0ed86d3) to ensure all PlannerGlobal > fields are explicitly initialized, and I'd prefer to maintain that > consistency. We don't really have consensus on that point, I fear. I like the initialize-em-all-explicitly approach, but some other senior hackers think it's useless verbiage. My argument for doing it explicitly is that when adding a new field to a struct, one frequently searches for existing references to a nearby field. Without initialize-em-all, this risks missing places where you need to initialize your new field. If you'd only set it to zero, then fine ... but what if that particular place needs some other initial value? So I think omitting initializations-to-zero risks future bugs of omission. Some other folk don't find that argument very compelling, though. regards, tom lane
-
Re: plan shape work
Robert Haas <robertmhaas@gmail.com> — 2025-10-09T12:34:56Z
On Wed, Oct 8, 2025 at 11:04 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > We don't really have consensus on that point, I fear. I like the > initialize-em-all-explicitly approach, but some other senior hackers > think it's useless verbiage. I'm in the "useless verbiage" camp. > My argument for doing it explicitly is that when adding a new field > to a struct, one frequently searches for existing references to a > nearby field. Without initialize-em-all, this risks missing places > where you need to initialize your new field. If you'd only set it > to zero, then fine ... but what if that particular place needs some > other initial value? So I think omitting initializations-to-zero > risks future bugs of omission. Some other folk don't find that > argument very compelling, though. This problem does not typically happen for me because it is my habit to start by grepping for makeNode(Whatever) -- or for relevant palloc calls, for non-Node types -- at the very start of my research into any topic, and only later to look into specific fields. There might be a consistency argument for doing this in this case, if other nearby fields are doing the same thing, and if one of you wants to go and make it so I have much better things to do than spend time complaining about it. But any code I write is likely to rely on makeNode() or palloc0() to zero fields wherever relying on such a thing is convenient, unless somebody forces me to do otherwise in a particular case. -- Robert Haas EDB: http://www.enterprisedb.com
-
Re: plan shape work
Robert Haas <robertmhaas@gmail.com> — 2025-12-01T20:31:06Z
On Wed, Sep 24, 2025 at 6:03 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Aside from the const issue, something I don't really like at the > coding level is the use of an "allroots" list. One reason is that > it's partially redundant with the adjacent "subroots" list, but > a bigger one is that we have transient roots that shouldn't be > in there. An example here is pull_up_simple_subquery: it builds > a clone of the query's PlannerInfo to help it use various > infrastructure along the way to flattening the subquery, but > that clone is not referenced anymore after the function exits. > You were putting that into allroots, which seems to me to be > a fundamental error, even more so because it went in with the > same plan_name as the root it was cloned from. > > I think a better idea is to keep a list of just the subplan > names that we've assigned so far. That has a far clearer > charter, plus it can be updated immediately by choose_plan_name() > instead of relying on the caller to do the right thing later. > I coded this up, and was rather surprised to find that it changed > some regression outputs. On investigation, that's because > build_minmax_path() was actually doing the wrong thing later: > it was putting the wrong root into allroots, so that "minmax_1" > never became assigned and could be re-used later. Today, I discovered a disadvantage of the change from allroots to subplanNames. The concern that you raise about transient roots still seems entirely valid to me. However, the allroots list - if correctly constructed to exclude such transient roots - seems to me to have potential utility for extensions that subplanNames doesn't. What I wanted to do was use planner_shutdown_hook to copy some information from each PlannerInfo to the extension_state field of the PlannedStmt and, without allroots, there's no easy way to find all of them. I do think there are other ways to solve that problem. For instance, we could add a hook that's called when we invoke subquery_planner, similar to the way that planner_hook can be used to intercept calls to planner(). The new hook could then do whatever it likes at the end of each call to subquery_planner(). That has the disadvantage of making each call to subquery_planner() a little more expensive, but it might turn out that such a hook has other utility. For what I was trying to do today, I can probably even solve it using set_join_pathlist_hook. I wanted to capture the relid sets of all baserels and joinrels where rel->unique_rel is non-NULL, and I think that I could do that by having set_join_pathlist_hook add notice when save_jointype is JOIN_UNIQUE_INNER or JOIN_UNIQUE_OUTER, and then add innerrel or outerrel as appropriate to a list hanging off an object attached using Get/SetPlannerGlobalExtensionState, making sure not to add the same one more than once. But that only works in this specific case, and it's pretty indirect even for that. So I'm wondering if we ought to step back and rethink a bit. subplanNames ensures that we assign a different name to every PlannerInfo, but it doesn't give you any help finding the PlannerInfo given the name, or enumerating all of the PlannerInfo objects that exist. If we went back to allroots, and just fixed the problem of temporary entries creeping into the list, the core code wouldn't really notice the difference, but I think extensions would have an easier time. Thoughts? -- Robert Haas EDB: http://www.enterprisedb.com
-
Re: plan shape work
Robert Haas <robertmhaas@gmail.com> — 2025-12-08T19:06:35Z
On Mon, Dec 1, 2025 at 3:31 PM Robert Haas <robertmhaas@gmail.com> wrote: > So I'm wondering if we ought to step back and rethink a bit. > subplanNames ensures that we assign a different name to every > PlannerInfo, but it doesn't give you any help finding the PlannerInfo > given the name, or enumerating all of the PlannerInfo objects that > exist. If we went back to allroots, and just fixed the problem of > temporary entries creeping into the list, the core code wouldn't > really notice the difference, but I think extensions would have an > easier time. > > Thoughts? Here's a couple of patches to bring back allroots. The first one adds allroots, makes choose_plan_name() use it, and adds assertions that the contents of allroots and subplanNames correspond. The second one removes subplanNames. I imagine that these would be collapsed for commit, but testing 0001 without 0002 seems useful for peace of mind. -- Robert Haas EDB: http://www.enterprisedb.com