Thread

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →

Add pg_plan_advice contrib module.
- 5883ff30b02c 19 (unreleased) landed
Store information about Append node consolidation in the final plan.
- 7358abcc6076 19 (unreleased) landed
Store information about elided nodes in the final plan.
- 0d4391b265f8 19 (unreleased) landed
Store information about range-table flattening in the final plan.
- adbad833f3d9 19 (unreleased) landed
Allow for plugin control over path generation strategies.
- 4020b370f214 19 (unreleased) landed
Allow passing a pointer to GetNamedDSMSegment()'s init callback.
- 48d4a1423d2e 19 (unreleased) cited
Don't reset the pathlist of partitioned joinrels.
- 014f9a831a32 19 (unreleased) cited

pg_plan_advice

Robert Haas <robertmhaas@gmail.com> — 2025-10-30T14:00:05Z

As I have mentioned on previous threads, for the past while I have
been working on planner extensibility. I've posted some extensibility
patches previously, and got a few of them committed in
Sepember/October with Tom's help, but I think the time has come a
patch which actually makes use of that infrastructure as well as some
further infrastructure that I'm also including in this posting.[1] The
final patch in this series adds a new contrib module called
pg_plan_advice. Very briefly, what pg_plan_advice knows how to do is
process a plan and emits a (potentially long) long text string in a
special-purpose mini-language that describes a bunch of key planning
decisions, such as the join order, selected join methods, types of
scans used to access individual tables, and where and how
partitionwise join and parallelism were used. You can then set
pg_plan_advice.advice to that string to get a future attempt to plan
the same query to reproduce those decisions, or (maybe a better idea)
you can trim that string down to constrain some decisions (e.g. the
join order) but not others (e.g. the join methods), or (if you want to
make your life more exciting) you can edit that advice string and
thereby attempt to coerce the planner into planning the query the way
you think best. There is a README that explains the design philosophy
and thinking in a lot more detail, which is a good place to start if
you're curious, and I implore you to read it if you're interested, and
*especially* if you're thinking of flaming me.

But that doesn't mean that you *shouldn't* flame me. There are a
remarkable number of things that someone could legitimately be unhappy
about in this patch set. First, any form of user control over the
planner tends to be a lightning rod for criticism around here. I've
come to believe that's the wrong way of thinking about it: we can want
to improve the planner over the long term and *also* want to have
tools available to work around problems with it in the short term.
Further, we should not imagine that we're going to solve problems that
have stumped other successful database projects any time in the
foreseeable future; no product will ever get 100% of cases right, and
you don't need to get to very obscure cases before other products
throw up their hands just as we do. But, second, even if you're OK
with the idea of some kind of user control over the planner, you could
very well be of the opinion that what I've implemented here is utter
crap. I've certainly had to make a ton of very opinionated decisions
to get to this point, and you are entitled to hate them. Of course, I
did have *reasons* for making the decisions, so if your operating
theory as to why I did something is that I'm a stupid moron, perhaps
consider an alternative explanation or two as well. Finally, even if
you're OK with the concept and feel that I've made some basically
reasonable design decisions, you might notice that the code is full of
bugs, needs a lot of cleanup, is missing features, lacks
documentation, and a bunch of other stuff. In that judgement, you
would be absolutely correct. I'm not posting it here because I'm
hoping to get it committed in November -- or at least, not THIS
November.  What I would like to do is getting some design feedback on
the preliminary patches, which I think will be more possible if
reviewers also have the main pg_plan_advice to look at as a way of
understanding why the exist, and also some feedback on the
pg_plan_advice patch itself.

Now I do want to caveat the statement that I am looking for feedback
just a little bit. I imagine that there will be some people reading
this who are already imagining how great life will be when they put
this into production, and begin complaining about either (1) features
that it's missing or (2) things that they don't like about the design
of the advice mini-language. What I'd ask you to keep in mind is that
you will not be able to put this into production unless and until
something gets committed, and getting this committed is probably going
to be super-hard even if you don't creep the scope, so maybe don't do
that, especially if you haven't read the README yet to understand what
the scope is actually intended to be. The details of the advice
mini-language are certainly open to negotiation; of everything, that
would be one of the easier things to change. However, keep in mind
that there are probably LOTS AND LOTS of people who all have their own
opinions about what decisions I should have made when designing that
mini-language, and an outcome where you personally get everything you
want and everyone who disagrees is out of luck is unlikely. In other
words, constructive suggestions for improvement are welcome, but
please think twice before turning this into a bikeshedding nightmare.
Now is the time to talk about whether I've got the overall design
somewhat correct moreso than whether I've spelled everything the way
you happen to prefer.[2]

I want to mention that, beyond the fact that I'm sure some people will
want to use something like this (with more feature and a lot fewer
bugs) in production, it seems to be super-useful for testing. We have
a lot of regression test cases that try to coerce the planner to do a
particular thing by manipulating enable_* GUCs, and I've spent a lot
of time trying to do similar things by hand, either for regression
test coverage or just private testing. This facility, even with all of
the bugs and limitations that it currently has, is exponentially more
powerful than frobbing enable_* GUCs. Once you get the hang of the
advice mini-language, you can very quickly experiment with all sorts
of plan shapes in ways that are currently very hard to do, and thereby
find out how expensive the planner thinks those things are and which
ones it thinks are even legal. So I see this as not only something
that people might find useful for in production deployments, but also
something that can potentially be really useful to advance PostgreSQL
development.

Which brings me to the question of where this code ought to go if it
goes anywhere at all. I decided to propose pg_plan_advice as a contrib
module rather than a part of core because I had to make a WHOLE lot of
opinionated design decisions just to get to the point of having
something that I could post and hopefully get feedback on. I figured
that all of those opinionated decisions would be a bit less
unpalatable if they were mostly encapsulated in a contrib module, with
the potential for some future patch author to write a different
contrib module that adopted different solutions to all of those
problems. But what I've also come to realize is that there's so much
infrastructure here that leaving the next person to reinvent it may
not be all that appealing. Query jumbling is a previous case where we
initially thought that different people might want to do different
things, but eventually realized that most people really just wanted
some solution that they didn't have to think too hard about. Likewise,
in this patch, the relation identifier system described in the README
is the only thing of its kind, to my knowledge, and any system that
wants to accomplish something similar to what pg_plan_advice does
would need a system like that. pg_hint_plan doesn't have something
like that, because pg_hint_plan is just trying to do hints. This is
trying to do round-trip-safe plan stability, where the system will
tell you how to refer unambiguously to a certain part of the query in
a way that will work correctly on every single query regardless of how
it's structured or how many times it refers to the same tables or to
different tables using the same aliases. If we say that we're never
going to put any of that infrastructure in core, then anyone who wants
to write a module to control the planner is going to need to start by
either (a) reinventing something similar, (b) cloning all the relevant
code, or (c) just giving up on the idea of unambiguous references to
parts of a query. None of those seem like great options, so now I'm
less sure whether contrib is actually the right place for this code,
but that's where I have put it for now. Feedback welcome, on this and
everything else.

Perhaps more than any other patch I've ever written, I know I'm
playing with fire here just by putting this out on the list, but I'm
nevertheless hopeful that something good can come of it, and I hope we
can have a constructive discussion about what that thing should be. I
think there is unquestionably is a lot of demand for the ability to
influence the planner in some form, but there is a lot of room for
debate about what exactly that should mean in practice. While I
personally am pretty happy with the direction of the code I've
written, modulo the large amount of not-yet-completed bug fixing and
cleanup, there's certainly plenty of room for other people to feel
differently, and finding out what other people think is, of course,
the whole point of posting things publicly before committing them --
or in this case, before even finishing them.[3] If you're interested
it contributing to the conversation, I urge you to start with the
following things: (1) the README in the final patch; (2) the
regression test examples in the final patch, which give a good sense
of what it actually looks like to use this; and (3) the earlier
patches, which show the minimum amount of core infrastructure that I
think we need in order to make something like this workable (ideas on
how to further reduce that footprint are very welcome).

Thanks,

--
Robert Haas
EDB: http://www.enterprisedb.com

[1] All of the earlier patches have been posted previously in some
form, but the commit messages have been rewritten for clarity, and the
"Allow for plugin control over path generation strategies" patch has
been heavily rewritten since it was last posted; the earlier versions
turned out to have substantial inadequacies.

[2] This is not to say that proposal to modify or improve the syntax
are unwelcome, but the bigger obstacle to getting something committed
here is probably reaching some agreement on the internal details. Any
changes to src/include/optimizer or src/backend/optimizer need careful
scrutiny from a design perspective. Also, keep in mind that the syntax
needs to fit what we can actually do: a proposal to change the syntax
to something that implies semantics we can't implement is a dead
letter.

[3] Note, however, that a proposal to achieve the same or similar
goals by different means is more welcome than a proposal that I should
have done some other project entirely. I've already put a lot of work
into these goals and hope to achieve them, at least to some degree,
before I start working toward something else.

Re: pg_plan_advice

Jakub Wartak <jakub.wartak@enterprisedb.com> — 2025-10-31T09:58:59Z

On Thu, Oct 30, 2025 at 3:00 PM Robert Haas <robertmhaas@gmail.com> wrote:

[..over 400kB of attachments, yay]

Thank You for working on this!

My gcc-13 was nitpicking a little bit (see
compilation_warnings_v1.txt), so attached is just a tiny diff to fix
some of those issues. After that, clang-20 run was clean too.

> First, any form of user control over the planner tends to be a lightning rod for criticism around here.

I do not know where this is coming from, but everybody I've talked to
was saying this is needed to handle real enterprise databases and
applications. I just really love it, how one could precisely adjust
the plan with this even with the presence of heavy aliasing:

postgres=# explain (plan_advice, costs off) SELECT * FROM (select *
from t1 a join t2 b using (id)) a, t2 b, t3 c WHERE a.id = b.id and
b.id = c.id;
                     QUERY PLAN
-----------------------------------------------------
 Merge Join
   Merge Cond: (a.id = c.id)
   ->  Merge Join
         Merge Cond: (a.id = b.id)
         ->  Index Scan using t1_pkey on t1 a
         ->  Index Scan using t2_pkey on t2 b
   ->  Sort
         Sort Key: c.id
         ->  Seq Scan on t3 c
 Supplied Plan Advice:
   SEQ_SCAN(ble5) /* not matched */
 Generated Plan Advice:
   JOIN_ORDER(a#2 b#2 c)
   MERGE_JOIN_PLAIN(b#2 c)
   SEQ_SCAN(c)
   INDEX_SCAN(a#2 public.t1_pkey )
   NO_GATHER(c a#2 b#2)
(17 rows)

postgres=# set pg_plan_advice.advice = 'SEQ_SCAN(b#2)';
SET
postgres=# explain (plan_advice, costs off) SELECT * FROM (select *
from t1 a join t2 b using (id)) a, t2 b, t3 c WHERE a.id = b.id and
b.id = c.id;
                     QUERY PLAN
----------------------------------------------------
 Hash Join
   Hash Cond: (b.id = a.id)
   ->  Seq Scan on t2 b
   ->  Hash
         ->  Merge Join
               Merge Cond: (a.id = c.id)
               ->  Index Scan using t1_pkey on t1 a
               ->  Sort
                     Sort Key: c.id
                     ->  Seq Scan on t3 c
 Supplied Plan Advice:
   SEQ_SCAN(b#2) /* matched */
 Generated Plan Advice:
   JOIN_ORDER(b#2 (a#2 c))
   MERGE_JOIN_PLAIN(c)
   HASH_JOIN(c)
   SEQ_SCAN(b#2 c)
   INDEX_SCAN(a#2 public.t1_pkey)
   NO_GATHER(c a#2 b#2)

To attract a little attention to the $thread, the only bigger design
(usability) question that keeps ringing in my head is how we are going
to bind it to specific queries without even issuing any SETs(or ALTER
USER) in the far future in the grand scheme of things. The discussed
query id (hash), full query text comparison, maybe even strstr(query ,
"partial hit") or regex all seem to be kind too limited in terms of
what crazy ORMs can come up with (each query will be potentially
slightly different, but if optimizer reference points are stable that
should nail it good enough, but just enabling it for the very specific
set of queries and not the others [with same aliases] is some major
challenge).

Due to this, at some point I was even thinking about some hashes for
every plan node (including hashes of subplans), e.g.:
 Merge Join // hash(MERGE_JOIN_PLAIN(b#2) + ';' somehashval1 + ';'+
somehahsval2 ) => somehashval3
   Merge Cond: (a.id = c.id)
   ->  Merge Join
         Merge Cond: (a.id = b.id)
         ->  Index Scan using t1_pkey on t1 a // hash(INDEX_SCAN(a#2
public.t1_pkey)) => somehashval1
         ->  Index Scan using t2_pkey on t2 b // hash(INDEX_SCAN(b#2
public.t2_pkey)) => somehashval2

and then having a way to use `somehashval3` (let's say it's SHA1) as a
way to activate the necessary advice. Something like having a way to
express it using plan_advice.on_subplanhashes_plan_advice =
'somehashval3: SEQ_SCAN(b#2)'. This would have the benefit of being
able to override multiple similiar SQL queries in one go rather than
collecting all possible query_ids, but probably it's stupid,
heavyweight, but that would be my dream ;)

-J.

Re: pg_plan_advice

Robert Haas <robertmhaas@gmail.com> — 2025-10-31T12:51:34Z

On Fri, Oct 31, 2025 at 5:59 AM Jakub Wartak
<jakub.wartak@enterprisedb.com> wrote:
> > First, any form of user control over the planner tends to be a lightning rod for criticism around here.
>
> I do not know where this is coming from, but everybody I've talked to
> was saying this is needed to handle real enterprise databases and
> applications. I just really love it, how one could precisely adjust
> the plan with this even with the presence of heavy aliasing:

Thanks for the kind words.

I'll respond to the points about compiler warnings later.

> To attract a little attention to the $thread, the only bigger design
> (usability) question that keeps ringing in my head is how we are going
> to bind it to specific queries without even issuing any SETs(or ALTER
> USER) in the far future in the grand scheme of things. The discussed
> query id (hash), full query text comparison, maybe even strstr(query ,
> "partial hit") or regex all seem to be kind too limited in terms of
> what crazy ORMs can come up with (each query will be potentially
> slightly different, but if optimizer reference points are stable that
> should nail it good enough, but just enabling it for the very specific
> set of queries and not the others [with same aliases] is some major
> challenge).

Yeah, I haven't really dealt with this problem yet.

> Due to this, at some point I was even thinking about some hashes for
> every plan node (including hashes of subplans),
[...]
>
> and then having a way to use `somehashval3` (let's say it's SHA1) as a
> way to activate the necessary advice. Something like having a way to

This doesn't make sense to me, because it seems circular. We can't use
anything in the plan to choose which advice string to use, because the
purpose of the advice string is to influence the choice of plan. In
other words, our choice of what advice string to use has to be based
on the properties of the query, not the plan. We can implement
anything we want to do in terms of exactly how that works: we can use
the query ID, or the query text, or the query node tree.
Hypothetically, we could call out to a user-defined function and pass
the query text or the query node tree as an argument and let it do
whatever it wants to decide on an advice string. The practical problem
here is computational cost -- any computation that gets performed for
every single query is going to have to be pretty cheap to avoid
creating a performance problem. That's why I thought matching on query
ID or exact matching on query text would likely be the most practical
approaches, aside from the obvious alternative of setting and
resetting pg_plan_advice.advice manually. But I haven't really
explored this area too much yet, because I need to get all the basics
working first.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg_plan_advice

Alastair Turner <minion@decodable.me> — 2025-10-31T21:17:41Z

On Fri, 31 Oct 2025, 12:51 Robert Haas, <robertmhaas@gmail.com> wrote:

> On Fri, Oct 31, 2025 at 5:59 AM Jakub Wartak
> <jakub.wartak@enterprisedb.com> wrote:
> > > First, any form of user control over the planner tends to be a
> lightning rod for criticism around here.
> >
> > I do not know where this is coming from, but everybody I've talked to
> > was saying this is needed to handle real enterprise databases and
> > applications. I just really love it, how one could precisely adjust
> > the plan with this even with the presence of heavy aliasing:
>

I really like the functionality of the current patch as well, even though I
am suspicious of user control over the planner. By giving concise, precise
control over a plan, this allows people who believe they can out-plan the
planner to test their alternative, and possibly fail.

Whatever other UIs and integrations you build as you develop this towards
you goal, please keep what's currently there user accessible. Not only for
testing code, but also for testing users' belief that they know better.

Alastair

Re: pg_plan_advice

Hannu Krosing <hannuk@google.com> — 2025-11-01T16:10:08Z

This weas recently shared in LinkedIn
https://www.vldb.org/pvldb/vol18/p5126-bress.pdf

For example it says that 31% of all queries are metadata queries, 78%
have LIMIT, 20% of queries have 10+ joins, with 0.52% exceeding 100
joins. , 12% of expressions have depths between 11-100 levels, some
exceeding 100. These deeply nested conditions create optimization
challenges benchmarks don't capture.etc.

This reinforces my belief thet we either should have some kind of
two-level optimization, where most queries are handled quickly but
with something to trigger a more elaborate optimisation and
investigation workflow.

Or alternatively we could just have an extra layer before the query is
sent to the database which deals with unwinding the product of
excessively stupid query generators (usually, but not always, some BI
tools :) )


On Fri, Oct 31, 2025 at 10:18 PM Alastair Turner <minion@decodable.me> wrote:
>
>
> On Fri, 31 Oct 2025, 12:51 Robert Haas, <robertmhaas@gmail.com> wrote:
>>
>> On Fri, Oct 31, 2025 at 5:59 AM Jakub Wartak
>> <jakub.wartak@enterprisedb.com> wrote:
>> > > First, any form of user control over the planner tends to be a lightning rod for criticism around here.
>> >
>> > I do not know where this is coming from, but everybody I've talked to
>> > was saying this is needed to handle real enterprise databases and
>> > applications. I just really love it, how one could precisely adjust
>> > the plan with this even with the presence of heavy aliasing:
>
>
> I really like the functionality of the current patch as well, even though I am suspicious of user control over the planner. By giving concise, precise control over a plan, this allows people who believe they can out-plan the planner to test their alternative, and possibly fail.
>
> Whatever other UIs and integrations you build as you develop this towards you goal, please keep what's currently there user accessible. Not only for testing code, but also for testing users' belief that they know better.
>
> Alastair

Re: pg_plan_advice

Robert Haas <robertmhaas@gmail.com> — 2025-11-03T16:18:08Z

On Fri, Oct 31, 2025 at 5:17 PM Alastair Turner <minion@decodable.me> wrote:
> I really like the functionality of the current patch as well, even though I am suspicious of user control over the planner. By giving concise, precise control over a plan, this allows people who believe they can out-plan the planner to test their alternative, and possibly fail.

Indeed. The downside of letting people control anything is that they
may leverage that control to do something bad. However, I think it is
unlikely that very many people would prefer to write an entire query
plan by hand. If you wanted to do that, why would you being using
PostgreSQL in the first place? Furthermore, if somebody does try to do
that, I expect that they will find it frustrating and difficult: the
planner considers a large number of options even for simple queries
and an absolutely vast number of options for more difficult queries,
and a human being trying possibilities one by one is only ever going
to consider a tiny fraction of those possibilities. The ideal
possibility often won't be in that small subset of the search space,
and the user will be wasting their time. If that were the design goal
of this feature, I don't think it would be worth having.

But it isn't. As I say in the README, what I consider the principal
use case is reproducing plans that you know to have worked well in the
past. Sometimes, the planner is correct for a while and then it's
wrong later. We don't need to accept the proposition that users can
out-plan the planner. We only need to accept that they can tell good
plans from bad plans better than the planner. That is a low bar to
clear. The planner never finds out what happens when the plans that it
generates are actually executed, but users do. If they are
sufficiently experienced, they can make reasonable judgements about
whether the plan they're currently getting is one they'd like to
continue getting. Of course, they may make wrong judgements even then,
because they lack knowledge or experience or just make a mistake, but
it's not a farcically unreasonable thing to do. I've basically never
wanted to write my own query plan from scratch, but I've certainly
looked at many plans over the years and judged them to be great, or
terrible, or good for now but risky in the long-term; and I'm probably
not the only human being on the planet capable of making such
judgements with some degree of competence.

> Whatever other UIs and integrations you build as you develop this towards you goal, please keep what's currently there user accessible. Not only for testing code, but also for testing users' belief that they know better.

And this is also a good point. Knowledgeable and experienced users can
look at a plan that the planner generated, feel like it's bad, and
wonder why the planner picked it. You can try to figure that out by,
for example, setting enable_SOMETHING = false and re-running EXPLAIN,
but since there aren't that many such knobs relevant to any given
query, and since changing any of those knobs can affect large swathes
of the query and not just the part you're trying to understand better,
it can actually be really difficult to understand why the planner
thought that something was the best option. Sometimes you can't even
tell whether the planner thinks that the plan you expected to be
chosen is *impossible* or just *more expensive*, which is always one
of the things that I'm keen to find out when something weird is
happening. This can make answering that question a great deal easier.
If some important index is not getting used, you can say "no, really,
I want to see what happens with this query when you plan it with that
index" -- and then it either gives you a plan that does use that
index, and you can see how much more expensive it is and why, or it
still doesn't give you a plan using that index, and you know that the
index is inapplicable to the query or unusable in general for some
reason. You don't necessarily have it as a goal to coerce the planner
in production; your goal may very well be to find out why your belief
that you know better is incorrect.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg_plan_advice

Robert Haas <robertmhaas@gmail.com> — 2025-11-03T16:41:35Z

On Sat, Nov 1, 2025 at 12:10 PM Hannu Krosing <hannuk@google.com> wrote:
> This reinforces my belief thet we either should have some kind of
> two-level optimization, where most queries are handled quickly but
> with something to trigger a more elaborate optimisation and
> investigation workflow.
>
> Or alternatively we could just have an extra layer before the query is
> sent to the database which deals with unwinding the product of
> excessively stupid query generators (usually, but not always, some BI
> tools :) )

I'd like to keep the focus of this thread on the patches that I'm
proposing, rather than other ideas for improving the planner. I
actually agree with you that at least the first of these things might
be a very good idea, but that would be an entirely separate project
from these patches, and I feel a lot more qualified to do this project
than that one.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg_plan_advice

John Naylor <johncnaylorls@gmail.com> — 2025-11-04T11:47:48Z

On Thu, Oct 30, 2025 at 9:00 PM Robert Haas <robertmhaas@gmail.com> wrote:
> First, any form of user control over the
> planner tends to be a lightning rod for criticism around here. I've
> come to believe that's the wrong way of thinking about it: we can want
> to improve the planner over the long term and *also* want to have
> tools available to work around problems with it in the short term.

The most frustrating real-world incidents I've had were in the course
of customers planning a major version upgrade, or worse, after
upgrading and finding that a 5 minute query now takes 5 hours. I
mention this to emphasize that workarounds will be needed also to deal
with rare unintended effects that arise from our very attempts to
improve the planner.

> Further, we should not imagine that we're going to solve problems that
> have stumped other successful database projects any time in the
> foreseeable future; no product will ever get 100% of cases right, and
> you don't need to get to very obscure cases before other products
> throw up their hands just as we do.

Right.

> it seems to be super-useful for testing. We have
> a lot of regression test cases that try to coerce the planner to do a
> particular thing by manipulating enable_* GUCs, and I've spent a lot
> of time trying to do similar things by hand, either for regression
> test coverage or just private testing. This facility, even with all of
> the bugs and limitations that it currently has, is exponentially more
> powerful than frobbing enable_* GUCs. Once you get the hang of the
> advice mini-language, you can very quickly experiment with all sorts
> of plan shapes in ways that are currently very hard to do, and thereby
> find out how expensive the planner thinks those things are and which
> ones it thinks are even legal. So I see this as not only something
> that people might find useful for in production deployments, but also
> something that can potentially be really useful to advance PostgreSQL
> development.

That sounds very useful as well.

--
John Naylor
Amazon Web Services

Re: pg_plan_advice

Robert Haas <robertmhaas@gmail.com> — 2025-11-04T19:54:44Z

On Fri, Oct 31, 2025 at 5:59 AM Jakub Wartak
<jakub.wartak@enterprisedb.com> wrote:
> My gcc-13 was nitpicking a little bit (see
> compilation_warnings_v1.txt), so attached is just a tiny diff to fix
> some of those issues. After that, clang-20 run was clean too.

Here's v2. Change log:

- Attempted to fix the compiler warnings. I didn't add elog() before
pg_unreachable() as you suggested; instead, I added a dummy return
afterwards. Let's see if that works. Also, I decided after reading the
comment for list_truncate() that what I'd done there was not going to
be acceptable, so I rewrote the code slightly. It now copies the list
when adding to it, instead of relying on the ability to use
list_truncate() to recreate the prior tstate.

- Deleted the SQL-callable pg_parse_advice function and related code.
That was useful to me early in development but I don't think anyone
will need it at this point; if you want to test whether an advice
string can be parsed, just try setting pg_plan_advice.advice.

- Fixed a couple of dumb bugs in pgpa_trove.c.

- Added a few more regression test scenarios.

- Fixed a couple of typos/thinkos.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg_plan_advice

Robert Haas <robertmhaas@gmail.com> — 2025-11-06T16:45:47Z

Here's v3. I've attempted to fix some more things that cfbot didn't
like, one of which was an actual bug in 0005, and I also fixed a
stupid few bugs in pgpa_collector.c and added a few more tests.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg_plan_advice

Matheus Alcantara <matheusssilv97@gmail.com> — 2025-11-17T14:42:45Z

Hi

On Thu Nov 6, 2025 at 1:45 PM -03, Robert Haas wrote:
> Here's v3. I've attempted to fix some more things that cfbot didn't
> like, one of which was an actual bug in 0005, and I also fixed a
> stupid few bugs in pgpa_collector.c and added a few more tests.
>
I've spent some time playing with these patches. I still don't have to
much comments on the syntax yet but I've noticed a small bug or perhaps
I'm missing something?

When I run CREATE EXTENSION pg_plan_advice I'm able to use the
EXPLAIN(plan_advice) but if try to open another connection, with the
extension already previously created, I'm unable to use once I drop and
re-create the extension.

tpch=# create extension pg_plan_advice;
ERROR:  extension "pg_plan_advice" already exists
tpch=# explain(plan_advice) select 1;
ERROR:  unrecognized EXPLAIN option "plan_advice"
LINE 1: explain(plan_advice) select 1;
                ^
tpch=# drop extension pg_plan_advice ;
DROP EXTENSION
tpch=# create extension pg_plan_advice;
CREATE EXTENSION
tpch=# explain(plan_advice) select 1;
                QUERY PLAN
------------------------------------------
 Result  (cost=0.00..0.01 rows=1 width=4)
 Generated Plan Advice:
   NO_GATHER("*RESULT*")


And thanks for working on this. I think that this can be a very useful
feature for both users and for postgres hackers, +1 for the idea.

-- 
Matheus Alcantara
EDB: http://www.enterprisedb.com

Re: pg_plan_advice

Robert Haas <robertmhaas@gmail.com> — 2025-11-17T15:09:36Z

On Mon, Nov 17, 2025 at 9:42 AM Matheus Alcantara
<matheusssilv97@gmail.com> wrote:
> I've spent some time playing with these patches. I still don't have to
> much comments on the syntax yet but I've noticed a small bug or perhaps
> I'm missing something?

Cool, thanks for looking. I am guessing that the paucity of feedback
thus far is partly because there's a lot of stuff to absorb -- though
the main point at this stage is really to get some opinions on the
planner infrastructure/hooks, which don't necessarily require full
understanding of (never mind agreement with) the design of
pg_plan_advice itself.

> When I run CREATE EXTENSION pg_plan_advice I'm able to use the
> EXPLAIN(plan_advice) but if try to open another connection, with the
> extension already previously created, I'm unable to use once I drop and
> re-create the extension.

This is just an idiosyncrasy of PostgreSQL's extension framework.
Whether or not EXPLAIN (PLAN_ADVICE) works depends on whether the
shared module has been loaded, not whether the extension has been
created. The purpose of CREATE EXTENSION is to put SQL objects, such
as function definitions, into the database, but there's no SQL
required to enable EXPLAIN (PLAN_ADVICE) -- or for setting the
pg_plan_advice.advice GUC. However, running CREATE EXTENSION to
establish the function definitions will incidentally load the shared
module into that particular session.

Therefore, the best way to use this module is to add pg_plan_advice to
shared_preload_libraries. Alternatively, you can use
session_preload_libraries or run LOAD in an individual session. If you
don't care about the collector interface, that's really all you need.
If you do care about the collector interface, then in addition you
will need to run CREATE EXTENSION, so that the SQL functions needed to
access it are available.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg_plan_advice

Robert Haas <robertmhaas@gmail.com> — 2025-11-18T16:19:24Z

Here's v4. This version has some bug fixes and test case changes to
0005 and 0006, with the goal of getting CI to pass cleanly (which it
now does for me, but let's see if cfbot agrees).

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg_plan_advice

Dian Fay <di@nmfay.com> — 2025-11-23T00:43:40Z

On Tue Nov 18, 2025 at 11:19 AM EST, Robert Haas wrote:
> Here's v4. This version has some bug fixes and test case changes to
> 0005 and 0006, with the goal of getting CI to pass cleanly (which it
> now does for me, but let's see if cfbot agrees).

Thanks for working on this, Robert! I think the design seems solid (and
very powerful) from a user perspective. I was curious what would happen
with row-level security interactions so I tried it out on a toy example
I put together a while back. I found one case where scan advice fails on
an intentionally naive/bad policy implementation, but I'm not sure why
and it seems like the kind of weird corner case that might be useful to
reason about. See attached for the setup script, then:

set pg_plan_advice.advice = 'BITMAP_HEAP_SCAN(item public.item_tags_idx)';
set item_reader.allowed_tags = '{alpha,beta}';
set role item_reader;

explain (plan_advice, analyze, verbose, costs, timing)
select * from item
where value ilike 'a%' and tags && array[1];

Seq Scan on public.item  (cost=0.00..41777312.00 rows=54961 width=67) (actual time=2.947..8603.333 rows=6762.00 loops=1)
 Disabled: true
 Output: item.id, item.value, item.tags
 Filter: (EXISTS(SubPlan exists_1) AND (item.value ~~* 'a%'::text) AND (item.tags && '{1}'::integer[]))
 Rows Removed by Filter: 993238
 Buffers: shared hit=1012312
 SubPlan exists_1
   ->  Seq Scan on public.tag  (cost=0.00..41.75 rows=1 width=0) (actual time=0.008..0.008 rows=0.21 loops=1000000)
         Filter: ((current_setting('item_reader.allowed_tags'::text) IS NOT NULL) AND ((current_setting('item_reader.allowed_tags'::text))::text[] @> ARRAY[tag.name]) AND (item.tags @> ARRAY[tag.id]))
         Rows Removed by Filter: 18
         Buffers: shared hit=1000000
Planning Time: 1.168 ms
Supplied Plan Advice:
 BITMAP_HEAP_SCAN(item public.item_tags_idx) /* matched, failed */
Generated Plan Advice:
 SEQ_SCAN(item tag@exists_1)
 NO_GATHER(item tag@exists_1)
Execution Time: 8603.615 ms

Since the policies don't contain any execution boundaries, all the quals
should be going into a single bucket for planning if I understand the
process correctly. The bitmap heap scan should be a candidate given the
`tags &&` predicate (and indeed if I switch to a privileged role, the
advice matches successfully without any policies in the mix), but gdb
shows the walker bouncing out of pgpa_walker_contains_scan without any
candidate scans for the BITMAP_HEAP_SCAN strategy.

I do want to avoid getting bikesheddy about the advice language so I'll
forbear from syntax discussion, but one design thought with lower-level
implications did occur to me as I was playing with this: it might be
useful in some situations to influence the planner _away_ from known
worse paths while leaving it room to decide on the best other option. I
think the work you did in path management should make this pretty
straightforward for join and scan strategies, since it looks like you've
basically made the enable_* gucs a runtime-configurable bitmask (which
seems like a perfectly reasonable approach to my "have done some source
diving but not an internals hacker" eyes), and could disable one as
easily as forcing one.

"Don't use this one index" sounds more fiddly to implement, but also
less valuable since in that case you probably already know which other
index it should be using.

Re: pg_plan_advice

Robert Haas <robertmhaas@gmail.com> — 2025-11-24T16:14:27Z

On Sat, Nov 22, 2025 at 7:43 PM Dian Fay <di@nmfay.com> wrote:
> Thanks for working on this, Robert!

Thanks for looking at it! I was hoping for a bit more in the way of
responses by now, honestly.

> Since the policies don't contain any execution boundaries, all the quals
> should be going into a single bucket for planning if I understand the
> process correctly. The bitmap heap scan should be a candidate given the
> `tags &&` predicate (and indeed if I switch to a privileged role, the
> advice matches successfully without any policies in the mix), but gdb
> shows the walker bouncing out of pgpa_walker_contains_scan without any
> candidate scans for the BITMAP_HEAP_SCAN strategy.

I can understand why it seems that way, but when I try setting
enable_seqscan=false instead of using pg_plan_advice, I get exactly
the same result. I think this is actually a great example both of why
this is actually a very powerful tool and also why it has the
potential to be really confusing. The power comes from the fact that
you can find out whether the planner thinks that the thing you want to
do is even possible. In this case, that's easy anyway because the
example is simple enough, but sometimes you can't set
enable_seqscan=false or similar because it would change too many other
things in the plan at that same time and you wouldn't be able to
compare. In those situations, this figures to be useful. However, all
this can do is tell you that the answer to the question "is this a
possible plan shape?" is "no". It cannot tell you why, and you may
easily find the result counterintuitive.

And honestly, this is one of the things I'm worried about if we go
forward with this, that we'll get a ton of people who think it doesn't
work because it doesn't force the planner to do things which the
planner rejects on non-cost considerations. We're going to need really
good documentation to explain to people that if you use this to try to
force a plan and you can't, that's not a bug, that's the planner
telling you that that plan shape is not able to be considered for some
reason. That won't keep people from complaining about things that
aren't really bugs, but at least it will mean that there's a link we
can give them to explain why the way they're thinking about it is
incorrect. However, that will just beg the next question of WHY the
planner doesn't think a certain plan can be considered, and honestly,
I've found over the years that I often need to resort to the source
code to answer those kinds of questions. People who are not good at
reading C source code are not going to like that answer very much, but
I still think it's better if they know THAT the planner thinks the
plan shape is impossible even if we can't tell them WHY the planner
thinks that the plan shape is impossible. We probably will want to
document at least some of the common reasons why this happens, to cut
down on getting the same questions over and over again.

In this particular case, I think the problem is that the user-supplied
qual item.tags @> ARRAY[id] is not leakproof and therefore must be
tested after the security qual. There's no way to use a Bitmap Heap
Scan without reversing the order of those tests.

> I do want to avoid getting bikesheddy about the advice language so I'll
> forbear from syntax discussion, but one design thought with lower-level
> implications did occur to me as I was playing with this: it might be
> useful in some situations to influence the planner _away_ from known
> worse paths while leaving it room to decide on the best other option. I
> think the work you did in path management should make this pretty
> straightforward for join and scan strategies, since it looks like you've
> basically made the enable_* gucs a runtime-configurable bitmask (which
> seems like a perfectly reasonable approach to my "have done some source
> diving but not an internals hacker" eyes), and could disable one as
> easily as forcing one.

I mostly agree. Saying not to use a sequential scan on a certain
table, or not to use a particular index, or not to use a particular
join method seem like things that would be potentially useful, and
they would be straightforward generalizations of what the code already
does. For me, that would principally be a way to understand better why
the planner chose what it did. I often wonder what the planner's
second choice would have been, but I don't just want the plan with the
second-cheapest overall cost, because that will be something just
trivially different. I want the cheapest plan that excludes some key
element of the current plan, so I can see a meaningfully different
alternative.

That said, I don't see this being a general thing that would make
sense across all of the tags that pg_plan_advice supports. For
example, NO_JOIN_ORDER() sounds hard to implement and largely useless.

The main reason I haven't done this is that I want to keep the focus
on plan stability, or said differently, on things that can properly
round-trip. You should be able to run a query with EXPLAIN
(PLAN_ADVICE), then set pg_plan_advice.advice to the resulting string,
rerun the query, and get the same plan with all of the advice
successfully matching. Since EXPLAIN (PLAN_ADVICE) would never emit
these proposed negative tags, we'd need to think a little bit harder
about how that stuff should be tested. That's not necessarily a big
deal or anything, but I didn't think it was an essential element of
the initial scope, so I left it out. I'm happy to add it in at some
point, or for someone else to do so, but not until this much is
working well.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg_plan_advice

Dian Fay <di@nmfay.com> — 2025-11-30T03:16:44Z

On Mon Nov 24, 2025 at 11:14 AM EST, Robert Haas wrote:
> On Sat, Nov 22, 2025 at 7:43 PM Dian Fay <di@nmfay.com> wrote:
>> Since the policies don't contain any execution boundaries, all the quals
>> should be going into a single bucket for planning if I understand the
>> process correctly. The bitmap heap scan should be a candidate given the
>> `tags &&` predicate (and indeed if I switch to a privileged role, the
>> advice matches successfully without any policies in the mix), but gdb
>> shows the walker bouncing out of pgpa_walker_contains_scan without any
>> candidate scans for the BITMAP_HEAP_SCAN strategy.
>
> In this particular case, I think the problem is that the user-supplied
> qual item.tags @> ARRAY[id] is not leakproof and therefore must be
> tested after the security qual. There's no way to use a Bitmap Heap
> Scan without reversing the order of those tests.

Right, I keep forgetting the functions underneath those array operators
aren't leakproof. Thanks for digging.

> And honestly, this is one of the things I'm worried about if we go
> forward with this, that we'll get a ton of people who think it doesn't
> work because it doesn't force the planner to do things which the
> planner rejects on non-cost considerations. We're going to need really
> good documentation to explain to people that if you use this to try to
> force a plan and you can't, that's not a bug, that's the planner
> telling you that that plan shape is not able to be considered for some
> reason.

Once we're closer to consensus on pg_plan_advice or something like it
landing, I'm interested in helping out on this end of things!

Re: pg_plan_advice

Robert Haas <robertmhaas@gmail.com> — 2025-12-05T19:57:09Z

On Sat, Nov 29, 2025 at 10:17 PM Dian Fay <di@nmfay.com> wrote:
> Once we're closer to consensus on pg_plan_advice or something like it
> landing, I'm interested in helping out on this end of things!

Thanks!

014f9a831a320666bf2195949f41710f970c54ad removes the need for what was
previously 0004, so here is a new patch series with that dropped, to
avoid confusing cfbot or human reviewers.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg_plan_advice

Greg Burd <greg@burd.me> — 2025-12-08T20:39:25Z

On Fri, Dec 5, 2025, at 2:57 PM, Robert Haas wrote:
> 014f9a831a320666bf2195949f41710f970c54ad removes the need for what was
> previously 0004, so here is a new patch series with that dropped, to
> avoid confusing cfbot or human reviewers.
>
> -- 
> Robert Haas
> EDB: http://www.enterprisedb.com

Hey Robert,

Thanks for working on this!  I think the idea has merit, I hope it lands sometime soon.

I've worked on extending PostgreSQL's planner for a JSONB-like document system. The new query shapes frequently caused mysterious planner issues. Having the ability to:

1. Dump detailed planner decisions for comparison during development
2. Record planner choices from customer databases to reproduce their issues
3. Apply fixed plans to specific queries as a quick customer workaround while addressing root causes in later releases

...would have been invaluable.

Yes, there's danger here ("with great power comes great responsibility"), but I see this as providing more information to make better decisions when working with the black art of planner logic.

ORM Query Variation Challenge

Jakob's point about "crazy ORMs" is important. ORMs generate queries with minor variations that should ideally match the same plan advice. I need to study the plan advice matching logic more deeply to understand how it handles query variations.

This reminded me of Erlang/Elixir's "parse transforms" - compiled code forms an AST that registered transforms can modify via pattern matching. The concept might be relevant here: pattern-matching portions of query ASTs to apply advice despite syntactic variations. I'd need to think more about whether this intersects well with the current design or if it's impractical, but it's worth exploring.

> Attachments:
> * v5-0001-Store-information-about-range-table-flattening-in.patch

contrib/pg_overexplain/pg_overexplain.c

+               /* Advance to next SubRTInfo, if it's time. */
+               if (lc_subrtinfo != NULL)
+               {
+                       next_rtinfo = lfirst(lc_subrtinfo);
+                       if (rti > next_rtinfo->rtoffset)

Should the test be >= not >? Unless I am I reading this wrong, when rti == rtoffset, that's the first entry of the new subplan's range table. That would mean that the current logic skips displaying the subplan name for the first RTE of each subplan.

in src/include/nodes/plannodes.h there is:

+typedef struct SubPlanRTInfo
+{
+       NodeTag         type;
+       const char *plan_name;
+       Index           rtoffset;
+       bool            dummy;
+} SubPlanRTInfo;

This is where I get confused, if rtoffset is an Index, then the comparison (above) in pg_overexplain uses rti > next_rtinfo->rtoffset where rti starts at 1. If rtoffset is 0 for the first subplan, the logic might be off-by-one, no?

> This commit teaches pg_overexplain'e RANGE_TABLE option to make use
Minor nit in the commit message, "pg_overexplain'e" should be "pg_overexplain's"

> * v5-0002-Store-information-about-elided-nodes-in-the-final.patch

+/*
+ * Record some details about a node removed from the plan during setrefs
+ * procesing, for the benefit of code trying to reconstruct planner decisions
+ * from examination of the final plan tree.
+ */

Nit, "procesing" should be "processing"

> * v5-0003-Store-information-about-Append-node-consolidation.patch

src/backend/optimizer/path/allpaths.c

/* Now consider each interesting sort ordering */
foreach(lcp, all_child_pathkeys)
{
        List       *subpaths = NIL;
        bool            subpaths_valid = true;
+       List       *subpath_cars = NIL;
        List       *startup_subpaths = NIL;
        bool            startup_subpaths_valid = true;
+       List       *startup_subpath_cars = NIL;
        List       *partial_subpaths = NIL;
+       List       *partial_subpath_cars = NIL;
        List       *pa_partial_subpaths = NIL;
        List       *pa_nonpartial_subpaths = NIL;
+       List       *pa_subpath_cars = NIL;

I find "cars" a bit cryptic (albeit clever), I think I've decoded it properly and it stands for "child_append_relid_sets", correct?  Could you add a comment or use a clearer name like subpath_child_relids or consolidated_relid_sets?

+accumulate_append_subpath(Path *path, List **subpaths, List **special_subpaths,
+                                                 List **child_append_relid_sets)
 {
        if (IsA(path, AppendPath))
        {
@@ -2219,6 +2256,8 @@ accumulate_append_subpath(Path *path, List **subpaths, List **special_subpaths)
                if (!apath->path.parallel_aware || apath->first_partial_path == 0)
                {
                        *subpaths = list_concat(*subpaths, apath->subpaths);
+                       *child_append_relid_sets =
+                               lappend(*child_append_relid_sets, path->parent->relids);

Is it possible that when pulling up multiple subpaths from an AppendPath, only ONE relid set is added to child_append_relid_sets, but MULTIPLE paths are added to subpaths?  If so, that would break the correspondence between the lists which would be bad, right?

src/include/nodes/pathnodes.h
+ * Whenever accumulate_append_subpath() allows us to consolidate multiple
+ * levels of Append paths are consolidated down to one, we store the RTI
+ * sets for the omitted paths in child_append_relid_sets. This is not necessary
+ * for planning or execution; we do it for the benefit of code that wants
+ * to inspect the final plan and understand how it came to be.

Minor: "paths are consolidated" is redundant, should be "paths consolidated" or "allows us to consolidate".

> * v5-0004-Allow-for-plugin-control-over-path-generation-str.patch

src/backend/optimizer/path/costsize.c
+       else
+               enable_mask |= PGS_CONSIDER_NONPARTIAL;

-       path->disabled_nodes = enable_seqscan ? 0 : 1;
+       path->disabled_nodes =
+               (baserel->pgs_mask & enable_mask) == enable_mask ? 0 : 1;

When parallel_workers > 0 the path is partial and doesn't need PGS_CONSIDER_NONPARTIAL. But if parallel_workers == 0, it's non-partial and DOES need it, right?  Would this mean that non-partial paths can be disabled even when the scan type itself (e.g., PGS_SEQSCAN) is enabled?  Intentional?

> * v5-0005-WIP-Add-pg_plan_advice-contrib-module.patch

It seems this is still WIP with a solid start, I'm not going to dig too much into it. :)

Keep it up, best.

-greg

Re: pg_plan_advice

Jacob Champion <jacob.champion@enterprisedb.com> — 2025-12-09T01:18:50Z

Hello,

On Fri, Dec 5, 2025 at 11:57 AM Robert Haas <robertmhaas@gmail.com> wrote:
> 014f9a831a320666bf2195949f41710f970c54ad removes the need for what was
> previously 0004, so here is a new patch series with that dropped, to
> avoid confusing cfbot or human reviewers.

I really like this idea! Telling the planner, "if you need to make a
decision for [this thing], choose [this way]," seems to be a really
nice way of sidestepping many of the concerns with "user control".

I've started an attempt to throw a fuzzer at this, because I'm pretty
useless when it comes to planner/optimizer review. I don't really know
what the overall fuzzing strategy is going to be, given the multiple
complicated inputs that have to be constructed and somehow correlated
with each other, but I'll try to start small and expand:

a) fuzz the parser first, because it's easy and we can get interesting inputs
b) fuzz the AST utilities, seeded with "successful" corpus members from a)
c) stare really hard at the corpus of b) and figure out how to
usefully mutate a PlannedStmt with it
d) use c) to fuzz pgpa_plan_walker, then pgpa_output_advice, then...?

I'm in the middle of an implementation of b) now, and it noticed the
following code (which probably bodes well for the fuzzer itself!):

>        if (rid->partnsp == NULL)
>            result = psprintf("%s/%s", result,
>                              quote_identifier(rid->partnsp));

I assume that should be quote_identifier(rid->partrel)?

= Other Notes =

GCC 11 complains about the following code in pgpa_collect_advice():

>        dsa_area   *area = pg_plan_advice_dsa_area();
>        dsa_pointer ca_pointer;
>
>        pgpa_make_collected_advice(userid, dbid, queryId, now,
>                                   query_string, advice_string, area,
>                                   &ca_pointer);
>        pgpa_store_shared_advice(ca_pointer);

It doesn't know that area is guaranteed to be non-NULL, so it can't
prove that ca_pointer is initialized.

(GCC also complains about unique_nonjoin_rtekind() not initializing
the rtekind, but I think that's because of a bug [1].)

--Jacob

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107838

Re: pg_plan_advice

Robert Haas <robertmhaas@gmail.com> — 2025-12-09T19:34:43Z

On Mon, Dec 8, 2025 at 3:39 PM Greg Burd <greg@burd.me> wrote:
> Thanks for working on this!  I think the idea has merit, I hope it lands sometime soon.

Thanks. I think it needs a good deal more review first, but I
appreciate the support.

> > Attachments:
> > * v5-0001-Store-information-about-range-table-flattening-in.patch
>
> contrib/pg_overexplain/pg_overexplain.c
>
> +               /* Advance to next SubRTInfo, if it's time. */
> +               if (lc_subrtinfo != NULL)
> +               {
> +                       next_rtinfo = lfirst(lc_subrtinfo);
> +                       if (rti > next_rtinfo->rtoffset)
>
> Should the test be >= not >? Unless I am I reading this wrong, when rti == rtoffset, that's the first entry of the new subplan's range table. That would mean that the current logic skips displaying the subplan name for the first RTE of each subplan.

I don't think so. I think I actually had it that way at one point, and
I believe I found that it was wrong. RTIs are 1-based, so the smallest
per-subquery RTI is 1. rtoffset is the amount that must be added to
the per-subquery RTI to get a "flat" RTI that can be used to index
into the final range table. But if you find that theoretical argument
unconvincing, by all means please test it and see what happens!

> > This commit teaches pg_overexplain'e RANGE_TABLE option to make use
> Minor nit in the commit message, "pg_overexplain'e" should be "pg_overexplain's"

Thanks, fixed in my local branch.

> > * v5-0002-Store-information-about-elided-nodes-in-the-final.patch
>
> +/*
> + * Record some details about a node removed from the plan during setrefs
> + * procesing, for the benefit of code trying to reconstruct planner decisions
> + * from examination of the final plan tree.
> + */
>
> Nit, "procesing" should be "processing"

Thanks, fixed in my local branch.

> > * v5-0003-Store-information-about-Append-node-consolidation.patch
>
> src/backend/optimizer/path/allpaths.c
>
> /* Now consider each interesting sort ordering */
> foreach(lcp, all_child_pathkeys)
> {
>         List       *subpaths = NIL;
>         bool            subpaths_valid = true;
> +       List       *subpath_cars = NIL;
>         List       *startup_subpaths = NIL;
>         bool            startup_subpaths_valid = true;
> +       List       *startup_subpath_cars = NIL;
>         List       *partial_subpaths = NIL;
> +       List       *partial_subpath_cars = NIL;
>         List       *pa_partial_subpaths = NIL;
>         List       *pa_nonpartial_subpaths = NIL;
> +       List       *pa_subpath_cars = NIL;
>
> I find "cars" a bit cryptic (albeit clever), I think I've decoded it properly and it stands for "child_append_relid_sets", correct?  Could you add a comment or use a clearer name like subpath_child_relids or consolidated_relid_sets?

I certainly admit that this is a bit too clever. I am not entirely
sure how to make it less clever. There needs to be a
child-append-relid-sets list corresponding to every current and future
subpath list, and the names of some of those subpath lists are already
quite long, so whatever naming convention we choose for the "cars"
lists had better not add too much more length to the variable name. I
felt like someone looking at this might initially be confused by what
"cars" meant, but then I thought that they would probably look at how
the variable was used and see that it was for example being passed as
the second argument to get_singleton_append_subpath(), which is named
child_append_relid_sets, or being passed to create_append_path or
create_merge_append_path, which also use that naming. I figured that
this would clear up the confusion pretty quickly. I could certainly
add a comment above this block of variable assignments saying
something like "for each list of paths, we must also maintain a list
of child append relid sets, etc. etc." but I worried that this would
create as much confusion as it solved, i.e. somebody reading the code
would be going: why is this comment here? Is it trying to tell me that
there's something weirder going on than what is anyway obvious?

If I get more opinions that some clarification is needed here, I'm
happy to change it, especially if those opinions agree with each other
on exactly what to change, but I think for now I'll leave it as it is.

> +accumulate_append_subpath(Path *path, List **subpaths, List **special_subpaths,
> +                                                 List **child_append_relid_sets)
>  {
>         if (IsA(path, AppendPath))
>         {
> @@ -2219,6 +2256,8 @@ accumulate_append_subpath(Path *path, List **subpaths, List **special_subpaths)
>                 if (!apath->path.parallel_aware || apath->first_partial_path == 0)
>                 {
>                         *subpaths = list_concat(*subpaths, apath->subpaths);
> +                       *child_append_relid_sets =
> +                               lappend(*child_append_relid_sets, path->parent->relids);
>
> Is it possible that when pulling up multiple subpaths from an AppendPath, only ONE relid set is added to child_append_relid_sets, but MULTIPLE paths are added to subpaths?  If so, that would break the correspondence between the lists which would be bad, right?

That would indeed be bad, but I'm not clear on how you think it could
happen. Can you clarify?

> src/include/nodes/pathnodes.h
> + * Whenever accumulate_append_subpath() allows us to consolidate multiple
> + * levels of Append paths are consolidated down to one, we store the RTI
> + * sets for the omitted paths in child_append_relid_sets. This is not necessary
> + * for planning or execution; we do it for the benefit of code that wants
> + * to inspect the final plan and understand how it came to be.
>
> Minor: "paths are consolidated" is redundant, should be "paths consolidated" or "allows us to consolidate".

Thanks, fixed in my local branch.

> > * v5-0004-Allow-for-plugin-control-over-path-generation-str.patch
>
> src/backend/optimizer/path/costsize.c
> +       else
> +               enable_mask |= PGS_CONSIDER_NONPARTIAL;
>
> -       path->disabled_nodes = enable_seqscan ? 0 : 1;
> +       path->disabled_nodes =
> +               (baserel->pgs_mask & enable_mask) == enable_mask ? 0 : 1;
>
> When parallel_workers > 0 the path is partial and doesn't need PGS_CONSIDER_NONPARTIAL. But if parallel_workers == 0, it's non-partial and DOES need it, right?  Would this mean that non-partial paths can be disabled even when the scan type itself (e.g., PGS_SEQSCAN) is enabled?  Intentional?

See this comment:

 * Finally, unsetting PGS_CONSIDER_NONPARTIAL disables all non-partial paths
 * except those that use Gather or Gather Merge. In most other cases, a
 * plugin can nudge the planner toward a particular strategy by disabling
 * all of the others, but that doesn't work here: unsetting PGS_SEQSCAN,
 * for instance, would disable both partial and non-partial sequential scans.

> It seems this is still WIP with a solid start, I'm not going to dig too much into it. :)
>
> Keep it up, best.

Thanks for the review so far!

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg_plan_advice

Robert Haas <robertmhaas@gmail.com> — 2025-12-09T19:45:47Z

On Mon, Dec 8, 2025 at 8:19 PM Jacob Champion
<jacob.champion@enterprisedb.com> wrote:
> I really like this idea! Telling the planner, "if you need to make a
> decision for [this thing], choose [this way]," seems to be a really
> nice way of sidestepping many of the concerns with "user control".
>
> I've started an attempt to throw a fuzzer at this, because I'm pretty
> useless when it comes to planner/optimizer review. I don't really know
> what the overall fuzzing strategy is going to be, given the multiple
> complicated inputs that have to be constructed and somehow correlated
> with each other, but I'll try to start small and expand:
>
> a) fuzz the parser first, because it's easy and we can get interesting inputs
> b) fuzz the AST utilities, seeded with "successful" corpus members from a)
> c) stare really hard at the corpus of b) and figure out how to
> usefully mutate a PlannedStmt with it
> d) use c) to fuzz pgpa_plan_walker, then pgpa_output_advice, then...?

Cool. I'm bad at fuzzing, but I think fuzzing by someone who is good
at it is very promising for this kind of patch.

> I'm in the middle of an implementation of b) now, and it noticed the
> following code (which probably bodes well for the fuzzer itself!):
>
> >        if (rid->partnsp == NULL)
> >            result = psprintf("%s/%s", result,
> >                              quote_identifier(rid->partnsp));
>
> I assume that should be quote_identifier(rid->partrel)?

Yes, thanks. Fixed locally. By the way, if your fuzzer can also
produces some things to add contrib/pg_plan_advice/sql for cases like
this, that would be quite helpful. Ideally I would have caught this
with a manually-written test case, but obviously that didn't happen.

> = Other Notes =
>
> GCC 11 complains about the following code in pgpa_collect_advice():
>
> >        dsa_area   *area = pg_plan_advice_dsa_area();
> >        dsa_pointer ca_pointer;
> >
> >        pgpa_make_collected_advice(userid, dbid, queryId, now,
> >                                   query_string, advice_string, area,
> >                                   &ca_pointer);
> >        pgpa_store_shared_advice(ca_pointer);
>
> It doesn't know that area is guaranteed to be non-NULL, so it can't
> prove that ca_pointer is initialized.

I don't know what to do about that. I can understand why it might be
unable to prove that, but I don't see an obvious way to change the
code that would make life easier. I could add Assert(area != NULL)
before the call to pgpa_make_collected_advice() if that helps.

> (GCC also complains about unique_nonjoin_rtekind() not initializing
> the rtekind, but I think that's because of a bug [1].)

This one could be fixed with a dummy initialization, if needed.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg_plan_advice

amit <amitlangote09@gmail.com> — 2025-12-10T11:20:38Z

Hi Robert,

On Thu, Oct 30, 2025 at 11:00 PM Robert Haas <robertmhaas@gmail.com> wrote:
> As I have mentioned on previous threads, for the past while I have
> been working on planner extensibility. I've posted some extensibility
> patches previously, and got a few of them committed in
> Sepember/October with Tom's help, but I think the time has come a
> patch which actually makes use of that infrastructure as well as some
> further infrastructure that I'm also including in this posting.[1] The
> final patch in this series adds a new contrib module called
> pg_plan_advice. Very briefly, what pg_plan_advice knows how to do is
> process a plan and emits a (potentially long) long text string in a
> special-purpose mini-language that describes a bunch of key planning
> decisions, such as the join order, selected join methods, types of
> scans used to access individual tables, and where and how
> partitionwise join and parallelism were used. You can then set
> pg_plan_advice.advice to that string to get a future attempt to plan
> the same query to reproduce those decisions, or (maybe a better idea)
> you can trim that string down to constrain some decisions (e.g. the
> join order) but not others (e.g. the join methods), or (if you want to
> make your life more exciting) you can edit that advice string and
> thereby attempt to coerce the planner into planning the query the way
> you think best. There is a README that explains the design philosophy
> and thinking in a lot more detail, which is a good place to start if
> you're curious, and I implore you to read it if you're interested, and
> *especially* if you're thinking of flaming me.

Thanks for posting this.  Looks very interesting to me.

These are just high-level comments after browsing the patches and
reading some bits like pgpa_identifier to get myself familiarized with
the project.  I like that the key concept here is plan stability
rather than plan control, because that framing makes it easier to
treat this as infrastructure instead of policy.

> I want to mention that, beyond the fact that I'm sure some people will
> want to use something like this (with more feature and a lot fewer
> bugs) in production, it seems to be super-useful for testing. We have
> a lot of regression test cases that try to coerce the planner to do a
> particular thing by manipulating enable_* GUCs, and I've spent a lot
> of time trying to do similar things by hand, either for regression
> test coverage or just private testing. This facility, even with all of
> the bugs and limitations that it currently has, is exponentially more
> powerful than frobbing enable_* GUCs. Once you get the hang of the
> advice mini-language, you can very quickly experiment with all sorts
> of plan shapes in ways that are currently very hard to do, and thereby
> find out how expensive the planner thinks those things are and which
> ones it thinks are even legal. So I see this as not only something
> that people might find useful for in production deployments, but also
> something that can potentially be really useful to advance PostgreSQL
> development.

+1, the testing benefits make this worthwhile.

> Which brings me to the question of where this code ought to go if it
> goes anywhere at all. I decided to propose pg_plan_advice as a contrib
> module rather than a part of core because I had to make a WHOLE lot of
> opinionated design decisions just to get to the point of having
> something that I could post and hopefully get feedback on. I figured
> that all of those opinionated decisions would be a bit less
> unpalatable if they were mostly encapsulated in a contrib module, with
> the potential for some future patch author to write a different
> contrib module that adopted different solutions to all of those
> problems. But what I've also come to realize is that there's so much
> infrastructure here that leaving the next person to reinvent it may
> not be all that appealing. Query jumbling is a previous case where we
> initially thought that different people might want to do different
> things, but eventually realized that most people really just wanted
> some solution that they didn't have to think too hard about. Likewise,
> in this patch, the relation identifier system described in the README
> is the only thing of its kind, to my knowledge, and any system that
> wants to accomplish something similar to what pg_plan_advice does
> would need a system like that. pg_hint_plan doesn't have something
> like that, because pg_hint_plan is just trying to do hints. This is
> trying to do round-trip-safe plan stability, where the system will
> tell you how to refer unambiguously to a certain part of the query in
> a way that will work correctly on every single query regardless of how
> it's structured or how many times it refers to the same tables or to
> different tables using the same aliases. If we say that we're never
> going to put any of that infrastructure in core, then anyone who wants
> to write a module to control the planner is going to need to start by
> either (a) reinventing something similar, (b) cloning all the relevant
> code, or (c) just giving up on the idea of unambiguous references to
> parts of a query. None of those seem like great options, so now I'm
> less sure whether contrib is actually the right place for this code,
> but that's where I have put it for now. Feedback welcome, on this and
> everything else.

On the relation identifier system: IMHO this part doesn't seem as
opinionated as the advice mini-language. The requirements pretty much
dictate the design -- you need alias names and occurrence counters to
handle self-joins, partition fields for partitioned tables, and a
string representation to survive dump/restore. There doesn't seem to
be much flexibility in that.

Given that, it seems more practical to put this in core from the
start. Extensions that might want to build plan-advice-like
functionality shouldn’t have to clone this logic and wait another
release for something that’s already well-defined and deterministic.
The mini-language is opinionated and belongs in contrib, but the
identifier infrastructure just solves a fundamental problem cleanly.

On the infrastructure patches (0001-0005): these look sensible. The
range table flattening info, elided node tracking, and append node
consolidation preserve information that's currently lost -- there's
some additional overhead to track this, but it's fixed per-relation
per-subquery, which seems reasonable.  The path generation hooks
(0005) are a clear improvement: moving from global enable_* GUCs to
per-RelOptInfo pgs_mask gives extensions the granularity they need for
relation-specific and join-specific decisions. Yes, you need C code to
use them, but you'd need to write C code to do something of value in
this area anyway, and the hooks give you control that GUCs can't
provide.

Overall, I'm supportive of getting these committed once they're ready.
contrib/pg_plan_advice is a compelling proof-of-concept for why these
hooks are needed.

I'll try to post more specific comments once I've read this some more.

--
Thanks, Amit Langote

Re: pg_plan_advice

Jakub Wartak <jakub.wartak@enterprisedb.com> — 2025-12-10T11:43:39Z

On Fri, Dec 5, 2025 at 8:57 PM Robert Haas <robertmhaas@gmail.com> wrote:
[..]
> 014f9a831a320666bf2195949f41710f970c54ad removes the need for what was
> previously 0004, so here is a new patch series with that dropped, to
> avoid confusing cfbot or human reviewers.

Quick-question regarding cross-interactions of the extensions: would
it be possible for auto_explain to have something like
auto_explain.log_custom_options='PLAN_ADVICES' so that it could be
dumping the advice of the queries involved . I can see there is
ApplyExtensionExplainOption() and that would have to probably be used
by auto_explain(?) Or is there any other better way or perhaps it
somehow is against some design or it's just outside of initial scope?
This would solve two problems:
a) sometimes explaining manually (psql) is simply not realistic as it
is being run by app only
b) auto_explain could log nested queries and could print plan advices
along the way, which can be very painful process otherwise
(reverse-engineering how the optimizer would name things  in more
complex queries run from inside PLPGSQL functions)

BTW, some feedback: the plan advices (plan fixing) seems to work fine
for nested queries inside PLPGSQL, and also I've discovered (?) that
one can do even today with patchset the following:
   alter function blah(bigint) set pg_plan_advice.advice =
'NESTED_LOOP_MATERIALIZE(b)';
which seems to be pretty cool, because it allows more targeted fixes
without even having capability of fixing plans for specific query_id
(as discussed earlier).

For the generation part, the only remaining thing is how it integrates
with partitions (especially the ones being dynamically created/dropped
over time). Right now one needs to keep the advice(s) in sync after
altering the partitions, but it could be expected that some form of
regexp/partition-templating would be built into pg_plan_advices
instead. Anyway, I think this one should go into documentation just as
known-limitations for now.

While scratching my head on how to prove that this is not crashing
I've also checked below ones (TLDR all ok):
1. PG_TEST_INITDB_EXTRA_OPTS="-c
shared_preload_libraries='pg_plan_advice'"  meson test  # It was clean
2. PG_TEST_INITDB_EXTRA_OPTS="-c
shared_preload_libraries='pg_plan_advice'" PGOPTIONS="-c
pg_plan_advice.advice=NESTED_LOOP_MATERIALIZE(certainlynotused)" meson
test # This had several failures, but all is OK: it's just some of
them had to additional (expected) text inside regression.diffs:
NESTED_LOOP_MATERIALIZE(certainlynotused) /* not matched */
3. PG_TEST_INITDB_EXTRA_OPTS="-c
shared_preload_libraries='pg_plan_advice' -c
pg_plan_advice.shared_collection_limit=42"  meson test # It was clean
too

-J.

Re: pg_plan_advice

Robert Haas <robertmhaas@gmail.com> — 2025-12-10T13:33:11Z

On Wed, Dec 10, 2025 at 6:43 AM Jakub Wartak
<jakub.wartak@enterprisedb.com> wrote:
> Quick-question regarding cross-interactions of the extensions: would
> it be possible for auto_explain to have something like
> auto_explain.log_custom_options='PLAN_ADVICES' so that it could be
> dumping the advice of the queries involved

Yes, I had the same idea. I think the tricky part here is that an
option can have an argument. Most options will probably have Boolean
arguments, but there are existing in-core counterexamples, such as
FORMAT. We should try to figure out the GUC in such a way that it can
be used either to set a Boolean option by just specifying it, or that
it can be used to set an option to a value by writing both. Maybe it's
fine if the GUC value is just a comma-separated list of entries, and
each entry can either be an option name or an option name followed by
a space followed by an option value, i.e. if FORMAT were custom, then
you could write auto_explain.log_custom_options='format xml,
plan_advice' or auto_explain.log_custom_options='plan_advice true,
range_table false' and have sensible things happen. In fact, very
possibly the GUC should just accept any options whether in-core or
out-of-core and not distinguish, so it would be more like
auto_explain.log_options.

> BTW, some feedback: the plan advices (plan fixing) seems to work fine
> for nested queries inside PLPGSQL, and also I've discovered (?) that
> one can do even today with patchset the following:
>    alter function blah(bigint) set pg_plan_advice.advice =
> 'NESTED_LOOP_MATERIALIZE(b)';
> which seems to be pretty cool, because it allows more targeted fixes
> without even having capability of fixing plans for specific query_id
> (as discussed earlier).

Yes, this is a big advantage of reusing the GUC machinery for this
purpose (but see the thread on "[PATCH] Allow complex data for GUC
extra").

> For the generation part, the only remaining thing is how it integrates
> with partitions (especially the ones being dynamically created/dropped
> over time). Right now one needs to keep the advice(s) in sync after
> altering the partitions, but it could be expected that some form of
> regexp/partition-templating would be built into pg_plan_advices
> instead. Anyway, I think this one should go into documentation just as
> known-limitations for now.

Right. I don't think trying to address this at this stage makes sense.
To maintain my sanity, I want to focus for now only on things that
round-trip: that is, we can generate it, and then we can accept that
same stuff. If we're using a parallel plan for every partition e.g.
they are all sequential scans or all index scans, we could generate
SEQ_SCAN(foo/*) or similar and then we could accept that. But figuring
that out would take a bunch of additional infrastructure that I don't
have the time or energy to create right this minute, and I don't see
it as anywhere close to essential for v1. Some other problems here:

1. What happens when a small number of partitions are different? The
code puts quite a bit of energy into detecting conflicting advice, and
honestly probably should put even more, and you might say, well, if
there's just one partition that used an index scan, then I still want
the advice to read SEQ_SCAN(foo/*) INDEX_SCAN(foo/foo23 foo23_a_idx)
and not signal a conflict, but that's slightly unprincipled.

2. INDEX_SCAN() specifications and similar will tend not to be
different for every partition because the index names will be
different for every partition. You might want something that says "for
each partition of foo, use the index on that partition that is a child
of this index on the parent".

Long run, there's a lot of things that can be added to this to make it
more concise (and more expressive, too). Another similar idea is to
have something like NO_GATHER_UNLESS_I_SAID_SO() so that a
non-parallel query doesn't have to do NO_GATHER(every single relation
including all the partitions). I'm pretty sure this is a valuable
idea, but, again, it's not essential for v1.

> While scratching my head on how to prove that this is not crashing
> I've also checked below ones (TLDR all ok):
> 1. PG_TEST_INITDB_EXTRA_OPTS="-c
> shared_preload_libraries='pg_plan_advice'"  meson test  # It was clean
> 2. PG_TEST_INITDB_EXTRA_OPTS="-c
> shared_preload_libraries='pg_plan_advice'" PGOPTIONS="-c
> pg_plan_advice.advice=NESTED_LOOP_MATERIALIZE(certainlynotused)" meson
> test # This had several failures, but all is OK: it's just some of
> them had to additional (expected) text inside regression.diffs:
> NESTED_LOOP_MATERIALIZE(certainlynotused) /* not matched */
> 3. PG_TEST_INITDB_EXTRA_OPTS="-c
> shared_preload_libraries='pg_plan_advice' -c
> pg_plan_advice.shared_collection_limit=42"  meson test # It was clean
> too

You can set pg_plan_advice.always_explain_supplied_advice=false to
clean up some of the noise here. This kind of testing is why I
invented that option. I think that in production, we REALLY REALLY
want any supplied advice to show up in the EXPLAIN plan even if the
user did not specify the PLAN_ADVICE option to EXPLAIN. Otherwise,
understanding what is going on with an EXPLAIN plan that a
hypothetical customer sends to a hypothetical PostgreSQL expert who
has to support said hypothetical customer will be a miserable
experience. But for testing purposes, it's nice to be able to shut it
off so you don't get random regression diffs.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg_plan_advice

Robert Haas <robertmhaas@gmail.com> — 2025-12-10T14:54:13Z

On Wed, Dec 10, 2025 at 6:20 AM Amit Langote <amitlangote09@gmail.com> wrote:
> These are just high-level comments after browsing the patches and
> reading some bits like pgpa_identifier to get myself familiarized with
> the project.  I like that the key concept here is plan stability
> rather than plan control, because that framing makes it easier to
> treat this as infrastructure instead of policy.

Thanks, I agree. I'm sure people will use this for plan control, but
if you start with that, then it's really unclear what things you
should allow to be controlled and what things not. Defining the focus
as plan stability makes round-trip safety a priority and the scope of
what you can request is what the planner could have generated had the
costing come out just so. There's still some definitional questions at
the margin, but IMHO it's much less fuzzy.

> On the relation identifier system: IMHO this part doesn't seem as
> opinionated as the advice mini-language. The requirements pretty much
> dictate the design -- you need alias names and occurrence counters to
> handle self-joins, partition fields for partitioned tables, and a
> string representation to survive dump/restore. There doesn't seem to
> be much flexibility in that.

Right. There's some flexibility. For instance, you could handle
partitions using occurrence numbers, which would actually save a bunch
of code, but that seems obviously worse in terms of user experience.
Also, you could if you wanted key it off of the name of the table
rather than the relation alias used for the table. I think that's also
worse but possibly it's debatable. You could change the order of the
pieces in the representation; e.g. maybe plan_name should come first
rather than last; or you could change the separator characters. But,
honestly, none of that strikes me as sufficient grounds to want
multiple implementations. If the choices I've made don't seem good to
other people, then we should just change them and hopefully find
something everybody can live with. It's a bit like the way that
extension SQL scripts use "--" as a separator: maybe not everybody
agrees that this is the absolutely most elegant choice, but nobody's
proposing a a second version of the extension mechanism just to do
something different.

> Given that, it seems more practical to put this in core from the
> start. Extensions that might want to build plan-advice-like
> functionality shouldn’t have to clone this logic and wait another
> release for something that’s already well-defined and deterministic.
> The mini-language is opinionated and belongs in contrib, but the
> identifier infrastructure just solves a fundamental problem cleanly.

It's not quite as easy to make a sharp distinction between these
things as someone might hope. Note that the lexer and parser handle
the whole mini-language, which includes parsing the relation
identifiers. That doesn't of course mean that the code to *generate*
relation identifiers couldn't be in core, and I actually had it that
way at one point, but it's not very much code and I wasn't too
impressed with how that turned out. It seemed to couple the core code
to the extension more tightly than necessary for not much real
benefit.

But that's not to say I disagree with you categorically. Suppose we
decided (and I'm not saying we should) to start showing relation
identifiers in EXPLAIN output instead of identifying things in EXPLAIN
output as we do today. Maybe we even decide to show elided subqueries
and similar as first-class parts of the EXPLAIN output, also using
relation identifier syntax. That would be a pretty significant change,
and would destabilize a WHOLE LOT of regression test outputs, but then
relation identifiers become a first-class PostgreSQL concept that
everyone who looks at EXPLAIN output will encounter and, probably,
come to understand. Then obviously the relation identifier generation
code needs to be in core, and that makes total sense because we're
actually using it for something, and arguably we've made life easier
for everyone who wants to use pg_plan_advice in the future because
they're already familiar with the identifier convention. The downside
is everyone has to get used to the new EXPLAIN output even if they
don't care about pg_plan_advice or hate it with a fiery passion.

So my point here is that there are things we can decide to do to make
some or all of this "core," but IMHO it's not just as simple as saying
"this is in, that's out". It's more about deciding what the end state
ought to look like, and how integrated this stuff ought to be into the
fabric of PostgreSQL. I started with the minimal level of integration:
little pieces of core infrastructure, all used by a giant extension.
Now we need to either decide that's where we want to settle, or decide
to push to some greater or lesser degree toward more integration.

> On the infrastructure patches (0001-0005): these look sensible. The
> range table flattening info, elided node tracking, and append node
> consolidation preserve information that's currently lost -- there's
> some additional overhead to track this, but it's fixed per-relation
> per-subquery, which seems reasonable.  The path generation hooks
> (0005) are a clear improvement: moving from global enable_* GUCs to
> per-RelOptInfo pgs_mask gives extensions the granularity they need for
> relation-specific and join-specific decisions. Yes, you need C code to
> use them, but you'd need to write C code to do something of value in
> this area anyway, and the hooks give you control that GUCs can't
> provide.
>
> Overall, I'm supportive of getting these committed once they're ready.
> contrib/pg_plan_advice is a compelling proof-of-concept for why these
> hooks are needed.

Great. I don't think there's anything terribly controversial in
0001-0004. I think the comments and so on might need improving and
there could be little mini-bugs or whatever, but basically I think
they work and I don't anticipate any major problems. However, I'd want
at least one other person to do a detailed review before committing
anything. 0005 might be a little more controversial. There's some
design choices to dislike (though I believe I've made them for good
reason) and there's a question of whether it's as complete as we want.
It might be fine to commit it the way it is and just adjust it later
if we find that something ought to be different, but it's also
possible that we should think harder about some of the choices or hold
off for a bit while other parts of this effort move forward. I'm happy
to hear opinions on the best strategy here.

> I'll try to post more specific comments once I've read this some more.

Thanks for the review so far, and that sounds great!

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg_plan_advice

Corey Huinker <corey.huinker@gmail.com> — 2025-12-10T21:09:24Z

On Wed, Dec 10, 2025 at 9:54 AM Robert Haas <robertmhaas@gmail.com> wrote:

> On Wed, Dec 10, 2025 at 6:20 AM Amit Langote <amitlangote09@gmail.com>
> wrote:
> > These are just high-level comments after browsing the patches and
> > reading some bits like pgpa_identifier to get myself familiarized with
> > the project.  I like that the key concept here is plan stability
> > rather than plan control, because that framing makes it easier to
> > treat this as infrastructure instead of policy.
>
> Thanks, I agree. I'm sure people will use this for plan control, but
> if you start with that, then it's really unclear what things you
> should allow to be controlled and what things not. Defining the focus
> as plan stability makes round-trip safety a priority and the scope of
> what you can request is what the planner could have generated had the
> costing come out just so. There's still some definitional questions at
> the margin, but IMHO it's much less fuzzy.
>

I couldn't have said this any better than Amit did. In my experience, lack
of a plan stability feature is far and away the most cited reason for not
porting to PostgreSQL. They want query plan stability first and foremost.
The amount of plan tweaking they do is actually pretty minimal, once they
get good-enough performance during user acceptance they want to encase
those query plans in amber because that's what the customer signed-off on.
After that, they're happy to scan the performance trendlines, and only make
tweaks when it's worth a change request.

But that's not to say I disagree with you categorically. Suppose we
> decided (and I'm not saying we should) to start showing relation
> identifiers in EXPLAIN output instead of identifying things in EXPLAIN
> output as we do today. Maybe we even decide to show elided subqueries
> and similar as first-class parts of the EXPLAIN output, also using
> relation identifier syntax. That would be a pretty significant change,
> and would destabilize a WHOLE LOT of regression test outputs, but then
> relation identifiers become a first-class PostgreSQL concept that
> everyone who looks at EXPLAIN output will encounter and, probably,
> come to understand.

I think the change would be worth the destabilization, because it makes it
so much easier to talk about complex query plans. Additionally, it would
make it reasonable to programmatically extract portions of a plan, allowing
for much more fine-grained regression tests regarding plans.

Showing the elided subqueries would be a huge benefit, outlining the
benefits that the planner is giving you "for free".

> > On the infrastructure patches (0001-0005): these look sensible. The
> > range table flattening info, elided node tracking, and append node
>

One thing I am curious about is that by tracking the elided nodes, would it
make more sense in the long run to have the initial post-naming plan tree
be immutable, and generate a separate copy minus the elided parts?

Re: pg_plan_advice

Robert Haas <robertmhaas@gmail.com> — 2025-12-10T21:29:50Z

On Wed, Dec 10, 2025 at 4:09 PM Corey Huinker <corey.huinker@gmail.com> wrote:
> I think the change would be worth the destabilization, because it makes it so much easier to talk about complex query plans. Additionally, it would make it reasonable to programmatically extract portions of a plan, allowing for much more fine-grained regression tests regarding plans.

I'll wait for more votes before thinking about doing anything about
this, because I have my doubts about whether the consensus will
actually go in favor of such a large change. Or maybe someone else
would like to try mocking it up (even if somewhat imperfectly) so we
can all see just how large an impact it makes.

>> > On the infrastructure patches (0001-0005): these look sensible. The
>> > range table flattening info, elided node tracking, and append node
>
> One thing I am curious about is that by tracking the elided nodes, would it make more sense in the long run to have the initial post-naming plan tree be immutable, and generate a separate copy minus the elided parts?

Probably not. Having two entire copies of the plan tree would be
pretty expensive. I think that we've bet on the right idea, namely,
that the primary consumer of plan trees should be the executor, and
the primary goal should be to create plan trees that make the executor
run fast. I believe the right approach is basically what we do today:
you're allowed to put things into the plan that aren't technically
necessary for execution, if they're useful for instrumentation and
observability purposes and they don't add an unreasonable amount of
overhead. These patches basically just extend that existing principle
to a few new things.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg_plan_advice

Robert Haas <robertmhaas@gmail.com> — 2025-12-11T15:09:47Z

On Fri, Dec 5, 2025 at 2:57 PM Robert Haas <robertmhaas@gmail.com> wrote:
> 014f9a831a320666bf2195949f41710f970c54ad removes the need for what was
> previously 0004, so here is a new patch series with that dropped, to
> avoid confusing cfbot or human reviewers.

Here's v6, with minor improvements over v5.

0001: Unchanged.

0002, 0003: Unchanged except for typo fixes pointed out by reviewers.

0004: I've improved the hook placement, which was previously such as
to make correct unique-semijoin handling impossible, and I improved
the associated comment about how to use the hook, based on experience
trying to actually do so.

0005: Fixed a small bug related to unique-semijoin handling (other
problems remain). Tidied things up to avoid producing non-actionable
NO_GATHER() advice in a number of cases, per some off-list feedback
from Ajaykumar Pal.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg_plan_advice

Jacob Champion <jacob.champion@enterprisedb.com> — 2025-12-12T01:11:09Z

On Tue, Dec 9, 2025 at 11:46 AM Robert Haas <robertmhaas@gmail.com> wrote:
> By the way, if your fuzzer can also
> produces some things to add contrib/pg_plan_advice/sql for cases like
> this, that would be quite helpful. Ideally I would have caught this
> with a manually-written test case, but obviously that didn't happen.

Sure! (They'll need to be golfed down.) Here are three entries that
hit the crash, each on its own line:

> join_order(qoe((nested_l oindex_scanp_plain))se(nested_loop_plain)nested_loo/_pseq_scanlain)
> join_order(qoe((nested_loop_plain))se(nested_loop_plain)nesemij/insted_loop_plain)
> gather(gather(gar(g/ther0))gtaher(gathethga))

Something the fuzzer really likes is zero-length identifiers ("").
Maybe that's by design, but I thought I'd mention it since the
standard lexer doesn't allow that and syntax.sql doesn't exercise it.

> > It doesn't know that area is guaranteed to be non-NULL, so it can't
> > prove that ca_pointer is initialized.
>
> I don't know what to do about that. I can understand why it might be
> unable to prove that, but I don't see an obvious way to change the
> code that would make life easier. I could add Assert(area != NULL)
> before the call to pgpa_make_collected_advice() if that helps.

With USE_ASSERT_CHECKING, that should help, but I'm not sure if it
does without. (I could have sworn there was a conversation about that
at some point but I can't remember any of the keywords.) Could also
just make a dummy assignment. Or tag pg_plan_advice_dsa_area() with
__attribute__((returns_nonnull)), but that's more portability work.

--Jacob

Re: pg_plan_advice

Robert Haas <robertmhaas@gmail.com> — 2025-12-12T17:36:17Z

On Thu, Dec 11, 2025 at 8:11 PM Jacob Champion
<jacob.champion@enterprisedb.com> wrote:
> Sure! (They'll need to be golfed down.) Here are three entries that
> hit the crash, each on its own line:
>
> > join_order(qoe((nested_l oindex_scanp_plain))se(nested_loop_plain)nested_loo/_pseq_scanlain)
> > join_order(qoe((nested_loop_plain))se(nested_loop_plain)nesemij/insted_loop_plain)
> > gather(gather(gar(g/ther0))gtaher(gathethga))

At least for me, setting pg_plan_advice.advice to any of these strings
does not provoke a crash. What I discovered after a bit of
experimentation is that you get the crash if you (a) set the string to
something like this and then (b) run an EXPLAIN. Turns out, I already
had a test in syntax.sql that is sufficient to provoke the crash, so,
locally, I've added 'EXPLAIN SELECT 1' after each test case in
syntax.sql that is expected to successfully alter the value of the
GUC.

> Something the fuzzer really likes is zero-length identifiers ("").
> Maybe that's by design, but I thought I'd mention it since the
> standard lexer doesn't allow that and syntax.sql doesn't exercise it.

That's not by design. I've added a matching error check locally.

> > > It doesn't know that area is guaranteed to be non-NULL, so it can't
> > > prove that ca_pointer is initialized.
> >
> > I don't know what to do about that. I can understand why it might be
> > unable to prove that, but I don't see an obvious way to change the
> > code that would make life easier. I could add Assert(area != NULL)
> > before the call to pgpa_make_collected_advice() if that helps.
>
> With USE_ASSERT_CHECKING, that should help, but I'm not sure if it
> does without. (I could have sworn there was a conversation about that
> at some point but I can't remember any of the keywords.) Could also
> just make a dummy assignment. Or tag pg_plan_advice_dsa_area() with
> __attribute__((returns_nonnull)), but that's more portability work.

As in initialize ca_pointer to InvalidDsaPointer?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg_plan_advice

Jacob Champion <jacob.champion@enterprisedb.com> — 2025-12-12T18:09:44Z

On Fri, Dec 12, 2025 at 9:36 AM Robert Haas <robertmhaas@gmail.com> wrote:
> At least for me, setting pg_plan_advice.advice to any of these strings
> does not provoke a crash. What I discovered after a bit of
> experimentation is that you get the crash if you (a) set the string to
> something like this and then (b) run an EXPLAIN.

Makes sense (this fuzzer was exercising pgpa_format_advice_target()).

> > With USE_ASSERT_CHECKING, that should help, but I'm not sure if it
> > does without. (I could have sworn there was a conversation about that
> > at some point but I can't remember any of the keywords.) Could also
> > just make a dummy assignment. Or tag pg_plan_advice_dsa_area() with
> > __attribute__((returns_nonnull)), but that's more portability work.
>
> As in initialize ca_pointer to InvalidDsaPointer?

Yeah.

Next bit of fuzzer feedback: I need the following diff in
pgpa_trove_add_to_hash() to avoid a crash when the hashtable starts to
fill up:

>     element = pgpa_trove_entry_insert(hash, key, &found);
> +   if (!found)
> +       element->indexes = NULL;
>     element->indexes = bms_add_member(element->indexes, index);

The advice string that triggered this is horrific, but I can send it
to you offline if you're morbidly curious. (I can spend time to
minimize it or I can get more fuzzer coverage, and I'd rather do the
latter right now :D)

--Jacob

Re: pg_plan_advice

Ajay Pal <ajay.pal.k@gmail.com> — 2025-12-15T06:30:43Z

During further testing of the plan_advice patch's latest version, I
observed that the following query is generating a no_gather plan. This
specific plan structure is not being accepted by the query planner.

postgres=*# set local pg_plan_advice.advice='NO_GATHER("*RESULT*")';
SET
postgres=*# explain ( plan_advice) SELECT
CAST('99999999999999999999999999999999999999999999999999' AS NUMERIC);
                QUERY PLAN
-------------------------------------------
 Result  (cost=0.00..0.01 rows=1 width=32)
 Supplied Plan Advice:
   NO_GATHER("*RESULT*") /* not matched */
 Generated Plan Advice:
   NO_GATHER("*RESULT*")
(5 rows)

Thanks
Ajay

On Fri, Dec 12, 2025 at 11:40 PM Jacob Champion
<jacob.champion@enterprisedb.com> wrote:
>
> On Fri, Dec 12, 2025 at 9:36 AM Robert Haas <robertmhaas@gmail.com> wrote:
> > At least for me, setting pg_plan_advice.advice to any of these strings
> > does not provoke a crash. What I discovered after a bit of
> > experimentation is that you get the crash if you (a) set the string to
> > something like this and then (b) run an EXPLAIN.
>
> Makes sense (this fuzzer was exercising pgpa_format_advice_target()).
>
> > > With USE_ASSERT_CHECKING, that should help, but I'm not sure if it
> > > does without. (I could have sworn there was a conversation about that
> > > at some point but I can't remember any of the keywords.) Could also
> > > just make a dummy assignment. Or tag pg_plan_advice_dsa_area() with
> > > __attribute__((returns_nonnull)), but that's more portability work.
> >
> > As in initialize ca_pointer to InvalidDsaPointer?
>
> Yeah.
>
> Next bit of fuzzer feedback: I need the following diff in
> pgpa_trove_add_to_hash() to avoid a crash when the hashtable starts to
> fill up:
>
> >     element = pgpa_trove_entry_insert(hash, key, &found);
> > +   if (!found)
> > +       element->indexes = NULL;
> >     element->indexes = bms_add_member(element->indexes, index);
>
> The advice string that triggered this is horrific, but I can send it
> to you offline if you're morbidly curious. (I can spend time to
> minimize it or I can get more fuzzer coverage, and I'd rather do the
> latter right now :D)
>
> --Jacob
>
>

Re: pg_plan_advice

Jacob Champion <jacob.champion@enterprisedb.com> — 2025-12-15T16:37:43Z

On Fri, Dec 12, 2025 at 10:09 AM Jacob Champion
<jacob.champion@enterprisedb.com> wrote:
> Next bit of fuzzer feedback:

And another bit, but this time I was able to minimize into a
regression case, attached.

This comment in pgpa_identifier_matches_target() seems to be incorrect:

>     /*
>      * The identifier must specify a schema, but the target may leave the
>      * schema NULL to match anything.
>      */

But I don't know whether that's because the assumption itself is
wrong, or because a layer above hasn't filtered something out before
getting to this point.

--Jacob

Re: pg_plan_advice

Robert Haas <robertmhaas@gmail.com> — 2025-12-15T20:06:08Z

Here's v7.

In 0001, I removed "const" from a node's struct declaration, because
Tom gave me some feedback to avoid that on another recent patch, and I
noticed I had done it here also. 0002, 0003, and 0004 are unchanged.

In 0005:

- Refactored the code to avoid issuing SEMIJOIN_NON_UNIQUE() advice in
cases where uniqueness wasn't actually considered.
- Adjusted the code not to issue NO_GATHER() advice for non-relation
RTEs. (This is the issue reported by Ajay Pal in a recent message to
this thread, which was also mentioned in an XXX in the code.)
- Reject zero-length delimited identifiers, per Jacob's email.
- Properly initialize element->indexes in pgpa_trove_add_to_hash, per
Jacob'e email.
- Add gather((d d/d.d)) test case, per Jacob, and fix the related bug
in pgpa_identifier_matches_target, per Jacob's email.
- Add EXPLAIN SELECT 1 after various test cases in syntax.sql, to
improve test coverage, per analysis of why the existing test case
didn't catch a bug previously reported by Jacob.
- Added a dummy initialization to pgpa_collector.c to placate nervous
compilers, per discussion with Jacob.

I think this mostly catches me up on responding to issues reported
here, although there is one thing reported to me off-list that I
haven't dealt with yet. If there's anything reported on thread that
isn't addressed here, let me know.

Thanks,

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg_plan_advice

Jakub Wartak <jakub.wartak@enterprisedb.com> — 2025-12-17T10:12:40Z

On Mon, Dec 15, 2025 at 9:06 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> Here's v7.
[..]

OK, so I've tested today from Your's branch directly, so I hope that
was also v7. Given the following q20 query:

SELECT s_name, s_address
FROM supplier, nation
WHERE s_suppkey in
    (SELECT ps_suppkey
     FROM partsupp
     WHERE ps_partkey in
         (SELECT p_partkey
          FROM part
          WHERE p_name LIKE 'forest%' )
       AND ps_availqty >
         (SELECT 0.5 * sum(l_quantity)
          FROM lineitem
          WHERE l_partkey = ps_partkey
            AND l_suppkey = ps_suppkey
            AND l_shipdate >= DATE '1994-01-01'
            AND l_shipdate < DATE '1994-01-01' + INTERVAL '1' year ) )
  AND s_nationkey = n_nationkey
  AND n_name = 'CANADA'
ORDER BY s_name;

in normal conditions (w/o advice) the above query generates:

 Sort  (cost=1010985030.44..1010985030.59 rows=61 width=51)
   Sort Key: supplier.s_name
   ->  Nested Loop  (cost=0.42..1010985028.63 rows=61 width=51)
         Join Filter: (nation.n_nationkey = supplier.s_nationkey)
         ->  Seq Scan on nation  (cost=0.00..1.31 rows=1 width=4)
               Filter: (n_name = 'CANADA'::bpchar)
         ->  Nested Loop Semi Join  (cost=0.42..1010985008.29
rows=1522 width=55)
               Join Filter: (partsupp.ps_suppkey = supplier.s_suppkey)
               ->  Seq Scan on supplier  (cost=0.00..249.30 rows=7730 width=59)
               ->  Materialize  (cost=0.42..1010755994.57 rows=1973 width=4)
                     ->  Nested Loop  (cost=0.42..1010755984.71
rows=1973 width=4)
                           ->  Seq Scan on part  (cost=0.00..4842.25
rows=1469 width=4)
                                 Filter: ((p_name)::text ~~ 'forest%'::text)
                           ->  Index Scan using pk_partsupp on
partsupp  (cost=0.42..688053.87 rows=1 width=8)
                                 Index Cond: (ps_partkey = part.p_partkey)
                                 Filter: ((ps_availqty)::numeric >
(SubPlan expr_1))
                                 SubPlan expr_1
                                   ->  Aggregate
(cost=172009.42..172009.44 rows=1 width=32)
                                         ->  Seq Scan on lineitem
(cost=0.00..172009.42 rows=1 width=5)
                                               Filter: ((l_shipdate >=
'1994-01-01'::date) AND (l_shipdate < '1995-01-01 00:00:00'::timestamp
without time zone) AND (l_partkey = partsupp.ps_partkey) AND
(l_suppkey = partsupp.ps_suppkey))


 Generated Plan Advice:
   JOIN_ORDER(nation (supplier (part partsupp)))
   NESTED_LOOP_PLAIN(partsupp partsupp) <--- [X]
   NESTED_LOOP_MATERIALIZE(partsupp)
   SEQ_SCAN(nation supplier part lineitem@expr_1)
   INDEX_SCAN(partsupp public.pk_partsupp)
   SEMIJOIN_NON_UNIQUE((partsupp part))
   NO_GATHER(supplier nation partsupp part lineitem@expr_1)

Please see the - I think it's confusing? -
NESTED_LOOP_MATERIALIZE(partsupp partsupp) - that's 2x the same
string? This causes it to turn into below plan -- I've marked the
problem with [X]

 Sort  (cost=50035755.50..50035755.66 rows=61 width=51)
   Sort Key: supplier.s_name
   ->  Nested Loop  (cost=12562154.32..50035753.70 rows=61 width=51)
         Join Filter: (nation.n_nationkey = supplier.s_nationkey)
         ->  Seq Scan on nation  (cost=0.00..1.31 rows=1 width=4)
               Filter: (n_name = 'CANADA'::bpchar)
         ->  Nested Loop Semi Join  (cost=12562154.32..50035733.36
rows=1522 width=55)
             [X] -- missing Join Filter here
               ->  Seq Scan on supplier  (cost=0.00..249.30 rows=7730 width=59)
               [X] -- HJ instead of Materialize+Nested Loop below:
               ->  Hash Join  (cost=12562154.32..12567002.09 rows=1 width=4)
                     Hash Cond: (part.p_partkey = partsupp.ps_partkey)
                     ->  Seq Scan on part  (cost=0.00..4842.25
rows=1469 width=4)
                           Filter: ((p_name)::text ~~ 'forest%'::text)
                     ->  Hash  (cost=12562154.02..12562154.02 rows=24 width=8)
                           ->  Index Scan using pk_partsupp on
partsupp  (cost=0.42..12562154.02 rows=24 width=8)
                                 [X] -- wrong Index Cond below
(suppkey instead of partkey)
                                 Index Cond: (ps_suppkey = supplier.s_suppkey)
                                 Filter: ((ps_availqty)::numeric >
(SubPlan expr_1))
                                 SubPlan expr_1
                                   ->  Aggregate
(cost=172009.42..172009.44 rows=1 width=32)
                                         ->  Seq Scan on lineitem
(cost=0.00..172009.42 rows=1 width=5)
                                               Filter: ((l_shipdate >=
'1994-01-01'::date) AND (l_shipdate < '1995-01-01 00:00:00'::timestamp
without time zone) AND (l_partkey = partsupp.ps_partkey) AND
(l_suppkey = partsupp.ps_suppkey))

Supplied Plan Advice:
   SEQ_SCAN(nation) /* matched */
   SEQ_SCAN(supplier) /* matched */
   SEQ_SCAN(part) /* matched */
   SEQ_SCAN(lineitem@expr_1) /* matched */
   INDEX_SCAN(partsupp public.pk_partsupp) /* matched */
   JOIN_ORDER(nation (supplier (part partsupp))) /* matched, conflicting */
   NESTED_LOOP_PLAIN(partsupp) /* matched, conflicting */
   NESTED_LOOP_PLAIN(partsupp) /* matched, conflicting */
   NESTED_LOOP_MATERIALIZE(partsupp) /* matched, conflicting, failed */
   SEMIJOIN_NON_UNIQUE((partsupp part)) /* matched, conflicting */
   NO_GATHER(supplier) /* matched */
   NO_GATHER(nation) /* matched */
   NO_GATHER(partsupp) /* matched */
   NO_GATHER(part) /* matched */
   NO_GATHER(lineitem@expr_1) /* matched */

So the difference is basically between:
    set pg_plan_advice.advice = '[..] NESTED_LOOP_PLAIN(partsupp
partsupp) NESTED_LOOP_MATERIALIZE(partsupp) [..]';
which causes wrong plan and outcome:
    NESTED_LOOP_MATERIALIZE(partsupp) /* matched, conflicting, failed */

and apparently proper advice like below which has better yield:
    set pg_plan_advice.advice = '[..] NESTED_LOOP_PLAIN(part partsupp)
NESTED_LOOP_MATERIALIZE(partsupp) [..]';
which is not generated , but caused good plan, however it also prints:
   NESTED_LOOP_PLAIN(part) /* matched, conflicting, failed */
   NESTED_LOOP_MATERIALIZE(partsupp) /* matched, conflicting */
but that seems "failed" there, seems to be untrue?

Another idea is perhaps, we could have some elog(WARNING) - but not
Asserts() - in assert-only enabled build that could alert us in case
of duplicated entries being detected for the same ops in
pg_plan_advice_explain_feedback()?

-J.

Re: pg_plan_advice

Jakub Wartak <jakub.wartak@enterprisedb.com> — 2025-12-17T13:44:02Z

On Wed, Dec 17, 2025 at 11:12 AM Jakub Wartak
<jakub.wartak@enterprisedb.com> wrote:
>
> On Mon, Dec 15, 2025 at 9:06 PM Robert Haas <robertmhaas@gmail.com> wrote:
> >
> > Here's v7.
> [..]
>[..q20..]

OK, now for the q10:

 Sort
   Sort Key: (sum((lineitem.l_extendedprice * ('1'::numeric -
lineitem.l_discount)))) DESC
   ->  Finalize GroupAggregate
         Group Key: customer.c_custkey, nation.n_name
         ->  Gather Merge
               Workers Planned: 2
               ->  Partial GroupAggregate
                     Group Key: customer.c_custkey, nation.n_name
                     ->  Sort
                           Sort Key: customer.c_custkey, nation.n_name
                           ->  Hash Join
                                 Hash Cond: (customer.c_nationkey =
nation.n_nationkey)
                                 ->  Parallel Hash Join
                                       Hash Cond: (orders.o_custkey =
customer.c_custkey)
                                       ->  Nested Loop
                                             ->  Parallel Seq Scan on orders
                                                   Filter:
((o_orderdate >= '1993-10-01'::date) AND (o_orderdate < '1994-01-01
00:00:00'::timestamp without time zone))
                                             ->  Index Scan using
lineitem_l_orderkey_idx_l_returnflag on lineitem
                                                   Index Cond:
(l_orderkey = orders.o_orderkey)
                                       ->  Parallel Hash
                                             ->  Parallel Seq Scan on customer
                                 ->  Hash
                                       ->  Seq Scan on nation
 Generated Plan Advice:
   JOIN_ORDER(orders lineitem customer nation)
   NESTED_LOOP_PLAIN(lineitem)
   HASH_JOIN(customer nation)
   SEQ_SCAN(orders customer nation)
   INDEX_SCAN(lineitem public.lineitem_l_orderkey_idx_l_returnflag)
   GATHER_MERGE((customer orders lineitem nation))

but when set the advice it generates wrong NL instead of expected
Parallel HJ (so another way to fix is to simply disable PQ, yuck),
but:

 Sort
   Sort Key: (sum((lineitem.l_extendedprice * ('1'::numeric -
lineitem.l_discount)))) DESC
   ->  Finalize GroupAggregate
         Group Key: customer.c_custkey, nation.n_name
         ->  Gather Merge
               Workers Planned: 2
               ->  Partial GroupAggregate
                     Group Key: customer.c_custkey, nation.n_name
                     ->  Sort
                           Sort Key: customer.c_custkey, nation.n_name
                           ->  Nested Loop
                                 ->  Hash Join
                                       Hash Cond:
(customer.c_nationkey = nation.n_nationkey)
                                       ->  Parallel Hash Join
                                             Hash Cond:
(orders.o_custkey = customer.c_custkey)
                                             ->  Parallel Seq Scan on orders
                                                   Filter:
((o_orderdate >= '1993-10-01'::date) AND (o_orderdate < '1994-01-01
00:00:00'::timestamp without time zone))
                                             ->  Parallel Hash
                                                   ->  Parallel Seq
Scan on customer
                                       ->  Hash
                                             ->  Seq Scan on nation
                                 ->  Index Scan using
lineitem_l_orderkey_idx_l_returnflag on lineitem
                                       Index Cond: (l_orderkey =
orders.o_orderkey)
 Supplied Plan Advice:
   SEQ_SCAN(orders) /* matched */
   SEQ_SCAN(customer) /* matched */
   SEQ_SCAN(nation) /* matched */
   INDEX_SCAN(lineitem public.lineitem_l_orderkey_idx_l_returnflag) /*
matched */
   JOIN_ORDER(orders lineitem customer nation) /* matched,
conflicting, failed */
   NESTED_LOOP_PLAIN(lineitem) /* matched, conflicting */
   HASH_JOIN(customer) /* matched, conflicting */
   HASH_JOIN(nation) /* matched, conflicting */
   GATHER_MERGE((customer orders lineitem nation)) /* matched */

So to me it looks like in Generated Plan Advice we:
- have proper HASH_JOIN(customer nation)
- but it somehow forgot to include "HASH_JOIN(orders)" to cover for
that Parallel Hash Join on (orders.o_custkey = customer.c_custkey)
with input from NL. After adding that manually, it achieves the same
input plan properly.

Please let me know if I'm wrong, I was kind of thinking Parallel is
not fully supported, but README/tests seem to state otherwise.

-J.

Re: pg_plan_advice

Jakub Wartak <jakub.wartak@enterprisedb.com> — 2025-12-18T12:27:27Z

On Wed, Dec 17, 2025 at 2:44 PM Jakub Wartak
<jakub.wartak@enterprisedb.com> wrote:
>
> On Wed, Dec 17, 2025 at 11:12 AM Jakub Wartak
> <jakub.wartak@enterprisedb.com> wrote:
> >
> > On Mon, Dec 15, 2025 at 9:06 PM Robert Haas <robertmhaas@gmail.com> wrote:
> > >
> > > Here's v7.
> > [..]
> >[..q20..]
>
> OK, now for the q10:

Hi, this is a follow-up just to the q10.

> So to me it looks like in Generated Plan Advice we:
> - have proper HASH_JOIN(customer nation)
> - but it somehow forgot to include "HASH_JOIN(orders)" to cover for
> that Parallel Hash Join on (orders.o_custkey = customer.c_custkey)
> with input from NL. After adding that manually, it achieves the same
> input plan properly.
[..]

Well, it's quite a ride with the Q10 and I partially wrong with above:

0. The reported earlier wrong missing "HASH_JOIN(orders customer)" -
that part was okay
1. The Incremental Sort is being used in the original plan, but is
still IS not reflected in the generated advice.
2a. I've noticed Memoize/Index Scan was not being respected for "nation"
2b. Seq scan for nation was being done for "nation"

So total modification list, I've ended up doing (+ for adding , - for removing):

+ HASH_JOIN(orders customer) -- from earlier reply
+ NESTED_LOOP_MEMOIZE(nation)
+ INDEX_SCAN(nation public.pk_nation)
- HASH_JOIN(customer nation) -- as it was we were having NL() in org plan
SEQ_SCAN(orders customer nation) ==> SEQ_SCAN(orders customer)

In full the best shape seems to be Q10 with pg_plan_advice.advice =
'HASH_JOIN(orders customer) JOIN_ORDER(orders lineitem customer
nation)    NESTED_LOOP_PLAIN(lineitem)    SEQ_SCAN(orders customer)
INDEX_SCAN(lineitem public.lineitem_l_orderkey_idx_l_returnflag)
GATHER_MERGE((customer orders lineitem nation))
NESTED_LOOP_MEMOIZE(nation)';

which yields:
 Sort
   Sort Key: (sum((lineitem.l_extendedprice * ('1'::numeric -
lineitem.l_discount)))) DESC
   ->  GroupAggregate
         Group Key: customer.c_custkey, nation.n_name
         ->  Gather Merge
               Workers Planned: 2
               ->  Sort
                     Sort Key: customer.c_custkey, nation.n_name
                     ->  Nested Loop
                           ->  Parallel Hash Join
                                 Hash Cond: (orders.o_custkey =
customer.c_custkey)
                                 ->  Nested Loop
                                       ->  Parallel Seq Scan on orders
                                             Filter: ((o_orderdate >=
'1993-10-01'::date) AND (o_orderdate < '1994-01-01
00:00:00'::timestamp without time zone))
                                       ->  Index Scan using
lineitem_l_orderkey_idx_l_returnflag on lineitem
                                             Index Cond: (l_orderkey =
orders.o_orderkey)
                                 ->  Parallel Hash
                                       ->  Parallel Seq Scan on customer
                           ->  Memoize
                                 Cache Key: customer.c_nationkey
                                 Cache Mode: logical
                                 ->  Index Scan using pk_nation on nation
                                       Index Cond: (n_nationkey =
customer.c_nationkey)

but that Incremental Sort *is* still missing. In original plan we are doing
   Incremental Sort (Sort Key: customer.c_custkey, nation.n_name,
Presorted Key: customer.c_custkey)
   <-- .... Sort(Sort Key: customer.c_custkey)

However, even with my overrides I haven't found an immediately obvious
way to force it to use Incremental Sort on a specific field, so it
just sorts on two at once. Maybe it's something that should be
expressed through GATHER_MERGE()?, but that's not obvious how and
where. In terms of raw performance , it seems to be very similiar
(98ms +/- 8ms even between those two).

-J.

Re: pg_plan_advice

Robert Haas <robertmhaas@gmail.com> — 2025-12-18T13:36:44Z

On Wed, Dec 17, 2025 at 5:12 AM Jakub Wartak
<jakub.wartak@enterprisedb.com> wrote:
>  Sort  (cost=1010985030.44..1010985030.59 rows=61 width=51)
>    Sort Key: supplier.s_name
>    ->  Nested Loop  (cost=0.42..1010985028.63 rows=61 width=51)
>          Join Filter: (nation.n_nationkey = supplier.s_nationkey)
>          ->  Seq Scan on nation  (cost=0.00..1.31 rows=1 width=4)
>                Filter: (n_name = 'CANADA'::bpchar)
>          ->  Nested Loop Semi Join  (cost=0.42..1010985008.29
> rows=1522 width=55)
>                Join Filter: (partsupp.ps_suppkey = supplier.s_suppkey)
>                ->  Seq Scan on supplier  (cost=0.00..249.30 rows=7730 width=59)
>                ->  Materialize  (cost=0.42..1010755994.57 rows=1973 width=4)
>                      ->  Nested Loop  (cost=0.42..1010755984.71
> rows=1973 width=4)
>                            ->  Seq Scan on part  (cost=0.00..4842.25
> rows=1469 width=4)
>                                  Filter: ((p_name)::text ~~ 'forest%'::text)
>                            ->  Index Scan using pk_partsupp on
> partsupp  (cost=0.42..688053.87 rows=1 width=8)
>                                  Index Cond: (ps_partkey = part.p_partkey)
>                                  Filter: ((ps_availqty)::numeric >
> (SubPlan expr_1))
>                                  SubPlan expr_1
>                                    ->  Aggregate
> (cost=172009.42..172009.44 rows=1 width=32)
>                                          ->  Seq Scan on lineitem
> (cost=0.00..172009.42 rows=1 width=5)
>                                                Filter: ((l_shipdate >=
> '1994-01-01'::date) AND (l_shipdate < '1995-01-01 00:00:00'::timestamp
> without time zone) AND (l_partkey = partsupp.ps_partkey) AND
> (l_suppkey = partsupp.ps_suppkey))
>
>
>  Generated Plan Advice:
>    JOIN_ORDER(nation (supplier (part partsupp)))
>    NESTED_LOOP_PLAIN(partsupp partsupp) <--- [X]
>    NESTED_LOOP_MATERIALIZE(partsupp)
>    SEQ_SCAN(nation supplier part lineitem@expr_1)
>    INDEX_SCAN(partsupp public.pk_partsupp)
>    SEMIJOIN_NON_UNIQUE((partsupp part))
>    NO_GATHER(supplier nation partsupp part lineitem@expr_1)

Yeah, that's not right. There are three nested loops here, so we
should have three pieces of nested loop advice.
NESTED_LOOP_MATERIALIZE(partsupp) covers the innermost nested loop.
The other two are NESTED_LOOP_PLAIN, but the advice should cover all
the tables on the inner side of the join. I think it should read:

NESTED_LOOP_PLAIN((part partsupp) (supplier part partsupp))

Ordering isn't significant here, so NESTED_LOOP_PLAIN((part supplier
partsupp) (partsupp part)) would be logically equivalent. Doesn't
matter exactly what we output here, but it shouldn't be just partsupp.

> and apparently proper advice like below which has better yield:
>     set pg_plan_advice.advice = '[..] NESTED_LOOP_PLAIN(part partsupp)

This isn't quite what you want, because this says that part should be
on the outer side of a NESTED_LOOP_PLAIN by itself and partsupp should
also be on the outer side of a NESTED_LOOP_PLAIN by itself. You need
the extra set of parentheses to indicate that the join product of
those two tables should be on the outer side of a NESTED_LOOP_PLAIN,
rather than each table individually.

What must be happening here is that either pgpa_join.c (maybe with
complicity from pgpa_walker.c) is not populating the
pgpa_plan_walker_context's join_strategies[JSTRAT_NESTED_LOOP_PLAIN]
member correctly, or else pgpa_output.c is not serializing it to text
correctly. I suspect the former is a more likely but I'm not sure
exactly what's happening.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg_plan_advice

Robert Haas <robertmhaas@gmail.com> — 2025-12-18T20:39:13Z

On Wed, Dec 17, 2025 at 8:44 AM Jakub Wartak
<jakub.wartak@enterprisedb.com> wrote:
> OK, now for the q10:
>
>  Sort
>    Sort Key: (sum((lineitem.l_extendedprice * ('1'::numeric -
> lineitem.l_discount)))) DESC
>    ->  Finalize GroupAggregate
>          Group Key: customer.c_custkey, nation.n_name
>          ->  Gather Merge
>                Workers Planned: 2
>                ->  Partial GroupAggregate
>                      Group Key: customer.c_custkey, nation.n_name
>                      ->  Sort
>                            Sort Key: customer.c_custkey, nation.n_name
>                            ->  Hash Join
>                                  Hash Cond: (customer.c_nationkey =
> nation.n_nationkey)
>                                  ->  Parallel Hash Join
>                                        Hash Cond: (orders.o_custkey =
> customer.c_custkey)
>                                        ->  Nested Loop
>                                              ->  Parallel Seq Scan on orders
>                                                    Filter:
> ((o_orderdate >= '1993-10-01'::date) AND (o_orderdate < '1994-01-01
> 00:00:00'::timestamp without time zone))
>                                              ->  Index Scan using
> lineitem_l_orderkey_idx_l_returnflag on lineitem
>                                                    Index Cond:
> (l_orderkey = orders.o_orderkey)
>                                        ->  Parallel Hash
>                                              ->  Parallel Seq Scan on customer
>                                  ->  Hash
>                                        ->  Seq Scan on nation
>  Generated Plan Advice:
>    JOIN_ORDER(orders lineitem customer nation)
>    NESTED_LOOP_PLAIN(lineitem)
>    HASH_JOIN(customer nation)
>    SEQ_SCAN(orders customer nation)
>    INDEX_SCAN(lineitem public.lineitem_l_orderkey_idx_l_returnflag)
>    GATHER_MERGE((customer orders lineitem nation))

This looks correct to me.

> but when set the advice it generates wrong NL instead of expected
> Parallel HJ (so another way to fix is to simply disable PQ, yuck),
> but:

This is obviously bad. I'm not quite sure what happened here, but my
guess is that something prevented the JOIN_ORDER advice from being
applied cleanly and then everything went downhill from there. I wonder
if JOIN_ORDER doesn't interact properly with incremental sorts --
that's a situation for which I don't think I have existing test
coverage.

> So to me it looks like in Generated Plan Advice we:
> - have proper HASH_JOIN(customer nation)
> - but it somehow forgot to include "HASH_JOIN(orders)" to cover for
> that Parallel Hash Join on (orders.o_custkey = customer.c_custkey)
> with input from NL. After adding that manually, it achieves the same
> input plan properly.

The first table in the JOIN_ORDER() specification isn't supposed to
have a join method specification, because the join method specifier
says what appears on the inner, i.e. second, arm of the join.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg_plan_advice

Haibo Yan <tristan.yim@gmail.com> — 2025-12-29T23:33:53Z

Hi Robert,

Thank you very much for your work on the pg_plan_advice patch series. It is
an impressive and substantial contribution, and it seems like a meaningful
step forward toward addressing long-standing query plan stability issues in
PostgreSQL.

While reviewing the v7 patches, I noticed a few points that I wanted to
raise for discussion:
1. GEQO interaction (patch 4):
Since GEQO relies on randomized search, is there a risk that the optimizer
may fail to explore the specific join order or path that is being enforced
by the advice mask? In that case, could this lead to failures such as
inability to construct the required join relation or excessive planning
time if the desired path is not sampled?
2. Parallel query serialization (patches 1–3):
Several new fields (subrtinfos, elidedNodes, child_append_relid_sets) are
added to PlannedStmt, but I did not see corresponding changes in outfuncs.c
/ readfuncs.c. Without serialization support, parallel workers executing
subplans or Append nodes may not receive this metadata. Is this handled
elsewhere, or is it something still pending?
3. Alias handling when generating advice (patch 5):
In pgpa_output_relation_name, the advice string is generated using
get_rel_name(relid), which resolves to the underlying table name rather
than the RTE alias. In self-join cases this could be ambiguous (e.g.,
my_table vs my_table). Would it be more appropriate to use the RTE alias
when available?
4. Minor typo (patch 4):
In src/include/nodes/relation.h, parititonwise appears to be a typo and
should likely be partitionwise.

I hope these comments are helpful, and I apologize in advance if any of
this is already addressed elsewhere in the series.

Best regards,
Haibo

On Mon, Dec 15, 2025 at 12:06 PM Robert Haas <robertmhaas@gmail.com> wrote:

> Here's v7.
>
> In 0001, I removed "const" from a node's struct declaration, because
> Tom gave me some feedback to avoid that on another recent patch, and I
> noticed I had done it here also. 0002, 0003, and 0004 are unchanged.
>
> In 0005:
>
> - Refactored the code to avoid issuing SEMIJOIN_NON_UNIQUE() advice in
> cases where uniqueness wasn't actually considered.
> - Adjusted the code not to issue NO_GATHER() advice for non-relation
> RTEs. (This is the issue reported by Ajay Pal in a recent message to
> this thread, which was also mentioned in an XXX in the code.)
> - Reject zero-length delimited identifiers, per Jacob's email.
> - Properly initialize element->indexes in pgpa_trove_add_to_hash, per
> Jacob'e email.
> - Add gather((d d/d.d)) test case, per Jacob, and fix the related bug
> in pgpa_identifier_matches_target, per Jacob's email.
> - Add EXPLAIN SELECT 1 after various test cases in syntax.sql, to
> improve test coverage, per analysis of why the existing test case
> didn't catch a bug previously reported by Jacob.
> - Added a dummy initialization to pgpa_collector.c to placate nervous
> compilers, per discussion with Jacob.
>
> I think this mostly catches me up on responding to issues reported
> here, although there is one thing reported to me off-list that I
> haven't dealt with yet. If there's anything reported on thread that
> isn't addressed here, let me know.
>
> Thanks,
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com
>

Re: pg_plan_advice

Lukas Fittl <lukas@fittl.com> — 2025-12-30T01:15:18Z

On Mon, Dec 15, 2025 at 12:06 PM Robert Haas <robertmhaas@gmail.com> wrote:
> Here's v7.

I'm excited about this patch series, and in an effort to help land the
infrastructure, here is a review of 0001 - 0003 to start:

For 0001, I'm not sure the following comment is correct:

> /* When recursing = true, it's an unplanned or dummy subquery. */
> rtinfo->dummy = recursing;

Later in that function we only recurse if its a dummy subquery - in the
case of an unplanned subquery (rel->subroot == NULL)
add_rtes_to_flat_rtable won't be called again (instead the relation RTEs
are directly added to the finalrtable). Maybe we can
clarify that comment as "When recursing = true, it's a dummy subquery or
its children.".

From my medium-level understanding of the planner, I don't think the lack
of tracking unplanned subqueries
in subrtinfos is a problem, at least for the pg_plan_advice type use cases.

---

For 0002:

It might be helpful to clarify in a comment that ElidedNode's plan_node_id
represents the surviving node, not that of the elided node.

I also noticed that this currently doesn't support cases where multiple
nodes are elided, e.g. with multi-level table partitioning:

CREATE TABLE pt (l1 date, l2 text) PARTITION BY RANGE (l1);
CREATE TABLE pt_202512 PARTITION OF pt FOR VALUES FROM ('2025-12-01') TO
('2026-01-01') PARTITION BY LIST (l2);
CREATE TABLE pt_202512_TEST PARTITION OF pt_202512 FOR VALUES IN ('TEST');

EXPLAIN (RANGE_TABLE) SELECT * FROM pt WHERE l1 = '2025-12-15' AND l2 =
'TEST';

                            QUERY PLAN
-------------------------------------------------------------------
 Seq Scan on pt_202512_test pt  (cost=0.00..29.05 rows=1 width=36)
   Filter: ((l1 = '2025-12-15'::date) AND (l2 = 'TEST'::text))
   Scan RTI: 3
   Elided Node Type: Append
   Elided Node RTIs: 1    <=== This is missing RTI 2
 RTI 1 (relation, inherited, in-from-clause):
   Relation: pt
 RTI 2 (relation, inherited, in-from-clause):
   Relation: pt_202512
 RTI 3 (relation, in-from-clause):
   Relation: pt_202512_test
 Unprunable RTIs: 1 2 3

In a quick test, adding child_append_relid_sets (from 0003) to the relids
being passed to record_elided_node fixes
that. Presumably the case of partitionwise join relids doesn't matter,
because that would prevent it being elided.

---

For 0003:

I also find the "cars" variable suffix a bit hard to understand, but not
sure a comment next to the variables is that useful.
Separately, the noise generated by all the additional "_cars" variables
isn't great.

I wonder a little bit if we couldn't introduce a better abstraction here,
e.g. a struct "AppendPathInput" that contains the
two related lists, and gets populated by
accumulate_append_subpath/get_singleton_append_subpath and then
passed to create_append_path as a single argument.

---

Note that 0005 needs a rebase,
since 48d4a1423d2e92d10077365532d92e059ba2eb2e changed the
GetNamedDSMSegment API.

You may also want to move the CF entry to the PG19-4 commitfest so CFbot
runs again.

Thanks,
Lukas

-- 
Lukas Fittl