Thread

[PATCH] Optionally record Plan IDs to track plan changes for a query

Lukas Fittl <lukas@fittl.com> — 2025-01-02T20:46:04Z
Hi all,

Inspired by a prior proposal by Sami Imseih for tracking Plan IDs [0], as
well as extensions like pg_stat_plans [1] (unmaintained), pg_store_plans
[2] (not usable on production, see notes later) and aurora_stat_plans [3]
(enabled by default on AWS), this proposed patch set adds:

1. An updated in-core facility to optionally track Plan IDs based on
hashing the plan nodes during the existing treewalk in setrefs.c -
controlled by the new "compute_plan_id" GUC
2. An example user of plan IDs with a new pg_stat_plans extension in
contrib, that also records the first plan text with EXPLAIN (COSTS OFF)

My overall perspective is that (1) is best done in-core to keep overhead
low, whilst (2) could be done outside of core (or merged with a future
pg_stat_statements) and is included here mainly for illustration purposes.

Notes including what constitutes a plan ID follow, after a quick example:

## Example

Having the planid + an extension that records it, plus the first plan text,
lets you track different plans for the same query:

bench=# SELECT * FROM pgbench_accounts WHERE aid = 123;
bench=# SET enable_indexscan = off;
bench=# SELECT * FROM pgbench_accounts WHERE aid = 123;
bench=# SELECT queryid, planid, plan FROM pg_stat_plans WHERE plan LIKE
'%pgbench%';
       queryid        |        planid        |
 plan
----------------------+----------------------+------------------------------------------------------------
 -5986989572677096226 | -2057350818695327558 | Index Scan using
pgbench_accounts_pkey on pgbench_accounts+
                      |                      |   Index Cond: (aid = 123)
 -5986989572677096226 |  2815444815385882663 | Bitmap Heap Scan on
pgbench_accounts                      +
                      |                      |   Recheck Cond: (aid = 123)
                              +
                      |                      |   ->  Bitmap Index Scan on
pgbench_accounts_pkey          +
                      |                      |         Index Cond: (aid =
123)

And this also supports showing the plan for a currently running query (call
count is zero in such cases):

session 1:
bench# SELECT pg_sleep(100), COUNT(*) FROM pgbench_accounts;

session 2:
bench=# SELECT query, plan FROM pg_stat_activity
  JOIN pg_stat_plans ON (usesysid = userid AND datid = dbid AND query_id =
queryid AND plan_id = planid)
  WHERE query LIKE 'SELECT pg_sleep%';
                         query                         |
 plan
-------------------------------------------------------+------------------------------------
 SELECT pg_sleep(100), COUNT(*) FROM pgbench_accounts; | Aggregate
                +
                                                       |   ->  Seq Scan on
pgbench_accounts

## What is a plan ID?

My overall hypothesis here is that identifying different plan shapes for
the same normalized query (i.e. queryid) is useful, because it lets you
detect use of different plan choices such as which join order or index was
used based on different input parameters (or different column statistics
due to a recent ANALYZE) for the same normalized query.

You can get this individually for a given query with EXPLAIN of course, but
if you want to track this over time the only workable mechanism in my
experience is auto_explain, which is good for sampling outliers, but bad
for getting a comprehensive view of which plans where used and how often.

To me the closest to what I consider a "plan shape" is the output of
EXPLAIN (COSTS OFF), that is, the plan nodes and their filters/conditions,
but discarding the exact costs as well as ignoring any execution
statistics. The idea behind the proposed plan ID implementation is trying
to match that by hashing plan nodes, similar to how query IDs hash
post-parse analysis query nodes.

One notable edge case are plans that involve partitions - those could of
course lead to a lot of different planids for a given queryid, based on how
many partitions were pruned. We could consider special casing this, e.g. by
trying to be smart about declarative partitioning, and considering plans to
be identical if they scan the same number of partitions with the same scan
methods. However this could also be done by an out-of-core extension,
either by defining a better planid mechanism, or maintaining a grouped
planid of sorts based on the internal planid.

The partitions problem reminds me a bit of the IN list problem with
pg_stat_statements (which we still haven't resolved) - despite the problem
the extension has been successfully used for many years by many Postgres
users, even for those workloads where you have thousands of entries for the
same query with different IN list lengths.

## Why does this need to be in core?

Unfortunately both existing open-source extensions I'm familiar with are
not suitable for production use. Out of the two, only pg_store_plans [2] is
being maintained, however it carries significant overhead because it
calculates the plan ID by hashing the EXPLAIN text output every time a
query is executed.

My colleague Marko (CCed) and I evaluated whether pg_store_plans could be
modified to instead calculate the planid by hashing the plan tree, and ran
into three issues:

1. The existing node jumbling in core is not usable by extensions, and it
is necessary to have something like it for hashing Filters/Conds
(ultimately requiring us to duplicate all of it in the extension, and keep
maintaining that for every major release)
2. Whilst its cheap enough, it seems unnecessary to do an additional tree
walk when setrefs.c already walks the plan tree in a near-final state
3. It seems useful to enable showing the plan shape of a currently running
query (e.g. to identify whether a plan regression causes the query to run
forever), and this is much easier to do by adding planid to
pg_stat_activity, like the queryid

I also suspect that Aurora's implementation in [3] had some in-core
modifications to enable it work efficiently, but I'm not familiar with any
implementation details beyond what's in the public documentation.

## Implementation notes

The attached patch set includes two preparatory patches that could be
committed independently if deemed useful:

The first patch allows use of node jumbling by other unit files /
extensions, which would help an out-of-core extension avoid duplicating all
the node jumbling code.

The second patch adds a function for the extensible cumulative statistics
system to drop all entries for a given statistics kind. This already exists
for resetting, but in case of a dynamic list of entries its more useful to
be able to drop all of them when "reset" is called.

The third patch adds plan ID tracking in core. This is turned off by
default, and can be enabled by setting "compute_plan_id" to "on". Plan IDs
are shown in pg_stat_activity, as well as EXPLAIN and auto_explain output,
to allow matching a given plan ID to a plan text, without requiring the use
of an extension. There are some minor TODOs in the plan jumbling logic that
I haven't finalized yet. There is also an open question whether we should
use the node attribute mechanism instead of custom jumbling logic?

The fourth patch adds the pg_stat_plans contrib extension, for illustrative
purposes. This is inspired by pg_stat_statements, but intentionally kept
separate for easier review and since it does not use an external file and
could technically be used independently. We may want to develop this into a
unified pg_stat_statements+plans in-core mechanism in the future, but I
think that is best kept for a separate discussion.

The pg_stat_plans extension utilizes the cumulative statistics system for
tracking statistics (extensible thanks to recent changes!), as well as
dynamic shared memory to track plan texts up to a given limit (2kB by
default). As a side note, managing extra allocations with the new
extensible stats is a bit cumbersome - it would be helpful to have a hook
for cleaning up data associated to entries (like a DSA allocation).

Thanks,
Lukas

[0]:
https://www.postgresql.org/message-id/flat/604E3199-2DD2-47DD-AC47-774A6F97DCA9%40amazon.com
[1]: https://github.com/2ndQuadrant/pg_stat_plans
[2]: https://ossc-db.github.io/pg_store_plans/
[3]:
https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora_stat_plans.html

-- 
Lukas Fittl