Thread
-
[PATCH] Optionally record Plan IDs to track plan changes for a query
Lukas Fittl <lukas@fittl.com> — 2025-01-02T20:46:04Z
Hi all, Inspired by a prior proposal by Sami Imseih for tracking Plan IDs [0], as well as extensions like pg_stat_plans [1] (unmaintained), pg_store_plans [2] (not usable on production, see notes later) and aurora_stat_plans [3] (enabled by default on AWS), this proposed patch set adds: 1. An updated in-core facility to optionally track Plan IDs based on hashing the plan nodes during the existing treewalk in setrefs.c - controlled by the new "compute_plan_id" GUC 2. An example user of plan IDs with a new pg_stat_plans extension in contrib, that also records the first plan text with EXPLAIN (COSTS OFF) My overall perspective is that (1) is best done in-core to keep overhead low, whilst (2) could be done outside of core (or merged with a future pg_stat_statements) and is included here mainly for illustration purposes. Notes including what constitutes a plan ID follow, after a quick example: ## Example Having the planid + an extension that records it, plus the first plan text, lets you track different plans for the same query: bench=# SELECT * FROM pgbench_accounts WHERE aid = 123; bench=# SET enable_indexscan = off; bench=# SELECT * FROM pgbench_accounts WHERE aid = 123; bench=# SELECT queryid, planid, plan FROM pg_stat_plans WHERE plan LIKE '%pgbench%'; queryid | planid | plan ----------------------+----------------------+------------------------------------------------------------ -5986989572677096226 | -2057350818695327558 | Index Scan using pgbench_accounts_pkey on pgbench_accounts+ | | Index Cond: (aid = 123) -5986989572677096226 | 2815444815385882663 | Bitmap Heap Scan on pgbench_accounts + | | Recheck Cond: (aid = 123) + | | -> Bitmap Index Scan on pgbench_accounts_pkey + | | Index Cond: (aid = 123) And this also supports showing the plan for a currently running query (call count is zero in such cases): session 1: bench# SELECT pg_sleep(100), COUNT(*) FROM pgbench_accounts; session 2: bench=# SELECT query, plan FROM pg_stat_activity JOIN pg_stat_plans ON (usesysid = userid AND datid = dbid AND query_id = queryid AND plan_id = planid) WHERE query LIKE 'SELECT pg_sleep%'; query | plan -------------------------------------------------------+------------------------------------ SELECT pg_sleep(100), COUNT(*) FROM pgbench_accounts; | Aggregate + | -> Seq Scan on pgbench_accounts ## What is a plan ID? My overall hypothesis here is that identifying different plan shapes for the same normalized query (i.e. queryid) is useful, because it lets you detect use of different plan choices such as which join order or index was used based on different input parameters (or different column statistics due to a recent ANALYZE) for the same normalized query. You can get this individually for a given query with EXPLAIN of course, but if you want to track this over time the only workable mechanism in my experience is auto_explain, which is good for sampling outliers, but bad for getting a comprehensive view of which plans where used and how often. To me the closest to what I consider a "plan shape" is the output of EXPLAIN (COSTS OFF), that is, the plan nodes and their filters/conditions, but discarding the exact costs as well as ignoring any execution statistics. The idea behind the proposed plan ID implementation is trying to match that by hashing plan nodes, similar to how query IDs hash post-parse analysis query nodes. One notable edge case are plans that involve partitions - those could of course lead to a lot of different planids for a given queryid, based on how many partitions were pruned. We could consider special casing this, e.g. by trying to be smart about declarative partitioning, and considering plans to be identical if they scan the same number of partitions with the same scan methods. However this could also be done by an out-of-core extension, either by defining a better planid mechanism, or maintaining a grouped planid of sorts based on the internal planid. The partitions problem reminds me a bit of the IN list problem with pg_stat_statements (which we still haven't resolved) - despite the problem the extension has been successfully used for many years by many Postgres users, even for those workloads where you have thousands of entries for the same query with different IN list lengths. ## Why does this need to be in core? Unfortunately both existing open-source extensions I'm familiar with are not suitable for production use. Out of the two, only pg_store_plans [2] is being maintained, however it carries significant overhead because it calculates the plan ID by hashing the EXPLAIN text output every time a query is executed. My colleague Marko (CCed) and I evaluated whether pg_store_plans could be modified to instead calculate the planid by hashing the plan tree, and ran into three issues: 1. The existing node jumbling in core is not usable by extensions, and it is necessary to have something like it for hashing Filters/Conds (ultimately requiring us to duplicate all of it in the extension, and keep maintaining that for every major release) 2. Whilst its cheap enough, it seems unnecessary to do an additional tree walk when setrefs.c already walks the plan tree in a near-final state 3. It seems useful to enable showing the plan shape of a currently running query (e.g. to identify whether a plan regression causes the query to run forever), and this is much easier to do by adding planid to pg_stat_activity, like the queryid I also suspect that Aurora's implementation in [3] had some in-core modifications to enable it work efficiently, but I'm not familiar with any implementation details beyond what's in the public documentation. ## Implementation notes The attached patch set includes two preparatory patches that could be committed independently if deemed useful: The first patch allows use of node jumbling by other unit files / extensions, which would help an out-of-core extension avoid duplicating all the node jumbling code. The second patch adds a function for the extensible cumulative statistics system to drop all entries for a given statistics kind. This already exists for resetting, but in case of a dynamic list of entries its more useful to be able to drop all of them when "reset" is called. The third patch adds plan ID tracking in core. This is turned off by default, and can be enabled by setting "compute_plan_id" to "on". Plan IDs are shown in pg_stat_activity, as well as EXPLAIN and auto_explain output, to allow matching a given plan ID to a plan text, without requiring the use of an extension. There are some minor TODOs in the plan jumbling logic that I haven't finalized yet. There is also an open question whether we should use the node attribute mechanism instead of custom jumbling logic? The fourth patch adds the pg_stat_plans contrib extension, for illustrative purposes. This is inspired by pg_stat_statements, but intentionally kept separate for easier review and since it does not use an external file and could technically be used independently. We may want to develop this into a unified pg_stat_statements+plans in-core mechanism in the future, but I think that is best kept for a separate discussion. The pg_stat_plans extension utilizes the cumulative statistics system for tracking statistics (extensible thanks to recent changes!), as well as dynamic shared memory to track plan texts up to a given limit (2kB by default). As a side note, managing extra allocations with the new extensible stats is a bit cumbersome - it would be helpful to have a hook for cleaning up data associated to entries (like a DSA allocation). Thanks, Lukas [0]: https://www.postgresql.org/message-id/flat/604E3199-2DD2-47DD-AC47-774A6F97DCA9%40amazon.com [1]: https://github.com/2ndQuadrant/pg_stat_plans [2]: https://ossc-db.github.io/pg_store_plans/ [3]: https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora_stat_plans.html -- Lukas Fittl