Thread
-
Cleaning up PREPARE query strings?
Julien Rouhaud <rjuju123@gmail.com> — 2025-12-24T15:43:13Z
Hi, Currently prepared statements store the whole query string that was submitted by the client at the time of the PREPARE as-is. This is usually fine, but if that query was a multi-statement query string it can lead to a waste of memory. There are some pattern that are more likely to have such overhead, mine being an application with a fixed set of prepared statements that are sent at the connection start using a single query to avoid extra round trips. One naive example of the outcome is as follow: #= PREPARE s1 AS SELECT 1\; PREPARE s2(text) AS SELECT oid FROM pg_class WHERE relname = $1\; PREPARE s3(int, int) AS SELECT $1 + $2; PREPARE PREPARE PREPARE =# SELECT name, statement FROM pg_prepared_statements ; name | statement ------+---------------------------------------------------------------------------- s1 | PREPARE s1 AS SELECT 1; PREPARE s2(text) AS SELECT oid FROM pg_class WHERE+ | relname = $1; PREPARE s3(int, int) AS SELECT $1 + $2; s2 | PREPARE s1 AS SELECT 1; PREPARE s2(text) AS SELECT oid FROM pg_class WHERE+ | relname = $1; PREPARE s3(int, int) AS SELECT $1 + $2; s3 | PREPARE s1 AS SELECT 1; PREPARE s2(text) AS SELECT oid FROM pg_class WHERE+ | relname = $1; PREPARE s3(int, int) AS SELECT $1 + $2; (3 rows) The more prepared statements you have the bigger the waste. This is also not particularly readable for people who want to rely on the pg_prepared_statements views, as you need to parse the query again yourself to figure out what exactly is the associated query. I assume that some other patterns could lead to other kind of problems. For instance if the query string includes a prepared statement and some DML, it could lead some automated program to replay both the PREPARE and DML when only the PREPARE was intended. I'm attaching a POC patch to fix that behavior by teaching PREPARE to clean the passed query text the same way as pg_stat_statements. Since it relies on the location saved during parsing the overhead should be minimal, and only present when some space can actually be saved. Note that I first tried to have the cleanup done in CreateCachedPlan so that it's done everywhere including things like the extended protocol but this lead to too many issues so I ended up doing it for an explicit PREPARE statement only. With this patch applied, the above scenario gives this new output: =# SELECT name, statement FROM pg_prepared_statements ; name | statement ------+---------------------------------------------------- s1 | PREPARE s1 AS SELECT 1 s2 | PREPARE s2(text) AS SELECT oid FROM pg_class WHERE+ | relname = $1 s3 | PREPARE s3(int, int) AS SELECT $1 + $2 (3 rows) One possible issue is that any comment present at the beginning of the query text would be discarded. I'm not sure if that's something used by e.g. pg_hint_plan, but if yes it's always possible to put the statement in front of the SELECT (or other actual first keyword) rather than the PREPARE itself to preserve it.