Thread

  1. Cleaning up PREPARE query strings?

    Julien Rouhaud <rjuju123@gmail.com> — 2025-12-24T15:43:13Z

    Hi,
    
    Currently prepared statements store the whole query string that was submitted
    by the client at the time of the PREPARE as-is.  This is usually fine, but if
    that query was a multi-statement query string it can lead to a waste of memory.
    
    There are some pattern that are more likely to have such overhead, mine being
    an application with a fixed set of prepared statements that are sent at the
    connection start using a single query to avoid extra round trips.
    
    One naive example of the outcome is as follow:
    
    #= PREPARE s1 AS SELECT 1\; PREPARE s2(text) AS SELECT oid FROM pg_class WHERE
    relname = $1\; PREPARE s3(int, int) AS SELECT $1 + $2;
    PREPARE
    PREPARE
    PREPARE
    
    =# SELECT name, statement FROM pg_prepared_statements ;
     name |                                 statement
    ------+----------------------------------------------------------------------------
     s1   | PREPARE s1 AS SELECT 1; PREPARE s2(text) AS SELECT oid FROM pg_class WHERE+
          | relname = $1; PREPARE s3(int, int) AS SELECT $1 + $2;
     s2   | PREPARE s1 AS SELECT 1; PREPARE s2(text) AS SELECT oid FROM pg_class WHERE+
          | relname = $1; PREPARE s3(int, int) AS SELECT $1 + $2;
     s3   | PREPARE s1 AS SELECT 1; PREPARE s2(text) AS SELECT oid FROM pg_class WHERE+
          | relname = $1; PREPARE s3(int, int) AS SELECT $1 + $2;
    (3 rows)
    
    The more prepared statements you have the bigger the waste.  This is also not
    particularly readable for people who want to rely on the pg_prepared_statements
    views, as you need to parse the query again yourself to figure out what exactly
    is the associated query.
    
    I assume that some other patterns could lead to other kind of problems.  For
    instance if the query string includes a prepared statement and some DML, it
    could lead some automated program to replay both the PREPARE and DML when only
    the PREPARE was intended.
    
    I'm attaching a POC patch to fix that behavior by teaching PREPARE to clean the
    passed query text the same way as pg_stat_statements.  Since it relies on the
    location saved during parsing the overhead should be minimal, and only present
    when some space can actually be saved.  Note that I first tried to have the
    cleanup done in CreateCachedPlan so that it's done everywhere including things
    like the extended protocol but this lead to too many issues so I ended up doing
    it for an explicit PREPARE statement only.
    
    With this patch applied, the above scenario gives this new output:
    
    =# SELECT name, statement FROM pg_prepared_statements ;
     name |                     statement
    ------+----------------------------------------------------
     s1   | PREPARE s1 AS SELECT 1
     s2   | PREPARE s2(text) AS SELECT oid FROM pg_class WHERE+
          | relname = $1
     s3   | PREPARE s3(int, int) AS SELECT $1 + $2
    (3 rows)
    
    One possible issue is that any comment present at the beginning of the query
    text would be discarded.  I'm not sure if that's something used by e.g.
    pg_hint_plan, but if yes it's always possible to put the statement in front of
    the SELECT (or other actual first keyword) rather than the PREPARE itself to
    preserve it.