Thread

  1. Refactor query normalization into core query jumbling

    Sami Imseih <samimseih@gmail.com> — 2025-12-19T00:44:17Z

    Hi,
    
    A while back, there was a discussion about moving the query
    normalization code out of pg_stat_statements and into core
    query jumbling [0]. This conversation also led to hardening
    the hooks (currently only post_parse_analyze) that receive
    JumbleState [1].
    
    For the first point, `generate_normalized_query` takes the query
    string and the jumbleState with constant locations from core
    query jumbling. There is nothing uniquely special about
    pg_stat_statements in this code, and it can be used by other extensions.
    Extensions could even pass their own query string and JumbleState
    if they choose. Patch 0001 moves this function and its helpers to
    queryjumblefuncs.c.
    
    A Github search also shows that there are quite a few extensions
    that may be copying this code [2] [3], which means they will lose
    out on potential improvements and fixes.
    
    As part of this patch, the moved code itself did not change,
    but I did improve a comment:
    
    ```
    * If query_loc > 0, then "query" has been advanced by that much compared to
    * the original string start, so we need to translate the provided locations
    * to compensate.  (This lets us avoid re-scanning statements before the one
    * of interest, so it's worth doing.)
    ```
    
    This comment appeared at the top of generate_normalized_query and
    fill_in_constant_lengths. I simplified it and moved a shortened
    version inside the functions.
    
    Also, I did not think `fill_in_constant_lengths` should be a global
    function, as I cannot think of a good reason it would be used
    on its own, though someone may have a different opinion there.
    
    For the second point, since JumbleState can be shared by multiple
    extensions, hooks should receive it as a const pointer. This
    signals read-only intent and prevents extensions from
    accidentally modifying it through the hook. This change is in
    0002. This does mean that extensions will need to update their
    hooks, but I do not see that as an issue for a major version.
    
    Thoughts?
    
    [0] https://postgr.es/m/aQA9v9nLu5qsX8IE%40paquier.xyz
    [1] https://postgr.es/m/202510281023.4u5aszccvsct%40alvherre.pgsql
    [2] https://github.com/search?q=fill_in_constant_lengths&type=code
    [3] https://github.com/search?q=generate_normalized_query&type=code
    
    --
    Sami Imseih
    Amazon Web Services (AWS)