Thread

  1. Re: Refactor query normalization into core query jumbling

    Lukas Fittl <lukas@fittl.com> — 2025-12-30T02:34:18Z

    On Wed, Dec 24, 2025 at 9:33 AM Sami Imseih <samimseih@gmail.com> wrote:
    
    > > that would be pushed across more stacks of the
    > > post-parse-analyze hook.  Doing that would allow us to pull the plug
    > > into making JumbleState and the original copies of clocations a set of
    > > consts when given to extensions.
    >
    > Yes correct. The hook signature will turn into, so all extensions now
    > just have a constant pointer to the jumblestate. They can of course
    > work hard to cast away constants, but at that point, they are doing
    > things the wrong way.
    >
    > ```
    >  typedef void (*post_parse_analyze_hook_type) (ParseState *pstate,
    >
    >         Query *query,
    >
    >         const JumbleState *jstate);
    > ```
    >
    > This of course will be a big breaking change to all the extensions out
    > there using this hook. This is not just about a signature change of
    > the hook, but an author now has to figure out how to deal with a
    > constant JumbleState in their code, which may not be trivial.
    >
    
    I wonder if there is a single extension out there that actually utilizes
    JumbleState in that hook -
    it was added in 5fd9dfa5f50e4906c35133a414ebec5b6d518493 presumably because
    pg_stat_statements
    needed it, but even Julien at the time was critical of adding it, mainly
    considering it for pgss needs if I read
    the archive correctly [0]. CCing Julien to correct me if I misinterpret the
    historic record here.
    
    Reading through the discussion in this thread here I do wonder a bit if we
    shouldn't just take that out of the
    hook again, and instead provide separate functions for getting the
    normalized query string (generating it the
    first time its called).
    
    My proposal in v3 was a bit more softer, and keeps the hook
    > backwards compatible. Basically, we keep JumbleState a non-constant,
    > but provide core APIs for any  operation, such as
    > generate_normalized_query,
    > that needs to modify the state. So, my approach was not about enforcing a
    > read-only JumbleState, but about providing the API to dissuade an author
    > from modifying a JumbleState.
    
    
    Given the lack of public APIs to modify JumbleState today, I don't see how
    an extension would do
    modifications in a meaningful way, short of copying the code. I think we
    should be a bit bolder here in
    enforcing a convention, either clearly making it read-only or dropping the
    argument again.
    
    Thinking through a couple of theoretical use cases:
    
    1) An extension that wants to display normalized query strings
    
    This seems to be the biggest kind of what I can find with code search.
    Extensions like pg_stat_monitor [1], that
    want to do something like pg_stat_statements, and have to copy a bunch of
    normalization code today that is 1:1 what
    Postgres does. Such extensions don't need the JumbleState argument if they
    can get the normalized text directly.
    
    2) An extension that wants to capture parameter values
    
    Some extensions may want to remember additional context for normalized
    queries, like pg_tracing's logic for
    optionally passing parameter values (post normalization) in the trace
    context [2]. If we kept passing a read-only
    JumbleState then such extensions could presumably still get this, but I
    wonder if it wouldn't be better for core to
    have a helper for this?
    
    3) An extension that modifies the JumbleState to cause different
    normalization to happen
    
    I'm not aware of any extensions doing this, and I couldn't find any
    either.  And since such theoretical extensions
    can directly modify query->queryId in the same hook, why not do that?
    
    4) Extensions using the presence of JumbleState to determine whether query
    IDs are available (?)
    
    I noticed pg_hint_plan checks the JumbleState argument to determine whether
    or not to run an early check
    of the hint logic. I'm a little confused by this (and if this is about
    query IDs being available, couldn't it just check the
    Query having a non-zero queryID?), but FWIW: [3]
    
    [0]:
    https://www.postgresql.org/message-id/flat/CAOBaU_ZbeQrUESCuLGk3sRZWxXFGaBBO39CxSsFxLeZGicUrJw%40mail.gmail.com#9a9bc47aa1b4b25ee2e3de1388c4c346
    [1]:
    https://github.com/percona/pg_stat_monitor/blob/main/pg_stat_monitor.c#L476
    [2]:
    https://github.com/DataDog/pg_tracing/blob/main/src/pg_tracing_query_process.c#L428
    [3]:
    https://github.com/ossc-db/pg_hint_plan/blob/master/pg_hint_plan.c#L3111
    
    Thanks,
    Lukas
    
    -- 
    Lukas Fittl