Thread

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Mark JumbleState as a const in the post_parse_analyze hook

  2. Move pg_stat_statements query jumbling to core.

  1. Refactor query normalization into core query jumbling

    Sami Imseih <samimseih@gmail.com> — 2025-12-19T00:44:17Z

    Hi,
    
    A while back, there was a discussion about moving the query
    normalization code out of pg_stat_statements and into core
    query jumbling [0]. This conversation also led to hardening
    the hooks (currently only post_parse_analyze) that receive
    JumbleState [1].
    
    For the first point, `generate_normalized_query` takes the query
    string and the jumbleState with constant locations from core
    query jumbling. There is nothing uniquely special about
    pg_stat_statements in this code, and it can be used by other extensions.
    Extensions could even pass their own query string and JumbleState
    if they choose. Patch 0001 moves this function and its helpers to
    queryjumblefuncs.c.
    
    A Github search also shows that there are quite a few extensions
    that may be copying this code [2] [3], which means they will lose
    out on potential improvements and fixes.
    
    As part of this patch, the moved code itself did not change,
    but I did improve a comment:
    
    ```
    * If query_loc > 0, then "query" has been advanced by that much compared to
    * the original string start, so we need to translate the provided locations
    * to compensate.  (This lets us avoid re-scanning statements before the one
    * of interest, so it's worth doing.)
    ```
    
    This comment appeared at the top of generate_normalized_query and
    fill_in_constant_lengths. I simplified it and moved a shortened
    version inside the functions.
    
    Also, I did not think `fill_in_constant_lengths` should be a global
    function, as I cannot think of a good reason it would be used
    on its own, though someone may have a different opinion there.
    
    For the second point, since JumbleState can be shared by multiple
    extensions, hooks should receive it as a const pointer. This
    signals read-only intent and prevents extensions from
    accidentally modifying it through the hook. This change is in
    0002. This does mean that extensions will need to update their
    hooks, but I do not see that as an issue for a major version.
    
    Thoughts?
    
    [0] https://postgr.es/m/aQA9v9nLu5qsX8IE%40paquier.xyz
    [1] https://postgr.es/m/202510281023.4u5aszccvsct%40alvherre.pgsql
    [2] https://github.com/search?q=fill_in_constant_lengths&type=code
    [3] https://github.com/search?q=generate_normalized_query&type=code
    
    --
    Sami Imseih
    Amazon Web Services (AWS)
    
  2. Re: Refactor query normalization into core query jumbling

    Bertrand Drouvot <bertranddrouvot.pg@gmail.com> — 2025-12-19T06:49:29Z

    Hi,
    
    On Thu, Dec 18, 2025 at 06:44:17PM -0600, Sami Imseih wrote:
    > For the second point, since JumbleState can be shared by multiple
    > extensions, hooks should receive it as a const pointer. This
    > signals read-only intent and prevents extensions from
    > accidentally modifying it through the hook. This change is in
    > 0002. This does mean that extensions will need to update their
    > hooks, but I do not see that as an issue for a major version.
    
    I can see that 0001 is doing:
    
    -static void fill_in_constant_lengths(JumbleState *jstate, const char *query,
    
    and then:
    
    +static void
    +fill_in_constant_lengths(const JumbleState *jstate, const char *query,
    
    Should the const addition be in 0002? But...
    
    While looking at fill_in_constant_lengths(), I can see that it is doing:
    
    locs = jstate->clocations;
    
    and then things like:
    
    locs[i].length = -1
    
    or
    
    locs[i].length = strlen(yyextra.scanbuf + loc)
    
    While this is technically correct so the compiler does not complain (because
    clocations is a non const pointer in JumbleState and the added const does not
    apply to what clocations points to), I think that adding const here is misleading.
    
    Indeed, the function clearly modifies data accessible through the parameter.
    
    Thoughts?
    
    Regards,
    
    -- 
    Bertrand Drouvot
    PostgreSQL Contributors Team
    RDS Open Source Databases
    Amazon Web Services: https://aws.amazon.com
    
    
    
    
  3. Re: Refactor query normalization into core query jumbling

    Sami Imseih <samimseih@gmail.com> — 2025-12-19T21:36:16Z

    > While this is technically correct so the compiler does not complain (because
    > clocations is a non const pointer in JumbleState and the added const does not
    > apply to what clocations points to), I think that adding const here is misleading.
    
    Yes, I am not happy with this. I initially thought it would be OK to
    modify the JumbleState as long as it is done in a core function, but
    after much thought, this is neither a good idea nor safe. If a pointer
    is marked as a Constant, we should not modify it.
    
    So, I went back to think about this, and the core problem as I see it
    is that multiple hooks on the chain can modify the constant lengths. For
    example, pg_stat_statements can now modify the constant lengths in
    JumbleState via fill_in_constant_lengths, and the same JumbleState can
    have its constant locations modified in a different way.
    
    At this time, constant lengths are the only part of the JumbleState that
    is not set by core. So, I don't think post_parse_analyze receiving
    JumbleState as a constant is the right approach, nor do we need it.
    
    I think we should allow JumbleState to define a normalization callback,
    which defaults to a core normalization function rather than an
    extension specific one:
    
    ```
        jstate->normalize_query = GenerateNormalizedQuery;
    ```
    
    This way, any extension that wishes to return a normalized string from
    the same JumbleState can invoke this callback and get consistent results.
    pg_stat_statements and other extensions with a need to normalize a query
    string based on the locations of a JumbleState do not need to care about the
    internals of normalization, they simply invoke the callback and
    receive the final
    string.
    
    So v2-0001 implements this callback and moves the normalization logic
    into core. Both changes must be done in the same patch.  The comments
    are also updated where they are no longer applicable or could be improved.
    
    One additional improvement that this patch did not include is a bool in
    JumbleState that tracks whether a normalized string has already been
    generated. This way, repeated calls to the callback would not need to
    regenerate the string; only the first call would perform the work,
    while subsequent calls could simply return the previously computed
    normalized string.
    
    I do like the simplicity of this approach and it removes pg_stat_statements
    from having to own the normalization code and how well different extensions
    sharing the same JumbleState can play together. Not yet sure if this is the
    correct direction, and I am open to other ideas.
    
    --
    Sami Imseih
    Amazon Web Services (AWS)
    
  4. Re: Refactor query normalization into core query jumbling

    Michael Paquier <michael@paquier.xyz> — 2025-12-19T23:22:52Z

    On Fri, Dec 19, 2025 at 03:36:16PM -0600, Sami Imseih wrote:
    > This way, any extension that wishes to return a normalized string from
    > the same JumbleState can invoke this callback and get consistent results.
    > pg_stat_statements and other extensions with a need to normalize a query
    > string based on the locations of a JumbleState do not need to care about the
    > internals of normalization, they simply invoke the callback and
    > receive the final
    > string.
    
    Hmm.  I did not wrap completely my head with your problem, but,
    assuming that what you are proposing goes in the right direction, I am 
    wondering if we should not expose a bit more the jumble query APIs so
    as the normal default callback can be reused by out-of-core rather
    than hide it entirely.  This would mean exposing
    GenerateNormalizedQuery(), which also giving a way for callers of
    JumbleQuery() to pass down a custom callback?  This would imply
    thinking harder about the initialization state we expect in the
    structure, but I think that we should try to design things so as
    extensions do not need to copy-paste more code from the core tree at
    the end, just less of it.
    
    Of course, this sentence is written with the same line of thoughts as
    previously mentioned in the other thread we have discussed: extensions
    should not be allowed to update a JumbleState after it's been set by
    the backend code, so as once the same JumbleState pointer is passed
    down across multiple extensions they don't get confused.  If an
    extension wants to use their own policy within the JumbleState, they
    had better recreate a new independent one if they are unhappy about
    has been generated previously.
    --
    Michael
    
  5. Re: Refactor query normalization into core query jumbling

    Sami Imseih <samimseih@gmail.com> — 2025-12-22T23:40:10Z

    > > This way, any extension that wishes to return a normalized string from
    > > the same JumbleState can invoke this callback and get consistent results.
    > > pg_stat_statements and other extensions with a need to normalize a query
    > > string based on the locations of a JumbleState do not need to care about the
    > > internals of normalization, they simply invoke the callback and
    > > receive the final
    > > string.
    >
    > Hmm.  I did not wrap completely my head with your problem, but,
    > assuming that what you are proposing goes in the right direction,
    
    The first goal is to move all query-normalization-related infrastructure
    that pg_stat_statements (and other extensions) rely on into core, so
    extensions no longer need to copy or reimplement normalization logic and
    can all depend on a single, shared implementation.
    
    In addition, query normalization necessarily modifies JumbleState (to
    record constant locations and lengths). This responsibility should not
    fall to extensions and should instead be delegated to core. I will argue
    that the current design, in which extensions handle this directly, is a
    layering violation.
    
    As a first step, we can move generate_normalized_query to core as a global
    function, allowing extensions to simply call it.
    
    > I am wondering if we should not expose a bit more the jumble query APIs so
    > as the normal default callback can be reused by out-of-core rather
    > than hide it entirely.  This would mean exposing
    > GenerateNormalizedQuery(), which also giving a way for callers of
    > JumbleQuery() to pass down a custom callback?  This would imply
    > thinking harder about the initialization state we expect in the
    > structure, but I think that we should try to design things so as
    > extensions do not need to copy-paste more code from the core tree at
    > the end, just less of it.
    
    ... and this will be taking the next step which is providing callbacks
    and making
    more jumbling utilities global. This will require more discussion, but I
    would think we would expose InitJumble() and it will do the bare minimum
    to initialize a JumbleState, and some fields that can define callbacks after
    the fact. There will be a callback for a normalization function and a
    callback function that will allow the user to implement jumbling functions
    for nodes that are currently not included in queryjumblefuncs.switch.c, or
    perhaps they can override the existing logic in this generated file.
    
    > Of course, this sentence is written with the same line of thoughts as
    > previously mentioned in the other thread we have discussed: extensions
    > should not be allowed to update a JumbleState after it's been set by
    > the backend code, so as once the same JumbleState pointer is passed
    > down across multiple extensions they don't get confused.  If an
    > extension wants to use their own policy within the JumbleState, they
    > had better recreate a new independent one if they are unhappy about
    > has been generated previously.
    
    Yes, correct. If we provide the interface to create an additional JumbleState,
    they can create an independent state.
    
    For this thread, I would like to focus on the first goal.
    
    What do you think?
    
    --
    Sami Imseih
    Amazon Web Services (AWS)
    
    
    
    
  6. Re: Refactor query normalization into core query jumbling

    zengman <zengman@halodbtech.com> — 2025-12-23T03:39:14Z

    Hi,
    
    Though this may be tangential to the current topic, I have long been wanting to revise the two instances of `Assert(len_to_wrt >= 0);`
    in the code to the implementation below. Would you kindly advise if this modification is worthwhile?
    
    ```
    	if (len_to_wrt > 0)
    	{
    		memcpy(norm_query + n_quer_loc, query + quer_loc, len_to_wrt);
    		n_quer_loc += len_to_wrt;
    	}
    ```
    
    --
    Regards,
    Man Zeng
    www.openhalo.org
  7. Re: Refactor query normalization into core query jumbling

    Sami Imseih <samimseih@gmail.com> — 2025-12-23T16:35:18Z

    > Though this may be tangential to the current topic, I have long been wanting to revise the two instances of `Assert(len_to_wrt >= 0);`
    > in the code to the implementation below. Would you kindly advise if this modification is worthwhile?
    >
    > ```
    >         if (len_to_wrt > 0)
    >         {
    >                 memcpy(norm_query + n_quer_loc, query + quer_loc, len_to_wrt);
    >                 n_quer_loc += len_to_wrt;
    >         }
    > ```
    
    I am not sure I like this idea. The len_to_wrt being less than 0
    indicates a bug which will be hidden behind such a loop. I
    think it's better to keep the Assert as-is.
    
    > > > This way, any extension that wishes to return a normalized string from
    > > > the same JumbleState can invoke this callback and get consistent results.
    > > > pg_stat_statements and other extensions with a need to normalize a query
    > > > string based on the locations of a JumbleState do not need to care about the
    > > > internals of normalization, they simply invoke the callback and
    > > > receive the final
    > > > string.
    > >
    > > Hmm.  I did not wrap completely my head with your problem, but,
    > > assuming that what you are proposing goes in the right direction,
    >
    > The first goal is to move all query-normalization-related infrastructure
    > that pg_stat_statements (and other extensions) rely on into core, so
    > extensions no longer need to copy or reimplement normalization logic and
    > can all depend on a single, shared implementation.
    >
    > In addition, query normalization necessarily modifies JumbleState (to
    > record constant locations and lengths). This responsibility should not
    > fall to extensions and should instead be delegated to core. I will argue
    > that the current design, in which extensions handle this directly, is a
    > layering violation.
    >
    > As a first step, we can move generate_normalized_query to core as a global
    > function, allowing extensions to simply call it.
    
    v3 implements this approach without a callback. This establishes a clear
    boundary: core owns JumbleState modifications, extensions consume the
    results through the API.
    
    I kept the post_parse_analyze hook signature unchanged since
    GenerateNormalizedQuery still needs to modify JumbleState
    (to populate constant lengths). Making it const would be misleading.
    
    For the second part of making more jumbling utilities global, this can
    be taken up in another thread. I am now thinking we would be better
    off actually taking all the generic jumbling routines and separating
    them into their own C file. So we can have a new file like primjumble.c
    which will have all the "primitive" jumbling utilities, and queryjumblefuncs.c
    can worry about its core business of jumbling a Query tree. Anyhow,
    these are just some high level thoughts on this part for now.
    
    --
    Sami Imseih
    Amazon Web Services (AWS)
    
  8. Re: Refactor query normalization into core query jumbling

    zengman <zengman@halodbtech.com> — 2025-12-24T02:50:16Z

    > I am not sure I like this idea. The len_to_wrt being less than 0
    > indicates a bug which will be hidden behind such a loop. I
    > think it's better to keep the Assert as-is.
    
    Well, my main concern is that assertions are unlikely to be enabled in production environments. 
    Also, the new patch looks really good. Besides, it seems you accidentally forgot to cc Michael.
    
    --
    Regards,
    Man Zeng
    www.openhalo.org
  9. Re: Refactor query normalization into core query jumbling

    Michael Paquier <michael@paquier.xyz> — 2025-12-24T02:57:45Z

    On Wed, Dec 24, 2025 at 10:50:16AM +0800, zengman wrote:
    > Besides, it seems you accidentally forgot to cc Michael.
    
    This is not a problem.  I still get the messages by being registered
    to this list :D 
    --
    Michael
    
  10. Re: Refactor query normalization into core query jumbling

    Michael Paquier <michael@paquier.xyz> — 2025-12-24T04:06:59Z

    On Tue, Dec 23, 2025 at 10:35:18AM -0600, Sami Imseih wrote:
    > > Though this may be tangential to the current topic, I have long been wanting to revise the two instances of `Assert(len_to_wrt >= 0);`
    > > in the code to the implementation below. Would you kindly advise if this modification is worthwhile?
    > >
    > > ```
    > >         if (len_to_wrt > 0)
    > >         {
    > >                 memcpy(norm_query + n_quer_loc, query + quer_loc, len_to_wrt);
    > >                 n_quer_loc += len_to_wrt;
    > >         }
    > > ```
    > 
    > I am not sure I like this idea. The len_to_wrt being less than 0
    > indicates a bug which will be hidden behind such a loop. I
    > think it's better to keep the Assert as-is.
    
    This set of asserts are critical to keep.  An incorrect computation is
    a synonym of an out-of-bound write in this case, which would classify
    any problem as something that could be worth a CVE.
    
    > v3 implements this approach without a callback. This establishes a clear
    > boundary: core owns JumbleState modifications, extensions consume the
    > results through the API.
    > 
    > I kept the post_parse_analyze hook signature unchanged since
    > GenerateNormalizedQuery still needs to modify JumbleState
    > (to populate constant lengths). Making it const would be misleading.
    > 
    > For the second part of making more jumbling utilities global, this can
    > be taken up in another thread. I am now thinking we would be better
    > off actually taking all the generic jumbling routines and separating
    > them into their own C file. So we can have a new file like primjumble.c
    > which will have all the "primitive" jumbling utilities, and queryjumblefuncs.c
    > can worry about its core business of jumbling a Query tree. Anyhow,
    > these are just some high level thoughts on this part for now.
    
    Hmm.  Moving from pgss to the core backup the code that updates the
    clocations of an existing JumbleState does not solve the issue that
    we should not modify the internals of the JumbleState computed.  So
    this does not move the ball at all.
    
    The updates of clocations depend on GenerateNormalizedQuery(), which
    depends on the pre-work done in CleanQuerytext() to get a query string
    and its location for a multi-query string case.  If we really want to
    make a JumbleState not touchable at all, we could either push more
    work of CleanQuerytext() and GenerateNormalizedQuery() into core.
    
    Or  a second thing that we can do, which would be actually much
    simpler, is to rework entirely fill_in_constant_lengths() so as we
    generate a *copy* of the location data PGSS wants rather than force it
    into a JumbleState that would be pushed across more stacks of the
    post-parse-analyze hook.  Doing that would allow us to pull the plug
    into making JumbleState and the original copies of clocations a set of
    consts when given to extensions.
    --
    Michael
    
  11. Re: Refactor query normalization into core query jumbling

    Sami Imseih <samimseih@gmail.com> — 2025-12-24T17:32:45Z

    > > For the second part of making more jumbling utilities global, this can
    > > be taken up in another thread. I am now thinking we would be better
    > > off actually taking all the generic jumbling routines and separating
    > > them into their own C file. So we can have a new file like primjumble.c
    > > which will have all the "primitive" jumbling utilities, and queryjumblefuncs.c
    > > can worry about its core business of jumbling a Query tree. Anyhow,
    > > these are just some high level thoughts on this part for now.
    >
    > Hmm.  Moving from pgss to the core backup the code that updates the
    > clocations of an existing JumbleState does not solve the issue that
    > we should not modify the internals of the JumbleState computed.  So
    > this does not move the ball at all.
    
    This is for the point "I am wondering if we should not expose a bit more
    the jumble query APIs" [0], which as I mention is a separate topic and is
    not about extensions being able to modify JumbleState.
    
    > The updates of clocations depend on GenerateNormalizedQuery(), which
    > depends on the pre-work done in CleanQuerytext() to get a query string
    > and its location for a multi-query string case.  If we really want to
    > make a JumbleState not touchable at all, we could either push more
    > work of CleanQuerytext() and GenerateNormalizedQuery() into core.
    
    When you say “push more work into core,” I understand that to mean this work
    should occur during jumbling. If so, there are several complications.
    
    1/ CleanQueryText() requires access to the query text, which we do not have
    during core jumbling as of 2ecbb0a4935.
    
    2/ GenerateNormalizedQuery() should be invoked on demand by the extension
    that needs the normalized string, for example when pg_stat_statements needs
    to store a query string.
    
    Both operations are expensive, especially GenerateNormalizedQuery(). Doing
    this work on every JumbleQuery() would introduce unnecessary overhead.
    
    > Or  a second thing that we can do, which would be actually much
    > simpler, is to rework entirely fill_in_constant_lengths() so as we
    > generate a *copy* of the location data PGSS wants rather than force it
    > into a JumbleState
    
    yes, that's an idea that did cross my mind. I have hesitation of copying
    clocations, but that may not be such a big deal, since it will only be
    occurring on demand when the extension requests a normalized string.
    
    > that would be pushed across more stacks of the
    > post-parse-analyze hook.  Doing that would allow us to pull the plug
    > into making JumbleState and the original copies of clocations a set of
    > consts when given to extensions.
    
    Yes correct. The hook signature will turn into, so all extensions now
    just have a constant pointer to the jumblestate. They can of course
    work hard to cast away constants, but at that point, they are doing
    things the wrong way.
    
    ```
     typedef void (*post_parse_analyze_hook_type) (ParseState *pstate,
    
            Query *query,
    
            const JumbleState *jstate);
    ```
    
    This of course will be a big breaking change to all the extensions out
    there using this hook. This is not just about a signature change of
    the hook, but an author now has to figure out how to deal with a
    constant JumbleState in their code, which may not be trivial.
    
    My proposal in v3 was a bit more softer, and keeps the hook
    backwards compatible. Basically, we keep JumbleState a non-constant,
    but provide core APIs for any  operation, such as generate_normalized_query,
    that needs to modify the state. So, my approach was not about enforcing a
    read-only JumbleState, but about providing the API to dissuade an author
    from modifying a JumbleState.
    If we need a stronger API contract, then we can go with the hook receving
    JumbleState as a constant and implement the copy of clocations to return
    back to extensions that need to normalize a query. We also have to keep
    in mind that if there are future requirements where the state has to be
    modified by an extension, we will need to implement more copy functions
    for those field.
    
    [0] https://www.postgresql.org/message-id/aUXeTEpSymo6n7lF%40paquier.xyz
    
    --
    Sami Imseih
    Amazon Web Services (AWS)
    
    
    
    
  12. Re: Refactor query normalization into core query jumbling

    Lukas Fittl <lukas@fittl.com> — 2025-12-30T02:34:18Z

    On Wed, Dec 24, 2025 at 9:33 AM Sami Imseih <samimseih@gmail.com> wrote:
    
    > > that would be pushed across more stacks of the
    > > post-parse-analyze hook.  Doing that would allow us to pull the plug
    > > into making JumbleState and the original copies of clocations a set of
    > > consts when given to extensions.
    >
    > Yes correct. The hook signature will turn into, so all extensions now
    > just have a constant pointer to the jumblestate. They can of course
    > work hard to cast away constants, but at that point, they are doing
    > things the wrong way.
    >
    > ```
    >  typedef void (*post_parse_analyze_hook_type) (ParseState *pstate,
    >
    >         Query *query,
    >
    >         const JumbleState *jstate);
    > ```
    >
    > This of course will be a big breaking change to all the extensions out
    > there using this hook. This is not just about a signature change of
    > the hook, but an author now has to figure out how to deal with a
    > constant JumbleState in their code, which may not be trivial.
    >
    
    I wonder if there is a single extension out there that actually utilizes
    JumbleState in that hook -
    it was added in 5fd9dfa5f50e4906c35133a414ebec5b6d518493 presumably because
    pg_stat_statements
    needed it, but even Julien at the time was critical of adding it, mainly
    considering it for pgss needs if I read
    the archive correctly [0]. CCing Julien to correct me if I misinterpret the
    historic record here.
    
    Reading through the discussion in this thread here I do wonder a bit if we
    shouldn't just take that out of the
    hook again, and instead provide separate functions for getting the
    normalized query string (generating it the
    first time its called).
    
    My proposal in v3 was a bit more softer, and keeps the hook
    > backwards compatible. Basically, we keep JumbleState a non-constant,
    > but provide core APIs for any  operation, such as
    > generate_normalized_query,
    > that needs to modify the state. So, my approach was not about enforcing a
    > read-only JumbleState, but about providing the API to dissuade an author
    > from modifying a JumbleState.
    
    
    Given the lack of public APIs to modify JumbleState today, I don't see how
    an extension would do
    modifications in a meaningful way, short of copying the code. I think we
    should be a bit bolder here in
    enforcing a convention, either clearly making it read-only or dropping the
    argument again.
    
    Thinking through a couple of theoretical use cases:
    
    1) An extension that wants to display normalized query strings
    
    This seems to be the biggest kind of what I can find with code search.
    Extensions like pg_stat_monitor [1], that
    want to do something like pg_stat_statements, and have to copy a bunch of
    normalization code today that is 1:1 what
    Postgres does. Such extensions don't need the JumbleState argument if they
    can get the normalized text directly.
    
    2) An extension that wants to capture parameter values
    
    Some extensions may want to remember additional context for normalized
    queries, like pg_tracing's logic for
    optionally passing parameter values (post normalization) in the trace
    context [2]. If we kept passing a read-only
    JumbleState then such extensions could presumably still get this, but I
    wonder if it wouldn't be better for core to
    have a helper for this?
    
    3) An extension that modifies the JumbleState to cause different
    normalization to happen
    
    I'm not aware of any extensions doing this, and I couldn't find any
    either.  And since such theoretical extensions
    can directly modify query->queryId in the same hook, why not do that?
    
    4) Extensions using the presence of JumbleState to determine whether query
    IDs are available (?)
    
    I noticed pg_hint_plan checks the JumbleState argument to determine whether
    or not to run an early check
    of the hint logic. I'm a little confused by this (and if this is about
    query IDs being available, couldn't it just check the
    Query having a non-zero queryID?), but FWIW: [3]
    
    [0]:
    https://www.postgresql.org/message-id/flat/CAOBaU_ZbeQrUESCuLGk3sRZWxXFGaBBO39CxSsFxLeZGicUrJw%40mail.gmail.com#9a9bc47aa1b4b25ee2e3de1388c4c346
    [1]:
    https://github.com/percona/pg_stat_monitor/blob/main/pg_stat_monitor.c#L476
    [2]:
    https://github.com/DataDog/pg_tracing/blob/main/src/pg_tracing_query_process.c#L428
    [3]:
    https://github.com/ossc-db/pg_hint_plan/blob/master/pg_hint_plan.c#L3111
    
    Thanks,
    Lukas
    
    -- 
    Lukas Fittl
    
  13. Re: Refactor query normalization into core query jumbling

    Michael Paquier <michael@paquier.xyz> — 2025-12-30T04:31:30Z

    On Mon, Dec 29, 2025 at 06:34:18PM -0800, Lukas Fittl wrote:
    > I noticed pg_hint_plan checks the JumbleState argument to determine whether
    > or not to run an early check of the hint logic. I'm a little
    > confused by this (and if this is about query IDs being available,
    > couldn't it just check the  Query having a non-zero queryID?), but
    > FWIW: [3]
    
    See also the code before 32ced2e13e75.  We wanted a JumbleState in
    this case to be able to generate a normalized query for the hint
    table.  Since this commit we only rely on the query ID in the hint
    table, so it would not matter for pg_hint_plan if the JumbleState is
    removed from this portions of the hooks.
    
    Saying that, yes I think that you are right, we should be able to
    remove this dependency on JumbleState in
    pg_hint_plan_post_parse_analyze() and rely on the query ID.  I'm
    writing down a note about cleaning that on the HEAD branch of the
    module..
    --
    Michael
    
  14. Re: Refactor query normalization into core query jumbling

    Bertrand Drouvot <bertranddrouvot.pg@gmail.com> — 2025-12-30T08:13:32Z

    Hi,
    
    On Tue, Dec 23, 2025 at 10:35:18AM -0600, Sami Imseih wrote:
    > v3 implements this approach without a callback. This establishes a clear
    > boundary: core owns JumbleState modifications, extensions consume the
    > results through the API.
    > 
    
    Thanks for the new patch version.
    
    Some random comments:
    
    === 1
    
    +       SetConstantLengths((JumbleState *) jstate, query, query_loc);
    
    This cast seems unnecessary.
    
    === 2
    
    +CompLocation(const void *a, const void *b)
    
    In the commit message I can see "Functions are renamed to match core naming
    conventions" but wasn't comp_location() better? 
    
    === 3
    
    +                       /*
    +                        * generate the normalized query. Note that the normalized
    +                        * representation may well vary depending on just which
    +                        * "equivalent" query is used to create the hashtable entry. We
    +                        * assume this is OK.
    +                        */
    +                       norm_query = GenerateNormalizedQuery(jstate, query,
    
    Should part of this comment be on top of the GenerateNormalizedQuery()
    definition instead?
    
    Regards,
    
    -- 
    Bertrand Drouvot
    PostgreSQL Contributors Team
    RDS Open Source Databases
    Amazon Web Services: https://aws.amazon.com