Thread
-
Re: Refactor query normalization into core query jumbling
Sami Imseih <samimseih@gmail.com> — 2025-12-19T21:36:16Z
> While this is technically correct so the compiler does not complain (because > clocations is a non const pointer in JumbleState and the added const does not > apply to what clocations points to), I think that adding const here is misleading. Yes, I am not happy with this. I initially thought it would be OK to modify the JumbleState as long as it is done in a core function, but after much thought, this is neither a good idea nor safe. If a pointer is marked as a Constant, we should not modify it. So, I went back to think about this, and the core problem as I see it is that multiple hooks on the chain can modify the constant lengths. For example, pg_stat_statements can now modify the constant lengths in JumbleState via fill_in_constant_lengths, and the same JumbleState can have its constant locations modified in a different way. At this time, constant lengths are the only part of the JumbleState that is not set by core. So, I don't think post_parse_analyze receiving JumbleState as a constant is the right approach, nor do we need it. I think we should allow JumbleState to define a normalization callback, which defaults to a core normalization function rather than an extension specific one: ``` jstate->normalize_query = GenerateNormalizedQuery; ``` This way, any extension that wishes to return a normalized string from the same JumbleState can invoke this callback and get consistent results. pg_stat_statements and other extensions with a need to normalize a query string based on the locations of a JumbleState do not need to care about the internals of normalization, they simply invoke the callback and receive the final string. So v2-0001 implements this callback and moves the normalization logic into core. Both changes must be done in the same patch. The comments are also updated where they are no longer applicable or could be improved. One additional improvement that this patch did not include is a bool in JumbleState that tracks whether a normalized string has already been generated. This way, repeated calls to the callback would not need to regenerate the string; only the first call would perform the work, while subsequent calls could simply return the previously computed normalized string. I do like the simplicity of this approach and it removes pg_stat_statements from having to own the normalization code and how well different extensions sharing the same JumbleState can play together. Not yet sure if this is the correct direction, and I am open to other ideas. -- Sami Imseih Amazon Web Services (AWS)