Thread

  1. Re: plan shape work

    Andrei Lepikhov <lepihov@gmail.com> — 2025-08-27T09:41:00Z

    On 19/5/2025 20:01, Robert Haas wrote:
    > Hope you find this interesting. If you do, let me know what you think.Thanks for looking in this direction!
    
    Since 2017, we designed features that should 'memorise' the experience 
    of previous executions, checking the PlanState tree and instrumentation 
    after execution.
    The first dumb prototype you should know - AQO. The following, more 
    robust code is 'replan' [1]. These two features utilise the core patch, 
    and I hope this patch can be removed after a few adjustments to the core.
    In fact, 'replan' is used to stop execution in the middle (if the 
    time/memory usage limit is achieved), pass through the state, collect 
    valuable data and execute again. That's why using data for the 
    subsequent execution needs a matching query tree, and that data may be 
    irrelevant for a different set of constants.
    
    The lessons learned during design and after some usage in real life:
    1. Subplans should be uniquely identified during the planning. Like you 
    mentioned, two similar subplans on the same query level may be executed 
    differently and provide different row numbers to the 'knowledge base'. I 
    used the subquery_planner hack that generated a unique ID for each subplan.
    2. Each node should be identified - I used a kind of signature at each 
    RelOptInfo node - just a hash generated by restrictinfos and underlying 
    signatures. This approach needs a create_plan hook to copy the signature 
    to the Plan nodes.
    3. A great source of fluctuations is 'never executed' nodes - because 
    depending on the constant, the subtree may never be executed or produce 
    tons of tuples - I resolved it by just ignoring 'never executed' 
    results, no in-core changes needed.
    4. A tree of paths may implement a single RelOptInfo. I have saved the 
    signature at the top of the Plan node for the corresponding RelOptInfo 
    and have never encountered any problems yet. It limits the use of 
    gathered data on cardinality/group number/peak memory consumption 
    somewhat, but not significantly.
    
    I think the node extension fields and hooks, debated in the thread [2], 
    may be enough to enable extensions to implement such a feature.
    
    What's more, I personally prefer to have a hook that allows extension 
    checking of a condition during execution - it may be done inside the 
    ExecProcNode() by calling the node-specific hook, which an extension may 
    initialise in the PlanState structure before the start of execution.
    Additionally, it would be beneficial to have a hook for error processing 
    at the top of the portal execution code - this is a key point for 
    managing query resources. If we need to stop and re-execute the query, 
    it is the only place where we can release all the resources assigned.
    One more helpful thing - an option to postpone receiver sending data out 
    of the instance-side, even result headings. We may want to decide on the 
    query plan after some tuples are produced, and if something has already 
    been sent to the user, we can't just stop and rebuild the plan.
    
    [1] https://postgrespro.com/docs/enterprise/16/realtime-query-replanning
    [2] 
    https://www.postgresql.org/message-id/flat/CA%2BTgmoYxfg90rw13%2BJcYwn4dwSC%2Bagw7o8-A%2BfA3M0fh96pg8w%40mail.gmail.com
    
    -- 
    regards, Andrei Lepikhov