Thread

  1. Re: [SQL/PGQ] Early pruning for GRAPH_TABLE path generation

    Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> — 2026-05-14T06:25:51Z

    Hi Junwang,
    Thanks for working on this. The performance improvement is impressive.
    I haven't verified it myself though at this time.
    
    Henson said it right. The first version separately enumeration and
    conversion clearly to keep things simple. Things like variable length
    element patterns, embedded path patterns can change the way we
    enumerate the paths. Whatever the enumeration method is failing early
    is best strategy. However, the question is whether your implementation
    of "failing early" is going to make it difficult implement the above
    mentioned advanced features or simplify it or not affect those at all.
    We need to think and discuss a bit.
    
    Delaying "failing early" implementation till we implement those
    features is safest strategy. If the lack of it makes the feature
    prohibitively useless, we will need to implement it early and deal
    with the difficulties it brings. But the examples so far are mostly
    artificial, not practical. That doesn't make me feel like it's worth
    taking the risk. There are many other bugs that need to be fixed.
    
    I am very glad that this patch demonstrates that "failing early"
    improves things significantly. So we will incorporate this strategy
    sooner or later.
    
    On Thu, May 14, 2026 at 6:08 AM Junwang Zhao <zhjwpku@gmail.com> wrote:
    >
    
    Some cosmetic comments on v2
    
    + if (list_length(graph_path) == 1)
    + return true;
    
    I would move this earlier in the function, to make it more readable.
    
    rev_feasible is clever, but may need more comments.
    
    How do we make sure that the edge direction checks in
    generate_query_for_graph_path and graph_path_edge_is_feasible remain
    consistent? What I had in mind instead was to start constructing Join
    tree while we are creating paths so that edge direction feasibility
    checks also create the edge connection quals avoiding the duplicate
    logic. However, we will need to make sure that any unusable objects we
    create during that process are discarded in time.
    
    It will be better to place CHECK_FOR_INTERRUPTS alongside stack depth
    check. But for it to be added there, we need an evidence that the
    function is really consuming a lot of time. Your earlier measurements
    hint towards that, but they are overall query times. Can you please
    share actual time spent in this function out of the total execution
    time?
    
    If you separate CHECK_FOR_INTERRUPTS change as a separate patch, it
    can be committed independent of the optimization.
    
    -- 
    Best Wishes,
    Ashutosh Bapat