Thread

Re: [SQL/PGQ] Early pruning for GRAPH_TABLE path generation

Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> — 2026-05-14T06:25:51Z
Hi Junwang,
Thanks for working on this. The performance improvement is impressive.
I haven't verified it myself though at this time.

Henson said it right. The first version separately enumeration and
conversion clearly to keep things simple. Things like variable length
element patterns, embedded path patterns can change the way we
enumerate the paths. Whatever the enumeration method is failing early
is best strategy. However, the question is whether your implementation
of "failing early" is going to make it difficult implement the above
mentioned advanced features or simplify it or not affect those at all.
We need to think and discuss a bit.

Delaying "failing early" implementation till we implement those
features is safest strategy. If the lack of it makes the feature
prohibitively useless, we will need to implement it early and deal
with the difficulties it brings. But the examples so far are mostly
artificial, not practical. That doesn't make me feel like it's worth
taking the risk. There are many other bugs that need to be fixed.

I am very glad that this patch demonstrates that "failing early"
improves things significantly. So we will incorporate this strategy
sooner or later.

On Thu, May 14, 2026 at 6:08 AM Junwang Zhao <zhjwpku@gmail.com> wrote:
>

Some cosmetic comments on v2

+ if (list_length(graph_path) == 1)
+ return true;

I would move this earlier in the function, to make it more readable.

rev_feasible is clever, but may need more comments.

How do we make sure that the edge direction checks in
generate_query_for_graph_path and graph_path_edge_is_feasible remain
consistent? What I had in mind instead was to start constructing Join
tree while we are creating paths so that edge direction feasibility
checks also create the edge connection quals avoiding the duplicate
logic. However, we will need to make sure that any unusable objects we
create during that process are discarded in time.

It will be better to place CHECK_FOR_INTERRUPTS alongside stack depth
check. But for it to be added there, we need an evidence that the
function is really consuming a lot of time. Your earlier measurements
hint towards that, but they are overall query times. Can you please
share actual time spent in this function out of the total execution
time?

If you separate CHECK_FOR_INTERRUPTS change as a separate patch, it
can be committed independent of the optimization.

-- 
Best Wishes,
Ashutosh Bapat