Thread
-
Re: Row pattern recognition
jian he <jian.universality@gmail.com> — 2026-05-30T06:45:25Z
Hi. -- Consecutive VAR merge: A A+ -> a{2,} -- Tests line 251: child->max == RPR_QUANTITY_INF branch in mergeConsecutiveVars -- prev: A{1,1} (finite), child: A+ (infinite) triggers line 251 evaluation EXPLAIN (COSTS OFF) SELECT COUNT(*) OVER w FROM rpr_plan WINDOW w AS (ORDER BY id ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING PATTERN (A A+) DEFINE A AS val > 0); -- Consecutive VAR merge: A A+ -> a{2,} -- Tests line 251: child->max == RPR_QUANTITY_INF branch in mergeConsecutiveVars -- prev: A{1,1} (finite), child: A+ (infinite) triggers line 251 evaluation EXPLAIN (COSTS OFF) SELECT COUNT(*) OVER w FROM rpr_plan WINDOW w AS (ORDER BY id ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING PATTERN (A A+) DEFINE A AS val > 0); Comments like "Tests line 251" should not appear in SQL, since the line number would change. So, we need to rephrase this comment. Please don't delete it, it may clarify the patch for future readers. /* * nfa_add_state_unique * * Add a state to ctx->states at the END, only if no duplicate exists. * Returns true if state was added, false if duplicate found (state is freed). * Earlier states have better lexical order (DFS traversal order), so existing wins. */ static bool nfa_add_state_unique(WindowAggState *winstate, RPRNFAContext *ctx, RPRNFAState *state) The "at the END" confuses me. It may also refers to RPR_VARID_END / RPRElemIsEnd. How about something like: "Add the state to the end of the ctx->states linked list, but only if a duplicate state is not already present." I think a bunch of these uppercase ENDs should just be lowercase "end", capitalizing them may collides visually with RPR_VARID_END and RPRElemIsEnd and makes the comments harder to read. In nfa_match: "Non-VAR elements (ALT, END, FIN) are kept as-is for advance phase." Here END does mean RPRElemIsEnd, right? That one's probably fine as uppercase since it's listed alongside other element kinds. But then: /* * Evaluate VAR elements against current row. For VARs that reach max * count with END next, advance through END chain inline so absorb phase * can compare states at judgment points. */ I'm think END here means the tail of the RPRNFAContext->RPRNFAState linked list, not the element kind -- which is exactly the ambiguity I'm worried about. Could we lowercase the list-end ones and keep uppercase only when actually referring to the END element kind? The comment under nfa_advance_var is confusing (for me). /* After a successful match, count >= 1, so at least one must be true */ nfa_advance_var doesn't actually know anything about match status (i think), it can be reached with count == 0. I added this elog(INFO) to confirm: if (count < 1 && currentPos > -1) elog(INFO, "should not reach, count is %d, currentPos=%ld", count, currentPos); ...and it does fire, so the comment's premise isn't quite right. ------------------------------------------------ please check the attached minor refactoring. 1. Refactor struct WindowAggState: Remove the nfaStateSize and nfaVisitedNWords fields because they are constant, and we can easily compute it, seems not necessary to stay in WindowAggState. 2. Rename nfa_context_alloc() to nfa_context_make(), nfa_state_alloc() to nfa_state_make(), nfa_state_create() to nfa_state_clone(). Rationale: We have makeNode, changing "alloc" to "make" would be more intuitive, IMHO. In tablecmd.c, we have CloneForeignKeyConstraints, which is based on existing information, creating a new node, here we are doing the same in nfa_state_create. 3. Refactor functions: nfa_advance_alt, nfa_advance_begin, nfa_advance_end, and nfa_advance_var. Because the (RPRPatternElement *elem) parameter is unnecessary. -- jian https://www.enterprisedb.com/