nocfbot-0030-Add-inline-comments-design-notes.txt

text/plain

Filename: nocfbot-0030-Add-inline-comments-design-notes.txt
Type: text/plain
Part: 29
Message: Re: Row pattern recognition
From 76b60fb538906346144be7430b858ddd471ec3ab Mon Sep 17 00:00:00 2001
From: Henson Choi <assam258@gmail.com>
Date: Tue, 7 Apr 2026 20:49:27 +0900
Subject: [PATCH 30/40] Add inline comments for complex RPR algorithms and
 design notes

Document END chain traversal in nfa_match(), fast-forward paths
in nfa_advance_end(), absorption safety rules with navigation
lookup table, per-context evaluation strategy table, fixed-length
group unrolling rationale, and BEGIN/END pointer layout diagram.
---
 src/backend/executor/execRPR.c   | 97 ++++++++++++++++++++++++++------
 src/backend/optimizer/plan/rpr.c | 41 ++++++++++++--
 2 files changed, 118 insertions(+), 20 deletions(-)

diff --git a/src/backend/executor/execRPR.c b/src/backend/executor/execRPR.c
index baee45ce54e..7ba7b6fb672 100644
--- a/src/backend/executor/execRPR.c
+++ b/src/backend/executor/execRPR.c
@@ -440,7 +440,14 @@
  *   Case 2: GROUP+ with fixed-length children (min == max, recursively)
  *           e.g., (A B)+, (A B{2})+, ((A (B C){2}){2})+
  *           -> ABSORBABLE_BRANCH on all elements within the group,
- *             ABSORBABLE | ABSORBABLE_BRANCH on END
+ *              ABSORBABLE | ABSORBABLE_BRANCH on END
+ *
+ *           Why this is safe: when every child has min == max, the group
+ *           is semantically equivalent to unrolling its body into {1,1}
+ *           elements.  E.g., (A B{2})+ behaves like (A B B)+.  Each
+ *           iteration consumes a fixed number of rows, so an earlier
+ *           context's count always dominates a later one's (monotonicity).
+ *
  *   Case 3: GROUP+ whose body starts with VAR+ (e.g., (A+ B)+)
  *           -> Recurses from BEGIN into the body, applying Case 1.
  *             ABSORBABLE | ABSORBABLE_BRANCH set on A.
@@ -647,6 +654,19 @@
  * variables.  The original nav_match_start and currentpos are saved and
  * restored after re-evaluation.
  *
+ * Summary of evaluation strategy by navigation content:
+ *
+ *   Navigation content               evaluation
+ *   -------------------------------------------------------
+ *   No navigation                    shared (once per row)
+ *   PREV/NEXT only                   shared (once per row)
+ *   LAST (no offset)                 shared (once per row)
+ *   LAST (with offset)               per-context
+ *   FIRST (any)                      per-context
+ *   Compound (inner FIRST)           per-context
+ *   Compound (inner LAST, no off.)   shared (once per row)
+ *   Compound (inner LAST, w/off.)    per-context
+ *
  * VI-5. Tuplestore Mark and Trim (nodeWindowAgg.c)
  *
  * Navigation functions require access to past rows via the tuplestore.
@@ -762,11 +782,26 @@
  *   (b) Unbounded frame (ROWS BETWEEN CURRENT ROW AND UNBOUNDED
  *       FOLLOWING).  Limited frames apply differently to each context,
  *       breaking the monotonicity principle.
- *   (c) No match_start-dependent navigation in DEFINE.  FIRST,
- *       LAST-with-offset, and compound navigation referencing match_start
- *       (PREV_FIRST, NEXT_FIRST, PREV_LAST/NEXT_LAST with offset)
- *       cause different contexts to evaluate to different values for the
- *       same row, breaking monotonicity.
+ *   (c) No match_start-dependent navigation in DEFINE.
+ *
+ *       Mechanism: each context has a different matchStartRow, so FIRST
+ *       resolves to a different row for each context at the same
+ *       currentpos.  An earlier context's DEFINE result no longer
+ *       subsumes a later one's, making count-dominance comparison
+ *       invalid.  Rather than comparing matchStartRow at runtime
+ *       (which would complicate the absorb path), any match_start
+ *       dependency disables absorption entirely.
+ *
+ *       Navigation content              match_start dep.  absorption
+ *       ------------------------------------------------------------
+ *       No navigation                   none              safe
+ *       PREV/NEXT only                  none              safe
+ *       LAST (no offset)                none              safe
+ *       LAST (with offset)              boundary check    unsafe
+ *       FIRST (any)                     direct            unsafe
+ *       Compound (inner FIRST)          direct            unsafe
+ *       Compound (inner LAST, no off.)  none              safe
+ *       Compound (inner LAST, w/off.)   boundary chk      unsafe
  *
  * Runtime conditions (evaluated per context pair):
  *
@@ -2260,7 +2295,13 @@ nfa_absorb_contexts(WindowAggState *winstate)
  * nfa_eval_var_match
  *
  * Evaluate if a VAR element matches the current row.
- * Undefined variables (varId >= defineVariableList length) default to TRUE.
+ *
+ * varMatched is a pre-evaluated boolean array indexed by varId, computed
+ * once per row by evaluating all DEFINE expressions.  NULL means no DEFINE
+ * clauses exist (only possible during early development/testing).
+ *
+ * Per SQL:2016 R020, pattern variables not listed in DEFINE are implicitly
+ * TRUE -- they match every row.  This is checked via varId >= list_length.
  */
 static bool
 nfa_eval_var_match(WindowAggState *winstate, RPRPatternElement *elem,
@@ -2337,9 +2378,20 @@ nfa_match(WindowAggState *winstate, RPRNFAContext *ctx, bool *varMatched)
 
 				/*
 				 * For VAR at max count with END next, advance through END
-				 * chain to reach the absorption judgment point. Only
+				 * chain to reach the absorption judgment point.  Only
 				 * deterministic exits (count >= max, max finite) are handled;
 				 * unbounded VARs stay for advance phase.
+				 *
+				 * In nested patterns like ((A B){2}){3}, a VAR reaching its
+				 * max triggers an exit cascade: inner END increments inner
+				 * group count, which may itself reach max, requiring an exit
+				 * to the next outer END.  The loop below walks this chain.
+				 *
+				 * ABSORBABLE_BRANCH marks elements inside the absorbable
+				 * region; ABSORBABLE marks the outermost judgment point
+				 * where count-dominance is evaluated.  We chain through
+				 * BRANCH elements until reaching the ABSORBABLE point or
+				 * an element that can still loop (count < max).
 				 */
 				if (RPRElemIsAbsorbableBranch(elem) &&
 					!RPRElemIsAbsorbable(elem) &&
@@ -2561,12 +2613,25 @@ nfa_advance_end(WindowAggState *winstate, RPRNFAContext *ctx,
 		RPRPatternElement *jumpElem;
 		RPRNFAState *ffState = NULL;
 
-		/* Snapshot state for ff path before modifying for loop-back */
+		/*
+		 * Two paths are explored in parallel when the group body is
+		 * nullable (RPR_ELEM_EMPTY_LOOP):
+		 *
+		 * 1. Primary path: loop back and attempt real matches in the
+		 *    next iteration (state, modified below).
+		 *
+		 * 2. Fast-forward path: skip directly to after the group,
+		 *    treating all remaining required iterations as empty
+		 *    matches (ffState, handled after the primary path).
+		 *
+		 * The snapshot must be taken BEFORE modifying state for the
+		 * loop-back, since both paths diverge from the same point.
+		 */
 		if (RPRElemCanEmptyLoop(elem))
 			ffState = nfa_state_create(winstate, state->elemIdx,
 									   state->counts, state->isAbsorbable);
 
-		/* Loop back for real matches (primary path) */
+		/* Primary path: loop back for real matches */
 		for (int d = depth + 1; d < pattern->maxDepth; d++)
 			state->counts[d] = 0;
 		state->elemIdx = elem->jump;
@@ -2575,12 +2640,12 @@ nfa_advance_end(WindowAggState *winstate, RPRNFAContext *ctx,
 						  currentPos);
 
 		/*
-		 * Fast-forward fallback for nullable bodies.  E.g. (A?){2,3} when A
-		 * doesn't match: the loop-back produces empty iterations that cycle
-		 * detection would kill.  Instead, exit directly treating all
-		 * remaining required iterations as empty.  Route to elem->next (not
-		 * nfa_advance_end) to avoid creating competing greedy/reluctant loop
-		 * states.
+		 * Fast-forward path for nullable bodies.  E.g. (A?){2,3} when
+		 * A doesn't match: the primary loop-back produces empty
+		 * iterations that cycle detection would kill.  Instead, exit
+		 * directly with count satisfied.  Route to elem->next (not
+		 * nfa_advance_end) to avoid creating competing greedy/reluctant
+		 * loop states.
 		 */
 		if (ffState != NULL)
 		{
diff --git a/src/backend/optimizer/plan/rpr.c b/src/backend/optimizer/plan/rpr.c
index 767a214016c..754fcd53099 100644
--- a/src/backend/optimizer/plan/rpr.c
+++ b/src/backend/optimizer/plan/rpr.c
@@ -478,6 +478,19 @@ mergeConsecutiveAlts(List *children)
  * mergeGroupPrefixSuffix
  *		Merge sequence prefix/suffix into GROUP with matching children.
  *
+ * When a GROUP's children appear as a prefix before and/or suffix after
+ * the GROUP in a SEQ, absorb them by incrementing the GROUP's quantifier.
+ * This runs iteratively: A B A B (A B)+ A B -> (A B){5,}.
+ *
+ * Algorithm:
+ *   For each GROUP encountered in the sequence:
+ *   1. PREFIX phase: compare the last N elements already in the result
+ *      list against the GROUP's children.  On match, remove them from
+ *      result and increment the GROUP's min/max.  Repeat until no match.
+ *   2. SUFFIX phase: compare the next N elements in the input against
+ *      the GROUP's children.  On match, skip them (via skipUntil) and
+ *      increment min/max.  Repeat until no match.
+ *
  * Examples:
  *   A B (A B)+ -> (A B){2,}
  *   (A B)+ A B -> (A B){2,}
@@ -813,8 +826,16 @@ tryMultiplyQuantifiers(RPRPatternNode *pattern)
 	}
 
 	/*
-	 * Case 2/3: Safe when child is finite AND (outer is exact OR child is
-	 * {1,1})
+	 * Case 2: Outer exact (min == max): (A{2,3}){4} -> A{8,12}.
+	 *         Safe because every iteration produces the same range.
+	 *
+	 * Case 3: Child {1,1}: (A){2,5} -> A{2,5}.
+	 *         Safe because the child contributes exactly one per
+	 *         iteration, so the outer range maps directly.
+	 *
+	 * Unsafe example: (A{2}){2,3} produces counts 4 or 6 only, not
+	 * the full range 4..6, so we cannot flatten when child has a
+	 * non-trivial range AND outer is also a range.
 	 */
 	if (child->max != RPR_QUANTITY_INF &&
 		(pattern->min == pattern->max ||
@@ -824,6 +845,7 @@ tryMultiplyQuantifiers(RPRPatternNode *pattern)
 		if (new_min_64 >= RPR_QUANTITY_INF)
 			return pattern;
 
+		/* Outer unbounded: result is unbounded regardless of child */
 		if (pattern->max == RPR_QUANTITY_INF)
 			new_max_64 = RPR_QUANTITY_INF;
 		else
@@ -1186,8 +1208,19 @@ fillRPRPatternVar(RPRPatternNode *node, RPRPattern *pat, int *idx, RPRDepth dept
  * fillRPRPatternGroup
  *		Fill a GROUP pattern and its children.
  *
- * Creates elements for group content at increased depth, plus an END marker
- * if the group has a non-trivial quantifier.
+ * Creates elements for group content at increased depth, plus BEGIN/END
+ * marker pair if the group has a non-trivial quantifier (not {1,1}).
+ *
+ * Element layout for (A B){2,3}:
+ *
+ *   [BEGIN]  [A]  [B]  [END]  [next element...]
+ *     |                  |          ^
+ *     |                  +-- jump --+ (loop back to first child)
+ *     +---- jump -------------------+ (skip to after END)
+ *
+ * BEGIN.jump points past END (skip path when count >= max or min == 0).
+ * END.jump points to the first child (loop-back path).
+ * BEGIN.next and END.next are set later by finalizeRPRPattern().
  *
  * Returns true if this group is nullable.  A group is nullable when its
  * min is 0 (can be skipped entirely) or its body is nullable (every path
-- 
2.50.1 (Apple Git-155)