Thread

  1. Re: Row pattern recognition

    jian he <jian.universality@gmail.com> — 2026-05-26T03:20:04Z

    Hi.
    
    Disclaimer: I had some private off-list discussions with Henson Choi
    regarding execRPR.c, and the NFA.
    The following are more formal comments regarding execRPR.c and the NFA
    algorithm, based on v47-0001 to v47-0009, I haven't applied the
    incremental diff yet.
    (Since the incremental diffs are scattered across several threads, a
    unified v48 would be better).
    (I'm still wrapping my head around the NFA in execRPR.c, so take the
    comments below with a grain of salt.)
    
    
    PATTERN (A B C D)
    We can short-circuit and exit early if any of the evaluations (A, B,
    or C) fail in nfa_match.
    This is necessary since the chance of a pattern element evaluation
    returning false is not rare, I think.
    
    
    For src/backend/executor/README.rpr:
    We should explicitly explain 'absorbable' and 'absorption' somewhere in
    README.rpr, as the text currently just assumes the reader knows what they mean.
    Using some example illustrate "absorption" meaning, put it on
    README.rpr would be great.
    We can also mention that 'DFS' refers to Depth-First Search".
    
    ``````
      (4) Call nfa_advance(initialAdvance=true)
    ``````
    In V47, the variable `initialAdvance` does not exist.
    
    In nfa_advance_var, after the first Assert, we can add:
    Assert(elem->next <  pattern->numElements);
    
    ExecRPRFinalizeAllContexts seems unnecessary; I commented it out,
    rerun the regress tests
    (TESTS='test_setup rpr_base rpr_nfa rpr_explain rpr_integration rpr'
    meson test -C $BUILD3 --num-processes 20 --suite regress --verbose)
    Only two SQL tests in rpr_explain.sql failed.
    
    SELECT first_value(id) OVER w AS match_start FROM stock_ticks
    WINDOW w AS ( ORDER BY id ROWS BETWEEN CURRENT ROW AND UNBOUNDED
    FOLLOWING AFTER MATCH SKIP PAST LAST ROW PATTERN ((A B) {2}) DEFINE A
    AS price < 100, B AS price < 100);
    
    The query above invokes the following code. Since the PATTERN above is
    not greedy, is the comment below incorrect?
    ``````
        else
        {
            /* Greedy: enter first, skip second */
            state->elemIdx = elem->next;
            nfa_route_to_elem(winstate, ctx, state,
                              &elements[state->elemIdx], currentPos);
            if (skipState != NULL)
            {
                nfa_route_to_elem(winstate, ctx, skipState,
                                  &elements[elem->jump], currentPos);
            }
        }
    ``````
    
    nfa_advance_var
    ```
        else if (canExit)
        {
            state->counts[depth] = 0;
            state->elemIdx = elem->next;
            nextElem = &elements[state->elemIdx];
            /*
             * Update isAbsorbable for target element (monotonic: AND preserves
             * false)
             */
            state->isAbsorbable = state->isAbsorbable &&
                RPRElemIsAbsorbableBranch(nextElem);
            /* See comment above: increment outer END count for quantified VARs */
            if (RPRElemIsEnd(nextElem))
            {
                if (state->counts[nextElem->depth] < RPR_COUNT_MAX)
                    state->counts[nextElem->depth]++;
            }
            nfa_route_to_elem(winstate, ctx, state, nextElem, currentPos);
        }
    ```
    The above ELSE IF overrides all RPRNFAState field values except
    RPRNFAState->next.
    Should we set RPRNFAState->next to NULL?
    (If I add ``state->next = NULL;`` in the above ELSE IF branch, all the
    regress tests still pass)
    Some comments about the situation of (RPRNFAState) state->next would be great.
    This also happens in `` if (canLoop && canExit)``, ``if (reluctant)`` branch.
    
    For function nfa_advance_var, I don't understand the meaning of the
    variable "count", after the first Assert I have added below:
    
        if (count > 2 && !state->isAbsorbable && RPRElemIsReluctant(elem))
            elog(INFO, "count is %d and elem is reluctant and isAbsorbable
    is false XXXXX", count);
        if (count > 2 && state->isAbsorbable && RPRElemIsReluctant(elem))
            elog(INFO, "count is %d and elem is reluctant and isAbsorbable
    is true XXXXX", count);
    
    Rerunning the regress tests shows that count >= 3 occurs very infrequently.
    I'm don't understand isAbsorbable, so I am not sure if the second
    ``elog(INFO`` is actually reachable or not.
    Can we add more complex queries (more count >= 3) to check if the
    "count" variable is working correctly?
    
    
    In function nfa_add_state_unique:
        /* Mark VAR in visited before duplicate check to prevent DFS loops */
        winstate->nfaVisitedElems[WORDNUM(state->elemIdx)] |=
            ((bitmapword) 1 << BITNUM(state->elemIdx));
    
    I honestly don't understand the purpose of the code block above. But it doesn't
    seem to influence the subsequent FOR LOOP;
    It only modifies its own variables without any other side effects within
    nfa_add_state_unique. Could we add some comments explaining which external
    functions rely on this code and why it belongs in nfa_add_state_unique?
    
    
    nfa_states_equal
    compareDepth = elem->depth + 1; /* depth 0 needs 1 count, etc. */
    The comment above isn't helpful, IMHO, and I don't understand it.
    We should focus on why compareDepth should be ```elem->depth + 1```.
    
    
    function nfa_add_state_unique return value is not being used?
    Do we need to do something with the return value, or is this expected?
    (I don't have an opinion on it, I guess it would be better to raise this issue)
    
    
    In nfa_advance_alt, during the main WHILE loop, I think altElem->depth
    must be larger than elem->depth.
    Therefore we can do
    ``````
            if (altElem->depth == elem->depth)
                elog(ERROR, "nfa_advance_alt altElem->depth should not be
    the same as elem->depth reached");
            if (altElem->depth < elem->depth)
                break;
    ``````
    
    
    
    --
    jian
    https://www.enterprisedb.com/