Thread

  1. Re: Row pattern recognition

    jian he <jian.universality@gmail.com> — 2026-05-28T13:23:59Z

    Hi.
    
    In src/include/nodes/execnodes.h, we're adding quite a few fields to
    WindowAggState that are only used for RPR queries.
    Should we consolidate these fields behind a single pointer (named
    RPRContext) to keep the WindowAggState size smaller for non-RPR
    queries?
    
    In function get_reduced_frame_status, I made the following changes:
    ``````
    bool conditionA;
    bool conditionB;
    bool conditionC;
    bool conditionD;
    if (!winstate->rpr_match_valid)
        return RF_NOT_DETERMINED;
    conditionA = (pos == start && winstate->rpr_match_matched && length == 0);
    conditionB = (pos < start || pos >= start + length);
    conditionC = !winstate->rpr_match_matched;
    conditionD = (pos == start);
    if ((conditionA && conditionB) || (conditionA && conditionC) ||
        (conditionA && conditionD))
    {
        if (conditionA)
            elog(INFO, "conditionA is true");
        if (conditionB)
            elog(INFO, "conditionB is true");
        if (conditionC)
            elog(INFO, "conditionC is true");
        if (conditionD)
            elog(INFO, "conditionD is true");
    }
    if ((conditionB && conditionC) || (conditionB && conditionD) ||
    (conditionC && conditionD))
    {
        elog(INFO, "more than 2 branch is true, should not be reached");
        if (!(conditionC && conditionD))
            elog(INFO, "not (conditionCD is true)");
        if ((conditionB && conditionD))
            elog(INFO, "conditionB and conditionD is true)");
    }
    ``````
    I re-ran the regression tests and noticed it's entirely possible for multiple
    conditions to be true at once. Because of this, all the IF blocks in
    get_reduced_frame_status (after the initial one) are order-dependent. If I move
    one IF statement above another, get_reduced_frame_status may return a
    different result.
    Is this the expected behavior? It seems like a potential logic flaw.
    
    ------------------------------------------------------------------------------------
    VIII-3. Absorption Conditions
    
    Planner-time prerequisites (all must hold for absorption to be enabled):
    
      (a) SKIP PAST LAST ROW.  SKIP TO NEXT ROW creates overlapping
          contexts that cannot be safely absorbed.
      (b) Unbounded frame (ROWS BETWEEN CURRENT ROW AND UNBOUNDED
          FOLLOWING).  Limited frames apply differently to each context,
          breaking the monotonicity principle.
      (c) No match_start-dependent navigation in DEFINE.
    
          Mechanism: each context has a different matchStartRow, so FIRST
          resolves to a different row for each context at the same
          currentpos.  An earlier context's DEFINE result no longer
          subsumes a later one's, making count-dominance comparison
          invalid.  Rather than comparing matchStartRow at runtime
          (which would complicate the absorb path), any match_start
          dependency disables absorption entirely.
    
          Navigation content              match_start dep.  absorption
          ------------------------------------------------------------
          No navigation                   none              safe
          PREV/NEXT only                  none              safe
          LAST (no offset)                none              safe
          LAST (with offset)              boundary check    unsafe
          FIRST (any)                     direct            unsafe
          Compound (inner FIRST)          direct            unsafe
          Compound (inner LAST, no off.)  none              safe
          Compound (inner LAST, w/off.)   boundary chk      unsafe
    ------------------------------------------------------------------------------------
    "boundary chk" should be "boundary check".
    column "match_start dep.", I don't understand the meaning of "direct"
    and "boundary check".
    "count-dominance" seems like an invented word, but I could not find
    the explanation.
    "match_start-dependent" occurred many times, it would also deserve an
    explanation.
    We can also rename it as match_start_dependent.
    
    ----------------------------------------------------
    Attached is a minor refactoring of ExecRPRProcessRow, no need for
    boolean hasLimitedFrame.
    ``if (hasLimitedFrame && winstate->endOffsetValue != 0)`` seems wrong
    to me, endOffsetValue is a Datum,
    and you directly compare it with 0.
    see also calculate_frame_offsets.
    ----------------------------------------------------
    ExecRPRProcessRow,
    nfa_evaluate_row
    ```
      if (!window_gettupleslot(winobj, pos, slot))
            return false;            /* No row exists */
    ```
    indicates that it's not possible for (currentPos > ctxFrameEnd).
    therefore
    `````
        if (currentPos >= ctxFrameEnd)
        {
            /* Frame boundary exceeded: force mismatch */
            nfa_match(winstate, ctx, NULL);
            continue;
        }
    `````
    should change to ```if (currentPos == ctxFrameEnd)```.
    With that, the comment wording "exceeded" seems wrong too.
    
    The comment 'Frame boundary exceeded' also needs updating since
    it's no longer exceeding the boundary.
    ----------------------------------------------------
    Since I couldn't generate a coverage report, I am using elog(INFO) to test
    reachability. Rerun the regress tests, you will find out that the ELSE IF branch
    below is not reached by current regress tests.
    ``````
        else if (targetCtx->states == NULL)
        {
            /* Context already completed - skip to result registration */
            goto register_result;
            elog(INFO, "XXXX reached");
        }
    ``````
    
    
    
    --
    jian
    https://www.enterprisedb.com/