Thread

  1. Re: Row pattern recognition

    Henson Choi <assam258@gmail.com> — 2026-05-05T04:26:56Z

    Hi Tatsuo,
    
    Thanks for the close look.  Inline below, then a sketch of what I
    plan to put in v48 so you can pick what to pull first.
    
    v47 starts to check whether range variable qualified expressions are
    > used in DEFINE clause. If used, raise an error. This is good because
    > we don't support the syntax yet (pattern variable range var), or it's
    > prohibited (from clause range var). However, the error message may not
    > be appropreate for the case when complex data type is involved.
    >
    > CREATE TEMP TABLE item (name TEXT, amount INT);
    > CREATE TABLE
    > CREATE TEMP TABLE sales(items item);
    > CREATE TABLE
    > SELECT (items).name, (items).amount, count(*) OVER w
    > FROM sales WINDOW w AS (
    >     ORDER BY (items).name
    >     ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
    >     AFTER MATCH SKIP PAST LAST ROW
    >     PATTERN (A+)
    >     DEFINE A AS (A.items).amount > 10
    > );
    > ERROR:  pattern variable qualified column reference "a.items" is not
    > supported in DEFINE clause
    > LINE 7:     DEFINE A AS (A.items).amount > 10
    >                          ^
    > If I change the DEFINE clause to:
    >
    >     DEFINE A AS (sales.items).amount > 10
    >
    > I get:
    >
    > ERROR:  range variable qualified column reference "sales.items" is not
    > allowed in DEFINE clause
    > LINE 8:     DEFINE A AS (sales.items).amount > 10
    >                          ^
    >
    > In both cases, "a.items" or "sales.items" in the error messages are
    > not column names, therefore the wording "column reference" in the
    > error messages are not appropriate.
    >
    
    Agreed.  Even for the simple non-composite cases the noun is fuzzy
    once A_Indirection enters the picture.  "expression" reads naturally
    in both cases.
    
    
    > In order to fix the issue, I think we need to add code to understand
    > range var qualification form "A.item" or "sales.items" in complex data
    > types case. But is it worth the trouble? The reword is just nicer
    > error messages. If we support MEASURES, the code is useful. But so far
    > we have decided to not support it in the first cut of RPR.
    >
    
    Not now.  Teaching the pre-check about the surrounding A_Indirection
    duplicates work that MEASURES will need anyway, and the only visible
    gain before MEASURES is a more accurate echo of the source text.
    
    Instead I'll mark the limitation in three places so future MEASURES
    work can't miss it: a block comment above the pre-check that names
    the limitation and the revisit point, an XXX cross-reference in
    parse_rpr.c pointing back to the pre-check, and composite-type cases
    in rpr_base whose expected output quotes only the ColumnRef portion
    -- so when indirection-aware quoting lands, those outputs churn as
    a tripwire.
    
    
    > Maybe we should just change the error messages something like below
    > for now?
    >
    > ERROR:  pattern variable qualified expression "a.items" is not supported
    > in DEFINE clause
    > ERROR:  range variable qualified expression "sales.items" is not allowed
    > in DEFINE clause
    
    
    Yes -- planned for v48.  While I'm in that path I'll also clean up
    two adjacent issues:
    
      - the pre-check is binary, so an unknown qualifier (typo,
        undefined name) is misreported as a range-variable reference
        with SYNTAX_ERROR instead of falling through to the standard
        "column does not exist" / "missing FROM-clause entry"
        diagnostic;
    
      - parse_rpr.c and the SELECT docs claim DEFINE-name "collection"
        and "filtering during planning"; the actual behavior is
        validate-and-reject at parse analysis.
    
    
    ------------------------------------------------------------------
    v48 follow-up plan
    ------------------------------------------------------------------
    
    The items below are independent, so they can ship as separate
    patches or as a single batched posting -- whichever you prefer.
    Order is just for narrative; nothing depends on anything else.
    
      [A] Sizable refactor: collapse the four DEFINE walkers across
          parser/planner/executor into a single phase-tagged
          traversal.  On that base, reject volatile / NextValueExpr
          in DEFINE (the NFA may re-evaluate predicates during
          backtracking; STABLE / IMMUTABLE remain accepted), and
          bundle a STABLE/IMMUTABLE baseline test as a guard against
          accidental over-rejection.
    
      [B] Make the empty-match path observable in tests: replace the
          stale XXX comments in rpr_nfa with the actual behavior, and
          add EXPLAIN ANALYZE coverage in rpr_explain that surfaces
          "NFA: N matched (len 0/0/0.0)" so the NFA-found-but-window-
          empty case is regression-visible.
    
      [C] Trim the per-advance NFA visited-bitmap reset to a high-water
          mark range instead of the full bitmap.  Tradeoff: two int16
          comparisons added per visit, paying off for larger NFAs but
          added overhead for single-word bitmaps; semantics unchanged.
          I'll leave the decision to apply to your judgment.
    
      [D] DEFINE qualifier diagnostic: tri-classify (pattern var /
          range var / fall-through), reword to "expression", add
          unknown-qualifier and composite-type tests, and sync the
          adjacent stale comments and SELECT doc.
    
    If any of [A]..[D] looks misjudged or you'd prefer a different
    slicing, I'll reshape before posting.
    
    Best,
    Henson