Thread
-
Re: Row pattern recognition
Henson Choi <assam258@gmail.com> — 2026-05-05T04:26:56Z
Hi Tatsuo, Thanks for the close look. Inline below, then a sketch of what I plan to put in v48 so you can pick what to pull first. v47 starts to check whether range variable qualified expressions are > used in DEFINE clause. If used, raise an error. This is good because > we don't support the syntax yet (pattern variable range var), or it's > prohibited (from clause range var). However, the error message may not > be appropreate for the case when complex data type is involved. > > CREATE TEMP TABLE item (name TEXT, amount INT); > CREATE TABLE > CREATE TEMP TABLE sales(items item); > CREATE TABLE > SELECT (items).name, (items).amount, count(*) OVER w > FROM sales WINDOW w AS ( > ORDER BY (items).name > ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING > AFTER MATCH SKIP PAST LAST ROW > PATTERN (A+) > DEFINE A AS (A.items).amount > 10 > ); > ERROR: pattern variable qualified column reference "a.items" is not > supported in DEFINE clause > LINE 7: DEFINE A AS (A.items).amount > 10 > ^ > If I change the DEFINE clause to: > > DEFINE A AS (sales.items).amount > 10 > > I get: > > ERROR: range variable qualified column reference "sales.items" is not > allowed in DEFINE clause > LINE 8: DEFINE A AS (sales.items).amount > 10 > ^ > > In both cases, "a.items" or "sales.items" in the error messages are > not column names, therefore the wording "column reference" in the > error messages are not appropriate. > Agreed. Even for the simple non-composite cases the noun is fuzzy once A_Indirection enters the picture. "expression" reads naturally in both cases. > In order to fix the issue, I think we need to add code to understand > range var qualification form "A.item" or "sales.items" in complex data > types case. But is it worth the trouble? The reword is just nicer > error messages. If we support MEASURES, the code is useful. But so far > we have decided to not support it in the first cut of RPR. > Not now. Teaching the pre-check about the surrounding A_Indirection duplicates work that MEASURES will need anyway, and the only visible gain before MEASURES is a more accurate echo of the source text. Instead I'll mark the limitation in three places so future MEASURES work can't miss it: a block comment above the pre-check that names the limitation and the revisit point, an XXX cross-reference in parse_rpr.c pointing back to the pre-check, and composite-type cases in rpr_base whose expected output quotes only the ColumnRef portion -- so when indirection-aware quoting lands, those outputs churn as a tripwire. > Maybe we should just change the error messages something like below > for now? > > ERROR: pattern variable qualified expression "a.items" is not supported > in DEFINE clause > ERROR: range variable qualified expression "sales.items" is not allowed > in DEFINE clause Yes -- planned for v48. While I'm in that path I'll also clean up two adjacent issues: - the pre-check is binary, so an unknown qualifier (typo, undefined name) is misreported as a range-variable reference with SYNTAX_ERROR instead of falling through to the standard "column does not exist" / "missing FROM-clause entry" diagnostic; - parse_rpr.c and the SELECT docs claim DEFINE-name "collection" and "filtering during planning"; the actual behavior is validate-and-reject at parse analysis. ------------------------------------------------------------------ v48 follow-up plan ------------------------------------------------------------------ The items below are independent, so they can ship as separate patches or as a single batched posting -- whichever you prefer. Order is just for narrative; nothing depends on anything else. [A] Sizable refactor: collapse the four DEFINE walkers across parser/planner/executor into a single phase-tagged traversal. On that base, reject volatile / NextValueExpr in DEFINE (the NFA may re-evaluate predicates during backtracking; STABLE / IMMUTABLE remain accepted), and bundle a STABLE/IMMUTABLE baseline test as a guard against accidental over-rejection. [B] Make the empty-match path observable in tests: replace the stale XXX comments in rpr_nfa with the actual behavior, and add EXPLAIN ANALYZE coverage in rpr_explain that surfaces "NFA: N matched (len 0/0/0.0)" so the NFA-found-but-window- empty case is regression-visible. [C] Trim the per-advance NFA visited-bitmap reset to a high-water mark range instead of the full bitmap. Tradeoff: two int16 comparisons added per visit, paying off for larger NFAs but added overhead for single-word bitmaps; semantics unchanged. I'll leave the decision to apply to your judgment. [D] DEFINE qualifier diagnostic: tri-classify (pattern var / range var / fall-through), reword to "expression", add unknown-qualifier and composite-type tests, and sync the adjacent stale comments and SELECT doc. If any of [A]..[D] looks misjudged or you'd prefer a different slicing, I'll reshape before posting. Best, Henson