v47-0006-Row-pattern-recognition-patch-docs.patch

application/octet-stream

Filename: v47-0006-Row-pattern-recognition-patch-docs.patch
Type: application/octet-stream
Part: 5
Message: Re: Row pattern recognition
From 597ccd24e405949e5d320eee0cb1f7262ad3068e Mon Sep 17 00:00:00 2001
From: Tatsuo Ishii <ishii@postgresql.org>
Date: Sat, 2 May 2026 13:40:29 +0900
Subject: [PATCH v47 6/9] Row pattern recognition patch (docs).

---
 doc/src/sgml/advanced.sgml         | 145 ++++++++++++++++++++++++++++-
 doc/src/sgml/func/func-window.sgml | 121 ++++++++++++++++++++++++
 doc/src/sgml/ref/select.sgml       |  91 +++++++++++++++++-
 3 files changed, 350 insertions(+), 7 deletions(-)

diff --git a/doc/src/sgml/advanced.sgml b/doc/src/sgml/advanced.sgml
index 3286c2cf0b2..11c2416df51 100644
--- a/doc/src/sgml/advanced.sgml
+++ b/doc/src/sgml/advanced.sgml
@@ -552,13 +552,150 @@ WHERE pos &lt; 3;
     two rows for each department).
    </para>
 
+   <para>
+    Row Pattern Common Syntax can be used to perform Row Pattern Recognition
+    in a query. The Row Pattern Common Syntax includes two sub
+    clauses: <literal>DEFINE</literal>
+    and <literal>PATTERN</literal>. <literal>DEFINE</literal> defines
+    row pattern variables along with an expression. The expression must be a
+    logical expression, which means it must
+    return <literal>TRUE</literal>, <literal>FALSE</literal>
+    or <literal>NULL</literal>. The expression may comprise column references
+    and functions. Window functions, aggregate functions and subqueries are
+    not allowed. An example of <literal>DEFINE</literal> is as follows.
+
+<programlisting>
+DEFINE
+ LOWPRICE AS price &lt;= 100,
+ UP AS price &gt; PREV(price),
+ DOWN AS price &lt; PREV(price)
+</programlisting>
+
+    Note that <function>PREV</function> returns the <literal>price</literal>
+    column in the previous row if it's called in a context of row pattern
+    recognition. Thus in the second line the row pattern variable "UP"
+    is <literal>TRUE</literal> when the price column in the current row is
+    greater than the price column in the previous row. Likewise, "DOWN"
+    is <literal>TRUE</literal> when the
+    <literal>price</literal> column in the current row is lower than
+    the <literal>price</literal> column in the previous row.
+   </para>
+   <para>
+    Once <literal>DEFINE</literal> exists, <literal>PATTERN</literal> can be
+    used. <literal>PATTERN</literal> defines a sequence of rows that satisfies
+    conditions defined in the <literal>DEFINE</literal> clause.  For example
+    the following <literal>PATTERN</literal> defines a sequence of rows starting
+    with a row satisfying "LOWPRICE", then one or more rows satisfying
+    "UP" and finally one or more rows satisfying "DOWN". Pattern variables can
+    be followed by quantifiers: "+" means one or more matches, "*" means zero
+    or more matches, "?" means zero or one match, "{n}" (n &gt; 0) means exactly
+    n matches, "{n,}" (n &gt;= 0) means at least n matches, "{,m}" (m &gt; 0) means
+    at most m matches, and "{n,m}" (0 &lt;= n &lt;= m, 0 &lt; m) means between n and m
+    matches.  Patterns can be grouped using parentheses and combined using
+    alternation (the vertical bar "|" for OR). For example, "(UP DOWN)+"
+    matches one or more repetitions of UP followed by DOWN.  If a sequence of
+    rows which satisfies the PATTERN is found, in the starting row all columns
+    or functions are shown in the target list. Note that aggregations only
+    look into the matched rows, rather than the whole frame. On the second or
+    subsequent rows all window functions are shown as NULL. Aggregates on
+    non-starting rows return their initial value: for example,
+    <function>count()</function> returns 0 and <function>sum()</function>
+    returns NULL. For rows that do not match the PATTERN, columns are shown
+    as NULL too. Example of a <literal>SELECT</literal> using
+    the <literal>DEFINE</literal> and <literal>PATTERN</literal> clause is as
+    follows.
+
+<programlisting>
+SELECT company, tdate, price,
+ first_value(price) OVER w,
+ max(price) OVER w,
+ count(price) OVER w
+FROM stock
+ WINDOW w AS (
+ PARTITION BY company
+ ORDER BY tdate
+ ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+ AFTER MATCH SKIP PAST LAST ROW
+ INITIAL
+ PATTERN (LOWPRICE UP+ DOWN+)
+ DEFINE
+  LOWPRICE AS price &lt;= 100,
+  UP AS price &gt; PREV(price),
+  DOWN AS price &lt; PREV(price)
+);
+</programlisting>
+<screen>
+ company  |   tdate    | price | first_value | max | count 
+----------+------------+-------+-------------+-----+-------
+ company1 | 2023-07-01 |   100 |         100 | 200 |     4
+ company1 | 2023-07-02 |   200 |             |     |     0
+ company1 | 2023-07-03 |   150 |             |     |     0
+ company1 | 2023-07-04 |   140 |             |     |     0
+ company1 | 2023-07-05 |   150 |             |     |     0
+ company1 | 2023-07-06 |    90 |          90 | 130 |     4
+ company1 | 2023-07-07 |   110 |             |     |     0
+ company1 | 2023-07-08 |   130 |             |     |     0
+ company1 | 2023-07-09 |   120 |             |     |     0
+ company1 | 2023-07-10 |   130 |             |     |     0
+(10 rows)
+</screen>
+   </para>
+
+   <para>
+    Row Pattern Recognition internally uses a nondeterministic finite
+    automaton (NFA) to match patterns. For patterns with unbounded
+    quantifiers (e.g., <literal>A+</literal> or <literal>(A B)+</literal>),
+    the NFA may need to track many active matching contexts simultaneously,
+    which could potentially lead to O(n<superscript>2</superscript>)
+    complexity as the number of rows increases.
+   </para>
+
+   <para>
+    Before execution, <productname>PostgreSQL</productname> automatically
+    optimizes patterns to simplify their structure. This includes flattening
+    nested sequences and alternations, merging consecutive identical variables
+    (e.g., <literal>A{2,3} A{1,2}</literal> becomes <literal>A{3,5}</literal>),
+    removing duplicate alternatives
+    (e.g., <literal>(A | B | A)</literal> becomes <literal>(A | B)</literal>),
+    and simplifying nested quantifiers
+    (e.g., <literal>(A*)*</literal> becomes <literal>A*</literal>).
+    These optimizations reduce pattern complexity and also decrease
+    nesting depth, making the 253-level depth limit rarely encountered.
+    They are applied transparently and can be observed
+    in <command>EXPLAIN</command> output.
+   </para>
+
+   <para>
+    To mitigate the O(n<superscript>2</superscript>) complexity described
+    above, <productname>PostgreSQL</productname> also employs
+    a context absorption optimization. When a pattern starts with a greedy
+    unbounded element, newer matching contexts cannot produce longer matches
+    than older contexts. By detecting and eliminating these redundant
+    contexts, the matching complexity is reduced from
+    O(n<superscript>2</superscript>) to O(n) for many common patterns.
+   </para>
+
+   <para>
+    When examining query plans for Row Pattern Recognition with
+    <command>EXPLAIN</command>, the pattern output may include special
+    markers that indicate optimization opportunities. A double quote
+    <literal>"</literal> marks where pattern absorption can occur,
+    and a single quote <literal>'</literal> marks absorbable elements
+    within a branch. For example, <literal>a+"</literal> indicates that
+    repeated matches of <literal>a</literal> can be absorbed, while
+    <literal>(a' b')+"</literal> shows that both <literal>a</literal>
+    and <literal>b</literal> within the group are absorbable.
+    These markers are primarily useful for understanding internal
+    optimization behavior.
+   </para>
+
    <para>
     When a query involves multiple window functions, it is possible to write
     out each one with a separate <literal>OVER</literal> clause, but this is
-    duplicative and error-prone if the same windowing behavior is wanted
-    for several functions.  Instead, each windowing behavior can be named
-    in a <literal>WINDOW</literal> clause and then referenced in <literal>OVER</literal>.
-    For example:
+    duplicative and error-prone if the same windowing behavior is wanted for
+    several functions.  Instead, each windowing behavior can be named in
+    a <literal>WINDOW</literal> clause and then referenced
+    in <literal>OVER</literal>.  For example:
 
 <programlisting>
 SELECT sum(salary) OVER w, avg(salary) OVER w
diff --git a/doc/src/sgml/func/func-window.sgml b/doc/src/sgml/func/func-window.sgml
index bcf755c9ebc..d1da105ad0f 100644
--- a/doc/src/sgml/func/func-window.sgml
+++ b/doc/src/sgml/func/func-window.sgml
@@ -278,6 +278,127 @@
    <function>nth_value</function>.
   </para>
 
+  <para>
+   Row Pattern Recognition navigation functions are listed in
+   <xref linkend="functions-rpr-navigation-table"/>.  These functions
+   can be used to describe the DEFINE clause of Row Pattern Recognition.
+  </para>
+
+   <table id="functions-rpr-navigation-table">
+    <title>Row Pattern Navigation Functions</title>
+    <tgroup cols="1">
+     <thead>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        Function
+       </para>
+       <para>
+        Description
+       </para></entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>prev</primary>
+        </indexterm>
+        <function>prev</function> ( <parameter>value</parameter> <type>anyelement</type> [, <parameter>offset</parameter> <type>bigint</type> ] )
+        <returnvalue>anyelement</returnvalue>
+       </para>
+       <para>
+        Returns <parameter>value</parameter> evaluated at the row that is
+        <parameter>offset</parameter> rows before the current row within
+        the partition;
+        returns NULL if the target row is outside the partition.
+        <parameter>offset</parameter> defaults to 1 if omitted.
+        <parameter>offset</parameter> must be a non-negative integer;
+        an offset of 0 refers to the current row itself.
+        <parameter>offset</parameter> must not be NULL.
+        Can only be used in a <literal>DEFINE</literal> clause.
+       </para></entry>
+      </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>next</primary>
+        </indexterm>
+        <function>next</function> ( <parameter>value</parameter> <type>anyelement</type> [, <parameter>offset</parameter> <type>bigint</type> ] )
+        <returnvalue>anyelement</returnvalue>
+       </para>
+       <para>
+        Returns <parameter>value</parameter> evaluated at the row that is
+        <parameter>offset</parameter> rows after the current row within
+        the partition;
+        returns NULL if the target row is outside the partition.
+        <parameter>offset</parameter> defaults to 1 if omitted.
+        <parameter>offset</parameter> must be a non-negative integer;
+        an offset of 0 refers to the current row itself.
+        <parameter>offset</parameter> must not be NULL.
+        Can only be used in a <literal>DEFINE</literal> clause.
+       </para></entry>
+      </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>first</primary>
+        </indexterm>
+        <function>first</function> ( <parameter>value</parameter> <type>anyelement</type> [, <parameter>offset</parameter> <type>bigint</type> ] )
+        <returnvalue>anyelement</returnvalue>
+       </para>
+       <para>
+        Returns <parameter>value</parameter> evaluated at the row that is
+        <parameter>offset</parameter> rows after the match start row;
+        returns NULL if the target row is beyond the current row.
+        <parameter>offset</parameter> defaults to 0 if omitted, referring to the
+        match start row itself.
+        <parameter>offset</parameter> must be a non-negative integer.
+        <parameter>offset</parameter> must not be NULL.
+        Can only be used in a <literal>DEFINE</literal> clause.
+       </para></entry>
+      </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>last</primary>
+        </indexterm>
+        <function>last</function> ( <parameter>value</parameter> <type>anyelement</type> [, <parameter>offset</parameter> <type>bigint</type> ] )
+        <returnvalue>anyelement</returnvalue>
+       </para>
+       <para>
+        Returns <parameter>value</parameter> evaluated at the row that is
+        <parameter>offset</parameter> rows before the current row within
+        the match;
+        returns NULL if the target row is before the match start row.
+        <parameter>offset</parameter> defaults to 0 if omitted, referring to the
+        current row itself.
+        <parameter>offset</parameter> must be a non-negative integer.
+        <parameter>offset</parameter> must not be NULL.
+        Can only be used in a <literal>DEFINE</literal> clause.
+       </para></entry>
+      </row>
+
+     </tbody>
+    </tgroup>
+   </table>
+
+   <para>
+    <function>PREV</function> and <function>NEXT</function> may wrap
+    <function>FIRST</function> or <function>LAST</function> for compound
+    navigation. For example,
+    <literal>PREV(FIRST(val, 2), 3)</literal> fetches the value at
+    3 rows before the row that is 2 rows after the match start.
+    The reverse nesting (<function>FIRST</function>/<function>LAST</function>
+    wrapping <function>PREV</function>/<function>NEXT</function>) is not
+    permitted. Same-category nesting (e.g.,
+    <function>PREV</function> inside <function>PREV</function>) is also
+    prohibited.
+   </para>
+
   <note>
    <para>
     The SQL standard defines a <literal>FROM FIRST</literal> or <literal>FROM LAST</literal>
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 09b6ce809bb..5272d6c0bfa 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -1022,8 +1022,8 @@ WINDOW <replaceable class="parameter">window_name</replaceable> AS ( <replaceabl
     The <replaceable class="parameter">frame_clause</replaceable> can be one of
 
 <synopsis>
-{ RANGE | ROWS | GROUPS } <replaceable>frame_start</replaceable> [ <replaceable>frame_exclusion</replaceable> ]
-{ RANGE | ROWS | GROUPS } BETWEEN <replaceable>frame_start</replaceable> AND <replaceable>frame_end</replaceable> [ <replaceable>frame_exclusion</replaceable> ]
+{ RANGE | ROWS | GROUPS } <replaceable>frame_start</replaceable> [ <replaceable>frame_exclusion</replaceable> ] [ <replaceable>row_pattern_common_syntax</replaceable> ]
+{ RANGE | ROWS | GROUPS } BETWEEN <replaceable>frame_start</replaceable> AND <replaceable>frame_end</replaceable> [ <replaceable>frame_exclusion</replaceable> ] [ <replaceable>row_pattern_common_syntax</replaceable> ]
 </synopsis>
 
     where <replaceable>frame_start</replaceable>
@@ -1130,9 +1130,94 @@ EXCLUDE NO OTHERS
     a given peer group will be in the frame or excluded from it.
    </para>
 
+   <para>
+    The
+    optional <replaceable class="parameter">row_pattern_common_syntax</replaceable>
+    defines the <firstterm>Row Pattern Recognition condition</firstterm> for
+    this
+    window. <replaceable class="parameter">row_pattern_common_syntax</replaceable>
+    includes the following subclauses.
+
+<synopsis>
+[ { AFTER MATCH SKIP PAST LAST ROW | AFTER MATCH SKIP TO NEXT ROW } ]
+[ INITIAL | SEEK ]
+PATTERN ( <replaceable class="parameter">pattern_variable_name</replaceable> [ <replaceable>quantifier</replaceable> ] [ ... ] )
+DEFINE <replaceable class="parameter">definition_variable_name</replaceable> AS <replaceable class="parameter">expression</replaceable> [, ...]
+</synopsis>
+    <literal>AFTER MATCH SKIP PAST LAST ROW</literal> or <literal>AFTER MATCH
+    SKIP TO NEXT ROW</literal> controls how to proceed to the next row position
+    after a match is found. With <literal>AFTER MATCH SKIP PAST LAST
+    ROW</literal> (the default) the next row position is next to the last row of
+    the previous match. On the other hand, with <literal>AFTER MATCH SKIP TO NEXT
+    ROW</literal> the next row position is next to the first row of the previous
+    match. <literal>INITIAL</literal> or <literal>SEEK</literal> specifies from
+    which row in the frame pattern matching begins.
+    If <literal>INITIAL</literal> is specified, the match must start
+    from the first row in the frame. If <literal>SEEK</literal> is specified,
+    the set of matching rows does not necessarily start from the first row. The
+    default is <literal>INITIAL</literal>. Currently
+    only <literal>INITIAL</literal> is supported. <literal>DEFINE</literal>
+    defines definition variables along with a boolean
+    expression. <literal>PATTERN</literal> defines a sequence of rows that
+    satisfies certain conditions using variables defined
+    in the <literal>DEFINE</literal> clause (an empty <literal>PATTERN()</literal>
+    is not supported). Each pattern variable can be followed by a quantifier
+    to specify how many times it should match:
+    <literal>*</literal> (zero or more),
+    <literal>+</literal> (one or more),
+    <literal>?</literal> (zero or one),
+    <literal>{</literal><replaceable>n</replaceable><literal>}</literal> (exactly <replaceable>n</replaceable> times, n &gt; 0),
+    <literal>{</literal><replaceable>n</replaceable><literal>,}</literal> (at least <replaceable>n</replaceable> times, n &gt;= 0),
+    <literal>{,</literal><replaceable>m</replaceable><literal>}</literal> (at most <replaceable>m</replaceable> times, m &gt; 0), or
+    <literal>{</literal><replaceable>n</replaceable><literal>,</literal><replaceable>m</replaceable><literal>}</literal>
+    (between <replaceable>n</replaceable> and <replaceable>m</replaceable> times, 0 &lt;= n &lt;= m, 0 &lt; m).
+    Reluctant quantifiers (e.g., <literal>*?</literal>, <literal>+?</literal>,
+    <literal>??</literal>, <literal>{</literal><replaceable>n</replaceable><literal>,</literal><replaceable>m</replaceable><literal>}?</literal>)
+    are supported.
+    The exclusion (<literal>{-</literal> and <literal>-}</literal>)
+    is not supported.
+    Patterns can be grouped using parentheses, and alternation (OR) can be
+    expressed using the vertical bar <literal>|</literal>.
+    For example, <literal>(A B)+</literal> matches one or more repetitions
+    of the sequence A followed by B, and <literal>A | B</literal> matches
+    either A or B.
+    If a pattern variable is not defined in
+    the <literal>DEFINE</literal> clause, it is not automatically added
+    to the <literal>DEFINE</literal> clause. Instead, the executor evaluates
+    the variable as <literal>TRUE</literal> at execution time, behaving as if
+    the following definition existed.
+
+<synopsis>
+<literal>variable_name</literal> AS TRUE
+</synopsis>
+
+    Conversely, variables defined in the <literal>DEFINE</literal> clause
+    but not used in the <literal>PATTERN</literal> clause are filtered out
+    during query planning.
+   </para>
+
+   <para>
+    Note that the maximum number of unique pattern variables
+    used in the <literal>PATTERN</literal> clause is 251.
+    If this limit is exceeded, an error will be raised.
+    Additionally, the maximum nesting depth of pattern groups
+    (parentheses) is 253 levels.
+    However, pattern optimizations such as flattening nested sequences
+    and simplifying nested quantifiers may reduce the effective depth,
+    so this limit is rarely reached in practice.
+   </para>
+
+   <para>
+    The SQL standard defines more subclauses: <literal>MEASURES</literal>
+    and <literal>SUBSET</literal>. They are not currently supported
+    in <productname>PostgreSQL</productname>. Also in the standard there are
+    more variations in <literal>AFTER MATCH</literal> clause.
+   </para>
+
    <para>
     The purpose of a <literal>WINDOW</literal> clause is to specify the
-    behavior of <firstterm>window functions</firstterm> appearing in the query's
+    behavior of <firstterm>window functions</firstterm> appearing in the
+    query's
     <link linkend="sql-select-list"><command>SELECT</command> list</link> or
     <link linkend="sql-orderby"><literal>ORDER BY</literal></link> clause.
     These functions
-- 
2.43.0