v47-0006-Row-pattern-recognition-patch-docs.patch
application/octet-stream
Filename: v47-0006-Row-pattern-recognition-patch-docs.patch
Type: application/octet-stream
Part: 5
Message:
Re: Row pattern recognition
From 597ccd24e405949e5d320eee0cb1f7262ad3068e Mon Sep 17 00:00:00 2001
From: Tatsuo Ishii <ishii@postgresql.org>
Date: Sat, 2 May 2026 13:40:29 +0900
Subject: [PATCH v47 6/9] Row pattern recognition patch (docs).
---
doc/src/sgml/advanced.sgml | 145 ++++++++++++++++++++++++++++-
doc/src/sgml/func/func-window.sgml | 121 ++++++++++++++++++++++++
doc/src/sgml/ref/select.sgml | 91 +++++++++++++++++-
3 files changed, 350 insertions(+), 7 deletions(-)
diff --git a/doc/src/sgml/advanced.sgml b/doc/src/sgml/advanced.sgml
index 3286c2cf0b2..11c2416df51 100644
--- a/doc/src/sgml/advanced.sgml
+++ b/doc/src/sgml/advanced.sgml
@@ -552,13 +552,150 @@ WHERE pos < 3;
two rows for each department).
</para>
+ <para>
+ Row Pattern Common Syntax can be used to perform Row Pattern Recognition
+ in a query. The Row Pattern Common Syntax includes two sub
+ clauses: <literal>DEFINE</literal>
+ and <literal>PATTERN</literal>. <literal>DEFINE</literal> defines
+ row pattern variables along with an expression. The expression must be a
+ logical expression, which means it must
+ return <literal>TRUE</literal>, <literal>FALSE</literal>
+ or <literal>NULL</literal>. The expression may comprise column references
+ and functions. Window functions, aggregate functions and subqueries are
+ not allowed. An example of <literal>DEFINE</literal> is as follows.
+
+<programlisting>
+DEFINE
+ LOWPRICE AS price <= 100,
+ UP AS price > PREV(price),
+ DOWN AS price < PREV(price)
+</programlisting>
+
+ Note that <function>PREV</function> returns the <literal>price</literal>
+ column in the previous row if it's called in a context of row pattern
+ recognition. Thus in the second line the row pattern variable "UP"
+ is <literal>TRUE</literal> when the price column in the current row is
+ greater than the price column in the previous row. Likewise, "DOWN"
+ is <literal>TRUE</literal> when the
+ <literal>price</literal> column in the current row is lower than
+ the <literal>price</literal> column in the previous row.
+ </para>
+ <para>
+ Once <literal>DEFINE</literal> exists, <literal>PATTERN</literal> can be
+ used. <literal>PATTERN</literal> defines a sequence of rows that satisfies
+ conditions defined in the <literal>DEFINE</literal> clause. For example
+ the following <literal>PATTERN</literal> defines a sequence of rows starting
+ with a row satisfying "LOWPRICE", then one or more rows satisfying
+ "UP" and finally one or more rows satisfying "DOWN". Pattern variables can
+ be followed by quantifiers: "+" means one or more matches, "*" means zero
+ or more matches, "?" means zero or one match, "{n}" (n > 0) means exactly
+ n matches, "{n,}" (n >= 0) means at least n matches, "{,m}" (m > 0) means
+ at most m matches, and "{n,m}" (0 <= n <= m, 0 < m) means between n and m
+ matches. Patterns can be grouped using parentheses and combined using
+ alternation (the vertical bar "|" for OR). For example, "(UP DOWN)+"
+ matches one or more repetitions of UP followed by DOWN. If a sequence of
+ rows which satisfies the PATTERN is found, in the starting row all columns
+ or functions are shown in the target list. Note that aggregations only
+ look into the matched rows, rather than the whole frame. On the second or
+ subsequent rows all window functions are shown as NULL. Aggregates on
+ non-starting rows return their initial value: for example,
+ <function>count()</function> returns 0 and <function>sum()</function>
+ returns NULL. For rows that do not match the PATTERN, columns are shown
+ as NULL too. Example of a <literal>SELECT</literal> using
+ the <literal>DEFINE</literal> and <literal>PATTERN</literal> clause is as
+ follows.
+
+<programlisting>
+SELECT company, tdate, price,
+ first_value(price) OVER w,
+ max(price) OVER w,
+ count(price) OVER w
+FROM stock
+ WINDOW w AS (
+ PARTITION BY company
+ ORDER BY tdate
+ ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+ AFTER MATCH SKIP PAST LAST ROW
+ INITIAL
+ PATTERN (LOWPRICE UP+ DOWN+)
+ DEFINE
+ LOWPRICE AS price <= 100,
+ UP AS price > PREV(price),
+ DOWN AS price < PREV(price)
+);
+</programlisting>
+<screen>
+ company | tdate | price | first_value | max | count
+----------+------------+-------+-------------+-----+-------
+ company1 | 2023-07-01 | 100 | 100 | 200 | 4
+ company1 | 2023-07-02 | 200 | | | 0
+ company1 | 2023-07-03 | 150 | | | 0
+ company1 | 2023-07-04 | 140 | | | 0
+ company1 | 2023-07-05 | 150 | | | 0
+ company1 | 2023-07-06 | 90 | 90 | 130 | 4
+ company1 | 2023-07-07 | 110 | | | 0
+ company1 | 2023-07-08 | 130 | | | 0
+ company1 | 2023-07-09 | 120 | | | 0
+ company1 | 2023-07-10 | 130 | | | 0
+(10 rows)
+</screen>
+ </para>
+
+ <para>
+ Row Pattern Recognition internally uses a nondeterministic finite
+ automaton (NFA) to match patterns. For patterns with unbounded
+ quantifiers (e.g., <literal>A+</literal> or <literal>(A B)+</literal>),
+ the NFA may need to track many active matching contexts simultaneously,
+ which could potentially lead to O(n<superscript>2</superscript>)
+ complexity as the number of rows increases.
+ </para>
+
+ <para>
+ Before execution, <productname>PostgreSQL</productname> automatically
+ optimizes patterns to simplify their structure. This includes flattening
+ nested sequences and alternations, merging consecutive identical variables
+ (e.g., <literal>A{2,3} A{1,2}</literal> becomes <literal>A{3,5}</literal>),
+ removing duplicate alternatives
+ (e.g., <literal>(A | B | A)</literal> becomes <literal>(A | B)</literal>),
+ and simplifying nested quantifiers
+ (e.g., <literal>(A*)*</literal> becomes <literal>A*</literal>).
+ These optimizations reduce pattern complexity and also decrease
+ nesting depth, making the 253-level depth limit rarely encountered.
+ They are applied transparently and can be observed
+ in <command>EXPLAIN</command> output.
+ </para>
+
+ <para>
+ To mitigate the O(n<superscript>2</superscript>) complexity described
+ above, <productname>PostgreSQL</productname> also employs
+ a context absorption optimization. When a pattern starts with a greedy
+ unbounded element, newer matching contexts cannot produce longer matches
+ than older contexts. By detecting and eliminating these redundant
+ contexts, the matching complexity is reduced from
+ O(n<superscript>2</superscript>) to O(n) for many common patterns.
+ </para>
+
+ <para>
+ When examining query plans for Row Pattern Recognition with
+ <command>EXPLAIN</command>, the pattern output may include special
+ markers that indicate optimization opportunities. A double quote
+ <literal>"</literal> marks where pattern absorption can occur,
+ and a single quote <literal>'</literal> marks absorbable elements
+ within a branch. For example, <literal>a+"</literal> indicates that
+ repeated matches of <literal>a</literal> can be absorbed, while
+ <literal>(a' b')+"</literal> shows that both <literal>a</literal>
+ and <literal>b</literal> within the group are absorbable.
+ These markers are primarily useful for understanding internal
+ optimization behavior.
+ </para>
+
<para>
When a query involves multiple window functions, it is possible to write
out each one with a separate <literal>OVER</literal> clause, but this is
- duplicative and error-prone if the same windowing behavior is wanted
- for several functions. Instead, each windowing behavior can be named
- in a <literal>WINDOW</literal> clause and then referenced in <literal>OVER</literal>.
- For example:
+ duplicative and error-prone if the same windowing behavior is wanted for
+ several functions. Instead, each windowing behavior can be named in
+ a <literal>WINDOW</literal> clause and then referenced
+ in <literal>OVER</literal>. For example:
<programlisting>
SELECT sum(salary) OVER w, avg(salary) OVER w
diff --git a/doc/src/sgml/func/func-window.sgml b/doc/src/sgml/func/func-window.sgml
index bcf755c9ebc..d1da105ad0f 100644
--- a/doc/src/sgml/func/func-window.sgml
+++ b/doc/src/sgml/func/func-window.sgml
@@ -278,6 +278,127 @@
<function>nth_value</function>.
</para>
+ <para>
+ Row Pattern Recognition navigation functions are listed in
+ <xref linkend="functions-rpr-navigation-table"/>. These functions
+ can be used to describe the DEFINE clause of Row Pattern Recognition.
+ </para>
+
+ <table id="functions-rpr-navigation-table">
+ <title>Row Pattern Navigation Functions</title>
+ <tgroup cols="1">
+ <thead>
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ Function
+ </para>
+ <para>
+ Description
+ </para></entry>
+ </row>
+ </thead>
+
+ <tbody>
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>prev</primary>
+ </indexterm>
+ <function>prev</function> ( <parameter>value</parameter> <type>anyelement</type> [, <parameter>offset</parameter> <type>bigint</type> ] )
+ <returnvalue>anyelement</returnvalue>
+ </para>
+ <para>
+ Returns <parameter>value</parameter> evaluated at the row that is
+ <parameter>offset</parameter> rows before the current row within
+ the partition;
+ returns NULL if the target row is outside the partition.
+ <parameter>offset</parameter> defaults to 1 if omitted.
+ <parameter>offset</parameter> must be a non-negative integer;
+ an offset of 0 refers to the current row itself.
+ <parameter>offset</parameter> must not be NULL.
+ Can only be used in a <literal>DEFINE</literal> clause.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>next</primary>
+ </indexterm>
+ <function>next</function> ( <parameter>value</parameter> <type>anyelement</type> [, <parameter>offset</parameter> <type>bigint</type> ] )
+ <returnvalue>anyelement</returnvalue>
+ </para>
+ <para>
+ Returns <parameter>value</parameter> evaluated at the row that is
+ <parameter>offset</parameter> rows after the current row within
+ the partition;
+ returns NULL if the target row is outside the partition.
+ <parameter>offset</parameter> defaults to 1 if omitted.
+ <parameter>offset</parameter> must be a non-negative integer;
+ an offset of 0 refers to the current row itself.
+ <parameter>offset</parameter> must not be NULL.
+ Can only be used in a <literal>DEFINE</literal> clause.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>first</primary>
+ </indexterm>
+ <function>first</function> ( <parameter>value</parameter> <type>anyelement</type> [, <parameter>offset</parameter> <type>bigint</type> ] )
+ <returnvalue>anyelement</returnvalue>
+ </para>
+ <para>
+ Returns <parameter>value</parameter> evaluated at the row that is
+ <parameter>offset</parameter> rows after the match start row;
+ returns NULL if the target row is beyond the current row.
+ <parameter>offset</parameter> defaults to 0 if omitted, referring to the
+ match start row itself.
+ <parameter>offset</parameter> must be a non-negative integer.
+ <parameter>offset</parameter> must not be NULL.
+ Can only be used in a <literal>DEFINE</literal> clause.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>last</primary>
+ </indexterm>
+ <function>last</function> ( <parameter>value</parameter> <type>anyelement</type> [, <parameter>offset</parameter> <type>bigint</type> ] )
+ <returnvalue>anyelement</returnvalue>
+ </para>
+ <para>
+ Returns <parameter>value</parameter> evaluated at the row that is
+ <parameter>offset</parameter> rows before the current row within
+ the match;
+ returns NULL if the target row is before the match start row.
+ <parameter>offset</parameter> defaults to 0 if omitted, referring to the
+ current row itself.
+ <parameter>offset</parameter> must be a non-negative integer.
+ <parameter>offset</parameter> must not be NULL.
+ Can only be used in a <literal>DEFINE</literal> clause.
+ </para></entry>
+ </row>
+
+ </tbody>
+ </tgroup>
+ </table>
+
+ <para>
+ <function>PREV</function> and <function>NEXT</function> may wrap
+ <function>FIRST</function> or <function>LAST</function> for compound
+ navigation. For example,
+ <literal>PREV(FIRST(val, 2), 3)</literal> fetches the value at
+ 3 rows before the row that is 2 rows after the match start.
+ The reverse nesting (<function>FIRST</function>/<function>LAST</function>
+ wrapping <function>PREV</function>/<function>NEXT</function>) is not
+ permitted. Same-category nesting (e.g.,
+ <function>PREV</function> inside <function>PREV</function>) is also
+ prohibited.
+ </para>
+
<note>
<para>
The SQL standard defines a <literal>FROM FIRST</literal> or <literal>FROM LAST</literal>
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 09b6ce809bb..5272d6c0bfa 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -1022,8 +1022,8 @@ WINDOW <replaceable class="parameter">window_name</replaceable> AS ( <replaceabl
The <replaceable class="parameter">frame_clause</replaceable> can be one of
<synopsis>
-{ RANGE | ROWS | GROUPS } <replaceable>frame_start</replaceable> [ <replaceable>frame_exclusion</replaceable> ]
-{ RANGE | ROWS | GROUPS } BETWEEN <replaceable>frame_start</replaceable> AND <replaceable>frame_end</replaceable> [ <replaceable>frame_exclusion</replaceable> ]
+{ RANGE | ROWS | GROUPS } <replaceable>frame_start</replaceable> [ <replaceable>frame_exclusion</replaceable> ] [ <replaceable>row_pattern_common_syntax</replaceable> ]
+{ RANGE | ROWS | GROUPS } BETWEEN <replaceable>frame_start</replaceable> AND <replaceable>frame_end</replaceable> [ <replaceable>frame_exclusion</replaceable> ] [ <replaceable>row_pattern_common_syntax</replaceable> ]
</synopsis>
where <replaceable>frame_start</replaceable>
@@ -1130,9 +1130,94 @@ EXCLUDE NO OTHERS
a given peer group will be in the frame or excluded from it.
</para>
+ <para>
+ The
+ optional <replaceable class="parameter">row_pattern_common_syntax</replaceable>
+ defines the <firstterm>Row Pattern Recognition condition</firstterm> for
+ this
+ window. <replaceable class="parameter">row_pattern_common_syntax</replaceable>
+ includes the following subclauses.
+
+<synopsis>
+[ { AFTER MATCH SKIP PAST LAST ROW | AFTER MATCH SKIP TO NEXT ROW } ]
+[ INITIAL | SEEK ]
+PATTERN ( <replaceable class="parameter">pattern_variable_name</replaceable> [ <replaceable>quantifier</replaceable> ] [ ... ] )
+DEFINE <replaceable class="parameter">definition_variable_name</replaceable> AS <replaceable class="parameter">expression</replaceable> [, ...]
+</synopsis>
+ <literal>AFTER MATCH SKIP PAST LAST ROW</literal> or <literal>AFTER MATCH
+ SKIP TO NEXT ROW</literal> controls how to proceed to the next row position
+ after a match is found. With <literal>AFTER MATCH SKIP PAST LAST
+ ROW</literal> (the default) the next row position is next to the last row of
+ the previous match. On the other hand, with <literal>AFTER MATCH SKIP TO NEXT
+ ROW</literal> the next row position is next to the first row of the previous
+ match. <literal>INITIAL</literal> or <literal>SEEK</literal> specifies from
+ which row in the frame pattern matching begins.
+ If <literal>INITIAL</literal> is specified, the match must start
+ from the first row in the frame. If <literal>SEEK</literal> is specified,
+ the set of matching rows does not necessarily start from the first row. The
+ default is <literal>INITIAL</literal>. Currently
+ only <literal>INITIAL</literal> is supported. <literal>DEFINE</literal>
+ defines definition variables along with a boolean
+ expression. <literal>PATTERN</literal> defines a sequence of rows that
+ satisfies certain conditions using variables defined
+ in the <literal>DEFINE</literal> clause (an empty <literal>PATTERN()</literal>
+ is not supported). Each pattern variable can be followed by a quantifier
+ to specify how many times it should match:
+ <literal>*</literal> (zero or more),
+ <literal>+</literal> (one or more),
+ <literal>?</literal> (zero or one),
+ <literal>{</literal><replaceable>n</replaceable><literal>}</literal> (exactly <replaceable>n</replaceable> times, n > 0),
+ <literal>{</literal><replaceable>n</replaceable><literal>,}</literal> (at least <replaceable>n</replaceable> times, n >= 0),
+ <literal>{,</literal><replaceable>m</replaceable><literal>}</literal> (at most <replaceable>m</replaceable> times, m > 0), or
+ <literal>{</literal><replaceable>n</replaceable><literal>,</literal><replaceable>m</replaceable><literal>}</literal>
+ (between <replaceable>n</replaceable> and <replaceable>m</replaceable> times, 0 <= n <= m, 0 < m).
+ Reluctant quantifiers (e.g., <literal>*?</literal>, <literal>+?</literal>,
+ <literal>??</literal>, <literal>{</literal><replaceable>n</replaceable><literal>,</literal><replaceable>m</replaceable><literal>}?</literal>)
+ are supported.
+ The exclusion (<literal>{-</literal> and <literal>-}</literal>)
+ is not supported.
+ Patterns can be grouped using parentheses, and alternation (OR) can be
+ expressed using the vertical bar <literal>|</literal>.
+ For example, <literal>(A B)+</literal> matches one or more repetitions
+ of the sequence A followed by B, and <literal>A | B</literal> matches
+ either A or B.
+ If a pattern variable is not defined in
+ the <literal>DEFINE</literal> clause, it is not automatically added
+ to the <literal>DEFINE</literal> clause. Instead, the executor evaluates
+ the variable as <literal>TRUE</literal> at execution time, behaving as if
+ the following definition existed.
+
+<synopsis>
+<literal>variable_name</literal> AS TRUE
+</synopsis>
+
+ Conversely, variables defined in the <literal>DEFINE</literal> clause
+ but not used in the <literal>PATTERN</literal> clause are filtered out
+ during query planning.
+ </para>
+
+ <para>
+ Note that the maximum number of unique pattern variables
+ used in the <literal>PATTERN</literal> clause is 251.
+ If this limit is exceeded, an error will be raised.
+ Additionally, the maximum nesting depth of pattern groups
+ (parentheses) is 253 levels.
+ However, pattern optimizations such as flattening nested sequences
+ and simplifying nested quantifiers may reduce the effective depth,
+ so this limit is rarely reached in practice.
+ </para>
+
+ <para>
+ The SQL standard defines more subclauses: <literal>MEASURES</literal>
+ and <literal>SUBSET</literal>. They are not currently supported
+ in <productname>PostgreSQL</productname>. Also in the standard there are
+ more variations in <literal>AFTER MATCH</literal> clause.
+ </para>
+
<para>
The purpose of a <literal>WINDOW</literal> clause is to specify the
- behavior of <firstterm>window functions</firstterm> appearing in the query's
+ behavior of <firstterm>window functions</firstterm> appearing in the
+ query's
<link linkend="sql-select-list"><command>SELECT</command> list</link> or
<link linkend="sql-orderby"><literal>ORDER BY</literal></link> clause.
These functions
--
2.43.0