Thread

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →

Add temporal FOREIGN KEY contraints
- 89f908a6d0ac 18.0 cited
Remove obsolete executor cleanup code
- d060e921ea5a 17.0 cited

Row pattern recognition

Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-06-25T12:05:09Z

Attached is a PoC patch to implement "Row pattern recognition" (RPR)
in SQL:2016 (I know SQL:2023 is already out, but I don't have access
to it). Actually SQL:2016 defines RPR in two places[1]:

    Feature R010, “Row pattern recognition: FROM clause”
    Feature R020, “Row pattern recognition: WINDOW clause”

The patch includes partial support for R020 part.

- What is RPR?

RPR provides a way to search series of data using regular expression
patterns. Suppose you have a stock database.

CREATE TABLE stock (
       company TEXT,
       tdate DATE,
       price BIGINT);

You want to find a "V-shaped" pattern: i.e. price goes down for 1 or
more days, then goes up for 1 or more days. If we try to implement the
query in PostgreSQL, it could be quite complex and inefficient.

RPR provides convenient way to implement the query.

SELECT company, tdate, price, rpr(price) OVER w FROM stock
 WINDOW w AS (
 PARTITION BY company
 ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
 PATTERN (START DOWN+ UP+)
 DEFINE
  START AS TRUE,
  UP AS price > PREV(price),
  DOWN AS price < PREV(price)
);

"PATTERN" and "DEFINE" are the key clauses of RPR. DEFINE defines 3
"row pattern variables" namely START, UP and DOWN. They are associated
with logical expressions namely "TRUE", "price > PREV(price)", and
"price < PREV(price)". Note that "PREV" function returns price column
in the previous row. So, UP is true if price is higher than previous
day. On the other hand, DOWN is true if price is lower than previous
day.  PATTERN uses the row pattern variables to create a necessary
pattern.  In this case, the first row is always match because START is
always true, and second or more rows match with "UP" ('+' is a regular
expression representing one or more), and subsequent rows match with
"DOWN".

Here is the sample output.

 company  |   tdate    | price | rpr  
----------+------------+-------+------
 company1 | 2023-07-01 |   100 |     
 company1 | 2023-07-02 |   200 |  200 -- 200->150->140->150
 company1 | 2023-07-03 |   150 |  150 -- 150->140->150
 company1 | 2023-07-04 |   140 |     
 company1 | 2023-07-05 |   150 |  150 -- 150->90->110->130
 company1 | 2023-07-06 |    90 |     
 company1 | 2023-07-07 |   110 |     
 company1 | 2023-07-08 |   130 |     
 company1 | 2023-07-09 |   120 |     
 company1 | 2023-07-10 |   130 |     

rpr shows the first row if all the patterns are satisfied. In the
example above 200, 150, 150 are the cases.  Other rows are shown as
NULL. For example, on 2023-07-02 price starts with 200, then goes down
to 150 then 140 but goes up 150 next day.

As far as I know, only Oracle implements RPR (only R010. R020 is not
implemented) among OSS/commercial RDBMSs. There are a few DWH software
having RPR. According to [2] they are Snowflake and MS Stream
Analytics. It seems Trino is another one[3].

- Note about the patch

The current implementation is not only a subset of the standard, but
is different from it in some places. The example above is written as
follows according to the standard:

SELECT company, tdate, startprice OVER w FROM stock
 WINDOW w AS (
 PARTITION BY company
 MEASURES
  START.price AS startprice
 ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
 PATTERN (START DOWN+ UP+)
 DEFINE
  START AS TRUE,
  UP AS UP.price > PREV(UP.price),
  DOWN AS DOWN.price < PREV(DOWN.price)
);

Notice that rpr(price) is written as START.price and startprice in the
standard. MEASURES defines variable names used in the target list used
with "OVER window". As OVER only allows functions in PostgreSQL, I had
to make up a window function "rpr" which performs the row pattern
recognition task.  I was not able to find a way to implement
expressions like START.price (START is not a table alias). Any
suggestion is greatly appreciated.

The differences from the standard include:

o MEASURES is not supported
o SUBSET is not supported
o FIRST, LAST, CLASSIFIER are not supported
o PREV/NEXT in the standard accept more complex arguments
o Regular expressions other than "+" are not supported
o Only AFTER MATCH SKIP TO NEXT ROW is supported (if AFTER MATCH is
  not specified, AFTER MATCH SKIP TO NEXT ROW is assumed. In the
  standard AFTER MATCH SKIP PAST LAST ROW is assumed in this case). I
  have a plan to implement AFTER MATCH SKIP PAST LAST ROW though.
o INITIAL or SEEK are not supported ((behaves as if INITIAL is specified)
o Aggregate functions associated with window clause using RPR do not respect RPR

It seems RPR in the standard is quite complex. I think we can start
with a small subset of RPR then we could gradually enhance the
implementation.

Comments and suggestions are welcome.

[1] https://sqlperformance.com/2019/04/t-sql-queries/row-pattern-recognition-in-sql
[2] https://link.springer.com/article/10.1007/s13222-022-00404-3
[3] https://trino.io/docs/current/sql/pattern-recognition-in-window.html
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Vik Fearing <vik@postgresfriends.org> — 2023-06-25T21:08:35Z

On 6/25/23 14:05, Tatsuo Ishii wrote:
> Attached is a PoC patch to implement "Row pattern recognition" (RPR)
> in SQL:2016 (I know SQL:2023 is already out, but I don't have access
> to it). Actually SQL:2016 defines RPR in two places[1]:
> 
>      Feature R010, “Row pattern recognition: FROM clause”
>      Feature R020, “Row pattern recognition: WINDOW clause”
> 
> The patch includes partial support for R020 part.

I have been dreaming of and lobbying for someone to pick up this 
feature.  I will be sure to review it from a standards perspective and 
will try my best to help with the technical aspect, but I am not sure to 
have the qualifications for that.

THANK YOU!

 > (I know SQL:2023 is already out, but I don't have access to it)

If you can, try to get ISO/IEC 19075-5 which is a guide to RPR instead 
of just its technical specification.

https://www.iso.org/standard/78936.html

> - What is RPR?
> 
> RPR provides a way to search series of data using regular expression
> patterns. Suppose you have a stock database.
> 
> CREATE TABLE stock (
>         company TEXT,
>         tdate DATE,
>         price BIGINT);
> 
> You want to find a "V-shaped" pattern: i.e. price goes down for 1 or
> more days, then goes up for 1 or more days. If we try to implement the
> query in PostgreSQL, it could be quite complex and inefficient.
> 
> RPR provides convenient way to implement the query.
> 
> SELECT company, tdate, price, rpr(price) OVER w FROM stock
>   WINDOW w AS (
>   PARTITION BY company
>   ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
>   PATTERN (START DOWN+ UP+)
>   DEFINE
>    START AS TRUE,
>    UP AS price > PREV(price),
>    DOWN AS price < PREV(price)
> );
> 
> "PATTERN" and "DEFINE" are the key clauses of RPR. DEFINE defines 3
> "row pattern variables" namely START, UP and DOWN. They are associated
> with logical expressions namely "TRUE", "price > PREV(price)", and
> "price < PREV(price)". Note that "PREV" function returns price column
> in the previous row. So, UP is true if price is higher than previous
> day. On the other hand, DOWN is true if price is lower than previous
> day.  PATTERN uses the row pattern variables to create a necessary
> pattern.  In this case, the first row is always match because START is
> always true, and second or more rows match with "UP" ('+' is a regular
> expression representing one or more), and subsequent rows match with
> "DOWN".
> 
> Here is the sample output.
> 
>   company  |   tdate    | price | rpr
> ----------+------------+-------+------
>   company1 | 2023-07-01 |   100 |
>   company1 | 2023-07-02 |   200 |  200 -- 200->150->140->150
>   company1 | 2023-07-03 |   150 |  150 -- 150->140->150
>   company1 | 2023-07-04 |   140 |
>   company1 | 2023-07-05 |   150 |  150 -- 150->90->110->130
>   company1 | 2023-07-06 |    90 |
>   company1 | 2023-07-07 |   110 |
>   company1 | 2023-07-08 |   130 |
>   company1 | 2023-07-09 |   120 |
>   company1 | 2023-07-10 |   130 |
> 
> rpr shows the first row if all the patterns are satisfied. In the
> example above 200, 150, 150 are the cases.  Other rows are shown as
> NULL. For example, on 2023-07-02 price starts with 200, then goes down
> to 150 then 140 but goes up 150 next day.

I don't understand this.  RPR in a window specification limits the 
window to the matched rows, so this looks like your rpr() function is 
just the regular first_value() window function that we already have?

> As far as I know, only Oracle implements RPR (only R010. R020 is not
> implemented) among OSS/commercial RDBMSs. There are a few DWH software
> having RPR. According to [2] they are Snowflake and MS Stream
> Analytics. It seems Trino is another one[3].

I thought DuckDB had it already, but it looks like I was wrong.

> - Note about the patch
> 
> The current implementation is not only a subset of the standard, but
> is different from it in some places. The example above is written as
> follows according to the standard:
> 
> SELECT company, tdate, startprice OVER w FROM stock
>   WINDOW w AS (
>   PARTITION BY company
>   MEASURES
>    START.price AS startprice
>   ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
>   PATTERN (START DOWN+ UP+)
>   DEFINE
>    START AS TRUE,
>    UP AS UP.price > PREV(UP.price),
>    DOWN AS DOWN.price < PREV(DOWN.price)
> );
> 
> Notice that rpr(price) is written as START.price and startprice in the
> standard. MEASURES defines variable names used in the target list used
> with "OVER window". As OVER only allows functions in PostgreSQL, I had
> to make up a window function "rpr" which performs the row pattern
> recognition task.  I was not able to find a way to implement
> expressions like START.price (START is not a table alias). Any
> suggestion is greatly appreciated.

As in your example, you cannot have START.price outside of the window 
specification; it can only go in the MEASURES clause.  Only startprice 
is allowed outside and it gets its qualification from the OVER.  Using 
w.startprice might have been better but that would require window names 
to be in the same namespace as range tables.

This currently works in Postgres:

   SELECT RANK() OVER w
   FROM (VALUES (1)) AS w (x)
   WINDOW w AS (ORDER BY w.x);

> The differences from the standard include:
> 
> o MEASURES is not supported
 > o FIRST, LAST, CLASSIFIER are not supported
 > o PREV/NEXT in the standard accept more complex arguments
 > o INITIAL or SEEK are not supported ((behaves as if INITIAL is specified)

Okay, for now.

> o SUBSET is not supported

Is this because you haven't done it yet, or because you ran into 
problems trying to do it?

> o Regular expressions other than "+" are not supported

This is what I had a hard time imagining how to do while thinking about 
it.  The grammar is so different here and we allow many more operators 
(like "?" which is also the standard parameter symbol).  People more 
expert than me will have to help here.

> o Only AFTER MATCH SKIP TO NEXT ROW is supported (if AFTER MATCH is
>    not specified, AFTER MATCH SKIP TO NEXT ROW is assumed. In the
>    standard AFTER MATCH SKIP PAST LAST ROW is assumed in this case). I
>    have a plan to implement AFTER MATCH SKIP PAST LAST ROW though.

In this case, we should require the user to specify AFTER MATCH SKIP TO 
NEXT ROW so that behavior doesn't change when we implement the standard 
default.  (Your patch might do this already.)

> o Aggregate functions associated with window clause using RPR do not respect RPR

I do not understand what this means.

> It seems RPR in the standard is quite complex. I think we can start
> with a small subset of RPR then we could gradually enhance the
> implementation.

I have no problem with that as long as we don't paint ourselves into a 
corner.

> Comments and suggestions are welcome.

I have not looked at the patch yet, but is the reason for doing R020 
before R010 because you haven't done the MEASURES clause yet?

In any case, I will be watching this with a close eye, and I am eager to 
help in any way I can.
-- 
Vik Fearing

Re: Row pattern recognition

Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-06-26T01:05:20Z

> I have been dreaming of and lobbying for someone to pick up this
> feature.  I will be sure to review it from a standards perspective and
> will try my best to help with the technical aspect, but I am not sure
> to have the qualifications for that.
> 
> THANK YOU!

Thank you for looking into my proposal.

>> (I know SQL:2023 is already out, but I don't have access to it)
> 
> If you can, try to get ISO/IEC 19075-5 which is a guide to RPR instead
> of just its technical specification.
> 
> https://www.iso.org/standard/78936.html

Thanks for the info.

> I don't understand this.  RPR in a window specification limits the
> window to the matched rows, so this looks like your rpr() function is
> just the regular first_value() window function that we already have?

No, rpr() is different from first_value(). rpr() returns the argument
value at the first row in a frame only when matched rows found. On the
other hand first_value() returns the argument value at the first row
in a frame unconditionally.

company  |   tdate    | price | rpr  | first_value 
----------+------------+-------+------+-------------
 company1 | 2023-07-01 |   100 |      |         100
 company1 | 2023-07-02 |   200 |  200 |         200
 company1 | 2023-07-03 |   150 |  150 |         150
 company1 | 2023-07-04 |   140 |      |         140
 company1 | 2023-07-05 |   150 |  150 |         150
 company1 | 2023-07-06 |    90 |      |          90
 company1 | 2023-07-07 |   110 |      |         110
 company1 | 2023-07-08 |   130 |      |         130
 company1 | 2023-07-09 |   120 |      |         120
 company1 | 2023-07-10 |   130 |      |         130

For example, a frame starting with (tdate = 2023-07-02, price = 200)
consists of rows (price = 200, 150, 140, 150) satisfying the pattern,
thus rpr returns 200. Since in this example frame option "ROWS BETWEEN
CURRENT ROW AND UNBOUNDED FOLLOWING" is specified, next frame starts
with (tdate = 2023-07-03, price = 150). This frame satisfies the
pattern too (price = 150, 140, 150), and rpr retus 150... and so on.

> As in your example, you cannot have START.price outside of the window
> specification; it can only go in the MEASURES clause.  Only startprice
> is allowed outside and it gets its qualification from the OVER.  Using
> w.startprice might have been better but that would require window
> names to be in the same namespace as range tables.
> 
> This currently works in Postgres:
> 
>   SELECT RANK() OVER w
>   FROM (VALUES (1)) AS w (x)
>   WINDOW w AS (ORDER BY w.x);

Interesting.

>> o SUBSET is not supported
> 
> Is this because you haven't done it yet, or because you ran into
> problems trying to do it?

Because it seems SUBSET is not useful without MEASURES support. Thus
my plan is, firstly implement MEASURES, then SUBSET. What do you
think?

>> o Regular expressions other than "+" are not supported
> 
> This is what I had a hard time imagining how to do while thinking
> about it.  The grammar is so different here and we allow many more
> operators (like "?" which is also the standard parameter symbol).
> People more expert than me will have to help here.

Yes, that is a problem.

> In this case, we should require the user to specify AFTER MATCH SKIP
> TO NEXT ROW so that behavior doesn't change when we implement the
> standard default.  (Your patch might do this already.)

Agreed. I will implement AFTER MATCH SKIP PAST LAST ROW in the next
patch and I will change the default to AFTER MATCH SKIP PAST LAST ROW.

>> o Aggregate functions associated with window clause using RPR do not
>> respect RPR
> 
> I do not understand what this means.

Ok, let me explain. See example below. In my understanding "count"
should retun the number of rows in a frame restriced by the match
condition. For example at the first line (2023-07-01 | 100) count
returns 10. I think this should be 0 because the "restriced" frame
starting at the line contains no matched row. On the other hand the
(restricted) frame starting at second line (2023-07-02 | 200) contains
4 rows, thus count should return 4, instead of 9.

SELECT company, tdate, price, rpr(price) OVER w, count(*) OVER w FROM stock
 WINDOW w AS (
 PARTITION BY company
 ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
 PATTERN (START DOWN+ UP+)
 DEFINE
  START AS TRUE,
  UP AS price > PREV(price),
  DOWN AS price < PREV(price)
);

company  |   tdate    | price | rpr  | count 
----------+------------+-------+------+-------
 company1 | 2023-07-01 |   100 |      |    10
 company1 | 2023-07-02 |   200 |  200 |     9
 company1 | 2023-07-03 |   150 |  150 |     8
 company1 | 2023-07-04 |   140 |      |     7
 company1 | 2023-07-05 |   150 |  150 |     6
 company1 | 2023-07-06 |    90 |      |     5
 company1 | 2023-07-07 |   110 |      |     4
 company1 | 2023-07-08 |   130 |      |     3
 company1 | 2023-07-09 |   120 |      |     2
 company1 | 2023-07-10 |   130 |      |     1

>> It seems RPR in the standard is quite complex. I think we can start
>> with a small subset of RPR then we could gradually enhance the
>> implementation.
> 
> I have no problem with that as long as we don't paint ourselves into a
> corner.

Totally agreed.

>> Comments and suggestions are welcome.
> 
> I have not looked at the patch yet, but is the reason for doing R020
> before R010 because you haven't done the MEASURES clause yet?

One of the reasons is, implementing MATCH_RECOGNIZE (R010) looked
harder for me because modifying main SELECT clause could be a hard
work. Another reason is, I had no idea how to implement PREV/NEXT in
other than in WINDOW clause. Other people might feel differently
though.

> In any case, I will be watching this with a close eye, and I am eager
> to help in any way I can.

Thank you! I am looking forward to comments on my patch.  Also any
idea how to implement MEASURES clause is welcome.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-06-26T08:45:07Z

>> In this case, we should require the user to specify AFTER MATCH SKIP
>> TO NEXT ROW so that behavior doesn't change when we implement the
>> standard default.  (Your patch might do this already.)
> 
> Agreed. I will implement AFTER MATCH SKIP PAST LAST ROW in the next
> patch and I will change the default to AFTER MATCH SKIP PAST LAST ROW.

Attached is the v2 patch to add support for AFTER MATCH SKIP PAST LAST
ROW and AFTER MATCH SKIP PAST LAST ROW. The default is AFTER MATCH
SKIP PAST LAST ROW as the standard default. Here are some examples to
demonstrate how those clauses affect the query result.

SELECT i, rpr(i) OVER w
  FROM (VALUES (1), (2), (3), (4)) AS v (i)
  WINDOW w AS (
   ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
   AFTER MATCH SKIP PAST LAST ROW
   PATTERN (A B)
   DEFINE
    A AS i <= 2,
    B AS i <= 3
);
 i | rpr 
---+-----
 1 |   1
 2 |    
 3 |    
 4 |    
(4 rows)

In this example rpr starts from i = 1 and find that row i = 1
satisfies A, and row i = 2 satisfies B. Then rpr moves to row i = 3
and find that it does not satisfy A, thus the result is NULL. Same
thing can be said to row i = 4.

SELECT i, rpr(i) OVER w
  FROM (VALUES (1), (2), (3), (4)) AS v (i)
  WINDOW w AS (
   ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
   AFTER MATCH SKIP TO NEXT ROW
   PATTERN (A B)
   DEFINE
    A AS i <= 2,
    B AS i <= 3
);
 i | rpr 
---+-----
 1 |   1
 2 |   2
 3 |    
 4 |    
(4 rows)

In this example rpr starts from i = 1 and find that row i = 1
satisfies A, and row i = 2 satisfies B (same as above). Then rpr moves
to row i = 2, rather than 3 because AFTER MATCH SKIP TO NEXT ROW is
specified.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Vik Fearing <vik@postgresfriends.org> — 2023-06-26T22:38:20Z

On 6/26/23 03:05, Tatsuo Ishii wrote:
>> I don't understand this.  RPR in a window specification limits the
>> window to the matched rows, so this looks like your rpr() function is
>> just the regular first_value() window function that we already have?
> 
> No, rpr() is different from first_value(). rpr() returns the argument
> value at the first row in a frame only when matched rows found. On the
> other hand first_value() returns the argument value at the first row
> in a frame unconditionally.
> 
> company  |   tdate    | price | rpr  | first_value
> ----------+------------+-------+------+-------------
>   company1 | 2023-07-01 |   100 |      |         100
>   company1 | 2023-07-02 |   200 |  200 |         200
>   company1 | 2023-07-03 |   150 |  150 |         150
>   company1 | 2023-07-04 |   140 |      |         140
>   company1 | 2023-07-05 |   150 |  150 |         150
>   company1 | 2023-07-06 |    90 |      |          90
>   company1 | 2023-07-07 |   110 |      |         110
>   company1 | 2023-07-08 |   130 |      |         130
>   company1 | 2023-07-09 |   120 |      |         120
>   company1 | 2023-07-10 |   130 |      |         130
> 
> For example, a frame starting with (tdate = 2023-07-02, price = 200)
> consists of rows (price = 200, 150, 140, 150) satisfying the pattern,
> thus rpr returns 200. Since in this example frame option "ROWS BETWEEN
> CURRENT ROW AND UNBOUNDED FOLLOWING" is specified, next frame starts
> with (tdate = 2023-07-03, price = 150). This frame satisfies the
> pattern too (price = 150, 140, 150), and rpr retus 150... and so on.


Okay, I see the problem now, and why you need the rpr() function.

You are doing this as something that happens over a window frame, but it 
is actually something that *reduces* the window frame.  The pattern 
matching needs to be done when the frame is calculated and not when any 
particular function is applied over it.

This query (with all the defaults made explicit):

SELECT s.company, s.tdate, s.price,
        FIRST_VALUE(s.tdate) OVER w,
        LAST_VALUE(s.tdate) OVER w,
        lowest OVER w
FROM stock AS s
WINDOW w AS (
   PARTITION BY s.company
   ORDER BY s.tdate
   ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
   EXCLUDE NO OTHERS
   MEASURES
     LAST(DOWN) AS lowest
   AFTER MATCH SKIP PAST LAST ROW
   INITIAL PATTERN (START DOWN+ UP+)
   DEFINE
     START AS TRUE,
     UP AS price > PREV(price),
     DOWN AS price < PREV(price)
);

Should produce this result:

  company  |   tdate    | price | first_value | last_value | lowest
----------+------------+-------+-------------+------------+--------
  company1 | 07-01-2023 |   100 |             |            |
  company1 | 07-02-2023 |   200 | 07-02-2023  | 07-05-2023 |    140
  company1 | 07-03-2023 |   150 |             |            |
  company1 | 07-04-2023 |   140 |             |            |
  company1 | 07-05-2023 |   150 |             |            |
  company1 | 07-06-2023 |    90 |             |            |
  company1 | 07-07-2023 |   110 |             |            |
  company1 | 07-08-2023 |   130 | 07-05-2023  | 07-05-2023 |    120
  company1 | 07-09-2023 |   120 |             |            |
  company1 | 07-10-2023 |   130 |             |            |
(10 rows)

Or if we switch to AFTER MATCH SKIP TO NEXT ROW, then we get:

  company  |   tdate    | price | first_value | last_value | lowest
----------+------------+-------+-------------+------------+--------
  company1 | 07-01-2023 |   100 |             |            |
  company1 | 07-02-2023 |   200 | 07-02-2023  | 07-05-2023 |    140
  company1 | 07-03-2023 |   150 | 07-03-2023  | 07-05-2023 |    140
  company1 | 07-04-2023 |   140 |             |            |
  company1 | 07-05-2023 |   150 | 07-05-2023  | 07-08-2023 |     90
  company1 | 07-06-2023 |    90 |             |            |
  company1 | 07-07-2023 |   110 |             |            |
  company1 | 07-08-2023 |   130 | 07-08-2023  | 07-10-2023 |    120
  company1 | 07-09-2023 |   120 |             |            |
  company1 | 07-10-2023 |   130 |             |            |
(10 rows)

And then if we change INITIAL to SEEK:

  company  |   tdate    | price | first_value | last_value | lowest
----------+------------+-------+-------------+------------+--------
  company1 | 07-01-2023 |   100 | 07-02-2023  | 07-05-2023 |    140
  company1 | 07-02-2023 |   200 | 07-02-2023  | 07-05-2023 |    140
  company1 | 07-03-2023 |   150 | 07-03-2023  | 07-05-2023 |    140
  company1 | 07-04-2023 |   140 | 07-05-2023  | 07-08-2023 |     90
  company1 | 07-05-2023 |   150 | 07-05-2023  | 07-08-2023 |     90
  company1 | 07-06-2023 |    90 | 07-08-2023  | 07-10-2023 |    120
  company1 | 07-07-2023 |   110 | 07-08-2023  | 07-10-2023 |    120
  company1 | 07-08-2023 |   130 | 07-08-2023  | 07-10-2023 |    120
  company1 | 07-09-2023 |   120 |             |            |
  company1 | 07-10-2023 |   130 |             |            |
(10 rows)

Since the pattern recognition is part of the frame, the window 
aggregates should Just Work.


>>> o SUBSET is not supported
>>
>> Is this because you haven't done it yet, or because you ran into
>> problems trying to do it?
> 
> Because it seems SUBSET is not useful without MEASURES support. Thus
> my plan is, firstly implement MEASURES, then SUBSET. What do you
> think?


SUBSET elements can be used in DEFINE clauses, but I do not think this 
is important compared to other features.


>>> Comments and suggestions are welcome.
>>
>> I have not looked at the patch yet, but is the reason for doing R020
>> before R010 because you haven't done the MEASURES clause yet?
> 
> One of the reasons is, implementing MATCH_RECOGNIZE (R010) looked
> harder for me because modifying main SELECT clause could be a hard
> work. Another reason is, I had no idea how to implement PREV/NEXT in
> other than in WINDOW clause. Other people might feel differently
> though.


I think we could do this with a single tuplesort if we use backtracking 
(which might be really slow for some patterns).  I have not looked into 
it in any detail.

We would need to be able to remove tuples from the end (even if only 
logically), and be able to update tuples inside the store.  Both of 
those needs come from backtracking and possibly changing the classifier.

Without backtracking, I don't see how we could do it without have a 
separate tuplestore for every current possible match.


>> In any case, I will be watching this with a close eye, and I am eager
>> to help in any way I can.
> 
> Thank you! I am looking forward to comments on my patch.  Also any
> idea how to implement MEASURES clause is welcome.


I looked at your v2 patches a little bit and the only comment that I 
currently have on the code is you spelled PERMUTE as PREMUTE. 
Everything else is hopefully explained above.
-- 
Vik Fearing

Re: Row pattern recognition

Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-06-28T00:58:19Z

> Okay, I see the problem now, and why you need the rpr() function.
> 
> You are doing this as something that happens over a window frame, but
> it is actually something that *reduces* the window frame.  The pattern
> matching needs to be done when the frame is calculated and not when
> any particular function is applied over it.

Yes. (I think the standard calls the window frame as "full window
frame" in context of RPR to make a contrast with the subset of the
frame rows restricted by RPR. The paper I refered to as [2] claims
that the latter window frame is called "reduced window frame" in the
standard but I wasn't able to find the term in the standard.)

I wanted to demonstate that pattern matching logic is basically
correct in the PoC patch. Now what I need to do is, move the row
pattern matching logic to somewhere inside nodeWindowAgg so that
"restricted window frame" can be applied to all window functions and
window aggregates. Currently I am looking into update_frameheadpos()
and update_frametailpos() which calculate the frame head and tail
against current row. What do you think?

> This query (with all the defaults made explicit):
> 
> SELECT s.company, s.tdate, s.price,
>        FIRST_VALUE(s.tdate) OVER w,
>        LAST_VALUE(s.tdate) OVER w,
>        lowest OVER w
> FROM stock AS s
> WINDOW w AS (
>   PARTITION BY s.company
>   ORDER BY s.tdate
>   ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
>   EXCLUDE NO OTHERS
>   MEASURES
>     LAST(DOWN) AS lowest
>   AFTER MATCH SKIP PAST LAST ROW
>   INITIAL PATTERN (START DOWN+ UP+)
>   DEFINE
>     START AS TRUE,
>     UP AS price > PREV(price),
>     DOWN AS price < PREV(price)
> );
> 
> Should produce this result:

[snip]

Thanks for the examples. I agree with the expected query results.

>>>> o SUBSET is not supported
>>>
>>> Is this because you haven't done it yet, or because you ran into
>>> problems trying to do it?
>> Because it seems SUBSET is not useful without MEASURES support. Thus
>> my plan is, firstly implement MEASURES, then SUBSET. What do you
>> think?
> 
> 
> SUBSET elements can be used in DEFINE clauses, but I do not think this
> is important compared to other features.

Ok.

>>> I have not looked at the patch yet, but is the reason for doing R020
>>> before R010 because you haven't done the MEASURES clause yet?
>> One of the reasons is, implementing MATCH_RECOGNIZE (R010) looked
>> harder for me because modifying main SELECT clause could be a hard
>> work. Another reason is, I had no idea how to implement PREV/NEXT in
>> other than in WINDOW clause. Other people might feel differently
>> though.
> 
> 
> I think we could do this with a single tuplesort if we use
> backtracking (which might be really slow for some patterns).  I have
> not looked into it in any detail.
> 
> We would need to be able to remove tuples from the end (even if only
> logically), and be able to update tuples inside the store.  Both of
> those needs come from backtracking and possibly changing the
> classifier.
> 
> Without backtracking, I don't see how we could do it without have a
> separate tuplestore for every current possible match.

Maybe an insane idea but what about rewriting MATCH_RECOGNIZE clause
into Window clause with RPR?

> I looked at your v2 patches a little bit and the only comment that I
> currently have on the code is you spelled PERMUTE as
> PREMUTE. Everything else is hopefully explained above.

Thanks. Will fix.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-06-28T12:17:00Z

Small question.

> This query (with all the defaults made explicit):
> 
> SELECT s.company, s.tdate, s.price,
>        FIRST_VALUE(s.tdate) OVER w,
>        LAST_VALUE(s.tdate) OVER w,
>        lowest OVER w
> FROM stock AS s
> WINDOW w AS (
>   PARTITION BY s.company
>   ORDER BY s.tdate
>   ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
>   EXCLUDE NO OTHERS
>   MEASURES
>     LAST(DOWN) AS lowest
>   AFTER MATCH SKIP PAST LAST ROW
>   INITIAL PATTERN (START DOWN+ UP+)
>   DEFINE
>     START AS TRUE,
>     UP AS price > PREV(price),
>     DOWN AS price < PREV(price)
> );

>     LAST(DOWN) AS lowest

should be "LAST(DOWN.price) AS lowest"?

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Vik Fearing <vik@postgresfriends.org> — 2023-06-28T22:30:43Z

On 6/28/23 14:17, Tatsuo Ishii wrote:
> Small question.
> 
>> This query (with all the defaults made explicit):
>>
>> SELECT s.company, s.tdate, s.price,
>>         FIRST_VALUE(s.tdate) OVER w,
>>         LAST_VALUE(s.tdate) OVER w,
>>         lowest OVER w
>> FROM stock AS s
>> WINDOW w AS (
>>    PARTITION BY s.company
>>    ORDER BY s.tdate
>>    ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
>>    EXCLUDE NO OTHERS
>>    MEASURES
>>      LAST(DOWN) AS lowest
>>    AFTER MATCH SKIP PAST LAST ROW
>>    INITIAL PATTERN (START DOWN+ UP+)
>>    DEFINE
>>      START AS TRUE,
>>      UP AS price > PREV(price),
>>      DOWN AS price < PREV(price)
>> );
> 
>>      LAST(DOWN) AS lowest
> 
> should be "LAST(DOWN.price) AS lowest"?

Yes, it should be.  And the tdate='07-08-2023' row in the first 
resultset should have '07-08-2023' and '07-10-2023' as its 4th and 5th 
columns.

Since my brain is doing the processing instead of postgres, I made some 
human errors. :-)
-- 
Vik Fearing

Re: Row pattern recognition

Jacob Champion <jchampion@timescale.com> — 2023-07-19T16:30:40Z

Hello,

Thanks for working on this! We're interested in RPR as well, and I've
been trying to get up to speed with the specs, to maybe make myself
useful.

On 6/27/23 17:58, Tatsuo Ishii wrote:
> Yes. (I think the standard calls the window frame as "full window
> frame" in context of RPR to make a contrast with the subset of the
> frame rows restricted by RPR. The paper I refered to as [2] claims
> that the latter window frame is called "reduced window frame" in the
> standard but I wasn't able to find the term in the standard.)

19075-5 discusses that, at least; not sure about other parts of the spec.

> Maybe an insane idea but what about rewriting MATCH_RECOGNIZE clause
> into Window clause with RPR?

Are we guaranteed to always have an equivalent window clause? There seem
to be many differences between the two, especially when it comes to ONE
ROW/ALL ROWS PER MATCH.

--

To add onto what Vik said above:

>> It seems RPR in the standard is quite complex. I think we can start
>> with a small subset of RPR then we could gradually enhance the
>> implementation.
> 
> I have no problem with that as long as we don't paint ourselves into a 
> corner.

To me, PATTERN looks like an area where we may want to support a broader
set of operations in the first version. The spec has a bunch of
discussion around cases like empty matches, match order of alternation
and permutation, etc., which are not possible to express or test with
only the + quantifier. Those might be harder to get right in a v2, if we
don't at least keep them in mind for v1?

> +static List *
> +transformPatternClause(ParseState *pstate, WindowClause *wc, WindowDef *windef)
> +{
> +   List        *patterns;

My compiler complains about the `patterns` variable here, which is
returned without ever being initialized. (The caller doesn't seem to use
it.)

> +-- basic test using PREV
> +SELECT company, tdate, price, rpr(price) OVER w FROM stock
> + WINDOW w AS (
> + PARTITION BY company
> + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
> + INITIAL
> + PATTERN (START UP+ DOWN+)
> + DEFINE
> +  START AS TRUE,
> +  UP AS price > PREV(price),
> +  DOWN AS price < PREV(price)
> +);

nitpick: IMO the tests should be making use of ORDER BY in the window
clauses.

This is a very big feature. I agree with you that MEASURES seems like a
very important "next piece" to add. Are there other areas where you
would like reviewers to focus on right now (or avoid)?

Thanks!
--Jacob

Re: Row pattern recognition

Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-07-20T05:15:13Z

> Hello,
> 
> Thanks for working on this! We're interested in RPR as well, and I've
> been trying to get up to speed with the specs, to maybe make myself
> useful.

Thank you for being interested in this.

> 19075-5 discusses that, at least; not sure about other parts of the spec.

Thanks for the info. Unfortunately I don't have 19075-5 though.

>> Maybe an insane idea but what about rewriting MATCH_RECOGNIZE clause
>> into Window clause with RPR?
> 
> Are we guaranteed to always have an equivalent window clause? There seem
> to be many differences between the two, especially when it comes to ONE
> ROW/ALL ROWS PER MATCH.

You are right. I am not 100% sure if the rewriting is possible at this
point.

> To add onto what Vik said above:
> 
>>> It seems RPR in the standard is quite complex. I think we can start
>>> with a small subset of RPR then we could gradually enhance the
>>> implementation.
>> 
>> I have no problem with that as long as we don't paint ourselves into a 
>> corner.
> 
> To me, PATTERN looks like an area where we may want to support a broader
> set of operations in the first version.

Me too but...

> The spec has a bunch of
> discussion around cases like empty matches, match order of alternation
> and permutation, etc., which are not possible to express or test with
> only the + quantifier. Those might be harder to get right in a v2, if we
> don't at least keep them in mind for v1?

Currently my patch has a limitation for the sake of simple
implementation: a pattern like "A+" is parsed and analyzed in the raw
parser. This makes subsequent process much easier because the pattern
element variable (in this case "A") and the quantifier (in this case
"+") is already identified by the raw parser. However there are much
more cases are allowed in the standard as you already pointed out. For
those cases probably we should give up to parse PATTERN items in the
raw parser, and instead the raw parser just accepts the elements as
Sconst?

>> +static List *
>> +transformPatternClause(ParseState *pstate, WindowClause *wc, WindowDef *windef)
>> +{
>> +   List        *patterns;
> 
> My compiler complains about the `patterns` variable here, which is
> returned without ever being initialized. (The caller doesn't seem to use
> it.)

Will fix.

>> +-- basic test using PREV
>> +SELECT company, tdate, price, rpr(price) OVER w FROM stock
>> + WINDOW w AS (
>> + PARTITION BY company
>> + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
>> + INITIAL
>> + PATTERN (START UP+ DOWN+)
>> + DEFINE
>> +  START AS TRUE,
>> +  UP AS price > PREV(price),
>> +  DOWN AS price < PREV(price)
>> +);
> 
> nitpick: IMO the tests should be making use of ORDER BY in the window
> clauses.

Right. Will fix.

> This is a very big feature. I agree with you that MEASURES seems like a
> very important "next piece" to add. Are there other areas where you
> would like reviewers to focus on right now (or avoid)?

Any comments, especially on the PREV/NEXT implementation part is
welcome. Currently the DEFINE expression like "price > PREV(price)" is
prepared in ExecInitWindowAgg using ExecInitExpr,tweaking var->varno
in Var node so that PREV uses OUTER_VAR, NEXT uses INNER_VAR.  Then
evaluate the expression in ExecWindowAgg using ExecEvalExpr, setting
previous row TupleSlot to ExprContext->ecxt_outertuple, and next row
TupleSlot to ExprContext->ecxt_innertuple. I think this is temporary
hack and should be gotten ride of before v1 is committed. Better idea?

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Jacob Champion <jchampion@timescale.com> — 2023-07-20T23:36:37Z

Hi Ishii-san,

On 7/19/23 22:15, Tatsuo Ishii wrote:
> Currently my patch has a limitation for the sake of simple
> implementation: a pattern like "A+" is parsed and analyzed in the raw
> parser. This makes subsequent process much easier because the pattern
> element variable (in this case "A") and the quantifier (in this case
> "+") is already identified by the raw parser. However there are much
> more cases are allowed in the standard as you already pointed out. For
> those cases probably we should give up to parse PATTERN items in the
> raw parser, and instead the raw parser just accepts the elements as
> Sconst?

Is there a concern that the PATTERN grammar can't be represented in
Bison? I thought it was all context-free... Or is the concern that the
parse tree of the pattern is hard to feed into a regex engine?

> Any comments, especially on the PREV/NEXT implementation part is
> welcome. Currently the DEFINE expression like "price > PREV(price)" is
> prepared in ExecInitWindowAgg using ExecInitExpr,tweaking var->varno
> in Var node so that PREV uses OUTER_VAR, NEXT uses INNER_VAR.  Then
> evaluate the expression in ExecWindowAgg using ExecEvalExpr, setting
> previous row TupleSlot to ExprContext->ecxt_outertuple, and next row
> TupleSlot to ExprContext->ecxt_innertuple. I think this is temporary
> hack and should be gotten ride of before v1 is committed. Better idea?

I'm not familiar enough with this code yet to offer very concrete
suggestions, sorry... But at some point in the future, we need to be
able to skip forward and backward from arbitrary points, like

    DEFINE B AS B.price > PREV(FIRST(A.price), 3)

so there won't be just one pair of "previous and next" tuples. Maybe
that can help clarify the design? It feels like it'll need to eventually
be a "real" function that operates on the window state, even if it
doesn't support all of the possible complexities in v1.

--

Taking a closer look at the regex engine:

It looks like the + qualifier has trouble when it meets the end of the
frame. For instance, try removing the last row of the 'stock' table in
rpr.sql; some of the final matches will disappear unexpectedly. Or try a
pattern like

    PATTERN ( a+ )
     DEFINE a AS TRUE

which doesn't seem to match anything in my testing.

There's also the issue of backtracking in the face of reclassification,
as I think Vik was alluding to upthread. The pattern

    PATTERN ( a+ b+ )
     DEFINE a AS col = 2,
            b AS col = 2

doesn't match a sequence of values (2 2 ...) with the current
implementation, even with a dummy row at the end to avoid the
end-of-frame bug.

(I've attached two failing tests against v2, to hopefully better
illustrate, along with what I _think_ should be the correct results.)

I'm not quite understanding the match loop in evaluate_pattern(). It
looks like we're building up a string to pass to the regex engine, but
by the we call regexp_instr, don't we already know whether or not the
pattern will match based on the expression evaluation we've done?

Thanks,
--Jacob

Re: Row pattern recognition

Vik Fearing <vik@postgresfriends.org> — 2023-07-21T00:07:44Z

On 7/21/23 01:36, Jacob Champion wrote:
> There's also the issue of backtracking in the face of reclassification,
> as I think Vik was alluding to upthread.

We definitely need some kind of backtracking or other reclassification 
method.

> (I've attached two failing tests against v2, to hopefully better
> illustrate, along with what I_think_  should be the correct results.)

Almost.  You are matching 07-01-2023 but the condition is "price > 100".
-- 
Vik Fearing

Re: Row pattern recognition

Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-07-21T06:16:48Z

Hi,

> Hi Ishii-san,
> 
> On 7/19/23 22:15, Tatsuo Ishii wrote:
>> Currently my patch has a limitation for the sake of simple
>> implementation: a pattern like "A+" is parsed and analyzed in the raw
>> parser. This makes subsequent process much easier because the pattern
>> element variable (in this case "A") and the quantifier (in this case
>> "+") is already identified by the raw parser. However there are much
>> more cases are allowed in the standard as you already pointed out. For
>> those cases probably we should give up to parse PATTERN items in the
>> raw parser, and instead the raw parser just accepts the elements as
>> Sconst?
> 
> Is there a concern that the PATTERN grammar can't be represented in
> Bison? I thought it was all context-free...

I don't know at this point. I think context-free is not enough to be
repsented in Bison. The grammer also needs to be LALR(1).  Moreover,
adding the grammer to existing parser may generate shift/reduce
errors.

> Or is the concern that the
> parse tree of the pattern is hard to feed into a regex engine?

One small concern is how to convert pattern variables to regex
expression which our regex enginer understands. Suppose,

PATTERN UP+

Currently I convert "UP+" to "U+" so that it can be fed to the regexp
engine. In order to do that, we need to know which part of the pattern
(UP+) is the pattern variable ("UP"). For "UP+" it's quite easy. But
for more complex regular expressions it would be not, unless PATTERN
grammer can be analyzed by our parser to know which part is the
pattern variable.

>> Any comments, especially on the PREV/NEXT implementation part is
>> welcome. Currently the DEFINE expression like "price > PREV(price)" is
>> prepared in ExecInitWindowAgg using ExecInitExpr,tweaking var->varno
>> in Var node so that PREV uses OUTER_VAR, NEXT uses INNER_VAR.  Then
>> evaluate the expression in ExecWindowAgg using ExecEvalExpr, setting
>> previous row TupleSlot to ExprContext->ecxt_outertuple, and next row
>> TupleSlot to ExprContext->ecxt_innertuple. I think this is temporary
>> hack and should be gotten ride of before v1 is committed. Better idea?
> 
> I'm not familiar enough with this code yet to offer very concrete
> suggestions, sorry... But at some point in the future, we need to be
> able to skip forward and backward from arbitrary points, like
> 
>     DEFINE B AS B.price > PREV(FIRST(A.price), 3)
> 
> so there won't be just one pair of "previous and next" tuples.

Yes, I know.

> Maybe
> that can help clarify the design? It feels like it'll need to eventually
> be a "real" function that operates on the window state, even if it
> doesn't support all of the possible complexities in v1.

Unfortunately an window function can not call other window functions.

> Taking a closer look at the regex engine:
> 
> It looks like the + qualifier has trouble when it meets the end of the
> frame. For instance, try removing the last row of the 'stock' table in
> rpr.sql; some of the final matches will disappear unexpectedly. Or try a
> pattern like
> 
>     PATTERN ( a+ )
>      DEFINE a AS TRUE
> 
> which doesn't seem to match anything in my testing.
> 
> There's also the issue of backtracking in the face of reclassification,
> as I think Vik was alluding to upthread. The pattern
> 
>     PATTERN ( a+ b+ )
>      DEFINE a AS col = 2,
>             b AS col = 2
> 
> doesn't match a sequence of values (2 2 ...) with the current
> implementation, even with a dummy row at the end to avoid the
> end-of-frame bug.
> 
> (I've attached two failing tests against v2, to hopefully better
> illustrate, along with what I _think_ should be the correct results.)

Thanks. I will look into this.

> I'm not quite understanding the match loop in evaluate_pattern(). It
> looks like we're building up a string to pass to the regex engine, but
> by the we call regexp_instr, don't we already know whether or not the
> pattern will match based on the expression evaluation we've done?

For "+" yes. But for more complex regular expression like '{n}', we
need to call our regexp engine to check if the pattern matches.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Jacob Champion <jchampion@timescale.com> — 2023-07-21T23:14:12Z

On 7/20/23 17:07, Vik Fearing wrote:
> On 7/21/23 01:36, Jacob Champion wrote:
>> (I've attached two failing tests against v2, to hopefully better
>> illustrate, along with what I_think_  should be the correct results.)
> 
> Almost.  You are matching 07-01-2023 but the condition is "price > 100".

D'oh. Correction attached. I think :)

Thanks,
--Jacob

Re: Row pattern recognition

Jacob Champion <jchampion@timescale.com> — 2023-07-21T23:16:18Z

On 7/20/23 23:16, Tatsuo Ishii wrote:
> I don't know at this point. I think context-free is not enough to be
> repsented in Bison. The grammer also needs to be LALR(1).  Moreover,
> adding the grammer to existing parser may generate shift/reduce
> errors.

Ah. It's been too long since my compilers classes; I will pipe down.

> One small concern is how to convert pattern variables to regex
> expression which our regex enginer understands. Suppose,
> 
> PATTERN UP+
> 
> Currently I convert "UP+" to "U+" so that it can be fed to the regexp
> engine. In order to do that, we need to know which part of the pattern
> (UP+) is the pattern variable ("UP"). For "UP+" it's quite easy. But
> for more complex regular expressions it would be not, unless PATTERN
> grammer can be analyzed by our parser to know which part is the
> pattern variable.

Is the eventual plan to generate multiple alternatives, and run the
regex against them one at a time?

>> I'm not familiar enough with this code yet to offer very concrete
>> suggestions, sorry... But at some point in the future, we need to be
>> able to skip forward and backward from arbitrary points, like
>>
>>     DEFINE B AS B.price > PREV(FIRST(A.price), 3)
>>
>> so there won't be just one pair of "previous and next" tuples.
> 
> Yes, I know.

I apologize. I got overexplain-y.

>> Maybe
>> that can help clarify the design? It feels like it'll need to eventually
>> be a "real" function that operates on the window state, even if it
>> doesn't support all of the possible complexities in v1.
> 
> Unfortunately an window function can not call other window functions.

Can that restriction be lifted for the EXPR_KIND_RPR_DEFINE case? Or
does it make sense to split the pattern navigation "functions" into
their own new concept, and maybe borrow some of the window function
infrastructure for it?

Thanks!
--Jacob

Re: Row pattern recognition

Vik Fearing <vik@postgresfriends.org> — 2023-07-21T23:38:01Z

On 7/22/23 01:14, Jacob Champion wrote:
> On 7/20/23 17:07, Vik Fearing wrote:
>> On 7/21/23 01:36, Jacob Champion wrote:
>>> (I've attached two failing tests against v2, to hopefully better
>>> illustrate, along with what I_think_  should be the correct results.)
>>
>> Almost.  You are matching 07-01-2023 but the condition is "price > 100".
> 
> D'oh. Correction attached. I think :)

This looks correct to my human brain.  Thanks!
-- 
Vik Fearing

Re: Row pattern recognition

Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-07-22T01:11:49Z

>> One small concern is how to convert pattern variables to regex
>> expression which our regex enginer understands. Suppose,
>> 
>> PATTERN UP+
>> 
>> Currently I convert "UP+" to "U+" so that it can be fed to the regexp
>> engine. In order to do that, we need to know which part of the pattern
>> (UP+) is the pattern variable ("UP"). For "UP+" it's quite easy. But
>> for more complex regular expressions it would be not, unless PATTERN
>> grammer can be analyzed by our parser to know which part is the
>> pattern variable.
> 
> Is the eventual plan to generate multiple alternatives, and run the
> regex against them one at a time?

Yes, that's my plan.

>>> I'm not familiar enough with this code yet to offer very concrete
>>> suggestions, sorry... But at some point in the future, we need to be
>>> able to skip forward and backward from arbitrary points, like
>>>
>>>     DEFINE B AS B.price > PREV(FIRST(A.price), 3)
>>>
>>> so there won't be just one pair of "previous and next" tuples.
>> 
>> Yes, I know.
> 
> I apologize. I got overexplain-y.

No problem. Thank you for reminding me it.

>>> Maybe
>>> that can help clarify the design? It feels like it'll need to eventually
>>> be a "real" function that operates on the window state, even if it
>>> doesn't support all of the possible complexities in v1.
>> 
>> Unfortunately an window function can not call other window functions.
> 
> Can that restriction be lifted for the EXPR_KIND_RPR_DEFINE case?

I am not sure at this point. Current PostgreSQL executor creates
WindowStatePerFuncData for each window function and aggregate
appearing in OVER clause. This means PREV/NEXT and other row pattern
navigation operators cannot have their own WindowStatePerFuncData if
they do not appear in OVER clauses in a query even if PREV/NEXT
etc. are defined as window function.

> Or
> does it make sense to split the pattern navigation "functions" into
> their own new concept, and maybe borrow some of the window function
> infrastructure for it?

Maybe. Suppose a window function executes row pattern matching using
price > PREV(price). The window function already receives
WindowStatePerFuncData. If we can pass the WindowStatePerFuncData to
PREV, we could let PREV do the real work (getting previous tuple).
I have not tried this yet, though.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Vik Fearing <vik@postgresfriends.org> — 2023-07-22T02:54:43Z

On 7/22/23 03:11, Tatsuo Ishii wrote:
>>>> Maybe
>>>> that can help clarify the design? It feels like it'll need to eventually
>>>> be a "real" function that operates on the window state, even if it
>>>> doesn't support all of the possible complexities in v1.
>>> Unfortunately an window function can not call other window functions.
>> Can that restriction be lifted for the EXPR_KIND_RPR_DEFINE case?

> I am not sure at this point. Current PostgreSQL executor creates
> WindowStatePerFuncData for each window function and aggregate
> appearing in OVER clause. This means PREV/NEXT and other row pattern
> navigation operators cannot have their own WindowStatePerFuncData if
> they do not appear in OVER clauses in a query even if PREV/NEXT
> etc. are defined as window function.
> 
>> Or
>> does it make sense to split the pattern navigation "functions" into
>> their own new concept, and maybe borrow some of the window function
>> infrastructure for it?

> Maybe. Suppose a window function executes row pattern matching using
> price > PREV(price). The window function already receives
> WindowStatePerFuncData. If we can pass the WindowStatePerFuncData to
> PREV, we could let PREV do the real work (getting previous tuple).
> I have not tried this yet, though.


I don't understand this logic.  Window functions work over a window 
frame.  What we are talking about here is *defining* a window frame. 
How can a window function execute row pattern matching?
-- 
Vik Fearing

Re: Row pattern recognition

Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-07-22T06:14:46Z

> On 7/22/23 03:11, Tatsuo Ishii wrote:
>>>>> Maybe
>>>>> that can help clarify the design? It feels like it'll need to
>>>>> eventually
>>>>> be a "real" function that operates on the window state, even if it
>>>>> doesn't support all of the possible complexities in v1.
>>>> Unfortunately an window function can not call other window functions.
>>> Can that restriction be lifted for the EXPR_KIND_RPR_DEFINE case?
> 
>> I am not sure at this point. Current PostgreSQL executor creates
>> WindowStatePerFuncData for each window function and aggregate
>> appearing in OVER clause. This means PREV/NEXT and other row pattern
>> navigation operators cannot have their own WindowStatePerFuncData if
>> they do not appear in OVER clauses in a query even if PREV/NEXT
>> etc. are defined as window function.
>> 
>>> Or
>>> does it make sense to split the pattern navigation "functions" into
>>> their own new concept, and maybe borrow some of the window function
>>> infrastructure for it?
> 
>> Maybe. Suppose a window function executes row pattern matching using
>> price > PREV(price). The window function already receives
>> WindowStatePerFuncData. If we can pass the WindowStatePerFuncData to
>> PREV, we could let PREV do the real work (getting previous tuple).
>> I have not tried this yet, though.
> 
> 
> I don't understand this logic.  Window functions work over a window
> frame.

Yes.

> What we are talking about here is *defining* a window
> frame.

Well, we are defining a "reduced" window frame within a (full) window
frame. A "reduced" window frame is calculated each time when a window
function is called.

> How can a window function execute row pattern matching?

A window function is called for each row fed by an outer plan. It
fetches current, previous and next row to execute pattern matching. If
it matches, the window function moves to next row and repeat the
process, until pattern match fails.

Below is an example window function to execute pattern matching (I
will include this in the v3 patch). row_is_in_reduced_frame() is a
function to execute pattern matching. It returns the number of rows in
the reduced frame if pattern match succeeds. If succeeds, the function
returns the last row in the reduced frame instead of the last row in
the full window frame.

/*
 * last_value
 * return the value of VE evaluated on the last row of the
 * window frame, per spec.
 */
Datum
window_last_value(PG_FUNCTION_ARGS)
{
	WindowObject winobj = PG_WINDOW_OBJECT();
	Datum		result;
	bool		isnull;
	int64		abspos;
	int			num_reduced_frame;

	abspos = WinGetCurrentPosition(winobj);
	num_reduced_frame = row_is_in_reduced_frame(winobj, abspos);

	if (num_reduced_frame == 0)
		/* no RPR is involved */
		result = WinGetFuncArgInFrame(winobj, 0,
									  0, WINDOW_SEEK_TAIL, true,
									  &isnull, NULL);
	else if (num_reduced_frame > 0)
		/* get last row value in the reduced frame */
		result = WinGetFuncArgInFrame(winobj, 0,
									  num_reduced_frame - 1, WINDOW_SEEK_HEAD, true,
									  &isnull, NULL);
	else
		/* RPR is involved and current row is unmatched or skipped */
		isnull = true;

	if (isnull)
		PG_RETURN_NULL();

	PG_RETURN_DATUM(result);
}

Re: Row pattern recognition

Vik Fearing <vik@postgresfriends.org> — 2023-07-23T21:29:46Z

On 7/22/23 08:14, Tatsuo Ishii wrote:
>> On 7/22/23 03:11, Tatsuo Ishii wrote:
>>> Maybe. Suppose a window function executes row pattern matching using
>>> price > PREV(price). The window function already receives
>>> WindowStatePerFuncData. If we can pass the WindowStatePerFuncData to
>>> PREV, we could let PREV do the real work (getting previous tuple).
>>> I have not tried this yet, though.
>>
>> I don't understand this logic.  Window functions work over a window
>> frame.
> 
> Yes.
> 
>> What we are talking about here is *defining* a window
>> frame.
> 
> Well, we are defining a "reduced" window frame within a (full) window
> frame. A "reduced" window frame is calculated each time when a window
> function is called.


Why?  It should only be recalculated when the current row changes and we 
need a new frame.  The reduced window frame *is* the window frame for 
all functions over that window.


>> How can a window function execute row pattern matching?
> 
> A window function is called for each row fed by an outer plan. It
> fetches current, previous and next row to execute pattern matching. If
> it matches, the window function moves to next row and repeat the
> process, until pattern match fails.
> 
> Below is an example window function to execute pattern matching (I
> will include this in the v3 patch). row_is_in_reduced_frame() is a
> function to execute pattern matching. It returns the number of rows in
> the reduced frame if pattern match succeeds. If succeeds, the function
> returns the last row in the reduced frame instead of the last row in
> the full window frame.


I strongly disagree with this.  Window function do not need to know how 
the frame is defined, and indeed they should not.  WinGetFuncArgInFrame 
should answer yes or no and the window function just works on that. 
Otherwise we will get extension (and possibly even core) functions that 
don't handle the frame properly.
-- 
Vik Fearing

Re: Row pattern recognition

Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-07-24T00:22:40Z

>>> What we are talking about here is *defining* a window
>>> frame.
>> Well, we are defining a "reduced" window frame within a (full) window
>> frame. A "reduced" window frame is calculated each time when a window
>> function is called.
> 
> 
> Why?  It should only be recalculated when the current row changes and
> we need a new frame.  The reduced window frame *is* the window frame
> for all functions over that window.

We already recalculate a frame each time a row is processed even
without RPR. See ExecWindowAgg.

Also RPR always requires a frame option ROWS BETWEEN CURRENT ROW,
which means the frame head is changed each time current row position
changes.

> I strongly disagree with this.  Window function do not need to know
> how the frame is defined, and indeed they should not.

We already break the rule by defining *support functions. See
windowfuncs.c.

> WinGetFuncArgInFrame should answer yes or no and the window function
> just works on that. Otherwise we will get extension (and possibly even
> core) functions that don't handle the frame properly.

Maybe I can move row_is_in_reduced_frame into WinGetFuncArgInFrame
just for convenience.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Vik Fearing <vik@postgresfriends.org> — 2023-07-24T23:14:37Z

On 7/24/23 02:22, Tatsuo Ishii wrote:
>>>> What we are talking about here is *defining* a window
>>>> frame.
>>> Well, we are defining a "reduced" window frame within a (full) window
>>> frame. A "reduced" window frame is calculated each time when a window
>>> function is called.
>>
>>
>> Why?  It should only be recalculated when the current row changes and
>> we need a new frame.  The reduced window frame *is* the window frame
>> for all functions over that window.
> 
> We already recalculate a frame each time a row is processed even
> without RPR. See ExecWindowAgg.

Yes, after each row.  Not for each function.

> Also RPR always requires a frame option ROWS BETWEEN CURRENT ROW,
> which means the frame head is changed each time current row position
> changes.

Off topic for now: I wonder why this restriction is in place and whether 
we should respect or ignore it.  That is a discussion for another time, 
though.

>> I strongly disagree with this.  Window function do not need to know
>> how the frame is defined, and indeed they should not.
> 
> We already break the rule by defining *support functions. See
> windowfuncs.c.

The support functions don't know anything about the frame, they just 
know when a window function is monotonically increasing and execution 
can either stop or be "passed through".

>> WinGetFuncArgInFrame should answer yes or no and the window function
>> just works on that. Otherwise we will get extension (and possibly even
>> core) functions that don't handle the frame properly.
> 
> Maybe I can move row_is_in_reduced_frame into WinGetFuncArgInFrame
> just for convenience.

I have two comments about this:

It isn't just for convenience, it is for correctness.  The window 
functions do not need to know which rows they are *not* operating on.

There is no such thing as a "full" or "reduced" frame.  The standard 
uses those terms to explain the difference between before and after RPR 
is applied, but window functions do not get to choose which frame they 
apply over.  They only ever apply over the reduced window frame.
-- 
Vik Fearing

Re: Row pattern recognition

Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-07-25T12:35:04Z

Hi,

> diff --git a/src/test/regress/expected/rpr.out b/src/test/regress/expected/rpr.out
> index 6bf8818911..f3fd22de2a 100644
> --- a/src/test/regress/expected/rpr.out
> +++ b/src/test/regress/expected/rpr.out
> @@ -230,6 +230,79 @@ SELECT company, tdate, price, rpr(price) OVER w FROM stock
>   company2 | 07-10-2023 |  1300 |     
>  (20 rows)
>  
> +-- match everything
> +SELECT company, tdate, price, rpr(price) OVER w FROM stock
> + WINDOW w AS (
> + PARTITION BY company
> + ORDER BY tdate
> + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
> + AFTER MATCH SKIP TO NEXT ROW

It seems it's a result with AFTER MATCH SKIP PAST LAST ROW.

> + INITIAL
> + PATTERN (A+)
> + DEFINE
> +  A AS TRUE
> +);
> + company  |   tdate    | price | rpr 
> +----------+------------+-------+-----
> + company1 | 07-01-2023 |   100 | 100
> + company1 | 07-02-2023 |   200 |    
> + company1 | 07-03-2023 |   150 |    
> + company1 | 07-04-2023 |   140 |    
> + company1 | 07-05-2023 |   150 |    
> + company1 | 07-06-2023 |    90 |    
> + company1 | 07-07-2023 |   110 |    
> + company1 | 07-08-2023 |   130 |    
> + company1 | 07-09-2023 |   120 |    
> + company1 | 07-10-2023 |   130 |    
> + company2 | 07-01-2023 |    50 |  50
> + company2 | 07-02-2023 |  2000 |    
> + company2 | 07-03-2023 |  1500 |    
> + company2 | 07-04-2023 |  1400 |    
> + company2 | 07-05-2023 |  1500 |    
> + company2 | 07-06-2023 |    60 |    
> + company2 | 07-07-2023 |  1100 |    
> + company2 | 07-08-2023 |  1300 |    
> + company2 | 07-09-2023 |  1200 |    
> + company2 | 07-10-2023 |  1300 |    
> +(20 rows)
> +
> +-- backtracking with reclassification of rows
> +SELECT company, tdate, price, rpr(price) OVER w FROM stock
> + WINDOW w AS (
> + PARTITION BY company
> + ORDER BY tdate
> + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
> + AFTER MATCH SKIP TO NEXT ROW
> + INITIAL
> + PATTERN (A+ B+)
> + DEFINE
> +  A AS price > 100,
> +  B AS price > 100
> +);
> + company  |   tdate    | price | rpr  
> +----------+------------+-------+------
> + company1 | 07-01-2023 |   100 |     
> + company1 | 07-02-2023 |   200 |  200
> + company1 | 07-03-2023 |   150 |     
> + company1 | 07-04-2023 |   140 |     
> + company1 | 07-05-2023 |   150 |     
> + company1 | 07-06-2023 |    90 |     
> + company1 | 07-07-2023 |   110 |  110
> + company1 | 07-08-2023 |   130 |     
> + company1 | 07-09-2023 |   120 |     
> + company1 | 07-10-2023 |   130 |     
> + company2 | 07-01-2023 |    50 |     
> + company2 | 07-02-2023 |  2000 | 2000
> + company2 | 07-03-2023 |  1500 |     
> + company2 | 07-04-2023 |  1400 |     
> + company2 | 07-05-2023 |  1500 |     
> + company2 | 07-06-2023 |    60 |     
> + company2 | 07-07-2023 |  1100 | 1100
> + company2 | 07-08-2023 |  1300 |     
> + company2 | 07-09-2023 |  1200 |     
> + company2 | 07-10-2023 |  1300 |     
> +(20 rows)
> +
>  --
>  -- Error cases
>  --
> diff --git a/src/test/regress/sql/rpr.sql b/src/test/regress/sql/rpr.sql
> index 951c9abfe9..f1cd0369f4 100644
> --- a/src/test/regress/sql/rpr.sql
> +++ b/src/test/regress/sql/rpr.sql
> @@ -94,6 +94,33 @@ SELECT company, tdate, price, rpr(price) OVER w FROM stock
>    UPDOWN AS price > PREV(price) AND price > NEXT(price)
>  );
>  
> +-- match everything
> +SELECT company, tdate, price, rpr(price) OVER w FROM stock
> + WINDOW w AS (
> + PARTITION BY company
> + ORDER BY tdate
> + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
> + AFTER MATCH SKIP TO NEXT ROW
> + INITIAL
> + PATTERN (A+)
> + DEFINE
> +  A AS TRUE
> +);
> +
> +-- backtracking with reclassification of rows
> +SELECT company, tdate, price, rpr(price) OVER w FROM stock
> + WINDOW w AS (
> + PARTITION BY company
> + ORDER BY tdate
> + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
> + AFTER MATCH SKIP TO NEXT ROW
> + INITIAL
> + PATTERN (A+ B+)
> + DEFINE
> +  A AS price > 100,
> +  B AS price > 100
> +);
> +
>  --
>  -- Error cases
>  --

Re: Row pattern recognition

Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-07-26T12:21:34Z

Attached is the v3 patch. In this patch following changes are made.

(1) I completely changed the pattern matching engine so that it
performs backtracking. Now the engine evaluates all pattern elements
defined in PATTER against each row, saving matched pattern variables
in a string per row. For example if the pattern element A and B
evaluated to true, a string "AB" is created for current row.

This continues until all pattern matching fails or encounters the end
of full window frame/partition. After that, the pattern matching
engine creates all possible "pattern strings" and apply the regular
expression matching to each. For example if we have row 0 = "AB" row 1
= "C", possible pattern strings are: "AC" and "BC".

If it matches, the length of matching substring is saved. After all
possible trials are done, the longest matching substring is chosen and
it becomes the width (number of rows) in the reduced window frame.

See row_is_in_reduced_frame, search_str_set and search_str_set_recurse
in nodeWindowAggs.c for more details. For now I use a naive depth
first search and probably there is a lot of rooms for optimization
(for example rewriting it without using
recursion). Suggestions/patches are welcome.

Jacob Champion wrote:
> It looks like the + qualifier has trouble when it meets the end of the
> frame. For instance, try removing the last row of the 'stock' table in
> rpr.sql; some of the final matches will disappear unexpectedly. Or try a
> pattern like
> 
>     PATTERN ( a+ )
>      DEFINE a AS TRUE
> 
> which doesn't seem to match anything in my testing.
> 
> There's also the issue of backtracking in the face of reclassification,
> as I think Vik was alluding to upthread. The pattern
> 
>     PATTERN ( a+ b+ )
>      DEFINE a AS col = 2,
>             b AS col = 2

With the new engine, cases above do not fail anymore. See new
regression test cases. Thanks for providing valuable test cases!

(2) Make window functions RPR aware. Now first_value, last_value, and
nth_value recognize RPR (maybe first_value do not need any change?)

Vik Fearing wrote:
> I strongly disagree with this.  Window function do not need to know
> how the frame is defined, and indeed they should not.
> WinGetFuncArgInFrame should answer yes or no and the window function
> just works on that. Otherwise we will get extension (and possibly even
> core) functions that don't handle the frame properly.

So I moved row_is_in_reduce_frame into WinGetFuncArgInFrame so that
those window functions are not needed to be changed.

(3) Window function rpr was removed. We can use first_value instead.

(4) Remaining tasks/issues.

- For now I disable WinSetMarkPosition because RPR often needs to
  access a row before the mark is set. We need to fix this in the
  future.

- I am working on making window aggregates RPR aware now. The
  implementation is in progress and far from completeness. An example
  is below. I think row 2, 3, 4 of "count" column should be NULL
  instead of 3, 2, 0, 0. Same thing can be said to other
  rows. Probably this is an effect of moving aggregate but I still
  studying the window aggregation code.

SELECT company, tdate, first_value(price) OVER W, count(*) OVER w FROM stock
 WINDOW w AS (
 PARTITION BY company
 ORDER BY tdate
 ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
 AFTER MATCH SKIP PAST LAST ROW
 INITIAL
 PATTERN (START UP+ DOWN+)
 DEFINE
  START AS TRUE,
  UP AS price > PREV(price),
  DOWN AS price < PREV(price)
);
 company  |   tdate    | first_value | count 
----------+------------+-------------+-------
 company1 | 2023-07-01 |         100 |     4
 company1 | 2023-07-02 |             |     3
 company1 | 2023-07-03 |             |     2
 company1 | 2023-07-04 |             |     0
 company1 | 2023-07-05 |             |     0
 company1 | 2023-07-06 |          90 |     4
 company1 | 2023-07-07 |             |     3
 company1 | 2023-07-08 |             |     2
 company1 | 2023-07-09 |             |     0
 company1 | 2023-07-10 |             |     0
 company2 | 2023-07-01 |          50 |     4
 company2 | 2023-07-02 |             |     3
 company2 | 2023-07-03 |             |     2
 company2 | 2023-07-04 |             |     0
 company2 | 2023-07-05 |             |     0
 company2 | 2023-07-06 |          60 |     4
 company2 | 2023-07-07 |             |     3
 company2 | 2023-07-08 |             |     2
 company2 | 2023-07-09 |             |     0
 company2 | 2023-07-10 |             |     0

- If attributes appearing in DEFINE are not used in the target list, query fails.

SELECT company, tdate, count(*) OVER w FROM stock
 WINDOW w AS (
 PARTITION BY company
 ORDER BY tdate
 ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
 AFTER MATCH SKIP PAST LAST ROW
 INITIAL
 PATTERN (START UP+ DOWN+)
 DEFINE
  START AS TRUE,
  UP AS price > PREV(price),
  DOWN AS price < PREV(price)
);
ERROR:  attribute number 3 exceeds number of columns 2

This is because attributes appearing in DEFINE are not added to the
target list. I am looking for way to teach planner to add attributes
appearing in DEFINE.

I am going to add this thread to CommitFest and plan to add both of
you as reviewers. Thanks in advance.
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-07-26T20:22:30Z

> I am going to add this thread to CommitFest and plan to add both of
> you as reviewers. Thanks in advance.

Done.
https://commitfest.postgresql.org/44/4460/

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-07-28T07:09:53Z

>> We already recalculate a frame each time a row is processed even
>> without RPR. See ExecWindowAgg.
> 
> Yes, after each row.  Not for each function.

Ok, I understand now. Closer look at the code, I realized that each
window function calls update_frameheadpos, which computes the frame
head position. But actually it checks winstate->framehead_valid and if
it's already true (probably by other window function), then it does
nothing.

>> Also RPR always requires a frame option ROWS BETWEEN CURRENT ROW,
>> which means the frame head is changed each time current row position
>> changes.
> 
> Off topic for now: I wonder why this restriction is in place and
> whether we should respect or ignore it.  That is a discussion for
> another time, though.

My guess is, it is because other than ROWS BETWEEN CURRENT ROW has
little or no meaning. Consider following example:

SELECT i, first_value(i) OVER w
  FROM (VALUES (1), (2), (3), (4)) AS v (i)
  WINDOW w AS (
   ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
   AFTER MATCH SKIP PAST LAST ROW
   PATTERN (A)
   DEFINE
    A AS i = 1 OR i = 3
);

In this example ROWS BETWEEN CURRENT ROW gives frames with i = 1 and i
= 3.

 i | first_value 
---+-------------
 1 |           1
 2 |            
 3 |           3
 4 |            
(4 rows)

But what would happen with ROWS BETWEEN UNBOUNDED PRECEDING AND
UNBOUNDED FOLLOWING?  Probably the frame i = 3 will be missed as
at i = 2, PATTERN is not satisfied and compution of the reduced frame
stops.

 i | first_value 
---+-------------
 1 |           1
 2 |            
 3 |           
 4 |            
(4 rows)

This is not very useful for users.

>>> I strongly disagree with this.  Window function do not need to know
>>> how the frame is defined, and indeed they should not.
>> We already break the rule by defining *support functions. See
>> windowfuncs.c.
> The support functions don't know anything about the frame, they just
> know when a window function is monotonically increasing and execution
> can either stop or be "passed through".

I see following code in window_row_number_support:

		/*
		 * The frame options can always become "ROWS BETWEEN UNBOUNDED
		 * PRECEDING AND CURRENT ROW".  row_number() always just increments by
		 * 1 with each row in the partition.  Using ROWS instead of RANGE
		 * saves effort checking peer rows during execution.
		 */
		req->frameOptions = (FRAMEOPTION_NONDEFAULT |
							 FRAMEOPTION_ROWS |
							 FRAMEOPTION_START_UNBOUNDED_PRECEDING |
							 FRAMEOPTION_END_CURRENT_ROW);

I think it not only knows about frame but it even changes the frame
options. This seems far from "don't know anything about the frame", no?

> I have two comments about this:
> 
> It isn't just for convenience, it is for correctness.  The window
> functions do not need to know which rows they are *not* operating on.
> 
> There is no such thing as a "full" or "reduced" frame.  The standard
> uses those terms to explain the difference between before and after
> RPR is applied, but window functions do not get to choose which frame
> they apply over.  They only ever apply over the reduced window frame.

I agree that "full window frame" and "reduced window frame" do not
exist at the same time, and in the end (after computation of reduced
frame), only "reduced" frame is visible to window
functions/aggregates. But I still do think that "full window frame"
and "reduced window frame" are important concept to explain/understand
how PRP works.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Vik Fearing <vik@postgresfriends.org> — 2023-07-28T08:56:26Z

On 7/28/23 09:09, Tatsuo Ishii wrote:
>>> We already recalculate a frame each time a row is processed even
>>> without RPR. See ExecWindowAgg.
>>
>> Yes, after each row.  Not for each function.
> 
> Ok, I understand now. Closer look at the code, I realized that each
> window function calls update_frameheadpos, which computes the frame
> head position. But actually it checks winstate->framehead_valid and if
> it's already true (probably by other window function), then it does
> nothing.
> 
>>> Also RPR always requires a frame option ROWS BETWEEN CURRENT ROW,
>>> which means the frame head is changed each time current row position
>>> changes.
>>
>> Off topic for now: I wonder why this restriction is in place and
>> whether we should respect or ignore it.  That is a discussion for
>> another time, though.
> 
> My guess is, it is because other than ROWS BETWEEN CURRENT ROW has
> little or no meaning. Consider following example:

Yes, that makes sense.

>>>> I strongly disagree with this.  Window function do not need to know
>>>> how the frame is defined, and indeed they should not.
>>> We already break the rule by defining *support functions. See
>>> windowfuncs.c.
>> The support functions don't know anything about the frame, they just
>> know when a window function is monotonically increasing and execution
>> can either stop or be "passed through".
> 
> I see following code in window_row_number_support:
> 
> 		/*
> 		 * The frame options can always become "ROWS BETWEEN UNBOUNDED
> 		 * PRECEDING AND CURRENT ROW".  row_number() always just increments by
> 		 * 1 with each row in the partition.  Using ROWS instead of RANGE
> 		 * saves effort checking peer rows during execution.
> 		 */
> 		req->frameOptions = (FRAMEOPTION_NONDEFAULT |
> 							 FRAMEOPTION_ROWS |
> 							 FRAMEOPTION_START_UNBOUNDED_PRECEDING |
> 							 FRAMEOPTION_END_CURRENT_ROW);
> 
> I think it not only knows about frame but it even changes the frame
> options. This seems far from "don't know anything about the frame", no?

That's the planner support function.  The row_number() function itself 
is not even allowed to *have* a frame, per spec.  We allow it, but as 
you can see from that support function, we completely replace it.

So all of the partition-level window functions are not affected by RPR 
anyway.

>> I have two comments about this:
>>
>> It isn't just for convenience, it is for correctness.  The window
>> functions do not need to know which rows they are *not* operating on.
>>
>> There is no such thing as a "full" or "reduced" frame.  The standard
>> uses those terms to explain the difference between before and after
>> RPR is applied, but window functions do not get to choose which frame
>> they apply over.  They only ever apply over the reduced window frame.
> 
> I agree that "full window frame" and "reduced window frame" do not
> exist at the same time, and in the end (after computation of reduced
> frame), only "reduced" frame is visible to window
> functions/aggregates. But I still do think that "full window frame"
> and "reduced window frame" are important concept to explain/understand
> how PRP works.

If we are just using those terms for documentation, then okay.
-- 
Vik Fearing

Re: Row pattern recognition

Vik Fearing <vik@postgresfriends.org> — 2023-07-28T09:21:25Z

On 7/26/23 14:21, Tatsuo Ishii wrote:
> Attached is the v3 patch. In this patch following changes are made.

Excellent.  Thanks!

A few quick comments:

- PERMUTE is still misspelled as PREMUTE

- PATTERN variables do not have to exist in the DEFINE clause.  They are 
considered TRUE if not present.

> (1) I completely changed the pattern matching engine so that it
> performs backtracking. Now the engine evaluates all pattern elements
> defined in PATTER against each row, saving matched pattern variables
> in a string per row. For example if the pattern element A and B
> evaluated to true, a string "AB" is created for current row.
> 
> This continues until all pattern matching fails or encounters the end
> of full window frame/partition. After that, the pattern matching
> engine creates all possible "pattern strings" and apply the regular
> expression matching to each. For example if we have row 0 = "AB" row 1
> = "C", possible pattern strings are: "AC" and "BC".
> 
> If it matches, the length of matching substring is saved. After all
> possible trials are done, the longest matching substring is chosen and
> it becomes the width (number of rows) in the reduced window frame.
> 
> See row_is_in_reduced_frame, search_str_set and search_str_set_recurse
> in nodeWindowAggs.c for more details. For now I use a naive depth
> first search and probably there is a lot of rooms for optimization
> (for example rewriting it without using
> recursion). Suggestions/patches are welcome.

My own reviews will only focus on correctness for now.  Once we get a 
good set of regression tests all passing, I will focus more on 
optimization.  Of course, others might want to review the performance now.

> Vik Fearing wrote:
>> I strongly disagree with this.  Window function do not need to know
>> how the frame is defined, and indeed they should not.
>> WinGetFuncArgInFrame should answer yes or no and the window function
>> just works on that. Otherwise we will get extension (and possibly even
>> core) functions that don't handle the frame properly.
> 
> So I moved row_is_in_reduce_frame into WinGetFuncArgInFrame so that
> those window functions are not needed to be changed.
> 
> (3) Window function rpr was removed. We can use first_value instead.

Excellent.

> (4) Remaining tasks/issues.
> 
> - For now I disable WinSetMarkPosition because RPR often needs to
>    access a row before the mark is set. We need to fix this in the
>    future.

Noted, and agreed.

> - I am working on making window aggregates RPR aware now. The
>    implementation is in progress and far from completeness. An example
>    is below. I think row 2, 3, 4 of "count" column should be NULL
>    instead of 3, 2, 0, 0. Same thing can be said to other
>    rows. Probably this is an effect of moving aggregate but I still
>    studying the window aggregation code.

This tells me again that RPR is not being run in the right place.  All 
windowed aggregates and frame-level window functions should Just Work 
with no modification.

> SELECT company, tdate, first_value(price) OVER W, count(*) OVER w FROM stock
>   WINDOW w AS (
>   PARTITION BY company
>   ORDER BY tdate
>   ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
>   AFTER MATCH SKIP PAST LAST ROW
>   INITIAL
>   PATTERN (START UP+ DOWN+)
>   DEFINE
>    START AS TRUE,
>    UP AS price > PREV(price),
>    DOWN AS price < PREV(price)
> );
>   company  |   tdate    | first_value | count
> ----------+------------+-------------+-------
>   company1 | 2023-07-01 |         100 |     4
>   company1 | 2023-07-02 |             |     3
>   company1 | 2023-07-03 |             |     2
>   company1 | 2023-07-04 |             |     0
>   company1 | 2023-07-05 |             |     0
>   company1 | 2023-07-06 |          90 |     4
>   company1 | 2023-07-07 |             |     3
>   company1 | 2023-07-08 |             |     2
>   company1 | 2023-07-09 |             |     0
>   company1 | 2023-07-10 |             |     0
>   company2 | 2023-07-01 |          50 |     4
>   company2 | 2023-07-02 |             |     3
>   company2 | 2023-07-03 |             |     2
>   company2 | 2023-07-04 |             |     0
>   company2 | 2023-07-05 |             |     0
>   company2 | 2023-07-06 |          60 |     4
>   company2 | 2023-07-07 |             |     3
>   company2 | 2023-07-08 |             |     2
>   company2 | 2023-07-09 |             |     0
>   company2 | 2023-07-10 |             |     0

In this scenario, row 1's frame is the first 5 rows and specified SKIP 
PAST LAST ROW, so rows 2-5 don't have *any* frame (because they are 
skipped) and the result of the outer count should be 0 for all of them 
because there are no rows in the frame.

When we get to adding count in the MEASURES clause, there will be a 
difference between no match and empty match, but that does not apply here.

> I am going to add this thread to CommitFest and plan to add both of
> you as reviewers. Thanks in advance.

My pleasure.  Thank you for working on this difficult feature.
-- 
Vik Fearing

Re: Row pattern recognition

Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-07-28T11:02:30Z

>> Attached is the v3 patch. In this patch following changes are made.
> 
> Excellent.  Thanks!

You are welcome!

> A few quick comments:
> 
> - PERMUTE is still misspelled as PREMUTE

Oops. Will fix.

> - PATTERN variables do not have to exist in the DEFINE clause.  They are
> - considered TRUE if not present.

Do you think we really need this? I found a criticism regarding this.

https://link.springer.com/article/10.1007/s13222-022-00404-3
"3.2 Explicit Definition of All Row Pattern Variables"

What do you think?

>> - I am working on making window aggregates RPR aware now. The
>>    implementation is in progress and far from completeness. An example
>>    is below. I think row 2, 3, 4 of "count" column should be NULL
>>    instead of 3, 2, 0, 0. Same thing can be said to other
>>    rows. Probably this is an effect of moving aggregate but I still
>>    studying the window aggregation code.
> 
> This tells me again that RPR is not being run in the right place.  All
> windowed aggregates and frame-level window functions should Just Work
> with no modification.

I am not touching each aggregate function. I am modifying
eval_windowaggregates() in nodeWindowAgg.c, which calls each aggregate
function. Do you think it's not the right place to make window
aggregates RPR aware?

>> SELECT company, tdate, first_value(price) OVER W, count(*) OVER w FROM
>> stock
>>   WINDOW w AS (
>>   PARTITION BY company
>>   ORDER BY tdate
>>   ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
>>   AFTER MATCH SKIP PAST LAST ROW
>>   INITIAL
>>   PATTERN (START UP+ DOWN+)
>>   DEFINE
>>    START AS TRUE,
>>    UP AS price > PREV(price),
>>    DOWN AS price < PREV(price)
>> );
>>   company  |   tdate    | first_value | count
>> ----------+------------+-------------+-------
>>   company1 | 2023-07-01 |         100 |     4
>>   company1 | 2023-07-02 |             |     3
>>   company1 | 2023-07-03 |             |     2
>>   company1 | 2023-07-04 |             |     0
>>   company1 | 2023-07-05 |             |     0
>>   company1 | 2023-07-06 |          90 |     4
>>   company1 | 2023-07-07 |             |     3
>>   company1 | 2023-07-08 |             |     2
>>   company1 | 2023-07-09 |             |     0
>>   company1 | 2023-07-10 |             |     0
>>   company2 | 2023-07-01 |          50 |     4
>>   company2 | 2023-07-02 |             |     3
>>   company2 | 2023-07-03 |             |     2
>>   company2 | 2023-07-04 |             |     0
>>   company2 | 2023-07-05 |             |     0
>>   company2 | 2023-07-06 |          60 |     4
>>   company2 | 2023-07-07 |             |     3
>>   company2 | 2023-07-08 |             |     2
>>   company2 | 2023-07-09 |             |     0
>>   company2 | 2023-07-10 |             |     0
> 
> In this scenario, row 1's frame is the first 5 rows and specified SKIP
> PAST LAST ROW, so rows 2-5 don't have *any* frame (because they are
> skipped) and the result of the outer count should be 0 for all of them
> because there are no rows in the frame.

Ok. Just I want to make sure. If it's other aggregates like sum or
avg, the result of the outer aggregates should be NULL.

> When we get to adding count in the MEASURES clause, there will be a
> difference between no match and empty match, but that does not apply
> here.

Can you elaborate more? I understand that "no match" and "empty match"
are different things. But I do not understand how the difference
affects the result of count.

>> I am going to add this thread to CommitFest and plan to add both of
>> you as reviewers. Thanks in advance.
> 
> My pleasure.  Thank you for working on this difficult feature.

Thank you for accepting being registered as a reviewer. Your comments
are really helpful.
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Vik Fearing <vik@postgresfriends.org> — 2023-07-28T12:36:58Z

On 7/28/23 13:02, Tatsuo Ishii wrote:
>>> Attached is the v3 patch. In this patch following changes are made.
>>
>> - PATTERN variables do not have to exist in the DEFINE clause.  They are
>> - considered TRUE if not present.
> 
> Do you think we really need this? I found a criticism regarding this.
> 
> https://link.springer.com/article/10.1007/s13222-022-00404-3
> "3.2 Explicit Definition of All Row Pattern Variables"
> 
> What do you think?

I think that a large part of obeying the standard is to allow queries 
from other engines to run the same on ours.  The standard does not 
require the pattern variables to be defined and so there are certainly 
queries out there without them, and that hurts migrating to PostgreSQL.

>>> - I am working on making window aggregates RPR aware now. The
>>>     implementation is in progress and far from completeness. An example
>>>     is below. I think row 2, 3, 4 of "count" column should be NULL
>>>     instead of 3, 2, 0, 0. Same thing can be said to other
>>>     rows. Probably this is an effect of moving aggregate but I still
>>>     studying the window aggregation code.
>>
>> This tells me again that RPR is not being run in the right place.  All
>> windowed aggregates and frame-level window functions should Just Work
>> with no modification.
> 
> I am not touching each aggregate function. I am modifying
> eval_windowaggregates() in nodeWindowAgg.c, which calls each aggregate
> function. Do you think it's not the right place to make window
> aggregates RPR aware?

Oh, okay.

>>> SELECT company, tdate, first_value(price) OVER W, count(*) OVER w FROM
>>> stock
>>>    WINDOW w AS (
>>>    PARTITION BY company
>>>    ORDER BY tdate
>>>    ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
>>>    AFTER MATCH SKIP PAST LAST ROW
>>>    INITIAL
>>>    PATTERN (START UP+ DOWN+)
>>>    DEFINE
>>>     START AS TRUE,
>>>     UP AS price > PREV(price),
>>>     DOWN AS price < PREV(price)
>>> );
>>>    company  |   tdate    | first_value | count
>>> ----------+------------+-------------+-------
>>>    company1 | 2023-07-01 |         100 |     4
>>>    company1 | 2023-07-02 |             |     3
>>>    company1 | 2023-07-03 |             |     2
>>>    company1 | 2023-07-04 |             |     0
>>>    company1 | 2023-07-05 |             |     0
>>>    company1 | 2023-07-06 |          90 |     4
>>>    company1 | 2023-07-07 |             |     3
>>>    company1 | 2023-07-08 |             |     2
>>>    company1 | 2023-07-09 |             |     0
>>>    company1 | 2023-07-10 |             |     0
>>>    company2 | 2023-07-01 |          50 |     4
>>>    company2 | 2023-07-02 |             |     3
>>>    company2 | 2023-07-03 |             |     2
>>>    company2 | 2023-07-04 |             |     0
>>>    company2 | 2023-07-05 |             |     0
>>>    company2 | 2023-07-06 |          60 |     4
>>>    company2 | 2023-07-07 |             |     3
>>>    company2 | 2023-07-08 |             |     2
>>>    company2 | 2023-07-09 |             |     0
>>>    company2 | 2023-07-10 |             |     0
>>
>> In this scenario, row 1's frame is the first 5 rows and specified SKIP
>> PAST LAST ROW, so rows 2-5 don't have *any* frame (because they are
>> skipped) and the result of the outer count should be 0 for all of them
>> because there are no rows in the frame.
> 
> Ok. Just I want to make sure. If it's other aggregates like sum or
> avg, the result of the outer aggregates should be NULL.

They all behave the same way as in a normal query when they receive no 
rows as input.

>> When we get to adding count in the MEASURES clause, there will be a
>> difference between no match and empty match, but that does not apply
>> here.
> 
> Can you elaborate more? I understand that "no match" and "empty match"
> are different things. But I do not understand how the difference
> affects the result of count.

This query:

SELECT v.a, wcnt OVER w, count(*) OVER w
FROM (VALUES ('A')) AS v (a)
WINDOW w AS (
   ORDER BY v.a
   MEASURES count(*) AS wcnt
   ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
   PATTERN (B)
   DEFINE B AS B.a = 'B'
)

produces this result:

  a | wcnt | count
---+------+-------
  A |      |     0
(1 row)

Inside the window specification, *no match* was found and so all of the 
MEASURES are null.  The count(*) in the target list however, still 
exists and operates over zero rows.

This very similar query:

SELECT v.a, wcnt OVER w, count(*) OVER w
FROM (VALUES ('A')) AS v (a)
WINDOW w AS (
   ORDER BY v.a
   MEASURES count(*) AS wcnt
   ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
   PATTERN (B?)
   DEFINE B AS B.a = 'B'
)

produces this result:

  a | wcnt | count
---+------+-------
  A |    0 |     0
(1 row)

In this case, the pattern is B? instead of just B, which produces an 
*empty match* for the MEASURES to be applied over.
-- 
Vik Fearing

Re: Row pattern recognition

Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-07-29T03:05:08Z

>>> - PATTERN variables do not have to exist in the DEFINE clause.  They are
>>> - considered TRUE if not present.
>> Do you think we really need this? I found a criticism regarding this.
>> https://link.springer.com/article/10.1007/s13222-022-00404-3
>> "3.2 Explicit Definition of All Row Pattern Variables"
>> What do you think?
> 
> I think that a large part of obeying the standard is to allow queries
> from other engines to run the same on ours.  The standard does not
> require the pattern variables to be defined and so there are certainly
> queries out there without them, and that hurts migrating to
> PostgreSQL.

Yeah, migration is good point. I agree we should have the feature.

>>> When we get to adding count in the MEASURES clause, there will be a
>>> difference between no match and empty match, but that does not apply
>>> here.
>> Can you elaborate more? I understand that "no match" and "empty match"
>> are different things. But I do not understand how the difference
>> affects the result of count.
> 
> This query:
> 
> SELECT v.a, wcnt OVER w, count(*) OVER w
> FROM (VALUES ('A')) AS v (a)
> WINDOW w AS (
>   ORDER BY v.a
>   MEASURES count(*) AS wcnt
>   ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
>   PATTERN (B)
>   DEFINE B AS B.a = 'B'
> )
> 
> produces this result:
> 
>  a | wcnt | count
> ---+------+-------
>  A |      |     0
> (1 row)
> 
> Inside the window specification, *no match* was found and so all of
> the MEASURES are null.  The count(*) in the target list however, still
> exists and operates over zero rows.
> 
> This very similar query:
> 
> SELECT v.a, wcnt OVER w, count(*) OVER w
> FROM (VALUES ('A')) AS v (a)
> WINDOW w AS (
>   ORDER BY v.a
>   MEASURES count(*) AS wcnt
>   ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
>   PATTERN (B?)
>   DEFINE B AS B.a = 'B'
> )
> 
> produces this result:
> 
>  a | wcnt | count
> ---+------+-------
>  A |    0 |     0
> (1 row)
> 
> In this case, the pattern is B? instead of just B, which produces an
> *empty match* for the MEASURES to be applied over.

Thank you for the detailed explanation. I think I understand now.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-08-09T08:41:12Z

Attached is the v4 patch. Differences from previous patch include:

> - PERMUTE is still misspelled as PREMUTE

Fixed.

> - PATTERN variables do not have to exist in the DEFINE clause.  They are
> - considered TRUE if not present.

Fixed. Moreover new regression test case is added.

- It was possible that tle nodes in DEFINE clause do not appear in the
  plan's target list. This makes impossible to evaluate expressions in
  the DEFINE because it does not appear in the outer plan's target
  list. To fix this, call findTargetlistEntrySQL99 (with resjunk is
  true) so that the missing TargetEntry is added to the outer plan
  later on.

- I eliminated some hacks in handling the Var node in DEFINE
  clause. Previously I replaced varattno of Var node in a plan tree by
  hand so that it refers to correct varattno in the outer plan
  node. In this patch I modified set_upper_references so that it calls
  fix_upper_expr for those Var nodes in the DEFINE clause. See v4-0003
  patch for more details.

- I found a bug with pattern matching code. It creates a string for
  subsequent regular expression matching. It uses the initial letter
  of each define variable name. For example, if the varname is "foo",
  then "f" is used. Obviously this makes trouble if we have two or
  more variables starting with same "f" (e.g. "food"). To fix this, I
  assign [a-z] to each variable instead of its initial letter. However
  this way limits us not to have more than 26 variables. I hope 26 is
  enough for most use cases.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-09-02T06:52:35Z

Attached is the v5 patch. Differences from previous patch include:

* Resolve complaint from "PostgreSQL Patch Tester"
  https://commitfest.postgresql.org/44/4460/

- Change gram.y to use PATTERN_P instead of PATTERN. Using PATTERN seems
  to make trouble with Visual Studio build.

:
:
[10:07:57.853] FAILED: src/backend/parser/parser.a.p/meson-generated_.._gram.c.obj 
[10:07:57.853] "cl" "-Isrc\backend\parser\parser.a.p" "-Isrc\backend\parser" "-I..\src\backend\parser" "-Isrc\include" "-I..\src\include" "-Ic:\openssl\1.1\include" "-I..\src\include\port\win32" "-I..\src\include\port\win32_msvc" "/MDd" "/nologo" "/showIncludes" "/utf-8" "/W2" "/Od" "/Zi" "/DWIN32" "/DWINDOWS" "/D__WINDOWS__" "/D__WIN32__" "/D_CRT_SECURE_NO_DEPRECATE" "/D_CRT_NONSTDC_NO_DEPRECATE" "/wd4018" "/wd4244" "/wd4273" "/wd4101" "/wd4102" "/wd4090" "/wd4267" "-DBUILDING_DLL" "/Fdsrc\backend\parser\parser.a.p\meson-generated_.._gram.c.pdb" /Fosrc/backend/parser/parser.a.p/meson-generated_.._gram.c.obj "/c" src/backend/parser/gram.c
[10:07:57.860] c:\cirrus\build\src\backend\parser\gram.h(379): error C2365: 'PATTERN': redefinition; previous definition was 'typedef'
[10:07:57.860] C:\Program Files (x86)\Windows Kits\10\include\10.0.20348.0\um\wingdi.h(1375): note: see declaration of 'PATTERN'
[10:07:57.860] c:\cirrus\build\src\backend\parser\gram.h(379): error C2086: 'yytokentype PATTERN': redefinition
[10:07:57.860] c:\cirrus\build\src\backend\parser\gram.h(379): note: see declaration of 'PATTERN'
[10:07:57.860] ninja: build stopped: subcommand failed.

* Resolve complaint from "make headerscheck"

- Change Windowapi.h and nodeWindowAgg.c to remove unecessary extern
  and public functions.
  
Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Erik Rijkers <er@xs4all.nl> — 2023-09-02T18:04:02Z

Op 9/2/23 om 08:52 schreef Tatsuo Ishii:

> Attached is the v5 patch. Differences from previous patch include:
> 

Hi,

The patches compile & tests run fine but this statement from the 
documentation crashes an assert-enabled server:

SELECT company, tdate, price, max(price) OVER w FROM stock
WINDOW w AS (
PARTITION BY company
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
AFTER MATCH SKIP PAST LAST ROW
INITIAL
PATTERN (LOWPRICE UP+ DOWN+)
DEFINE
LOWPRICE AS price <= 100,
UP AS price > PREV(price),
DOWN AS price < PREV(price)
);
server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.
connection to server was lost


Log file:

TRAP: failed Assert("aggregatedupto_nonrestarted <= 
winstate->aggregatedupto"), File: "nodeWindowAgg.c", Line: 1054, PID: 68975
postgres: 17_rpr_d0ec_gulo: aardvark testdb ::1(34808) 
SELECT(ExceptionalCondition+0x54)[0x9b0824]
postgres: 17_rpr_d0ec_gulo: aardvark testdb ::1(34808) SELECT[0x71ae8d]
postgres: 17_rpr_d0ec_gulo: aardvark testdb ::1(34808) 
SELECT(standard_ExecutorRun+0x13a)[0x6def9a]
/home/aardvark/pg_stuff/pg_installations/pgsql.rpr/lib/pg_stat_statements.so(+0x55e5)[0x7ff3798b95e5]
/home/aardvark/pg_stuff/pg_installations/pgsql.rpr/lib/auto_explain.so(+0x2680)[0x7ff3798ab680]
postgres: 17_rpr_d0ec_gulo: aardvark testdb ::1(34808) SELECT[0x88a4ff]
postgres: 17_rpr_d0ec_gulo: aardvark testdb ::1(34808) 
SELECT(PortalRun+0x240)[0x88bb50]
postgres: 17_rpr_d0ec_gulo: aardvark testdb ::1(34808) SELECT[0x887cca]
postgres: 17_rpr_d0ec_gulo: aardvark testdb ::1(34808) 
SELECT(PostgresMain+0x14dc)[0x88958c]
postgres: 17_rpr_d0ec_gulo: aardvark testdb ::1(34808) SELECT[0x7fb0da]
postgres: 17_rpr_d0ec_gulo: aardvark testdb ::1(34808) 
SELECT(PostmasterMain+0xd2d)[0x7fc01d]
postgres: 17_rpr_d0ec_gulo: aardvark testdb ::1(34808) 
SELECT(main+0x1e0)[0x5286d0]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xea)[0x7ff378e9dd0a]
postgres: 17_rpr_d0ec_gulo: aardvark testdb ::1(34808) 
SELECT(_start+0x2a)[0x5289aa]
2023-09-02 19:59:05.329 CEST 46723 LOG:  server process (PID 68975) was 
terminated by signal 6: Aborted
2023-09-02 19:59:05.329 CEST 46723 DETAIL:  Failed process was running: 
SELECT company, tdate, price, max(price) OVER w FROM stock
         WINDOW w AS (
         PARTITION BY company
         ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
         AFTER MATCH SKIP PAST LAST ROW
         INITIAL
         PATTERN (LOWPRICE UP+ DOWN+)
         DEFINE
         LOWPRICE AS price <= 100,
         UP AS price > PREV(price),
         DOWN AS price < PREV(price)
         );
2023-09-02 19:59:05.329 CEST 46723 LOG:  terminating any other active 
server processes



Erik Rijkers

Re: Row pattern recognition

Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-09-03T00:03:44Z

> Hi,
> 
> The patches compile & tests run fine but this statement from the
> documentation crashes an assert-enabled server:
> 
> SELECT company, tdate, price, max(price) OVER w FROM stock
> WINDOW w AS (
> PARTITION BY company
> ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
> AFTER MATCH SKIP PAST LAST ROW
> INITIAL
> PATTERN (LOWPRICE UP+ DOWN+)
> DEFINE
> LOWPRICE AS price <= 100,
> UP AS price > PREV(price),
> DOWN AS price < PREV(price)
> );
> server closed the connection unexpectedly
> 	This probably means the server terminated abnormally
> 	before or while processing the request.
> connection to server was lost

Thank you for the report. Currently the patch has an issue with
aggregate functions including max. I have been working on aggregations
in row pattern recognition but will take more time to complete the
part.

In the mean time if you want to play with RPR, you can try window
functions. Examples can be found in src/test/regress/sql/rpr.sql.
Here is one of this:

-- the first row start with less than or equal to 100
SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w
 FROM stock
 WINDOW w AS (
 PARTITION BY company
 ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
 INITIAL
 PATTERN (LOWPRICE UP+ DOWN+)
 DEFINE
  LOWPRICE AS price <= 100,
  UP AS price > PREV(price),
  DOWN AS price < PREV(price)
);

-- second row raises 120%
SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w
 FROM stock
 WINDOW w AS (
 PARTITION BY company
 ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
 INITIAL
 PATTERN (LOWPRICE UP+ DOWN+)
 DEFINE
  LOWPRICE AS price <= 100,
  UP AS price > PREV(price) * 1.2,
  DOWN AS price < PREV(price)
);

Sorry for inconvenience.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Jacob Champion <jchampion@timescale.com> — 2023-09-08T00:00:07Z

Hello!

> (1) I completely changed the pattern matching engine so that it
> performs backtracking. Now the engine evaluates all pattern elements
> defined in PATTER against each row, saving matched pattern variables
> in a string per row. For example if the pattern element A and B
> evaluated to true, a string "AB" is created for current row.

If I understand correctly, this strategy assumes that one row's
membership in a pattern variable is independent of the other rows'
membership. But I don't think that's necessarily true:

    DEFINE
      A AS PREV(CLASSIFIER()) IS DISTINCT FROM 'A',
      ...

> See row_is_in_reduced_frame, search_str_set and search_str_set_recurse
> in nodeWindowAggs.c for more details. For now I use a naive depth
> first search and probably there is a lot of rooms for optimization
> (for example rewriting it without using
> recursion).

The depth-first match is doing a lot of subtle work here. For example, with

    PATTERN ( A+ B+ )
    DEFINE A AS TRUE,
           B AS TRUE

(i.e. all rows match both variables), and three rows in the partition,
our candidates will be tried in the order

    aaa
    aab
    aba
    abb
    ...
    bbb

The only possible matches against our regex `^a+b+` are "aab" and "abb",
and that order matches the preferment order, so it's fine. But it's easy
to come up with a pattern where that's the wrong order, like

    PATTERN ( A+ (B|A)+ )

Now "aaa" will be considered before "aab", which isn't correct.

Similarly, the assumption that we want to match the longest string only
works because we don't allow alternation yet.

> Suggestions/patches are welcome.

Cool, I will give this piece some more thought. Do you mind if I try to
add some more complicated pattern quantifiers to stress the
architecture, or would you prefer to tackle that later? Just alternation
by itself will open up a world of corner cases.

> With the new engine, cases above do not fail anymore. See new
> regression test cases. Thanks for providing valuable test cases!

You're very welcome!

On 8/9/23 01:41, Tatsuo Ishii wrote:
> - I found a bug with pattern matching code. It creates a string for
>   subsequent regular expression matching. It uses the initial letter
>   of each define variable name. For example, if the varname is "foo",
>   then "f" is used. Obviously this makes trouble if we have two or
>   more variables starting with same "f" (e.g. "food"). To fix this, I
>   assign [a-z] to each variable instead of its initial letter. However
>   this way limits us not to have more than 26 variables. I hope 26 is
>   enough for most use cases.

There are still plenty of alphanumerics left that could be assigned...

But I'm wondering if we might want to just implement the NFA directly?
The current implementation's Cartesian explosion can probably be pruned
aggressively, but replaying the entire regex match once for every
backtracked step will still duplicate a lot of work.

--

I've attached another test case; it looks like last_value() is depending
on some sort of side effect from either first_value() or nth_value(). I
know the window frame itself is still under construction, so apologies
if this is an expected failure.

Thanks!
--Jacob

Re: Row pattern recognition

Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-09-08T03:54:47Z

Hi,

> Hello!
> 
>> (1) I completely changed the pattern matching engine so that it
>> performs backtracking. Now the engine evaluates all pattern elements
>> defined in PATTER against each row, saving matched pattern variables
>> in a string per row. For example if the pattern element A and B
>> evaluated to true, a string "AB" is created for current row.
> 
> If I understand correctly, this strategy assumes that one row's
> membership in a pattern variable is independent of the other rows'
> membership. But I don't think that's necessarily true:
> 
>     DEFINE
>       A AS PREV(CLASSIFIER()) IS DISTINCT FROM 'A',
>       ...

But:

UP AS price > PREV(price)

also depends on previous row, no? Can you please elaborate how your
example could break current implementation? I cannot test it because
CLASSIFIER is not implemented yet.

>> See row_is_in_reduced_frame, search_str_set and search_str_set_recurse
>> in nodeWindowAggs.c for more details. For now I use a naive depth
>> first search and probably there is a lot of rooms for optimization
>> (for example rewriting it without using
>> recursion).
> 
> The depth-first match is doing a lot of subtle work here. For example, with
> 
>     PATTERN ( A+ B+ )
>     DEFINE A AS TRUE,
>            B AS TRUE
> 
> (i.e. all rows match both variables), and three rows in the partition,
> our candidates will be tried in the order
> 
>     aaa
>     aab
>     aba
>     abb
>     ...
>     bbb
> 
> The only possible matches against our regex `^a+b+` are "aab" and "abb",
> and that order matches the preferment order, so it's fine. But it's easy
> to come up with a pattern where that's the wrong order, like
> 
>     PATTERN ( A+ (B|A)+ )
> 
> Now "aaa" will be considered before "aab", which isn't correct.

Can you explain a little bit more? I think 'aaa' matches a regular
expression 'a+(b|a)+' and should be no problem before "aab" is
considered.

> Similarly, the assumption that we want to match the longest string only
> works because we don't allow alternation yet.

Can you please clarify more on this?

> Cool, I will give this piece some more thought. Do you mind if I try to
> add some more complicated pattern quantifiers to stress the
> architecture, or would you prefer to tackle that later? Just alternation
> by itself will open up a world of corner cases.

Do you mean you want to provide a better patch for the pattern
matching part? That will be helpfull. Because I am currently working
on the aggregation part and have no time to do it. However, the
aggregation work affects the v5 patch: it needs a refactoring. So can
you wait until I release v6 patch? I hope it will be released in two
weeks or so.

> On 8/9/23 01:41, Tatsuo Ishii wrote:
>> - I found a bug with pattern matching code. It creates a string for
>>   subsequent regular expression matching. It uses the initial letter
>>   of each define variable name. For example, if the varname is "foo",
>>   then "f" is used. Obviously this makes trouble if we have two or
>>   more variables starting with same "f" (e.g. "food"). To fix this, I
>>   assign [a-z] to each variable instead of its initial letter. However
>>   this way limits us not to have more than 26 variables. I hope 26 is
>>   enough for most use cases.
> 
> There are still plenty of alphanumerics left that could be assigned...
> 
> But I'm wondering if we might want to just implement the NFA directly?
> The current implementation's Cartesian explosion can probably be pruned
> aggressively, but replaying the entire regex match once for every
> backtracked step will still duplicate a lot of work.

Not sure if you mean implementing new regular expression engine
besides src/backend/regexp. I am afraid it's not a trivial work. The
current regexp code consists of over 10k lines. What do you think?

> I've attached another test case; it looks like last_value() is depending
> on some sort of side effect from either first_value() or nth_value(). I
> know the window frame itself is still under construction, so apologies
> if this is an expected failure.

Thanks. Fortunately current code which I am working passes the new
test. I will include it in the next (v6) patch.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Jacob Champion <jchampion@timescale.com> — 2023-09-08T19:27:05Z

On 9/7/23 20:54, Tatsuo Ishii wrote:
>>     DEFINE
>>       A AS PREV(CLASSIFIER()) IS DISTINCT FROM 'A',
>>       ...
> 
> But:
> 
> UP AS price > PREV(price)
> 
> also depends on previous row, no?

PREV(CLASSIFIER()) depends not on the value of the previous row but the
state of the match so far. To take an example from the patch:

> * Example:
> * str_set[0] = "AB";
> * str_set[1] = "AC";
> * In this case at row 0 A and B are true, and A and C are true in row 1.

With these str_sets and my example DEFINE, row[1] is only classifiable
as 'A' if row[0] is *not* classified as 'A' at this point in the match.
"AA" is not a valid candidate string, even if it matches the PATTERN.

So if we don't reevaluate the pattern variable condition for the row, we
at least have to prune the combinations that search_str_set() visits, so
that we don't generate a logically impossible combination. That seems
like it could be pretty fragile, and it may be difficult for us to prove
compliance.

>> But it's easy
>> to come up with a pattern where that's the wrong order, like
>>
>>     PATTERN ( A+ (B|A)+ )
>>
>> Now "aaa" will be considered before "aab", which isn't correct.
> 
> Can you explain a little bit more? I think 'aaa' matches a regular
> expression 'a+(b|a)+' and should be no problem before "aab" is
> considered.

Assuming I've understood the rules correctly, we're not allowed to
classify the last row as 'A' if it also matches 'B'. Lexicographic
ordering takes precedence, so we have to try "aab" first. Otherwise our
query could return different results compared to another implementation.

>> Similarly, the assumption that we want to match the longest string only
>> works because we don't allow alternation yet.
> 
> Can you please clarify more on this?

Sure: for the pattern

    PATTERN ( (A|B)+ )

we have to consider the candidate "a" over the candidate "ba", even
though the latter is longer. Like the prior example, lexicographic
ordering is considered more important than the greedy quantifier.
Quoting ISO/IEC 9075-2:2016:

    More precisely, with both reluctant and greedy quantifiers, the set
    of matches is ordered lexicographically, but when one match is an
    initial substring of another match, reluctant quantifiers prefer the
    shorter match (the substring), whereas greedy quantifiers prefer the
    longer match (the “superstring”).

Here, "ba" doesn't have "a" as a prefix, so "ba" doesn't get priority.
ISO/IEC 19075-5:2021 has a big section on this (7.2) with worked examples.

(The "lexicographic order matters more than greediness" concept was the
most mind-bending part for me so far, probably because I haven't figured
out how to translate the concept into POSIX EREs. It wouldn't make sense
to say "the letter 't' can match 'a', 'B', or '3' in this regex", but
that's what RPR is doing.)

>> Cool, I will give this piece some more thought. Do you mind if I try to
>> add some more complicated pattern quantifiers to stress the
>> architecture, or would you prefer to tackle that later? Just alternation
>> by itself will open up a world of corner cases.
> 
> Do you mean you want to provide a better patch for the pattern
> matching part? That will be helpfull.

No guarantees that I'll find a better patch :D But yes, I will give it a
try.

> Because I am currently working
> on the aggregation part and have no time to do it. However, the
> aggregation work affects the v5 patch: it needs a refactoring. So can
> you wait until I release v6 patch? I hope it will be released in two
> weeks or so.

Absolutely!

>> But I'm wondering if we might want to just implement the NFA directly?
>> The current implementation's Cartesian explosion can probably be pruned
>> aggressively, but replaying the entire regex match once for every
>> backtracked step will still duplicate a lot of work.
> 
> Not sure if you mean implementing new regular expression engine
> besides src/backend/regexp. I am afraid it's not a trivial work. The
> current regexp code consists of over 10k lines. What do you think?

Heh, I think it would be pretty foolish for me to code an NFA, from
scratch, and then try to convince the community to maintain it.

But:
- I think we have to implement a parallel parser regardless (RPR PATTERN
syntax looks incompatible with POSIX)
- I suspect we need more control over the backtracking than the current
pg_reg* API is going to give us, or else I'm worried performance is
going to fall off a cliff with usefully-large partitions
- there's a lot of stuff in POSIX EREs that we don't need, and of the
features we do need, the + quantifier is probably one of the easiest
- it seems easier to prove the correctness of a slow, naive,
row-at-a-time engine, because we can compare it directly to the spec

So what I'm thinking is: if I start by open-coding the + quantifier, and
slowly add more pieces in, then it might be easier to see the parts of
src/backend/regex that I've duplicated. We can try to expose those parts
directly from the internal API to replace my bad implementation. And if
there are parts that aren't duplicated, then it'll be easier to explain
why we need something different from the current engine.

Does that seem like a workable approach? (Worst-case, my code is just
horrible, and we throw it in the trash.)

>> I've attached another test case; it looks like last_value() is depending
>> on some sort of side effect from either first_value() or nth_value(). I
>> know the window frame itself is still under construction, so apologies
>> if this is an expected failure.
> 
> Thanks. Fortunately current code which I am working passes the new
> test. I will include it in the next (v6) patch.

Great!

Thanks,
--Jacob

Re: Row pattern recognition

Vik Fearing <vik@postgresfriends.org> — 2023-09-08T21:43:10Z

On 9/8/23 21:27, Jacob Champion wrote:
> On 9/7/23 20:54, Tatsuo Ishii wrote:

>>> But it's easy
>>> to come up with a pattern where that's the wrong order, like
>>>
>>>      PATTERN ( A+ (B|A)+ )
>>>
>>> Now "aaa" will be considered before "aab", which isn't correct.
>>
>> Can you explain a little bit more? I think 'aaa' matches a regular
>> expression 'a+(b|a)+' and should be no problem before "aab" is
>> considered.
> 
> Assuming I've understood the rules correctly, we're not allowed to
> classify the last row as 'A' if it also matches 'B'. Lexicographic
> ordering takes precedence, so we have to try "aab" first. Otherwise our
> query could return different results compared to another implementation.


Your understanding is correct.
-- 
Vik Fearing

Re: Row pattern recognition

Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-09-09T11:21:21Z

Hi,

>> But:
>> 
>> UP AS price > PREV(price)
>> 
>> also depends on previous row, no?
> 
> PREV(CLASSIFIER()) depends not on the value of the previous row but the
> state of the match so far. To take an example from the patch:
> 
>> * Example:
>> * str_set[0] = "AB";
>> * str_set[1] = "AC";
>> * In this case at row 0 A and B are true, and A and C are true in row 1.
> 
> With these str_sets and my example DEFINE, row[1] is only classifiable
> as 'A' if row[0] is *not* classified as 'A' at this point in the match.
> "AA" is not a valid candidate string, even if it matches the PATTERN.

Ok, Let me clarify my understanding. Suppose we have:

PATTER (A B)
DEFINE A AS PREV(CLASSIFIER()) IS DISTINCT FROM 'A',
B AS price > 100

and the target table has price column values:

row[0]: 110
row[1]: 110
row[2]: 110
row[3]: 110

Then we will get for str_set:
r0: B
r1: AB

Because r0 only has classifier B, r1 can have A and B.  Problem is,
r2. If we choose A at r1, then r2 = B. But if we choose B at t1, then
r2 = AB. I guess this is the issue you pointed out.

> So if we don't reevaluate the pattern variable condition for the row, we
> at least have to prune the combinations that search_str_set() visits, so
> that we don't generate a logically impossible combination. That seems
> like it could be pretty fragile, and it may be difficult for us to prove
> compliance.

Yeah, probably we have delay evaluation of such pattern variables like
A, then reevaluate A after the first scan.

What about leaving this (reevaluation) for now? Because:

1) we don't have CLASSIFIER
2) we don't allow to give CLASSIFIER to PREV as its arggument

so I think we don't need to worry about this for now.

>> Can you explain a little bit more? I think 'aaa' matches a regular
>> expression 'a+(b|a)+' and should be no problem before "aab" is
>> considered.
> 
> Assuming I've understood the rules correctly, we're not allowed to
> classify the last row as 'A' if it also matches 'B'. Lexicographic
> ordering takes precedence, so we have to try "aab" first. Otherwise our
> query could return different results compared to another implementation.
> 
>>> Similarly, the assumption that we want to match the longest string only
>>> works because we don't allow alternation yet.
>> 
>> Can you please clarify more on this?
> 
> Sure: for the pattern
> 
>     PATTERN ( (A|B)+ )
> 
> we have to consider the candidate "a" over the candidate "ba", even
> though the latter is longer. Like the prior example, lexicographic
> ordering is considered more important than the greedy quantifier.
> Quoting ISO/IEC 9075-2:2016:
> 
>     More precisely, with both reluctant and greedy quantifiers, the set
>     of matches is ordered lexicographically, but when one match is an
>     initial substring of another match, reluctant quantifiers prefer the
>     shorter match (the substring), whereas greedy quantifiers prefer the
>     longer match (the “superstring”).
> 
> Here, "ba" doesn't have "a" as a prefix, so "ba" doesn't get priority.
> ISO/IEC 19075-5:2021 has a big section on this (7.2) with worked examples.
> 
> (The "lexicographic order matters more than greediness" concept was the
> most mind-bending part for me so far, probably because I haven't figured
> out how to translate the concept into POSIX EREs. It wouldn't make sense
> to say "the letter 't' can match 'a', 'B', or '3' in this regex", but
> that's what RPR is doing.)

Thanks for the explanation.  Surprising concet of the standard:-)　Is
it different from SIMILAR TO REs too?

What if we don't follow the standard, instead we follow POSIX EREs?  I
think this is better for users unless RPR's REs has significant merit
for users.

>> Do you mean you want to provide a better patch for the pattern
>> matching part? That will be helpfull.
> 
> No guarantees that I'll find a better patch :D But yes, I will give it a
> try.

Ok.

>> Because I am currently working
>> on the aggregation part and have no time to do it. However, the
>> aggregation work affects the v5 patch: it needs a refactoring. So can
>> you wait until I release v6 patch? I hope it will be released in two
>> weeks or so.
> 
> Absolutely!

Thanks.

> Heh, I think it would be pretty foolish for me to code an NFA, from
> scratch, and then try to convince the community to maintain it.
> 
> But:
> - I think we have to implement a parallel parser regardless (RPR PATTERN
> syntax looks incompatible with POSIX)

I am not sure if we need to worry about this because of the reason I
mentioned above.

> - I suspect we need more control over the backtracking than the current
> pg_reg* API is going to give us, or else I'm worried performance is
> going to fall off a cliff with usefully-large partitions

Agreed.

> - there's a lot of stuff in POSIX EREs that we don't need, and of the
> features we do need, the + quantifier is probably one of the easiest
> - it seems easier to prove the correctness of a slow, naive,
> row-at-a-time engine, because we can compare it directly to the spec
> 
> So what I'm thinking is: if I start by open-coding the + quantifier, and
> slowly add more pieces in, then it might be easier to see the parts of
> src/backend/regex that I've duplicated. We can try to expose those parts
> directly from the internal API to replace my bad implementation. And if
> there are parts that aren't duplicated, then it'll be easier to explain
> why we need something different from the current engine.
> 
> Does that seem like a workable approach? (Worst-case, my code is just
> horrible, and we throw it in the trash.)

Yes, it seems workable. I think for the first cut of RPR needs at
least the +quantifier with reasonable performance. The current naive
implementation seems to have issue because of exhaustive search.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Vik Fearing <vik@postgresfriends.org> — 2023-09-09T13:32:41Z

On 9/9/23 13:21, Tatsuo Ishii wrote:
> Thanks for the explanation.  Surprising concet of the standard:-)

<quote from 19075-5:2023>

This leaves the choice between traditional NFA and Posix NFA. The 
difference between these is that a traditional NFA exits (declares a 
match) as soon as it finds the first possible match, whereas a Posix NFA 
is obliged to find all possible matches and then choose the “leftmost 
longest”. There are examples that show that, even for conventional 
regular expression matching on text strings and without back references, 
there are patterns for which a Posix NFA is orders of magnitude slower 
than a traditional NFA. In addition, reluctant quantifiers cannot be 
defined in a Posix NFA, because of the leftmost longest rule.

Therefore it was decided not to use the Posix NFA model, which leaves 
the traditional NFA as the model for row pattern matching. Among 
available tools that use traditional NFA engines, Perl is the most 
influential; therefore Perl was adopted as the design target for pattern 
matching rules.

</quote>

> Is it different from SIMILAR TO REs too?

Of course it is. :-)  SIMILAR TO uses its own language and rules.

> What if we don't follow the standard, instead we follow POSIX EREs?  I
> think this is better for users unless RPR's REs has significant merit
> for users.

This would get big pushback from me.
-- 
Vik Fearing

Re: Row pattern recognition

Jacob Champion <jchampion@timescale.com> — 2023-09-11T22:13:43Z

On Sat, Sep 9, 2023 at 4:21 AM Tatsuo Ishii <ishii@sraoss.co.jp> wrote:
> Then we will get for str_set:
> r0: B
> r1: AB
>
> Because r0 only has classifier B, r1 can have A and B.  Problem is,
> r2. If we choose A at r1, then r2 = B. But if we choose B at t1, then
> r2 = AB. I guess this is the issue you pointed out.

Right.

> Yeah, probably we have delay evaluation of such pattern variables like
> A, then reevaluate A after the first scan.
>
> What about leaving this (reevaluation) for now? Because:
>
> 1) we don't have CLASSIFIER
> 2) we don't allow to give CLASSIFIER to PREV as its arggument
>
> so I think we don't need to worry about this for now.

Sure. I'm all for deferring features to make it easier to iterate; I
just want to make sure the architecture doesn't hit a dead end. Or at
least, not without being aware of it.

Also: is CLASSIFIER the only way to run into this issue?

> What if we don't follow the standard, instead we follow POSIX EREs?  I
> think this is better for users unless RPR's REs has significant merit
> for users.

Piggybacking off of what Vik wrote upthread, I think we would not be
doing ourselves any favors by introducing a non-compliant
implementation that performs worse than a traditional NFA. Those would
be some awful bug reports.

> > - I think we have to implement a parallel parser regardless (RPR PATTERN
> > syntax looks incompatible with POSIX)
>
> I am not sure if we need to worry about this because of the reason I
> mentioned above.

Even if we adopted POSIX NFA semantics, we'd still have to implement
our own parser for the PATTERN part of the query. I don't think
there's a good way for us to reuse the parser in src/backend/regex.

> > Does that seem like a workable approach? (Worst-case, my code is just
> > horrible, and we throw it in the trash.)
>
> Yes, it seems workable. I think for the first cut of RPR needs at
> least the +quantifier with reasonable performance. The current naive
> implementation seems to have issue because of exhaustive search.

+1

Thanks!
--Jacob

Re: Row pattern recognition

Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-09-12T06:18:43Z

>> What about leaving this (reevaluation) for now? Because:
>>
>> 1) we don't have CLASSIFIER
>> 2) we don't allow to give CLASSIFIER to PREV as its arggument
>>
>> so I think we don't need to worry about this for now.
> 
> Sure. I'm all for deferring features to make it easier to iterate; I
> just want to make sure the architecture doesn't hit a dead end. Or at
> least, not without being aware of it.

Ok, let's defer this issue. Currently the patch already exceeds 3k
lines. I am afraid too big patch cannot be reviewed by anyone, which
means it will never be committed.

> Also: is CLASSIFIER the only way to run into this issue?

Good question. I would like to know.

>> What if we don't follow the standard, instead we follow POSIX EREs?  I
>> think this is better for users unless RPR's REs has significant merit
>> for users.
> 
> Piggybacking off of what Vik wrote upthread, I think we would not be
> doing ourselves any favors by introducing a non-compliant
> implementation that performs worse than a traditional NFA. Those would
> be some awful bug reports.

What I am not sure about is, you and Vik mentioned that the
traditional NFA is superior that POSIX NFA in terms of performance.
But how "lexicographic ordering" is related to performance?

>> I am not sure if we need to worry about this because of the reason I
>> mentioned above.
> 
> Even if we adopted POSIX NFA semantics, we'd still have to implement
> our own parser for the PATTERN part of the query. I don't think
> there's a good way for us to reuse the parser in src/backend/regex.

Ok.

>> > Does that seem like a workable approach? (Worst-case, my code is just
>> > horrible, and we throw it in the trash.)
>>
>> Yes, it seems workable. I think for the first cut of RPR needs at
>> least the +quantifier with reasonable performance. The current naive
>> implementation seems to have issue because of exhaustive search.
> 
> +1

BTW, attched is the v6 patch. The differences from v5 include:

- Now aggregates can be used with RPR. Below is an example from the
  regression test cases, which is added by v6 patch.

- Fix assersion error pointed out by Erik.

SELECT company, tdate, price,
 first_value(price) OVER w,
 last_value(price) OVER w,
 max(price) OVER w,
 min(price) OVER w,
 sum(price) OVER w,
 avg(price) OVER w,
 count(price) OVER w
FROM stock
WINDOW w AS (
PARTITION BY company
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
AFTER MATCH SKIP PAST LAST ROW
INITIAL
PATTERN (START UP+ DOWN+)
DEFINE
START AS TRUE,
UP AS price > PREV(price),
DOWN AS price < PREV(price)
);
 company  |   tdate    | price | first_value | last_value | max  | min | sum  |          avg          | count 
----------+------------+-------+-------------+------------+------+-----+------+-----------------------+-------
 company1 | 07-01-2023 |   100 |         100 |        140 |  200 | 100 |  590 |  147.5000000000000000 |     4
 company1 | 07-02-2023 |   200 |             |            |      |     |      |                       |      
 company1 | 07-03-2023 |   150 |             |            |      |     |      |                       |      
 company1 | 07-04-2023 |   140 |             |            |      |     |      |                       |      
 company1 | 07-05-2023 |   150 |             |            |      |     |      |                       |      
 company1 | 07-06-2023 |    90 |          90 |        120 |  130 |  90 |  450 |  112.5000000000000000 |     4
 company1 | 07-07-2023 |   110 |             |            |      |     |      |                       |      
 company1 | 07-08-2023 |   130 |             |            |      |     |      |                       |      
 company1 | 07-09-2023 |   120 |             |            |      |     |      |                       |      
 company1 | 07-10-2023 |   130 |             |            |      |     |      |                       |      
 company2 | 07-01-2023 |    50 |          50 |       1400 | 2000 |  50 | 4950 | 1237.5000000000000000 |     4
 company2 | 07-02-2023 |  2000 |             |            |      |     |      |                       |      
 company2 | 07-03-2023 |  1500 |             |            |      |     |      |                       |      
 company2 | 07-04-2023 |  1400 |             |            |      |     |      |                       |      
 company2 | 07-05-2023 |  1500 |             |            |      |     |      |                       |      
 company2 | 07-06-2023 |    60 |          60 |       1200 | 1300 |  60 | 3660 |  915.0000000000000000 |     4
 company2 | 07-07-2023 |  1100 |             |            |      |     |      |                       |      
 company2 | 07-08-2023 |  1300 |             |            |      |     |      |                       |      
 company2 | 07-09-2023 |  1200 |             |            |      |     |      |                       |      
 company2 | 07-10-2023 |  1300 |             |            |      |     |      |                       |      
(20 rows)

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-09-12T08:44:57Z

Regarding v6 patch:

> SELECT company, tdate, price,
>  first_value(price) OVER w,
>  last_value(price) OVER w,
>  max(price) OVER w,
>  min(price) OVER w,
>  sum(price) OVER w,
>  avg(price) OVER w,
>  count(price) OVER w
> FROM stock
> WINDOW w AS (
> PARTITION BY company
> ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
> AFTER MATCH SKIP PAST LAST ROW
> INITIAL
> PATTERN (START UP+ DOWN+)
> DEFINE
> START AS TRUE,
> UP AS price > PREV(price),
> DOWN AS price < PREV(price)
> );
>  company  |   tdate    | price | first_value | last_value | max  | min | sum  |          avg          | count 
> ----------+------------+-------+-------------+------------+------+-----+------+-----------------------+-------
>  company1 | 07-01-2023 |   100 |         100 |        140 |  200 | 100 |  590 |  147.5000000000000000 |     4
>  company1 | 07-02-2023 |   200 |             |            |      |     |      |                       |      
>  company1 | 07-03-2023 |   150 |             |            |      |     |      |                       |      
>  company1 | 07-04-2023 |   140 |             |            |      |     |      |                       |      
>  company1 | 07-05-2023 |   150 |             |            |      |     |      |                       |      
>  company1 | 07-06-2023 |    90 |          90 |        120 |  130 |  90 |  450 |  112.5000000000000000 |     4
>  company1 | 07-07-2023 |   110 |             |            |      |     |      |                       |      
>  company1 | 07-08-2023 |   130 |             |            |      |     |      |                       |      
>  company1 | 07-09-2023 |   120 |             |            |      |     |      |                       |      
>  company1 | 07-10-2023 |   130 |             |            |      |     |      |                       |      
>  company2 | 07-01-2023 |    50 |          50 |       1400 | 2000 |  50 | 4950 | 1237.5000000000000000 |     4
>  company2 | 07-02-2023 |  2000 |             |            |      |     |      |                       |      
>  company2 | 07-03-2023 |  1500 |             |            |      |     |      |                       |      
>  company2 | 07-04-2023 |  1400 |             |            |      |     |      |                       |      
>  company2 | 07-05-2023 |  1500 |             |            |      |     |      |                       |      
>  company2 | 07-06-2023 |    60 |          60 |       1200 | 1300 |  60 | 3660 |  915.0000000000000000 |     4
>  company2 | 07-07-2023 |  1100 |             |            |      |     |      |                       |      
>  company2 | 07-08-2023 |  1300 |             |            |      |     |      |                       |      
>  company2 | 07-09-2023 |  1200 |             |            |      |     |      |                       |      
>  company2 | 07-10-2023 |  1300 |             |            |      |     |      |                       |      
> (20 rows)

count column for unmatched rows should have been 0, rather than
NULL. i.e.

 company  |   tdate    | price | first_value | last_value | max  | min | sum  |          avg          | count 
----------+------------+-------+-------------+------------+------+-----+------+-----------------------+-------
 company1 | 07-01-2023 |   100 |         100 |        140 |  200 | 100 |  590 |  147.5000000000000000 |     4
 company1 | 07-02-2023 |   200 |             |            |      |     |      |                       |      
 company1 | 07-03-2023 |   150 |             |            |      |     |      |                       |      
 company1 | 07-04-2023 |   140 |             |            |      |     |      |                       |      
 company1 | 07-05-2023 |   150 |             |            |      |     |      |                       |     0
 company1 | 07-06-2023 |    90 |          90 |        120 |  130 |  90 |  450 |  112.5000000000000000 |     4
 company1 | 07-07-2023 |   110 |             |            |      |     |      |                       |      
 company1 | 07-08-2023 |   130 |             |            |      |     |      |                       |      
 company1 | 07-09-2023 |   120 |             |            |      |     |      |                       |      
 company1 | 07-10-2023 |   130 |             |            |      |     |      |                       |     0
 company2 | 07-01-2023 |    50 |          50 |       1400 | 2000 |  50 | 4950 | 1237.5000000000000000 |     4
 company2 | 07-02-2023 |  2000 |             |            |      |     |      |                       |      
 company2 | 07-03-2023 |  1500 |             |            |      |     |      |                       |      
 company2 | 07-04-2023 |  1400 |             |            |      |     |      |                       |      
 company2 | 07-05-2023 |  1500 |             |            |      |     |      |                       |     0
 company2 | 07-06-2023 |    60 |          60 |       1200 | 1300 |  60 | 3660 |  915.0000000000000000 |     4
 company2 | 07-07-2023 |  1100 |             |            |      |     |      |                       |      
 company2 | 07-08-2023 |  1300 |             |            |      |     |      |                       |      
 company2 | 07-09-2023 |  1200 |             |            |      |     |      |                       |      
 company2 | 07-10-2023 |  1300 |             |            |      |     |      |                       |     0
(20 rows)

Attached is the fix against v6 patch. I will include this in upcoming v7 patch.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Jacob Champion <jchampion@timescale.com> — 2023-09-12T22:09:29Z

On Mon, Sep 11, 2023 at 11:18 PM Tatsuo Ishii <ishii@sraoss.co.jp> wrote:
> What I am not sure about is, you and Vik mentioned that the
> traditional NFA is superior that POSIX NFA in terms of performance.
> But how "lexicographic ordering" is related to performance?

I think they're only tangentially related. POSIX NFAs have to fully
backtrack even after the first match is found, so that's where the
performance difference comes in. (We would be introducing new ways to
catastrophically backtrack if we used that approach.) But since you
don't visit every possible path through the graph with a traditional
NFA, it makes sense to define an order in which you visit the nodes,
so that you can reason about which string is actually going to be
matched in the end.

> BTW, attched is the v6 patch. The differences from v5 include:
>
> - Now aggregates can be used with RPR. Below is an example from the
>   regression test cases, which is added by v6 patch.

Great, thank you!

--Jacob

Re: Row pattern recognition

Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-09-13T05:14:07Z

> <quote from 19075-5:2023>

I was looking for this but I only found ISO/IEC 19075-5:2021.
https://www.iso.org/standard/78936.html

Maybe 19075-5:2021 is the latest one?

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Vik Fearing <vik@postgresfriends.org> — 2023-09-13T11:28:45Z

On 9/13/23 07:14, Tatsuo Ishii wrote:
>> <quote from 19075-5:2023>
> 
> I was looking for this but I only found ISO/IEC 19075-5:2021.
> https://www.iso.org/standard/78936.html
> 
> Maybe 19075-5:2021 is the latest one?

Yes, probably.  Sorry.
-- 
Vik Fearing

Re: Row pattern recognition

Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-09-13T12:35:53Z

> On 9/13/23 07:14, Tatsuo Ishii wrote:
>>> <quote from 19075-5:2023>
>> I was looking for this but I only found ISO/IEC 19075-5:2021.
>> https://www.iso.org/standard/78936.html
>> Maybe 19075-5:2021 is the latest one?
> 
> Yes, probably.  Sorry.

No problem. Thanks for confirmation.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-09-22T05:16:40Z

> Attached is the fix against v6 patch. I will include this in upcoming v7 patch.

Attached is the v7 patch. It includes the fix mentioned above.  Also
this time the pattern matching engine is enhanced: previously it
recursed to row direction, which means if the number of rows in a
frame is large, it could exceed the limit of stack depth.  The new
version recurses over matched pattern variables in a row, which is at
most 26 which should be small enough.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Erik Rijkers <er@xs4all.nl> — 2023-09-22T08:12:38Z

Op 9/22/23 om 07:16 schreef Tatsuo Ishii:
>> Attached is the fix against v6 patch. I will include this in upcoming v7 patch.
> 
> Attached is the v7 patch. It includes the fix mentioned above.  Also

Hi,

In my hands, make check fails on the rpr test; see attached .diff file.
In these two statements:
-- using NEXT
-- using AFTER MATCH SKIP TO NEXT ROW
   result of first_value(price) and next_value(price) are empty.


Erik Rijkers


> this time the pattern matching engine is enhanced: previously it
> recursed to row direction, which means if the number of rows in a
> frame is large, it could exceed the limit of stack depth.  The new
> version recurses over matched pattern variables in a row, which is at
> most 26 which should be small enough.
> 
> Best reagards,
> --
> Tatsuo Ishii
> SRA OSS LLC
> English: http://www.sraoss.co.jp/index_en/
> Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Erik Rijkers <er@xs4all.nl> — 2023-09-22T08:23:11Z

Op 9/22/23 om 07:16 schreef Tatsuo Ishii:
>> Attached is the fix against v6 patch. I will include this in upcoming v7 patch.
> 
> Attached is the v7 patch. It includes the fix mentioned above.  Also
(Champion's address bounced; removed)

Hi,

In my hands, make check fails on the rpr test; see attached .diff file.
In these two statements:
-- using NEXT
-- using AFTER MATCH SKIP TO NEXT ROW
   result of first_value(price) and next_value(price) are empty.

Erik Rijkers


> this time the pattern matching engine is enhanced: previously it
> recursed to row direction, which means if the number of rows in a
> frame is large, it could exceed the limit of stack depth.  The new
> version recurses over matched pattern variables in a row, which is at
> most 26 which should be small enough.
> 
> Best reagards,
> --
> Tatsuo Ishii
> SRA OSS LLC
> English: http://www.sraoss.co.jp/index_en/
> Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Erik Rijkers <er@xs4all.nl> — 2023-09-22T08:26:49Z

Op 9/22/23 om 10:23 schreef Erik Rijkers:
> Op 9/22/23 om 07:16 schreef Tatsuo Ishii:
>>> Attached is the fix against v6 patch. I will include this in upcoming 
>>> v7 patch.
>>
>> Attached is the v7 patch. It includes the fix mentioned above.  Also
> (Champion's address bounced; removed)
> 

Sorry, I forgot to re-attach the regression.diffs with resend...

Erik

> Hi,
> 
> In my hands, make check fails on the rpr test; see attached .diff file.
> In these two statements:
> -- using NEXT
> -- using AFTER MATCH SKIP TO NEXT ROW
>    result of first_value(price) and next_value(price) are empty.
> 
> Erik Rijkers
> 
> 
>> this time the pattern matching engine is enhanced: previously it
>> recursed to row direction, which means if the number of rows in a
>> frame is large, it could exceed the limit of stack depth.  The new
>> version recurses over matched pattern variables in a row, which is at
>> most 26 which should be small enough.
>>
>> Best reagards,
>> -- 
>> Tatsuo Ishii
>> SRA OSS LLC
>> English: http://www.sraoss.co.jp/index_en/
>> Japanese:http://www.sraoss.co.jp
> 
>

Re: Row pattern recognition

Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-09-22T10:12:50Z

> Op 9/22/23 om 07:16 schreef Tatsuo Ishii:
>>> Attached is the fix against v6 patch. I will include this in upcoming
>>> v7 patch.
>> Attached is the v7 patch. It includes the fix mentioned above.  Also
> (Champion's address bounced; removed)

On my side his adress bounced too:-<

> Hi,
> 
> In my hands, make check fails on the rpr test; see attached .diff
> file.
> In these two statements:
> -- using NEXT
> -- using AFTER MATCH SKIP TO NEXT ROW
>   result of first_value(price) and next_value(price) are empty.

Strange. I have checked out fresh master branch and applied the v7
patches, then ran make check. All tests including the rpr test
passed. This is Ubuntu 20.04.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Erik Rijkers <er@xs4all.nl> — 2023-09-22T11:28:11Z

Op 9/22/23 om 12:12 schreef Tatsuo Ishii:
>> Op 9/22/23 om 07:16 schreef Tatsuo Ishii:
>>>> Attached is the fix against v6 patch. I will include this in upcoming
>>>> v7 patch.
>>> Attached is the v7 patch. It includes the fix mentioned above.  Also
>> (Champion's address bounced; removed)
> 
> On my side his adress bounced too:-<
> 
>> Hi,
>>
>> In my hands, make check fails on the rpr test; see attached .diff
>> file.
>> In these two statements:
>> -- using NEXT
>> -- using AFTER MATCH SKIP TO NEXT ROW
>>    result of first_value(price) and next_value(price) are empty.
> 
> Strange. I have checked out fresh master branch and applied the v7
> patches, then ran make check. All tests including the rpr test
> passed. This is Ubuntu 20.04.

The curious thing is that the server otherwise builds ok, and if I 
explicitly run on that server 'CREATE TEMP TABLE stock' + the 20 INSERTS 
  (just to make sure to have known data), those two statements now both 
return the correct result.

So maybe the testing/timing is wonky (not necessarily the server).

Erik

> 
> Best reagards,
> --
> Tatsuo Ishii
> SRA OSS LLC
> English: http://www.sraoss.co.jp/index_en/
> Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Jacob Champion <champion.p@gmail.com> — 2023-09-22T14:48:22Z

On Fri, Sep 22, 2023, 3:13 AM Tatsuo Ishii <ishii@sraoss.co.jp> wrote:

> > Op 9/22/23 om 07:16 schreef Tatsuo Ishii:
> >>> Attached is the fix against v6 patch. I will include this in upcoming
> >>> v7 patch.
> >> Attached is the v7 patch. It includes the fix mentioned above.  Also
> > (Champion's address bounced; removed)
>
> On my side his adress bounced too:-<
>

Yep. I'm still here, just lurking for now. It'll take a little time for me
to get back to this thread, as my schedule has changed significantly. :D

Thanks,
--Jacob

>

Re: Row pattern recognition

Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-09-25T05:26:30Z

> On Fri, Sep 22, 2023, 3:13 AM Tatsuo Ishii <ishii@sraoss.co.jp> wrote:
> 
>> > Op 9/22/23 om 07:16 schreef Tatsuo Ishii:
>> >>> Attached is the fix against v6 patch. I will include this in upcoming
>> >>> v7 patch.
>> >> Attached is the v7 patch. It includes the fix mentioned above.  Also
>> > (Champion's address bounced; removed)
>>
>> On my side his adress bounced too:-<
>>
> 
> Yep. I'm still here, just lurking for now. It'll take a little time for me
> to get back to this thread, as my schedule has changed significantly. :D

Hope you get back soon...

By the way, I was thinking about eliminating recusrive calls in
pattern matching. Attached is the first cut of the implementation. In
the attached v8 patch:

- No recursive calls anymore. search_str_set_recurse was removed.

- Instead it generates all possible pattern variable name initial
  strings before pattern matching. Suppose we have "ab" (row 0) and
  "ac" (row 1). "a" and "b" represents the pattern variable names
  which are evaluated to true.  In this case it will generate "aa",
  "ac", "ba" and "bc" and they are examined by do_pattern_match one by
  one, which performs pattern matching.

- To implement this, an infrastructure string_set* are created. They
  take care of set of StringInfo.

I found that the previous implementation did not search all possible
cases. I believe the bug is fixed in v8.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-10-04T06:03:28Z

> By the way, I was thinking about eliminating recusrive calls in
> pattern matching. Attached is the first cut of the implementation. In
> the attached v8 patch:
> 
> - No recursive calls anymore. search_str_set_recurse was removed.
> 
> - Instead it generates all possible pattern variable name initial
>   strings before pattern matching. Suppose we have "ab" (row 0) and
>   "ac" (row 1). "a" and "b" represents the pattern variable names
>   which are evaluated to true.  In this case it will generate "aa",
>   "ac", "ba" and "bc" and they are examined by do_pattern_match one by
>   one, which performs pattern matching.
> 
> - To implement this, an infrastructure string_set* are created. They
>   take care of set of StringInfo.
> 
> I found that the previous implementation did not search all possible
> cases. I believe the bug is fixed in v8.

The v8 patch does not apply anymore due to commit d060e921ea "Remove obsolete executor cleanup code".
So I rebased and created v9 patch. The differences from the v8 include:

- Fix bug with get_slots. It did not correctly detect the end of full frame.
- Add test case using "ROWS BETWEEN CURRENT ROW AND offset FOLLOWING".

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-10-22T02:39:20Z

Attached is the v10 patch. This version enhances the performance of
pattern matching.  Previously it generated all possible pattern string
candidates. This resulted in unnecessarily large number of
candidates. For example if you have 2 pattern variables and the target
frame includes 100 rows, the number of candidates can reach to 2^100
in the worst case. To avoid this, I do a pruning in the v10
patch. Suppose you have:

PATTERN (A B+ C+)

Candidates like "BAC" "CAB" cannot survive because they never satisfy
the search pattern. To judge this, I assign sequence numbers (0, 1, 2)
to (A B C).  If the pattern generator tries to generate BA, this is
not allowed because the sequence number for B is 1 and for A is 0, and
0 < 1: B cannot be followed by A. Note that this technique can be
applied when the quantifiers are "+" or "*". Maybe other quantifiers
such as '?'  or '{n, m}' can be applied too but I don't confirm yet
because I have not implemented them yet.

Besides this improvement, I fixed a bug in the previous and older
patches: when an expression in DEFINE uses text operators, it errors
out:

ERROR:  could not determine which collation to use for string comparison
HINT:  Use the COLLATE clause to set the collation explicitly.

This was fixed by adding assign_expr_collations() in
transformDefineClause().

Also I have updated documentation "3.5. Window Functions"

- It still mentioned about rpr(). It's not applied anymore.
- Enhance the description about DEFINE and PATTERN.
- Mention that quantifier '*' is supported.

Finally I have added more test cases to the regression test.
- same pattern variable appears twice
- case for quantifier '*'

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Jacob Champion <champion.p@gmail.com> — 2023-10-24T18:51:19Z

On Sat, Oct 21, 2023 at 7:39 PM Tatsuo Ishii <ishii@sraoss.co.jp> wrote:
> Attached is the v10 patch. This version enhances the performance of
> pattern matching.

Nice! I've attached a couple of more stressful tests (window
partitions of 1000 rows each). Beware that the second one runs my
desktop out of memory fairly quickly with the v10 implementation.

I was able to carve out some time this week to implement a very basic
recursive NFA, which handles both the + and * qualifiers (attached).
It's not production quality -- a frame on the call stack for every row
isn't going to work -- but with only those two features, it's pretty
tiny, and it's able to run the new stress tests with no issue. If I
continue to have time, I hope to keep updating this parallel
implementation as you add features to the StringSet implementation,
and we can see how it evolves. I expect that alternation and grouping
will ratchet up the complexity.

Thanks!
--Jacob

Re: Row pattern recognition

Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-10-25T00:11:05Z

> On Sat, Oct 21, 2023 at 7:39 PM Tatsuo Ishii <ishii@sraoss.co.jp> wrote:
>> Attached is the v10 patch. This version enhances the performance of
>> pattern matching.
> 
> Nice! I've attached a couple of more stressful tests (window
> partitions of 1000 rows each). Beware that the second one runs my
> desktop out of memory fairly quickly with the v10 implementation.
> 
> I was able to carve out some time this week to implement a very basic
> recursive NFA, which handles both the + and * qualifiers (attached).

Great. I will look into this.

> It's not production quality -- a frame on the call stack for every row
> isn't going to work

Yeah.

> -- but with only those two features, it's pretty
> tiny, and it's able to run the new stress tests with no issue. If I
> continue to have time, I hope to keep updating this parallel
> implementation as you add features to the StringSet implementation,
> and we can see how it evolves. I expect that alternation and grouping
> will ratchet up the complexity.

Sounds like a plan.

By the way, I tested my patch (v10) to handle more large data set and
tried to following query with pgbench database. On my laptop it works
with 100k rows pgbench_accounts table but with beyond the number I got
OOM killer. I would like to enhance this in the next patch.

SELECT aid, first_value(aid) OVER w,
count(*) OVER w
FROM pgbench_accounts
WINDOW w AS (
PARTITION BY bid
ORDER BY aid
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
AFTER MATCH SKIP PAST LAST ROW
INITIAL
PATTERN (START UP+)
DEFINE
START AS TRUE,
UP AS aid > PREV(aid)
);

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-10-25T02:49:30Z

> Great. I will look into this.

I am impressed the simple NFA implementation.  It would be nicer if it
could be implemented without using recursion.

> By the way, I tested my patch (v10) to handle more large data set and
> tried to following query with pgbench database. On my laptop it works
> with 100k rows pgbench_accounts table but with beyond the number I got
       ~~~ I meant 10k.

> OOM killer. I would like to enhance this in the next patch.
> 
> SELECT aid, first_value(aid) OVER w,
> count(*) OVER w
> FROM pgbench_accounts
> WINDOW w AS (
> PARTITION BY bid
> ORDER BY aid
> ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
> AFTER MATCH SKIP PAST LAST ROW
> INITIAL
> PATTERN (START UP+)
> DEFINE
> START AS TRUE,
> UP AS aid > PREV(aid)
> );

I ran this against your patch. It failed around > 60k rows.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Jacob Champion <champion.p@gmail.com> — 2023-10-30T19:49:18Z

On Tue, Oct 24, 2023 at 7:49 PM Tatsuo Ishii <ishii@sraoss.co.jp> wrote:
> I am impressed the simple NFA implementation.

Thanks!

> It would be nicer if it
> could be implemented without using recursion.

Yeah. If for some reason we end up going with a bespoke
implementation, I assume we'd just convert the algorithm to an
iterative one and optimize it heavily. But I didn't want to do that
too early, since it'd probably make it harder to add new features...
and anyway my goal is still to try to reuse src/backend/regex
eventually.

> > SELECT aid, first_value(aid) OVER w,
> > count(*) OVER w
> > FROM pgbench_accounts
> > WINDOW w AS (
> > PARTITION BY bid
> > ORDER BY aid
> > ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
> > AFTER MATCH SKIP PAST LAST ROW
> > INITIAL
> > PATTERN (START UP+)
> > DEFINE
> > START AS TRUE,
> > UP AS aid > PREV(aid)
> > );
>
> I ran this against your patch. It failed around > 60k rows.

Nice, that's actually more frames than I expected. Looks like I have
similar results here with my second test query (segfault at ~58k
rows).

Thanks,
--Jacob

Re: Row pattern recognition

Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-11-08T07:37:05Z

>> It would be nicer if it
>> could be implemented without using recursion.
> 
> Yeah. If for some reason we end up going with a bespoke
> implementation, I assume we'd just convert the algorithm to an
> iterative one and optimize it heavily. But I didn't want to do that
> too early, since it'd probably make it harder to add new features...
> and anyway my goal is still to try to reuse src/backend/regex
> eventually.

Ok.

Attached is the v11 patch. Below are the summary of the changes from
previous version.

- rebase.

- Reduce memory allocation in pattern matching (search_str_set()). But
  still Champion's second stress test gives OOM killer.
    
  - While keeping an old set to next round, move the StringInfo to
    new_str_set, rather than copying from old_str_set. This allows to
    run pgbench.sql against up to 60k rows on my laptop (previously
    20k).
    
  - Use enlargeStringInfo to set the buffer size, rather than
    incrementally enlarge the buffer. This does not seem to give big
    enhancement but it should theoretically an enhancement.

- Fix "variable not found in subplan target list" error if WITH is
  used.
    
  - To fix this apply pullup_replace_vars() against DEFINE clause in
    planning phase (perform_pullup_replace_vars()).  Also add
    regression test cases for WITH that caused the error in the
    previous version.

- Fix the case when no greedy quantifiers ('+' or '*') are included in
  PATTERN.
    
  - Previously update_reduced_frame() did not consider the case and
    produced wrong results. Add another code path which is dedicated
    to none greedy PATTERN (at this point, it means there's no
    quantifier case). Also add a test case for this.

- Remove unnecessary check in transformPatternClause().

  - Previously it checked if all pattern variables are defined in
    DEFINE clause. But currently RPR allows to "auto define" such
    variables as "varname AS TRUE". So the check was not necessary.

- FYI here is the list to explain what was changed in each patch file.

0001-Row-pattern-recognition-patch-for-raw-parser.patch
- same

0002-Row-pattern-recognition-patch-parse-analysis.patch
- Add markTargetListOrigins() to transformFrameOffset().
- Change transformPatternClause().

0003-Row-pattern-recognition-patch-planner.patch
- Fix perform_pullup_replace_vars()

0004-Row-pattern-recognition-patch-executor.patch
- Fix update_reduced_frame()
- Fix search_str_set()

0005-Row-pattern-recognition-patch-docs.patch
- same

0006-Row-pattern-recognition-patch-tests.patch
- Add test case for non-greedy and WITH cases

0007-Allow-to-print-raw-parse-tree.patch
- same

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-12-08T01:16:13Z

Sorry for posting v12 patch again. It seems the previous post of v12
patch email lost mail threading information and was not recognized as
a part of the thread by CF application and CFbot.
https://www.postgresql.org/message-id/20231204.204048.1998548830490453126.t-ishii%40sranhm.sra.co.jp

Attached is the v12 patch. Below are the summary of the changes from
previous version.

- Rebase. CFbot says v11 patch needs rebase since Nov 30, 2023.
 
- Apply preprocess_expression() to DEFINE clause in the planning
  phase.  This is necessary to simply const expressions like:

    DEFINE A price < (99 + 1)
    to:
    DEFINE A price < 100

- Re-allow to use WinSetMarkPosition() in eval_windowaggregates().

- FYI here is the list to explain what were changed in each patch file.

0001-Row-pattern-recognition-patch-for-raw-parser.patch
- Fix conflict.

0002-Row-pattern-recognition-patch-parse-analysis.patch
- Same as before.

0003-Row-pattern-recognition-patch-planner.patch
- Call preprocess_expression() for DEFINE clause in subquery_planner().

0004-Row-pattern-recognition-patch-executor.patch
- Re-allow to use WinSetMarkPosition() in eval_windowaggregates().

0005-Row-pattern-recognition-patch-docs.patch
- Same as before.

0006-Row-pattern-recognition-patch-tests.patch
- Same as before.

0007-Allow-to-print-raw-parse-tree.patch
- Same as before.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Tatsuo Ishii <ishii@sraoss.co.jp> — 2023-12-08T22:22:58Z

> On 04.12.23 12:40, Tatsuo Ishii wrote:
>> diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
>> index d631ac89a9..5a77fca17f 100644
>> --- a/src/backend/parser/gram.y
>> +++ b/src/backend/parser/gram.y
>> @@ -251,6 +251,8 @@ static Node *makeRecursiveViewSelect(char
>> *relname, List *aliases, Node *query);
>>   	DefElem	   *defelt;
>>   	SortBy	   *sortby;
>>   	WindowDef  *windef;
>> +	RPCommonSyntax	*rpcom;
>> +	RPSubsetItem	*rpsubset;
>>   	JoinExpr   *jexpr;
>>   	IndexElem  *ielem;
>>   	StatsElem  *selem;
>> @@ -278,6 +280,7 @@ static Node *makeRecursiveViewSelect(char
>> *relname, List *aliases, Node *query);
>>   	MergeWhenClause *mergewhen;
>>   	struct KeyActions *keyactions;
>>   	struct KeyAction *keyaction;
>> +	RPSkipTo	skipto;
>>   }
>>     %type <node>	stmt toplevel_stmt schema_stmt routine_body_stmt
> 
> It is usually not the style to add an entry for every node type to the
> %union.  Otherwise, we'd have hundreds of entries in there.

Ok, I have removed the node types and used existing node types.  Also
I have moved RPR related %types to same place to make it easier to know
what are added by RPR.

>> @@ -866,6 +878,7 @@ static Node *makeRecursiveViewSelect(char
>> *relname, List *aliases, Node *query);
>>   %nonassoc UNBOUNDED /* ideally would have same precedence as IDENT */
>>   %nonassoc IDENT PARTITION RANGE ROWS GROUPS PRECEDING FOLLOWING CUBE
>>   %ROLLUP
>>   			SET KEYS OBJECT_P SCALAR VALUE_P WITH WITHOUT
>> +%nonassoc	MEASURES AFTER INITIAL SEEK PATTERN_P
>>   %left Op OPERATOR /* multi-character ops and user-defined operators */
>>   %left		'+' '-'
>>   %left		'*' '/' '%'
> 
> It was recently discussed that these %nonassoc should ideally all have
> the same precedence.  Did you consider that here?

No, I didn't realize it. Thanks for pointing it out. I have removed
%nonassoc so that MEASURES etc. have the same precedence as IDENT etc.

Attached is the new diff of gram.y against master branch.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

NINGWEI CHEN <chen@sraoss.co.jp> — 2024-01-22T05:51:49Z

On Sat, 09 Dec 2023 07:22:58 +0900 (JST)
Tatsuo Ishii <ishii@sraoss.co.jp> wrote:

> > On 04.12.23 12:40, Tatsuo Ishii wrote:
> >> diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
> >> index d631ac89a9..5a77fca17f 100644
> >> --- a/src/backend/parser/gram.y
> >> +++ b/src/backend/parser/gram.y
> >> @@ -251,6 +251,8 @@ static Node *makeRecursiveViewSelect(char
> >> *relname, List *aliases, Node *query);
> >>   	DefElem	   *defelt;
> >>   	SortBy	   *sortby;
> >>   	WindowDef  *windef;
> >> +	RPCommonSyntax	*rpcom;
> >> +	RPSubsetItem	*rpsubset;
> >>   	JoinExpr   *jexpr;
> >>   	IndexElem  *ielem;
> >>   	StatsElem  *selem;
> >> @@ -278,6 +280,7 @@ static Node *makeRecursiveViewSelect(char
> >> *relname, List *aliases, Node *query);
> >>   	MergeWhenClause *mergewhen;
> >>   	struct KeyActions *keyactions;
> >>   	struct KeyAction *keyaction;
> >> +	RPSkipTo	skipto;
> >>   }
> >>     %type <node>	stmt toplevel_stmt schema_stmt routine_body_stmt
> > 
> > It is usually not the style to add an entry for every node type to the
> > %union.  Otherwise, we'd have hundreds of entries in there.
> 
> Ok, I have removed the node types and used existing node types.  Also
> I have moved RPR related %types to same place to make it easier to know
> what are added by RPR.
> 
> >> @@ -866,6 +878,7 @@ static Node *makeRecursiveViewSelect(char
> >> *relname, List *aliases, Node *query);
> >>   %nonassoc UNBOUNDED /* ideally would have same precedence as IDENT */
> >>   %nonassoc IDENT PARTITION RANGE ROWS GROUPS PRECEDING FOLLOWING CUBE
> >>   %ROLLUP
> >>   			SET KEYS OBJECT_P SCALAR VALUE_P WITH WITHOUT
> >> +%nonassoc	MEASURES AFTER INITIAL SEEK PATTERN_P
> >>   %left Op OPERATOR /* multi-character ops and user-defined operators */
> >>   %left		'+' '-'
> >>   %left		'*' '/' '%'
> > 
> > It was recently discussed that these %nonassoc should ideally all have
> > the same precedence.  Did you consider that here?
> 
> No, I didn't realize it. Thanks for pointing it out. I have removed
> %nonassoc so that MEASURES etc. have the same precedence as IDENT etc.
> 
> Attached is the new diff of gram.y against master branch.

Thank you very much for providing the patch for the RPR implementation.

After applying the v12-patches, I noticed an issue that
the rpr related parts in window clauses were not displayed in the
view definitions (the definition column of pg_views).

To address this, I have taken the liberty of adding an additional patch
that modifies the relevant rewriter source code.
I have attached the rewriter patch for your review and would greatly appreciate your feedback.

Thank you for your time and consideration.

-- 
SRA OSS LLC
Ningwei Chen <chen@sraoss.co.jp>
TEL: 03-5979-2701 FAX: 03-5979-2702

Re: Row pattern recognition

Tatsuo Ishii <ishii@sraoss.co.jp> — 2024-01-22T06:22:11Z

> Thank you very much for providing the patch for the RPR implementation.
> 
> After applying the v12-patches, I noticed an issue that
> the rpr related parts in window clauses were not displayed in the
> view definitions (the definition column of pg_views).
> 
> To address this, I have taken the liberty of adding an additional patch
> that modifies the relevant rewriter source code.
> I have attached the rewriter patch for your review and would greatly appreciate your feedback.
> 
> Thank you for your time and consideration.

Thank you so much for spotting the issue and creating the patch. I
confirmed that your patch applies cleanly and solve the issue. I will
include the patches into upcoming v13 patches.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Tatsuo Ishii <ishii@sraoss.co.jp> — 2024-01-22T10:26:18Z

Attached is the v13 patch. Below are the summary of the changes from
previous version (besides rebase).

0001-Row-pattern-recognition-patch-for-raw-parser.patch
- Fix raw paser per Peter Eisentraut's review. Remove the new node
  types and use existing ones. Also remove %nonassoc so that
  MEASURES etc. have the same precedence as IDENT etc.

Peter's comment:
> It is usually not the style to add an entry for every node type to the
> %union.  Otherwise, we'd have hundreds of entries in there.

> It was recently discussed that these %nonassoc should ideally all have
> the same precedence.  Did you consider that here?

0002-Row-pattern-recognition-patch-parse-analysis.patch
- Fix transformRPR so that SKIP variable name in the AFTER MATCH SKIP
  TO clause is tracked. This is added by Ningwei Chen.

0003-Row-pattern-recognition-patch-rewriter.patch
This is a new patch for rewriter. Contributed by Ningwei Chen.

Chen's comment:
> After applying the v12-patches, I noticed an issue that
> the rpr related parts in window clauses were not displayed in the
> view definitions (the definition column of pg_views).

0004-Row-pattern-recognition-patch-planner.patch
- same as before (previously it was 0003-Row-pattern-recognition-patch-planner.patch)

0005-Row-pattern-recognition-patch-executor.patch
- same as before (previously it was 0004-Row-pattern-recognition-patch-executor.patch)

0006-Row-pattern-recognition-patch-docs.patch
- Same as before. (previously it was 0005-Row-pattern-recognition-patch-docs.patch)

0007-Row-pattern-recognition-patch-tests.patch
- Same as before. (previously it was 0006-Row-pattern-recognition-patch-tests.patch)

0008-Allow-to-print-raw-parse-tree.patch
- Same as before. (previously it was 0007-Allow-to-print-raw-parse-tree.patch).
  Note that patch is not intended to be incorporated into main
  tree. This is just for debugging purpose. With this patch, raw parse
  tree is printed if debug_print_parse is enabled.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Tatsuo Ishii <ishii@sraoss.co.jp> — 2024-02-29T00:19:54Z

Attached is the v14 patch. Below are the summary of the changes from
previous version (besides rebase).
V14 patches are mainly for coding style fixes.

0001-Row-pattern-recognition-patch-for-raw-parser.patch
- Fold too long lines and run pgindent.

0002-Row-pattern-recognition-patch-parse-analysis.patch
- Fold too long lines and run pgindent.

0003-Row-pattern-recognition-patch-rewriter.patch
- Fold too long lines and run pgindent.

0004-Row-pattern-recognition-patch-planner.patch
- Fold too long lines and run pgindent.

0005-Row-pattern-recognition-patch-executor.patch
- Fold too long lines and run pgindent.

- Surround debug lines using "ifdef RPR_DEBUG" so that logs are not
  contaminated by RPR debug logs when log_min_messages are set to
  DEBUG1 or higher.

0006-Row-pattern-recognition-patch-docs.patch
- Same as before. (previously it was 0005-Row-pattern-recognition-patch-docs.patch)

0007-Row-pattern-recognition-patch-tests.patch
- Same as before. (previously it was 0006-Row-pattern-recognition-patch-tests.patch)

0008-Allow-to-print-raw-parse-tree.patch
- Same as before.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Tatsuo Ishii <ishii@sraoss.co.jp> — 2024-03-28T10:59:25Z

Attached is the v15 patch. No changes are made except rebasing due to
recent grammar changes.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Tatsuo Ishii <ishii@sraoss.co.jp> — 2024-04-12T07:09:08Z

Attached is the v16 patch. No changes are made except rebasing due to
recent grammar changes.

Also I removed chen@sraoss.co.jp from the Cc: list. The email address
is no longer valid.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Tatsuo Ishii <ishii@sraoss.co.jp> — 2024-04-24T03:12:44Z

Hi Vik and Champion,

I think the current RPR patch is not quite correct in handling
count(*).

(using slightly modified version of Vik's example query)

SELECT v.a, count(*) OVER w
FROM (VALUES ('A'),('B'),('B'),('C')) AS v (a)
WINDOW w AS (
  ORDER BY v.a
  ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
  PATTERN (B+)
  DEFINE B AS a = 'B'
)
 a | count 
---+-------
 A |     0
 B |     2
 B |      
 C |     0
(4 rows)

Here row 3 is skipped because the pattern B matches row 2 and 3. In
this case I think cont(*) should return 0 rathern than NULL for row 3.

What do you think?

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Jacob Champion <jacob.champion@enterprisedb.com> — 2024-04-24T17:55:29Z

On Tue, Apr 23, 2024 at 8:13 PM Tatsuo Ishii <ishii@sraoss.co.jp> wrote:
> SELECT v.a, count(*) OVER w
> FROM (VALUES ('A'),('B'),('B'),('C')) AS v (a)
> WINDOW w AS (
>   ORDER BY v.a
>   ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
>   PATTERN (B+)
>   DEFINE B AS a = 'B'
> )
>  a | count
> ---+-------
>  A |     0
>  B |     2
>  B |
>  C |     0
> (4 rows)
>
> Here row 3 is skipped because the pattern B matches row 2 and 3. In
> this case I think cont(*) should return 0 rathern than NULL for row 3.

I think returning zero would match Vik's explanation upthread [1],
yes. Unfortunately I don't have a spec handy to double-check for
myself right now.

--Jacob

[1] https://www.postgresql.org/message-id/c9ebc3d0-c3d1-e8eb-4a57-0ec099cbda17%40postgresfriends.org

Re: Row pattern recognition

Tatsuo Ishii <ishii@sraoss.co.jp> — 2024-04-26T06:09:32Z

> On Tue, Apr 23, 2024 at 8:13 PM Tatsuo Ishii <ishii@sraoss.co.jp> wrote:
>> SELECT v.a, count(*) OVER w
>> FROM (VALUES ('A'),('B'),('B'),('C')) AS v (a)
>> WINDOW w AS (
>>   ORDER BY v.a
>>   ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
>>   PATTERN (B+)
>>   DEFINE B AS a = 'B'
>> )
>>  a | count
>> ---+-------
>>  A |     0
>>  B |     2
>>  B |
>>  C |     0
>> (4 rows)
>>
>> Here row 3 is skipped because the pattern B matches row 2 and 3. In
>> this case I think cont(*) should return 0 rathern than NULL for row 3.
> 
> I think returning zero would match Vik's explanation upthread [1],
> yes. Unfortunately I don't have a spec handy to double-check for
> myself right now.

Ok. I believe you and Vik are correct.
I am modifying the patch in this direction.
Attached is the regression diff after modifying the patch.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Tatsuo Ishii <ishii@sraoss.co.jp> — 2024-04-28T11:28:26Z

>> I think returning zero would match Vik's explanation upthread [1],
>> yes. Unfortunately I don't have a spec handy to double-check for
>> myself right now.
> 
> Ok. I believe you and Vik are correct.
> I am modifying the patch in this direction.

Attached are the v17 patches in the direction. Differences from v16
include:

- In 0005 executor patch, aggregation in RPR always restarts for each
  row. This is necessary to run aggregates on no matching (due to
  skipping) or empty matching (due to no pattern variables matches)
  rows to produce NULL (most aggregates) or 0 (count) properly. In v16
  I had a hack using a flag to force the aggregation results to be
  NULL in case of no match or empty match in
  finalize_windowaggregate(). v17 eliminates the dirty hack.

- 0006 docs and 0007 test patches are adjusted to reflect the RPR
  output chages in 0005.
  
Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Tatsuo Ishii <ishii@sraoss.co.jp> — 2024-05-11T07:23:07Z

Attached are the v18 patches. To fix conflicts due to recent commit:

7d2c7f08d9 Fix query pullup issue with WindowClause runCondition

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Tatsuo Ishii <ishii@sraoss.co.jp> — 2024-05-15T00:02:03Z

Attached are the v19 patches. Changes from v18 include:

0002:
- add a check whether DEFINE clause includes subqueries. If so, error out.
0007:
- fix wrong test (row pattern definition variable name must not appear
  more than once)
- remove unnessary test (undefined define variable is not allowed).
  We have already allowed the undefined variables.
- add tests: subqueries and aggregates in DEFINE clause are not
  supported. The standard allows them but I have not implemented them
  yet.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Tatsuo Ishii <ishii@sraoss.co.jp> — 2024-05-24T02:39:19Z

Attached are the v20 patches. Just rebased.
(The conflict was in 0001 patch.)

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Tatsuo Ishii <ishii@sraoss.co.jp> — 2024-06-13T00:25:01Z

I gave a talk on RPR in PGConf.dev 2024.
https://www.pgevents.ca/events/pgconfdev2024/schedule/session/114-implementing-row-pattern-recognition/
(Slides are available from the link).

Vik Faring and Jacob Champion were one of the audiences and we had a
small discussion after the talk. We continued the discussion off list
on how to move forward the RPR implementation project. One of the
ideas is, to summarize what are in the patch and what are not from the
SQL standard specification's point of view. This should help us to
reach the consensus regarding "minimum viable" feature set if we want
to bring the patch in upcoming PostgreSQL v18.

Here is the first cut of the document. Comments/feedback are welcome.

-------------------------------------------------------------------------
This memo describes the current status of implementation of SQL/RPR
(Row Pattern Recognition), as of June 13, 2024 (the latest patch is v20).

- RPR in FROM clause and WINDOW clause

The SQL standard defines two features regarding SQL/RPR - R010 (RPR in
FROM clause) and R020 (RPR in WINDOW clause). Only R020 is
implemented. From now on, we discuss on R020.

- Overview of R020 syntax

WINDOW window_name AS (
[ PARTITION BY ... ]
[ ORDER BY... ]
[ MEASURES ... ]
ROWS BETWEEN CURRENT ROW AND ...
[ AFTER MATCH SKIP ... ]
[ INITIAL|SEEK ]
PATTERN (...)
[ SUBSET ... ]
DEFINE ...
)

-- PARTITION BY and ORDER BY are not specific to RPR and has been
  already there in current PostgreSQL.

-- What are (partially) implemented:

AFTER MATCH SKIP
INITIAL|SEEK
PATTERN
DEFINE

-- What are not implemented at all:
MEASURES
SUBSET

Followings are detailed status of the each clause.

- AFTER MATCH SKIP

-- Implemented:
AFTER MATCH SKIP TO NEXT ROW
AFTER MATCH SKIP PAST LAST ROW

-- Not implemented:
AFTER MATCH SKIP TO FIRST|LAST pattern_variable

- INITIAL|SEEK

--Implemented:
INITIAL

-- Not implemented:
SEEK

- DEFINE

-- Partially implemented row pattern navigation operations are PREV and
   NEXT. FIRST and LAST are not implemented.

-- The standard says PREV and NEXT accepts optional argument "offset"
   but it's not implemented.

-- The standard says the row pattern navigation operations can be
   nested but it's not implemented.

-- CLLASSIFIER, use of aggregate functions and subqueries in DEFINE
   clause are not implemented.

- PATTERN

-- Followings are implemented:
+: 1 or more rows
*: 0 or more rows

-- Followings are not implemented:
?: 0 or 1 row
A | B: OR condition
(A B): grouping
{n}: n rows
{n,}: n or more rows
{n,m}: greater or equal to n rows and less than or equal to m rows
{,m}: more than 0 and less than or equal to m rows
-------------------------------------------------------------------------

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Tatsuo Ishii <ishii@postgresql.org> — 2024-08-26T04:39:47Z

Attached are the v21 patches. Just rebased.
(The conflict was in 0001 patch.)

The 0008 patch is just for debugging purpose. You can ignore it.
This hasn't been changed, but I would like to notice just in case.

Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Tatsuo Ishii <ishii@postgresql.org> — 2024-09-19T04:59:47Z

Attached are the v22 patches. Just rebased.  The conflict was in 0001
patch due to commit 89f908a6d0 "Add temporal FOREIGN KEY contraints".

The 0008 patch is just for debugging purpose. You can ignore it.
This hasn't been changed, but I would like to notice just in case.

Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Jacob Champion <jacob.champion@enterprisedb.com> — 2024-09-27T22:27:07Z

On Wed, Sep 18, 2024 at 10:00 PM Tatsuo Ishii <ishii@postgresql.org> wrote:
>
> Attached are the v22 patches. Just rebased.

Thanks!

With some bigger partitions, I hit an `ERROR:  wrong pos: 1024`. A
test that reproduces it is attached.

While playing with the feature, I've been trying to identify runs of
matched rows by eye. But it's pretty difficult -- the best I can do is
manually count rows using a `COUNT(*) OVER ...`. So I'd like to
suggest that MEASURES be part of the eventual v1 feature, if there's
no other way to determine whether a row was skipped by a previous
match. (That was less obvious to me before the fix in v17.)

--

I've been working on an implementation [1] of SQL/RPR's "parenthesized
language" and preferment order. (These are defined in SQL/Foundation
2023, section 9.41.) The tool gives you a way to figure out, for a
given pattern, what matches are supposed to be attempted and in what
order:

    $ ./src/test/modules/rpr/rpr_prefer "a b? a"
    ( ( a ( b ) ) a )
    ( ( a ( ) ) a )

Many simple patterns result in an infinite set of possible matches. So
if you use an unbounded quantifiers, you have to also use --max-rows
to limit the size of the hypothetical window frame:

    $ ./src/test/modules/rpr/rpr_prefer --max-rows 2 "^ PERMUTE(a*, b+)? $"
    ( ( ^ ( ( ( ( ( ( a ) ( b ) ) ) - ) ) ) ) $ )
    ( ( ^ ( ( ( ( ( ( ) ( b b ) ) ) - ) ) ) ) $ )
    ( ( ^ ( ( ( ( ( ( ) ( b ) ) ) - ) ) ) ) $ )
    ( ( ^ ( ( ( - ( ( ( b b ) ( ) ) ) ) ) ) ) $ )
    ( ( ^ ( ( ( - ( ( ( b ) ( a ) ) ) ) ) ) ) $ )
    ( ( ^ ( ( ( - ( ( ( b ) ( ) ) ) ) ) ) ) $ )
    ( ( ^ ( ) ) $ )

I've found this useful to check my personal understanding of the spec
and the match behavior, but it could also potentially be used to
generate test cases, or to help users debug their own patterns. For
example, a pattern that has a bunch of duplicate sequences in its PL
is probably not very well optimized:

    $ ./src/test/modules/rpr/rpr_prefer --max-rows 4 "a+ a+"
    ( ( a a a ) ( a ) )
    ( ( a a ) ( a a ) )
    ( ( a a ) ( a ) )
    ( ( a ) ( a a a ) )
    ( ( a ) ( a a ) )
    ( ( a ) ( a ) )

And patterns with catastrophic backtracking behavior tend to show a
"sawtooth" pattern in the output, with a huge number of potential
matches being generated relative to the number of rows in the frame.

My implementation is really messy -- it leaks memory like a sieve, and
I cannibalized the parser from ECPG, which just ended up as an
exercise in teaching myself flex/bison. But if there's interest in
having this kind of tool in the tree, I can work on making it
reviewable. Either way, I should be able to use it to double-check
more complicated test cases.

A while back [2], you were wondering whether our Bison implementation
would be able to parse the PATTERN grammar directly. I think this tool
proves that the answer is "yes", but PERMUTE in particular causes a
shift/reduce conflict. To fix it, I applied the same precedence
workaround that we use for CUBE and ROLLUP.

Thanks again!
--Jacob

[1] https://github.com/jchampio/postgres/tree/dev/rpr
[2] https://www.postgresql.org/message-id/20230721.151648.412762379013769790.t-ishii%40sranhm.sra.co.jp

Re: Row pattern recognition

Tatsuo Ishii <ishii@postgresql.org> — 2024-09-28T10:43:59Z

> With some bigger partitions, I hit an `ERROR:  wrong pos: 1024`. A
> test that reproduces it is attached.

Thanks for the report. Attached is a patch on top of v22 patches to
fix the bug. We keep info in an array
(WindowAggState.reduced_frame_map) to track the rpr pattern match
result status for each row in a frame. If pattern match succeeds, the
first row in the reduced frame has status RF_FRAME_HEAD and rest of
rows have RF_SKIPPED state. A row with pattern match failure state has
RF_UNMATCHED state. Any row which is never tested has state
RF_NOT_DETERMINED. At begining the map is initialized with 1024
entries with all RF_NOT_DETERMINED state. Eventually they are replaced
with other than RF_NOT_DETERMINED state. In the error case rpr engine
tries to find 1024 th row's state in the map and failed because the
row's state has not been tested yet. I think we should treat it as
RF_NOT_DETERMINED rather than an error. Attached patch does it.

> While playing with the feature, I've been trying to identify runs of
> matched rows by eye. But it's pretty difficult -- the best I can do is
> manually count rows using a `COUNT(*) OVER ...`. So I'd like to
> suggest that MEASURES be part of the eventual v1 feature, if there's
> no other way to determine whether a row was skipped by a previous
> match. (That was less obvious to me before the fix in v17.)

I think implementing MEASURES is challenging. Especially we need to
find how our parser accepts "colname OVER
window_definition". Currently PostgreSQL's parser only accepts "func()
OVER window_definition" Even it is technically possible, I think the
v1 patch size will become much larger than now due to this.

How about inventing new window function that returns row state instead?

- match found (yes/no)
- skipped due to AFTER MATCH SKIP PAST LAST ROW (no match)

For the rest of the mail I need more time to understand. I will reply
back after studying it. For now, I just want to thank you for the
valuable information!

> --
> 
> I've been working on an implementation [1] of SQL/RPR's "parenthesized
> language" and preferment order. (These are defined in SQL/Foundation
> 2023, section 9.41.) The tool gives you a way to figure out, for a
> given pattern, what matches are supposed to be attempted and in what
> order:
> 
>     $ ./src/test/modules/rpr/rpr_prefer "a b? a"
>     ( ( a ( b ) ) a )
>     ( ( a ( ) ) a )
> 
> Many simple patterns result in an infinite set of possible matches. So
> if you use an unbounded quantifiers, you have to also use --max-rows
> to limit the size of the hypothetical window frame:
> 
>     $ ./src/test/modules/rpr/rpr_prefer --max-rows 2 "^ PERMUTE(a*, b+)? $"
>     ( ( ^ ( ( ( ( ( ( a ) ( b ) ) ) - ) ) ) ) $ )
>     ( ( ^ ( ( ( ( ( ( ) ( b b ) ) ) - ) ) ) ) $ )
>     ( ( ^ ( ( ( ( ( ( ) ( b ) ) ) - ) ) ) ) $ )
>     ( ( ^ ( ( ( - ( ( ( b b ) ( ) ) ) ) ) ) ) $ )
>     ( ( ^ ( ( ( - ( ( ( b ) ( a ) ) ) ) ) ) ) $ )
>     ( ( ^ ( ( ( - ( ( ( b ) ( ) ) ) ) ) ) ) $ )
>     ( ( ^ ( ) ) $ )
> 
> I've found this useful to check my personal understanding of the spec
> and the match behavior, but it could also potentially be used to
> generate test cases, or to help users debug their own patterns. For
> example, a pattern that has a bunch of duplicate sequences in its PL
> is probably not very well optimized:
> 
>     $ ./src/test/modules/rpr/rpr_prefer --max-rows 4 "a+ a+"
>     ( ( a a a ) ( a ) )
>     ( ( a a ) ( a a ) )
>     ( ( a a ) ( a ) )
>     ( ( a ) ( a a a ) )
>     ( ( a ) ( a a ) )
>     ( ( a ) ( a ) )
> 
> And patterns with catastrophic backtracking behavior tend to show a
> "sawtooth" pattern in the output, with a huge number of potential
> matches being generated relative to the number of rows in the frame.
> 
> My implementation is really messy -- it leaks memory like a sieve, and
> I cannibalized the parser from ECPG, which just ended up as an
> exercise in teaching myself flex/bison. But if there's interest in
> having this kind of tool in the tree, I can work on making it
> reviewable. Either way, I should be able to use it to double-check
> more complicated test cases.
> 
> A while back [2], you were wondering whether our Bison implementation
> would be able to parse the PATTERN grammar directly. I think this tool
> proves that the answer is "yes", but PERMUTE in particular causes a
> shift/reduce conflict. To fix it, I applied the same precedence
> workaround that we use for CUBE and ROLLUP.
> 
> Thanks again!
> --Jacob
> 
> [1] https://github.com/jchampio/postgres/tree/dev/rpr
> [2] https://www.postgresql.org/message-id/20230721.151648.412762379013769790.t-ishii%40sranhm.sra.co.jp

Re: Row pattern recognition

Tatsuo Ishii <ishii@postgresql.org> — 2024-09-30T00:07:51Z

>> While playing with the feature, I've been trying to identify runs of
>> matched rows by eye. But it's pretty difficult -- the best I can do is
>> manually count rows using a `COUNT(*) OVER ...`. So I'd like to
>> suggest that MEASURES be part of the eventual v1 feature, if there's
>> no other way to determine whether a row was skipped by a previous
>> match. (That was less obvious to me before the fix in v17.)
> 
> I think implementing MEASURES is challenging. Especially we need to
> find how our parser accepts "colname OVER
> window_definition". Currently PostgreSQL's parser only accepts "func()
> OVER window_definition" Even it is technically possible, I think the
> v1 patch size will become much larger than now due to this.
> 
> How about inventing new window function that returns row state instead?
> 
> - match found (yes/no)
> - skipped due to AFTER MATCH SKIP PAST LAST ROW (no match)

Please disregard my proposal. Even if we make such a function, it
would always return NULL for unmatched rows or skipped rows, and I
think the function does not solve your problem.

However, I wonder if supporting MEASURES solves the problem either
because any columns defined by MEASURES will return NULL except the
first row in a reduced frame. Can you please show an example how to
identify runs of matched rows using MEASURES?

> For the rest of the mail I need more time to understand. I will reply
> back after studying it. For now, I just want to thank you for the
> valuable information!
> 
>> --
>> 
>> I've been working on an implementation [1] of SQL/RPR's "parenthesized
>> language" and preferment order. (These are defined in SQL/Foundation
>> 2023, section 9.41.) The tool gives you a way to figure out, for a
>> given pattern, what matches are supposed to be attempted and in what
>> order:
>> 
>>     $ ./src/test/modules/rpr/rpr_prefer "a b? a"
>>     ( ( a ( b ) ) a )
>>     ( ( a ( ) ) a )
>> 
>> Many simple patterns result in an infinite set of possible matches. So
>> if you use an unbounded quantifiers, you have to also use --max-rows
>> to limit the size of the hypothetical window frame:
>> 
>>     $ ./src/test/modules/rpr/rpr_prefer --max-rows 2 "^ PERMUTE(a*, b+)? $"
>>     ( ( ^ ( ( ( ( ( ( a ) ( b ) ) ) - ) ) ) ) $ )
>>     ( ( ^ ( ( ( ( ( ( ) ( b b ) ) ) - ) ) ) ) $ )
>>     ( ( ^ ( ( ( ( ( ( ) ( b ) ) ) - ) ) ) ) $ )
>>     ( ( ^ ( ( ( - ( ( ( b b ) ( ) ) ) ) ) ) ) $ )
>>     ( ( ^ ( ( ( - ( ( ( b ) ( a ) ) ) ) ) ) ) $ )
>>     ( ( ^ ( ( ( - ( ( ( b ) ( ) ) ) ) ) ) ) $ )
>>     ( ( ^ ( ) ) $ )

I wonder how Oracle solves the problem (an infinite set of possible
matches) without using "--max-rows" or something like that because in
my understanding Oracle supports the regular expressions and PERMUTE.

>> I've found this useful to check my personal understanding of the spec
>> and the match behavior, but it could also potentially be used to
>> generate test cases, or to help users debug their own patterns. For
>> example, a pattern that has a bunch of duplicate sequences in its PL
>> is probably not very well optimized:
>> 
>>     $ ./src/test/modules/rpr/rpr_prefer --max-rows 4 "a+ a+"
>>     ( ( a a a ) ( a ) )
>>     ( ( a a ) ( a a ) )
>>     ( ( a a ) ( a ) )
>>     ( ( a ) ( a a a ) )
>>     ( ( a ) ( a a ) )
>>     ( ( a ) ( a ) )
>> 
>> And patterns with catastrophic backtracking behavior tend to show a
>> "sawtooth" pattern in the output, with a huge number of potential
>> matches being generated relative to the number of rows in the frame.
>> 
>> My implementation is really messy -- it leaks memory like a sieve, and
>> I cannibalized the parser from ECPG, which just ended up as an
>> exercise in teaching myself flex/bison. But if there's interest in
>> having this kind of tool in the tree, I can work on making it
>> reviewable. Either way, I should be able to use it to double-check
>> more complicated test cases.

I definitely am interested in the tool!

>> A while back [2], you were wondering whether our Bison implementation
>> would be able to parse the PATTERN grammar directly. I think this tool
>> proves that the answer is "yes", but PERMUTE in particular causes a
>> shift/reduce conflict. To fix it, I applied the same precedence
>> workaround that we use for CUBE and ROLLUP.

That's a good news!

Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Jacob Champion <jacob.champion@enterprisedb.com> — 2024-10-01T12:48:02Z

On Sun, Sep 29, 2024 at 5:08 PM Tatsuo Ishii <ishii@postgresql.org> wrote:
> > I think implementing MEASURES is challenging. Especially we need to
> > find how our parser accepts "colname OVER
> > window_definition". Currently PostgreSQL's parser only accepts "func()
> > OVER window_definition" Even it is technically possible, I think the
> > v1 patch size will become much larger than now due to this.

[resending, to the whole list this time]

Yeah. In any case, I'm not the right person to bolt MEASURES onto the
existing grammar... my misadventures in the PATTERN parser have
highlighted how little I know about Bison. :D

> Please disregard my proposal. Even if we make such a function, it
> would always return NULL for unmatched rows or skipped rows, and I
> think the function does not solve your problem.
>
> However, I wonder if supporting MEASURES solves the problem either
> because any columns defined by MEASURES will return NULL except the
> first row in a reduced frame. Can you please show an example how to
> identify runs of matched rows using MEASURES?

I think you're probably right; my suggestion can't distinguish between
skipped (but previously matched) rows and entirely-unmatched rows. The
test case I'd been working with returned an empty match as a fallback,
so it wouldn't have had that problem in practice. I was hoping that
one of the existing whole-partition window functions would allow me to
cobble something together based on the COUNT(*) measure, but after
searching for a while I haven't been able to come up with a solution.

Maybe it's just too niche for the window-function version of this --
after all, it only makes sense when using both INITIAL and AFTER MATCH
SKIP PAST LAST ROW. A more general solution could identify the
row_number of the first and last rows of the window frame, perhaps?
But a frame isn't guaranteed to be contiguous, so maybe that doesn't
make sense either. Ugh.

> I wonder how Oracle solves the problem (an infinite set of possible
> matches) without using "--max-rows" or something like that because in
> my understanding Oracle supports the regular expressions and PERMUTE.

I chose a confusing way to describe it, sorry. The parenthesized
language for a pattern can be an infinite set, because A+ could match
"( A )" or "( A A )" or "( A A A )" and so on forever. But that
doesn't apply to our regex engine in practice; our tables have a
finite number of rows, and I *think* the PL for a finite number of
rows is also finite, due to the complicated rules on where empty
matches are allowed to appear in the language. (In any case, my tool
doesn't guard against infinite recursion...)

> >> My implementation is really messy -- it leaks memory like a sieve, and
> >> I cannibalized the parser from ECPG, which just ended up as an
> >> exercise in teaching myself flex/bison. But if there's interest in
> >> having this kind of tool in the tree, I can work on making it
> >> reviewable. Either way, I should be able to use it to double-check
> >> more complicated test cases.
>
> I definitely am interested in the tool!

Okay, good to know! I will need to clean it up considerably, and
figure out whether I've duplicated more code than I should have.

> >> A while back [2], you were wondering whether our Bison implementation
> >> would be able to parse the PATTERN grammar directly. I think this tool
> >> proves that the answer is "yes", but PERMUTE in particular causes a
> >> shift/reduce conflict. To fix it, I applied the same precedence
> >> workaround that we use for CUBE and ROLLUP.
>
> That's a good news!

+1

Thanks,
--Jacob

Re: Row pattern recognition

Tatsuo Ishii <ishii@postgresql.org> — 2024-10-22T04:53:43Z

Hi,

I wonder how "PREV(col + 1)" is different from "PREV(col) + 1".
Currently my RPR implementation does not allow PREV(col + 1). If
"PREV(col + 1)" is different from "PREV(col) + 1", it maybe worthwhile
to implement "PREV(col + 1)".

Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Row pattern recognition

David G. Johnston <david.g.johnston@gmail.com> — 2024-10-22T05:29:34Z

On Monday, October 21, 2024, Tatsuo Ishii <ishii@postgresql.org> wrote:
>
> I wonder how "PREV(col + 1)" is different from "PREV(col) + 1".
> Currently my RPR implementation does not allow PREV(col + 1). If
> "PREV(col + 1)" is different from "PREV(col) + 1", it maybe worthwhile
> to implement "PREV(col + 1)".
>

Interesting feature that I’m now just seeing.

The expression PREV(column_name) produces a value output taken from the
given named column in the preceding frame row.  It doesn’t make any sense
to me to attempt to add the integer 1 to an identifier that is being used
as a value input to a “function”.  It would also seem quite odd if “+ 1”
had something to do with row selection as opposed to simply being an
operator “+(column_name%type, integer)” expression.

Maybe RPR is defining something special here I haven't yet picked up on, in
which case just ignore this.  But if I read: “UP as price > prev(price +
1)” in the opening example it would be quite non-intuitive to reason out
the meaning.  “Price > prev(price) + 1” would mean my current row is at
least one (e.g. dollar per share) more than the value of the previous
period.

David J.

Re: Row pattern recognition

Tatsuo Ishii <ishii@postgresql.org> — 2024-10-22T10:19:09Z

> On Monday, October 21, 2024, Tatsuo Ishii <ishii@postgresql.org> wrote:
>>
>> I wonder how "PREV(col + 1)" is different from "PREV(col) + 1".
>> Currently my RPR implementation does not allow PREV(col + 1). If
>> "PREV(col + 1)" is different from "PREV(col) + 1", it maybe worthwhile
>> to implement "PREV(col + 1)".
>>
> 
> Interesting feature that I’m now just seeing.
> 
> The expression PREV(column_name) produces a value output taken from the
> given named column in the preceding frame row.  It doesn’t make any sense
> to me to attempt to add the integer 1 to an identifier that is being used
> as a value input to a “function”.  It would also seem quite odd if “+ 1”
> had something to do with row selection as opposed to simply being an
> operator “+(column_name%type, integer)” expression.

According to the ISO/IEC 9075-2:2016, 6.26 <row pattern navigation
operation> (I don't have access to SQL 2023) PREV (and NEXT) is
defined as:

<row pattern navigation: physical> ::=
 <prev or next> <left paren> <value expression> [ <comma> <physical offset> ] <right paren>

(Besides <row pattern navigation: physical>, there are <row pattern
navigation: logical> and <row pattern navigation: compound> but I
ignore them here).

So PREV's first argument is a value expression (VE). VE shall contain
at least one row pattern column reference. <set function
specification>, <window function specification> or <row patter
navigation operation> are not permitted.

From this, I don't see any reason PREV(column_name + 1) is prohibited
unless I miss something.

I think even PREV(column_name1 + column_name2) is possible. I see
similar example in ISO/IEC 19075-5:2021, 5.6.2 "PREV and NEXT".

> Maybe RPR is defining something special here I haven't yet picked up on, in
> which case just ignore this.  But if I read: “UP as price > prev(price +
> 1)” in the opening example it would be quite non-intuitive to reason out
> the meaning.  “Price > prev(price) + 1” would mean my current row is at
> least one (e.g. dollar per share) more than the value of the previous
> period.

Acording to ISO/IEC 9075-2:2016 "4.21.2 Row pattern navigation operations",

  <row pattern navigation operation> evaluates a <value expression> VE
  in a row NR, which may be different than current row CR.

From this I think PREV(col + 1) should be interpreted as:

1. go to the previous row.
2. evaluate "col + 1" at the current row (that was previous row).
3. return the result.

If my understanding is correct, prev(price + 1) has the same meaning
as prev(price) + 1.

Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Vik Fearing <vik@postgresfriends.org> — 2024-10-22T13:12:43Z

On 22/10/2024 12:19, Tatsuo Ishii wrote:
> Acording to ISO/IEC 9075-2:2016 "4.21.2 Row pattern navigation operations",
>
>    <row pattern navigation operation> evaluates a <value expression> VE
>    in a row NR, which may be different than current row CR.
>
>  From this I think PREV(col + 1) should be interpreted as:
>
> 1. go to the previous row.
> 2. evaluate "col + 1" at the current row (that was previous row).
> 3. return the result.
>
> If my understanding is correct, prev(price + 1) has the same meaning
> as prev(price) + 1.



This is how I read the specification also.

-- 

Vik Fearing

Re: Row pattern recognition

David G. Johnston <david.g.johnston@gmail.com> — 2024-10-22T14:19:41Z

On Tue, Oct 22, 2024 at 6:12 AM Vik Fearing <vik@postgresfriends.org> wrote:

>
> On 22/10/2024 12:19, Tatsuo Ishii wrote:
>
> Acording to ISO/IEC 9075-2:2016 "4.21.2 Row pattern navigation operations",
>
>   <row pattern navigation operation> evaluates a <value expression> VE
>   in a row NR, which may be different than current row CR.
>
> From this I think PREV(col + 1) should be interpreted as:
>
> 1. go to the previous row.
> 2. evaluate "col + 1" at the current row (that was previous row).
> 3. return the result.
>
> If my understanding is correct, prev(price + 1) has the same meaning
> as prev(price) + 1.
>
>
>
> This is how I read the specification also.
>
>
>
That makes sense.  Definitely much nicer to only have to write PREV once if
the expression you are evaluating involves multiple columns.  And is also
consistent with window function "value" behavior.

Thanks!

David J.

Re: Row pattern recognition

Tatsuo Ishii <ishii@postgresql.org> — 2024-10-25T04:04:53Z

> On Tue, Oct 22, 2024 at 6:12 AM Vik Fearing <vik@postgresfriends.org> wrote:
> 
>>
>> On 22/10/2024 12:19, Tatsuo Ishii wrote:
>>
>> Acording to ISO/IEC 9075-2:2016 "4.21.2 Row pattern navigation operations",
>>
>>   <row pattern navigation operation> evaluates a <value expression> VE
>>   in a row NR, which may be different than current row CR.
>>
>> From this I think PREV(col + 1) should be interpreted as:
>>
>> 1. go to the previous row.
>> 2. evaluate "col + 1" at the current row (that was previous row).
>> 3. return the result.
>>
>> If my understanding is correct, prev(price + 1) has the same meaning
>> as prev(price) + 1.
>>
>>
>>
>> This is how I read the specification also.
>>
>>
>>
> That makes sense.  Definitely much nicer to only have to write PREV once if
> the expression you are evaluating involves multiple columns.  And is also
> consistent with window function "value" behavior.

Thanks to all who joined the discussion. I decided to support PREV and
NEXT in my RPR patches to allow to have multiple columns and other
expressions in their argument. e.g.

CREATE TEMP TABLE rpr1 (id INTEGER, i SERIAL, j INTEGER);
INSERT INTO rpr1(id, j) SELECT 1, g*2 FROM generate_series(1, 10) AS g;
SELECT id, i, j, count(*) OVER w
 FROM rpr1
 WINDOW w AS (
 PARTITION BY id
 ORDER BY i
 ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
 AFTER MATCH SKIP PAST LAST ROW
 INITIAL
 PATTERN (START COND+)
 DEFINE
  START AS TRUE,
  COND AS PREV(i + j + 1) < 10
);
 id | i  | j  | count 
----+----+----+-------
  1 |  1 |  2 |     3
  1 |  2 |  4 |     0
  1 |  3 |  6 |     0
  1 |  4 |  8 |     0
  1 |  5 | 10 |     0
  1 |  6 | 12 |     0
  1 |  7 | 14 |     0
  1 |  8 | 16 |     0
  1 |  9 | 18 |     0
  1 | 10 | 20 |     0
(10 rows)

Attached are the v23 patches. V23 also includes the fix for the problem
pointed out by Jacob Champion and test cases from him. Thank you, Jacob.
https://www.postgresql.org/message-id/CAOYmi%2Bns3kHjC83ap_BCfJCL0wfO5BJ_sEByOEpgNOrsPhqQTg%40mail.gmail.com

Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Tatsuo Ishii <ishii@postgresql.org> — 2024-12-19T06:19:50Z

I have looked into the performance of current RPR implementation,
especially when the number of rows in a reduced frame is large (like
over 10k). Below is a simple benchmark against pgbench database. The
SQL will produce a reduced frame having 10k rows.

EXPLAIN (ANALYZE)
SELECT aid, bid, count(*) OVER w
FROM pgbench_accounts WHERE aid <= 10000
WINDOW w AS (
PARTITION BY bid
ORDER BY aid
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
AFTER MATCH SKIP PAST LAST ROW
INITIAL
PATTERN (START UP+)
DEFINE
START AS TRUE,
UP AS aid > PREV(aid)
);

This took 722 ms on my laptop. It's not very quick. Moreover, if I
expand the reduced frame size to 100k (aid <= 100000), OOM killer
triggered. I looked into the code and found that do_pattern_match in
nodeWindowAgg.c is one of the major problems. It calls regexp_instr to
know whether the regular expression derived from a PATTERN clause
(e.g. "ab+c+") matches an encoded row pattern variable string
(e.g. "abbcc"). The latter string could be quite long: the length
could be as same as the number of rows in the reduced frame. Thus, The
length could become 100k if the frame size is 100k. Unfortunately
regexp_instr needs to allocate and convert the input string to wchar
(it's 4-byte long for each character), which uses 4x space bigger than
the original input string. In RPR case the input string is always
ASCII and does not need to be converted to wchar. So I decided to
switch to the standard regular expression engine coming with OS. With
this change, I got 2x speed up in the 10k case.

v23 patch: 722.618 ms (average of 3 runs)
new patch: 322.5913 ms (average of 3 runs)

Also I tried the 100k rows reduced frame case. It was slow (took 26
seconds) but it completed without OOM killer.  Attached is the
patch. The change was in 0005 only. Other patches were not changed
from v23.

Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Tatsuo Ishii <ishii@postgresql.org> — 2024-12-21T09:20:04Z

> I have looked into the performance of current RPR implementation,
> especially when the number of rows in a reduced frame is large (like
> over 10k). Below is a simple benchmark against pgbench database. The
> SQL will produce a reduced frame having 10k rows.
> 
> EXPLAIN (ANALYZE)
> SELECT aid, bid, count(*) OVER w
> FROM pgbench_accounts WHERE aid <= 10000
> WINDOW w AS (
> PARTITION BY bid
> ORDER BY aid
> ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
> AFTER MATCH SKIP PAST LAST ROW
> INITIAL
> PATTERN (START UP+)
> DEFINE
> START AS TRUE,
> UP AS aid > PREV(aid)
> );
> 
> This took 722 ms on my laptop. It's not very quick. Moreover, if I
> expand the reduced frame size to 100k (aid <= 100000), OOM killer
> triggered. I looked into the code and found that do_pattern_match in
> nodeWindowAgg.c is one of the major problems. It calls regexp_instr to
> know whether the regular expression derived from a PATTERN clause
> (e.g. "ab+c+") matches an encoded row pattern variable string
> (e.g. "abbcc"). The latter string could be quite long: the length
> could be as same as the number of rows in the reduced frame. Thus, The
> length could become 100k if the frame size is 100k. Unfortunately
> regexp_instr needs to allocate and convert the input string to wchar
> (it's 4-byte long for each character), which uses 4x space bigger than
> the original input string. In RPR case the input string is always
> ASCII and does not need to be converted to wchar. So I decided to
> switch to the standard regular expression engine coming with OS. With
> this change, I got 2x speed up in the 10k case.
> 
> v23 patch: 722.618 ms (average of 3 runs)
> new patch: 322.5913 ms (average of 3 runs)
> 
> Also I tried the 100k rows reduced frame case. It was slow (took 26
> seconds) but it completed without OOM killer.  Attached is the
> patch. The change was in 0005 only. Other patches were not changed
> from v23.

The CFBot starts complaining about the patch. It fails in Windows
environment test because regex.h does not exist. I concluded that on
Windows it's not a good idea to use the standard regexp library (I am
not familiar with Windows. If my thought is not correct, please let me
know). So I switched to the PostgreSQL's builtin core regexp
library. Although the interface requires to use pg_wchar which spends
4x memory comparing with the standard regexp (that's why I wanted to
avoid using it), the result seems to be not so bad. It consumes only
10MB or so more memory when processing 100k rows in a frame. Good news
is, it runs slightly faster than the standard regexp (19 vs. 26
seconds).

Attached is the v25 patch to use the PostgreSQL's regexp library.
Most changes are in nodeWindowAgg.c, which is in the 5th patch.

In the patches I also fixed some memory leaks and run pgindent with
updated typedefs.list. Now the patch includes a patch for
typedefs.list (the 8th patch).

Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Tatsuo Ishii <ishii@postgresql.org> — 2024-12-30T13:37:18Z

I have added further optimization to the v25 patch.

While generating possible input strings that may satisfy the pattern
string, it is possible to omit to run regexp in some cases. Since
regexp matching is heavy operation, especially if it is applied to
long string, it is beneficial for RPR to reduce the number of regexp
runs.

If the tail pattern variable has '+' quantifier and previously the
input string was confirmed to be matched the pattern string, and the
same character as the tail pattern string is added, we don't need run
regexp match again because the new input string surely matches the
pattern string. Suppose a pattern string is "ab+" and the current
input string is "ab" (this satisfies "ab+"). If the new input string
is "abb", then "abb" surely matches "ab+" too and we don't need to run
regexp again.

In v26 patch, by using the technique above I get performance
improvement.

>> EXPLAIN (ANALYZE)
>> SELECT aid, bid, count(*) OVER w
>> FROM pgbench_accounts WHERE aid <= 10000
>> WINDOW w AS (
>> PARTITION BY bid
>> ORDER BY aid
>> ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
>> AFTER MATCH SKIP PAST LAST ROW
>> INITIAL
>> PATTERN (START UP+)
>> DEFINE
>> START AS TRUE,
>> UP AS aid > PREV(aid)
>> );

This SQL took 322.5913 ms (average in 3 runs) in v24. With v26 patch,
it takes 41.84 ms, which is over 7 times improvement. Also I run the
SQL in 100k row case. v23 took 26 seconds. With the v26 patch it takes
1195.603 ms, which is over 21 times improvement.

I think a pattern string ended up with '+' is one of common use cases
of RPR, and I believe the improvement is useful for many RPR
applications.

I also add following changes to v25.

- Fix do_pattern_match to use the top memory context to store compiled
  re cache. Before it was in per query memory context. This causes a
  trouble because do_pattern_match checks the cache existence using
  a static variable.

- Refactor search_str_set, which is a workhorse of pattern matching,
  into multiple functions to understand the logic easier.
  
Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Tatsuo Ishii <ishii@postgresql.org> — 2024-12-30T23:57:07Z

> I have added further optimization to the v25 patch.
> 
> While generating possible input strings that may satisfy the pattern
> string, it is possible to omit to run regexp in some cases. Since
> regexp matching is heavy operation, especially if it is applied to
> long string, it is beneficial for RPR to reduce the number of regexp
> runs.
> 
> If the tail pattern variable has '+' quantifier and previously the
> input string was confirmed to be matched the pattern string, and the
> same character as the tail pattern string is added, we don't need run
> regexp match again because the new input string surely matches the
> pattern string. Suppose a pattern string is "ab+" and the current
> input string is "ab" (this satisfies "ab+"). If the new input string
> is "abb", then "abb" surely matches "ab+" too and we don't need to run
> regexp again.
> 
> In v26 patch, by using the technique above I get performance
> improvement.
> 
>>> EXPLAIN (ANALYZE)
>>> SELECT aid, bid, count(*) OVER w
>>> FROM pgbench_accounts WHERE aid <= 10000
>>> WINDOW w AS (
>>> PARTITION BY bid
>>> ORDER BY aid
>>> ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
>>> AFTER MATCH SKIP PAST LAST ROW
>>> INITIAL
>>> PATTERN (START UP+)
>>> DEFINE
>>> START AS TRUE,
>>> UP AS aid > PREV(aid)
>>> );
> 
> This SQL took 322.5913 ms (average in 3 runs) in v24. With v26 patch,
> it takes 41.84 ms, which is over 7 times improvement. Also I run the
> SQL in 100k row case. v23 took 26 seconds. With the v26 patch it takes
> 1195.603 ms, which is over 21 times improvement.
> 
> I think a pattern string ended up with '+' is one of common use cases
> of RPR, and I believe the improvement is useful for many RPR
> applications.
> 
> I also add following changes to v25.
> 
> - Fix do_pattern_match to use the top memory context to store compiled
>   re cache. Before it was in per query memory context. This causes a
>   trouble because do_pattern_match checks the cache existence using
>   a static variable.
> 
> - Refactor search_str_set, which is a workhorse of pattern matching,
>   into multiple functions to understand the logic easier.

CFBot complains a compiler error in the v26 patch.
Attached is v27 patch to fix this. Also some typo in comment are fixed.

Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Tatsuo Ishii <ishii@postgresql.org> — 2025-01-11T05:46:11Z

Attached are the v28 patches to implement a subset of Row Pattern
Recognition feature defined in the SQL standard.

In this patch set:

- Reduce the patch size. Comparing with v27, the patch size is trimmed
  from 5296 lines down to 5073 lines. This is mainly achieved in the
  raw parser patch so that non implemented features are removed from
  the grammar file.

- Use newly introduced makeStringInfoExt() instead of makeStringInfo()
  in nodeWindowAgg.c, which removes a few unnecessary codes and should
  give slightly better performance.

- Fix bugs in the doc and add it more description regarding RPR
  syntax.

Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Tatsuo Ishii <ishii@postgresql.org> — 2025-03-13T12:51:29Z

Attached are the v29 patch sets to implement the subset of row pattern
recognition feature defined in the SQL standard.

In this patch set:

- Adjust 0003 and 0004 to deal with the recent changes made by 8b1b342.
 
- Add tests to 0007 for CREATE VIEW/pg_get_viewdef.

Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Tatsuo Ishii <ishii@postgresql.org> — 2025-04-22T23:55:33Z

Attached are the v30 patches, just adding a patch to change the
default io_method parameter to sync, expecting this affects something
to the recent CFbot failure.
https://commitfest.postgresql.org/patch/4460/
https://cirrus-ci.com/task/6078653164945408
which is similar to:
https://www.postgresql.org/message-id/20250422.111139.1502127917165838403.ishii%40postgresql.org

(Once the issue resolved, the patch should be removed, of course)

Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Tatsuo Ishii <ishii@postgresql.org> — 2025-04-23T01:07:29Z

> Attached are the v30 patches, just adding a patch to change the
> default io_method parameter to sync, expecting this affects something
> to the recent CFbot failure.
> https://commitfest.postgresql.org/patch/4460/
> https://cirrus-ci.com/task/6078653164945408
> which is similar to:
> https://www.postgresql.org/message-id/20250422.111139.1502127917165838403.ishii%40postgresql.org

CFBot has just run tests against v30 patches and now it turns to green
again!  I guess io_method = sync definitely did the trick.  Note that
previously only the Windows Server 2019, VS 2019 - Neason & ninja test
had failed.

Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Tatsuo Ishii <ishii@postgresql.org> — 2025-05-06T00:53:51Z

> Attached are the v30 patches, just adding a patch to change the
> default io_method parameter to sync, expecting this affects something
> to the recent CFbot failure.
> https://commitfest.postgresql.org/patch/4460/
> https://cirrus-ci.com/task/6078653164945408
> which is similar to:
> https://www.postgresql.org/message-id/20250422.111139.1502127917165838403.ishii%40postgresql.org
> 
> (Once the issue resolved, the patch should be removed, of course)

It seems this has turned to green since May 2, 2025.
https://commitfest.postgresql.org/patch/5679/.

The last time it turned to red was April 16, 2025. Maybe something
committed between the period solved the cause of red, but I don't know
exactly which commit.

Anyway v31 patches now remove the change to default io_method
parameter to see if it passes Windows Server 2019, VS 2019 - Meson &
ninja test.

Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Tatsuo Ishii <ishii@postgresql.org> — 2025-05-06T07:28:42Z

>> Attached are the v30 patches, just adding a patch to change the
>> default io_method parameter to sync, expecting this affects something
>> to the recent CFbot failure.
>> https://commitfest.postgresql.org/patch/4460/
>> https://cirrus-ci.com/task/6078653164945408
>> which is similar to:
>> https://www.postgresql.org/message-id/20250422.111139.1502127917165838403.ishii%40postgresql.org
>> 
>> (Once the issue resolved, the patch should be removed, of course)
> 
> It seems this has turned to green since May 2, 2025.
> https://commitfest.postgresql.org/patch/5679/.
> 
> The last time it turned to red was April 16, 2025. Maybe something
> committed between the period solved the cause of red, but I don't know
> exactly which commit.
> 
> Anyway v31 patches now remove the change to default io_method
> parameter to see if it passes Windows Server 2019, VS 2019 - Meson &
> ninja test.

Now I see it passes the test.
https://cirrus-ci.com/build/5671612202090496

Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Tatsuo Ishii <ishii@postgresql.org> — 2025-08-16T08:45:52Z

Attached are the v32 patches for Row pattern recognition.  Recent
changes to doc/src/sgml/func.sgml required v31 to be rebased. Other
than that, nothing has been changed.

Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Tatsuo Ishii <ishii@postgresql.org> — 2025-09-24T10:35:23Z

Attached are the v33 patches for Row pattern recognition.  The
difference from v32 is that the raw parse tree printing patch is not
included in v33. This is because now that we have
debug_print_raw_parse.

Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Tatsuo Ishii <ishii@postgresql.org> — 2025-11-17T06:57:15Z

Attached are the v34 patches for Row pattern recognition.
Notihing has been changed. Just rebased.

Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Tatsuo Ishii <ishii@postgresql.org> — 2025-11-18T02:33:20Z

Attached are the v35 patches for Row pattern recognition.  In v34-0001
gram.y patch, %type for RPR were misplaced. v35-0001 fixes this. Other
patches are not changed.

Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Chao Li <li.evan.chao@gmail.com> — 2025-11-18T05:03:08Z

Hi Tatsuo-san,

I just reviewed 0006 docs changes and 0001. I plan to slowly review the patch, maybe one commit a day. 

> On Nov 18, 2025, at 10:33, Tatsuo Ishii <ishii@postgresql.org> wrote:
> 
> Attached are the v35 patches for Row pattern recognition.  In v34-0001
> gram.y patch, %type for RPR were misplaced. v35-0001 fixes this. Other
> patches are not changed.
> 
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS K.K.
> English: http://www.sraoss.co.jp/index_en/
> Japanese:http://www.sraoss.co.jp
> <v35-0001-Row-pattern-recognition-patch-for-raw-parser.patch><v35-0002-Row-pattern-recognition-patch-parse-analysis.patch><v35-0003-Row-pattern-recognition-patch-rewriter.patch><v35-0004-Row-pattern-recognition-patch-planner.patch><v35-0005-Row-pattern-recognition-patch-executor.patch><v35-0006-Row-pattern-recognition-patch-docs.patch><v35-0007-Row-pattern-recognition-patch-tests.patch><v35-0008-Row-pattern-recognition-patch-typedefs.list.patch>

I got a few comments, maybe just questions:

1 - 0001 - kwlist.h
```
+PG_KEYWORD("define", DEFINE, RESERVED_KEYWORD, BARE_LABEL)
```

Why do we add “define” as a reserved keyword? From the SQL example you put in 0006:
```
<programlisting>
SELECT company, tdate, price,
 first_value(price) OVER w,
 max(price) OVER w,
 count(price) OVER w
FROM stock
 WINDOW w AS (
 PARTITION BY company
 ORDER BY tdate
 ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
 AFTER MATCH SKIP PAST LAST ROW
 INITIAL
 PATTERN (LOWPRICE UP+ DOWN+)
 DEFINE
  LOWPRICE AS price &lt;= 100,
  UP AS price &gt; PREV(price),
  DOWN AS price &lt; PREV(price)
);
</programlisting>
```

PARTITION is at the same level as DEFINE, but it’s not defined as a reserved keyword:
```
PG_KEYWORD("partition", PARTITION, UNRESERVED_KEYWORD, BARE_LABEL)
```

Even in this patch,”initial”,”past”, “pattern” and “seek” are defined as unreserved, why?  

So I just want to clarify.

2 - 0001 - gram.y
```
opt_row_pattern_initial_or_seek:
			INITIAL		{ $$ = true; }
			| SEEK
				{
					ereport(ERROR,
							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
							 errmsg("SEEK is not supported"),
							 errhint("Use INITIAL instead."),
							 parser_errposition(@1)));
				}
			| /*EMPTY*/		{ $$ = true; }
		;
```

As SEEK is specially listed here, I guess it might be supported in future. If that is true, would it be better to defer the semantic check to later parse phase, which would future work easier.

3 - 0001 - parsenodes.h
```
+/*
+ * RowPatternCommonSyntax - raw representation of row pattern common syntax
+ *
+ */
+typedef struct RPCommonSyntax
+{
+	NodeTag		type;
+	RPSkipTo	rpSkipTo;		/* Row Pattern AFTER MATCH SKIP type */
+	bool		initial;		/* true if <row pattern initial or seek> is
+								 * initial */
+	List	   *rpPatterns;		/* PATTERN variables (list of A_Expr) */
+	List	   *rpDefs;			/* row pattern definitions clause (list of
+								 * ResTarget) */
+} RPCommonSyntax;
+
 /*
  * WindowDef - raw representation of WINDOW and OVER clauses
  *
@@ -593,6 +618,7 @@ typedef struct WindowDef
 	char	   *refname;		/* referenced window name, if any */
 	List	   *partitionClause;	/* PARTITION BY expression list */
 	List	   *orderClause;	/* ORDER BY (list of SortBy) */
+	RPCommonSyntax *rpCommonSyntax; /* row pattern common syntax */
```

RP fields are directly defined in WindowClause, then why do we need a wrapper of RPCommonSyntax? Can we directly define RP fields in WindowRef as well?

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Re: Row pattern recognition

Chao Li <li.evan.chao@gmail.com> — 2025-11-18T05:05:52Z


> On Nov 18, 2025, at 13:03, Chao Li <li.evan.chao@gmail.com> wrote:
> 
> Hi Tatsuo-san,
> 
> I just reviewed 0006 docs changes and 0001. I plan to slowly review the patch, maybe one commit a day. 
> 
>> On Nov 18, 2025, at 10:33, Tatsuo Ishii <ishii@postgresql.org> wrote:
>> 
>> Attached are the v35 patches for Row pattern recognition.  In v34-0001
>> gram.y patch, %type for RPR were misplaced. v35-0001 fixes this. Other
>> patches are not changed.
>> 
>> Best regards,
>> --
>> Tatsuo Ishii
>> SRA OSS K.K.
>> English: http://www.sraoss.co.jp/index_en/
>> Japanese:http://www.sraoss.co.jp
>> <v35-0001-Row-pattern-recognition-patch-for-raw-parser.patch><v35-0002-Row-pattern-recognition-patch-parse-analysis.patch><v35-0003-Row-pattern-recognition-patch-rewriter.patch><v35-0004-Row-pattern-recognition-patch-planner.patch><v35-0005-Row-pattern-recognition-patch-executor.patch><v35-0006-Row-pattern-recognition-patch-docs.patch><v35-0007-Row-pattern-recognition-patch-tests.patch><v35-0008-Row-pattern-recognition-patch-typedefs.list.patch>
> 
> I got a few comments, maybe just questions:
> 
> 1 - 0001 - kwlist.h
> ```
> +PG_KEYWORD("define", DEFINE, RESERVED_KEYWORD, BARE_LABEL)
> ```
> 
> Why do we add “define” as a reserved keyword? From the SQL example you put in 0006:
> ```
> <programlisting>
> SELECT company, tdate, price,
> first_value(price) OVER w,
> max(price) OVER w,
> count(price) OVER w
> FROM stock
> WINDOW w AS (
> PARTITION BY company
> ORDER BY tdate
> ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
> AFTER MATCH SKIP PAST LAST ROW
> INITIAL
> PATTERN (LOWPRICE UP+ DOWN+)
> DEFINE
>  LOWPRICE AS price &lt;= 100,
>  UP AS price &gt; PREV(price),
>  DOWN AS price &lt; PREV(price)
> );
> </programlisting>
> ```
> 
> PARTITION is at the same level as DEFINE, but it’s not defined as a reserved keyword:
> ```
> PG_KEYWORD("partition", PARTITION, UNRESERVED_KEYWORD, BARE_LABEL)
> ```
> 
> Even in this patch,”initial”,”past”, “pattern” and “seek” are defined as unreserved, why?  
> 
> So I just want to clarify.
> 
> 2 - 0001 - gram.y
> ```
> opt_row_pattern_initial_or_seek:
> INITIAL { $$ = true; }
> | SEEK
> {
> ereport(ERROR,
> (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> errmsg("SEEK is not supported"),
> errhint("Use INITIAL instead."),
> parser_errposition(@1)));
> }
> | /*EMPTY*/ { $$ = true; }
> ;
> ```
> 
> As SEEK is specially listed here, I guess it might be supported in future. If that is true, would it be better to defer the semantic check to later parse phase, which would future work easier.
> 
> 3 - 0001 - parsenodes.h
> ```
> +/*
> + * RowPatternCommonSyntax - raw representation of row pattern common syntax
> + *
> + */
> +typedef struct RPCommonSyntax
> +{
> + NodeTag type;
> + RPSkipTo rpSkipTo; /* Row Pattern AFTER MATCH SKIP type */
> + bool initial; /* true if <row pattern initial or seek> is
> + * initial */
> + List   *rpPatterns; /* PATTERN variables (list of A_Expr) */
> + List   *rpDefs; /* row pattern definitions clause (list of
> + * ResTarget) */
> +} RPCommonSyntax;
> +
> /*
>  * WindowDef - raw representation of WINDOW and OVER clauses
>  *
> @@ -593,6 +618,7 @@ typedef struct WindowDef
> char   *refname; /* referenced window name, if any */
> List   *partitionClause; /* PARTITION BY expression list */
> List   *orderClause; /* ORDER BY (list of SortBy) */
> + RPCommonSyntax *rpCommonSyntax; /* row pattern common syntax */
> ```
> 
> RP fields are directly defined in WindowClause, then why do we need a wrapper of RPCommonSyntax? Can we directly define RP fields in WindowRef as well?
> 

4 - 0001 - parsenodes.h
```
+	/* Row Pattern AFTER MACH SKIP clause */
```

Typo: MACH -> MATCH

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Re: Row pattern recognition

Vik Fearing <vik@postgresfriends.org> — 2025-11-18T11:19:46Z

On 18/11/2025 06:03, Chao Li wrote:
> 1 - 0001 - kwlist.h
> ```
> +PG_KEYWORD("define", DEFINE, RESERVED_KEYWORD, BARE_LABEL)
> ```
>
> Why do we add “define” as a reserved keyword? From the SQL example you put in 0006:
> ```
> <programlisting>
> SELECT company, tdate, price,
>   first_value(price) OVER w,
>   max(price) OVER w,
>   count(price) OVER w
> FROM stock
>   WINDOW w AS (
>   PARTITION BY company
>   ORDER BY tdate
>   ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
>   AFTER MATCH SKIP PAST LAST ROW
>   INITIAL
>   PATTERN (LOWPRICE UP+ DOWN+)
>   DEFINE
>    LOWPRICE AS price &lt;= 100,
>    UP AS price &gt; PREV(price),
>    DOWN AS price &lt; PREV(price)
> );
> </programlisting>
> ```
>
> PARTITION is at the same level as DEFINE, but it’s not defined as a reserved keyword:
> ```
> PG_KEYWORD("partition", PARTITION, UNRESERVED_KEYWORD, BARE_LABEL)
> ```
>
> Even in this patch,”initial”,”past”, “pattern” and “seek” are defined as unreserved, why?
>
> So I just want to clarify.


Because of position. Without making DEFINE a reserved keyword, how do 
you know that it isn't another variable in the PATTERN clause?

-- 

Vik Fearing

Re: Row pattern recognition

Chao Li <li.evan.chao@gmail.com> — 2025-11-19T04:14:03Z


> On Nov 18, 2025, at 19:19, Vik Fearing <vik@postgresfriends.org> wrote:
> 
> 
> Because of position. Without making DEFINE a reserved keyword, how do you know that it isn't another variable in the PATTERN clause?
> 

Ah, thanks for the clarification. Now I got it.

I’m continue to review 0002.

5 - 0002 - parse_clause.c
```
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("FRAME must start at current row when row patttern recognition is used")));
```

Nit typo: patttern -> pattern

6 - 0002 - parse_clause.c
```
+	/* DEFINE variable name initials */
+	static char *defineVariableInitials = "abcdefghijklmnopqrstuvwxyz”;
```

This string can also be const, so “static const char *”

7 - 0002 - parse_clause.c
```
+	/*
+	 * Create list of row pattern DEFINE variable name's initial. We assign
+	 * [a-z] to them (up to 26 variable names are allowed).
+	 */
+	restargets = NIL;
+	i = 0;
+	initialLen = strlen(defineVariableInitials);
+
+	foreach(lc, windef->rpCommonSyntax->rpDefs)
+	{
+		char		initial[2];
+
+		restarget = (ResTarget *) lfirst(lc);
+		name = restarget->name;
```

Looks like “name” is not used inside “foreach”.

8 - 0002 - parse_clause.c
```
+	/*
+	 * Create list of row pattern DEFINE variable name's initial. We assign
+	 * [a-z] to them (up to 26 variable names are allowed).
+	 */
+	restargets = NIL;
+	i = 0;
+	initialLen = strlen(defineVariableInitials);
+
+	foreach(lc, windef->rpCommonSyntax->rpDefs)
+	{
+		char		initial[2];
```

I guess this “foreach” for build initial list can be combined into the previous foreach loop of checking duplication. If an def has no dup, then assign an initial to it.

9 - 0002 - parse_clause.c
```
+static void
+transformPatternClause(ParseState *pstate, WindowClause *wc,
+					   WindowDef *windef)
+{
+	ListCell   *lc;
+
+	/*
+	 * Row Pattern Common Syntax clause exists?
+	 */
+	if (windef->rpCommonSyntax == NULL)
+		return;
```

Checking  (windef->rpCommonSyntax == NULL) seems a duplicate, because transformPatternClause() is only called by transformRPR(), and transformRPR() has done the check.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Re: Row pattern recognition

Tatsuo Ishii <ishii@postgresql.org> — 2025-11-19T06:51:02Z

> On 18/11/2025 06:03, Chao Li wrote:
>> 1 - 0001 - kwlist.h
>> ```
>> +PG_KEYWORD("define", DEFINE, RESERVED_KEYWORD, BARE_LABEL)
>> ```
>>
>> Why do we add “define” as a reserved keyword? From the SQL example
>> you put in 0006:
>> ```
>> <programlisting>
>> SELECT company, tdate, price,
>>   first_value(price) OVER w,
>>   max(price) OVER w,
>>   count(price) OVER w
>> FROM stock
>>   WINDOW w AS (
>>   PARTITION BY company
>>   ORDER BY tdate
>>   ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
>>   AFTER MATCH SKIP PAST LAST ROW
>>   INITIAL
>>   PATTERN (LOWPRICE UP+ DOWN+)
>>   DEFINE
>>    LOWPRICE AS price &lt;= 100,
>>    UP AS price &gt; PREV(price),
>>    DOWN AS price &lt; PREV(price)
>> );
>> </programlisting>
>> ```
>>
>> PARTITION is at the same level as DEFINE, but it’s not defined as a
>> reserved keyword:
>> ```
>> PG_KEYWORD("partition", PARTITION, UNRESERVED_KEYWORD, BARE_LABEL)
>> ```
>>
>> Even in this patch,”initial”,”past”, “pattern” and “seek” are
>> defined as unreserved, why?
>>
>> So I just want to clarify.
> 
> 
> Because of position. Without making DEFINE a reserved keyword, how do
> you know that it isn't another variable in the PATTERN clause?

I think we don't need to worry about this because PATTERN_P is in the
$nonassoc list in the patch, which gives PATTERN different precedence
from DEFINE.

@@ -888,6 +896,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %nonassoc	UNBOUNDED NESTED /* ideally would have same precedence as IDENT */
 %nonassoc	IDENT PARTITION RANGE ROWS GROUPS PRECEDING FOLLOWING CUBE ROLLUP
 			SET KEYS OBJECT_P SCALAR VALUE_P WITH WITHOUT PATH
+			AFTER INITIAL SEEK PATTERN_P

And I think we could change DEFINE to an unreserved keyword.  Attached
is a patch to do that, on top of v35-0001.

Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Tatsuo Ishii <ishii@postgresql.org> — 2025-11-19T07:19:03Z

> 2 - 0001 - gram.y
> ```
> opt_row_pattern_initial_or_seek:
> 			INITIAL		{ $$ = true; }
> 			| SEEK
> 				{
> 					ereport(ERROR,
> 							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> 							 errmsg("SEEK is not supported"),
> 							 errhint("Use INITIAL instead."),
> 							 parser_errposition(@1)));
> 				}
> 			| /*EMPTY*/		{ $$ = true; }
> 		;
> ```
> 
> As SEEK is specially listed here, I guess it might be supported in future. If that is true, would it be better to defer the semantic check to later parse phase, which would future work easier.

From a comment in gram.y:

/*
 * If we see PARTITION, RANGE, ROWS, GROUPS, AFTER, INITIAL, SEEK or PATTERN_P
 * as the first token after the '(' of a window_specification, we want the
 * assumption to be that there is no existing_window_name; but those keywords
 * are unreserved and so could be ColIds.  We fix this by making them have the

For this purpose, we want INITIAL and SEEK to be unreserved keywords.

> 3 - 0001 - parsenodes.h
> ```
> +/*
> + * RowPatternCommonSyntax - raw representation of row pattern common syntax
> + *
> + */
> +typedef struct RPCommonSyntax
> +{
> +	NodeTag		type;
> +	RPSkipTo	rpSkipTo;		/* Row Pattern AFTER MATCH SKIP type */
> +	bool		initial;		/* true if <row pattern initial or seek> is
> +								 * initial */
> +	List	   *rpPatterns;		/* PATTERN variables (list of A_Expr) */
> +	List	   *rpDefs;			/* row pattern definitions clause (list of
> +								 * ResTarget) */
> +} RPCommonSyntax;
> +
>  /*
>   * WindowDef - raw representation of WINDOW and OVER clauses
>   *
> @@ -593,6 +618,7 @@ typedef struct WindowDef
>  	char	   *refname;		/* referenced window name, if any */
>  	List	   *partitionClause;	/* PARTITION BY expression list */
>  	List	   *orderClause;	/* ORDER BY (list of SortBy) */
> +	RPCommonSyntax *rpCommonSyntax; /* row pattern common syntax */
> ```
> 
> RP fields are directly defined in WindowClause, then why do we need a wrapper of RPCommonSyntax? Can we directly define RP fields in WindowRef as well?

The row pattern common syntax defined in the standard is not only used
in the WINDOW clause, but in the FROM clause. If we would support RPR
in FROM clause in the future, it would be better to use the same code
of row pattern common syntax in WINDOW clause as much as
possible. That's the reason I created RPCommonSyntax. In the
parse/analysis phase, I am not sure how the parse/analysis code would
be in FROM clause at this point. So I did not define yet another
struct for it.

Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Tatsuo Ishii <ishii@postgresql.org> — 2025-11-19T07:20:46Z

> 4 - 0001 - parsenodes.h
> ```
> +	/* Row Pattern AFTER MACH SKIP clause */
> ```
> 
> Typo: MACH -> MATCH

Thanks for pointing it out. Will fix.

Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Chao Li <li.evan.chao@gmail.com> — 2025-11-20T07:33:48Z


> On Nov 19, 2025, at 12:14, Chao Li <li.evan.chao@gmail.com> wrote:
> 
> 
> 9 - 0002 - parse_clause.c

I am continuing to review 0003

10 - 0003
```
+	Assert(list_length(patternVariables) == list_length(defineClause));
```

Is this assert true? From what I learned, pattern length doesn’t have to equal to define length. For example, I got an example from [1]:

```
Example 4
-- This example has three different patterns.
-- Comment two of them, to get error-free query.
SELECT company, price_date, price
  FROM stock_price
    MATCH_RECOGNIZE (
       partition by company
       order by price_date
       all rows per match
       pattern ( limit_50  up   up* down  down*  )
       define
           limit_50 as price <= 50.00,
           up   as price > prev(price),
           down as price < prev(price)
    )
   WHERE company = 'B'
   ORDER BY price_date;
```

In this example, pattern has 5 elements and define has only 3 elements.

11 - 0004 - plannodes.h
```
+	/* Row Pattern Recognition AFTER MACH SKIP clause */
+	RPSkipTo	rpSkipTo;		/* Row Pattern Skip To type */
```

Typo: MACH -> MATCH

I’d stop here, and continue 0005 tomorrow.


[1] https://link.springer.com/article/10.1007/s13222-022-00404-3

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Re: Row pattern recognition

Chao Li <li.evan.chao@gmail.com> — 2025-11-21T05:25:50Z

> On Nov 20, 2025, at 15:33, Chao Li <li.evan.chao@gmail.com> wrote:
> 
> 
> I’d stop here, and continue 0005 tomorrow.
> 

I reviewed 0005 today. I'm still not very familiar with the executor code, so going through 0005 has also been a valuable learning process for me.

12 - 0005 - nodeWindowAgg.c
```
+	if (string_set_get_size(str_set) == 0)
+	{
+		/* no match found in the first row */
+		register_reduced_frame_map(winstate, original_pos, RF_UNMATCHED);
+		destroyStringInfo(encoded_str);
+		return;
+	}
```

encoded_str should also be destroyed if not entering the “if” clause.

13 - 0005 - nodeWindowAgg.c
```
static
int
do_pattern_match(char *pattern, char *encoded_str, int len)
{
static regex_t *regcache = NULL;
static regex_t preg;
static char patbuf[1024]; /* most recent 'pattern' is cached here */
```

Using static variable in executor seems something I have never seen, which may persistent data across queries. I think usually per query data are stored in state structures.

14 - 0005 - nodeWindowAgg.c
```
+		MemoryContext oldContext = MemoryContextSwitchTo(TopMemoryContext);
+
+		if (regcache != NULL)
+			pg_regfree(regcache);	/* free previous re */
+
+		/* we need to convert to char to pg_wchar */
+		plen = strlen(pattern);
+		data = (pg_wchar *) palloc((plen + 1) * sizeof(pg_wchar));
+		data_len = pg_mb2wchar_with_len(pattern, data, plen);
+		/* compile re */
+		sts = pg_regcomp(&preg, /* compiled re */
+						 data,	/* target pattern */
+						 data_len,	/* length of pattern */
+						 cflags,	/* compile option */
+						 C_COLLATION_OID	/* collation */
+			);
+		pfree(data);
+
+		MemoryContextSwitchTo(oldContext);
```

Here in do_pattern_match, it switches to top memory context and compiles the re and stores to “preg". Based on the comment of pg_regcomp:
```
/*
* pg_regcomp - compile regular expression
*
* Note: on failure, no resources remain allocated, so pg_regfree()
* need not be applied to re.
*/
int
pg_regcomp(regex_t *re,
const chr *string,
size_t len,
int flags,
Oid collation)
```

“preg” should be freed by pg_regfree(), given the memory is allocated in TopMemoryContext, this looks like a memory leak.

Okay, I’d stop here and continue to review 0006 next week.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Re: Row pattern recognition

Tatsuo Ishii <ishii@postgresql.org> — 2025-11-21T05:57:08Z

Hi Chao,

>> On Nov 18, 2025, at 19:19, Vik Fearing <vik@postgresfriends.org> wrote:
>> 
>> 
>> Because of position. Without making DEFINE a reserved keyword, how do you know that it isn't another variable in the PATTERN clause?
>> 
> 
> Ah, thanks for the clarification. Now I got it.
> 
> I’m continue to review 0002.

Thanks for the review!

> 5 - 0002 - parse_clause.c
> ```
> +		ereport(ERROR,
> +				(errcode(ERRCODE_SYNTAX_ERROR),
> +				 errmsg("FRAME must start at current row when row patttern recognition is used")));
> ```
> 
> Nit typo: patttern -> pattern

Will fix (this involves regression test change because this changes
the ERROR messages in the expected file).

> 6 - 0002 - parse_clause.c
> ```
> +	/* DEFINE variable name initials */
> +	static char *defineVariableInitials = "abcdefghijklmnopqrstuvwxyz”;
> ```
> 
> This string can also be const, so “static const char *”

Agreed. Will fix.

> 7 - 0002 - parse_clause.c
> ```
> +	/*
> +	 * Create list of row pattern DEFINE variable name's initial. We assign
> +	 * [a-z] to them (up to 26 variable names are allowed).
> +	 */
> +	restargets = NIL;
> +	i = 0;
> +	initialLen = strlen(defineVariableInitials);
> +
> +	foreach(lc, windef->rpCommonSyntax->rpDefs)
> +	{
> +		char		initial[2];
> +
> +		restarget = (ResTarget *) lfirst(lc);
> +		name = restarget->name;
> ```
> 
> Looks like “name” is not used inside “foreach”.

Oops. Will fix.

> 8 - 0002 - parse_clause.c
> ```
> +	/*
> +	 * Create list of row pattern DEFINE variable name's initial. We assign
> +	 * [a-z] to them (up to 26 variable names are allowed).
> +	 */
> +	restargets = NIL;
> +	i = 0;
> +	initialLen = strlen(defineVariableInitials);
> +
> +	foreach(lc, windef->rpCommonSyntax->rpDefs)
> +	{
> +		char		initial[2];
> ```
> 
> I guess this “foreach” for build initial list can be combined into the previous foreach loop of checking duplication. If an def has no dup, then assign an initial to it.

You are right. Will change.

> 9 - 0002 - parse_clause.c
> ```
> +static void
> +transformPatternClause(ParseState *pstate, WindowClause *wc,
> +					   WindowDef *windef)
> +{
> +	ListCell   *lc;
> +
> +	/*
> +	 * Row Pattern Common Syntax clause exists?
> +	 */
> +	if (windef->rpCommonSyntax == NULL)
> +		return;
> ```
> 
> Checking  (windef->rpCommonSyntax == NULL) seems a duplicate, because transformPatternClause() is only called by transformRPR(), and transformRPR() has done the check.

Yeah. transformDefineClause() already does the similar check using
Assert. What about using Assert in transPatternClause() as well?

	Assert(windef->rpCommonSyntax != NULL);

Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Chao Li <li.evan.chao@gmail.com> — 2025-11-24T00:59:57Z

> On Nov 21, 2025, at 13:25, Chao Li <li.evan.chao@gmail.com> wrote:
> 
> 
> Okay, I’d stop here and continue to review 0006 next week.
> 

I just finished reviewing 0006, and see some problems:

15 - 0006 - select.sgml
```
+[ <replaceable class="parameter">row_pattern_common_syntax</replaceable> ]
```

row_pattern_common_syntax doesn’t look like a good name. I searched over the docs, and couldn't find a name suffixed by “_syntax”. I think we can just name it as “row_pattern_recognition_clause” or a shorter name just “row_pattern”.

16 - 0006 - select.sgml
```
+ <synopsis>
+ [ AFTER MATCH SKIP PAST LAST ROW | AFTER MATCH SKIP TO NEXT ROW ]
``

I think the two values are mutually exclusive, so curly braces should added surround them:

[ { AFTER MATCH SKIP PAST LAST ROW | AFTER MATCH SKIP TO NEXT ROW } ]

[] means optional, and {} means choose one from all alternatives.

17 - 0006 - select.sgml
```
PATTERN <replaceable class="parameter">pattern_variable_name</replaceable>[+] [, ...]
```

PATTERN definition should be placed inside (), here you missed ()

18 - 0006 - select.sgml
The same code as 17, <replaceable class="parameter">pattern_variable_name</replaceable>[+], do you only put “+” here, I think SQL allows a lot of pattern quantifiers. From 0001, gram.y, looks like +, * and ? Are supported in this patch. So, maybe we can do:

```
PATTERN ( pattern_element, [pattern_element …] )

and pattern_element = variable name optionally followed by quantifier (reference to somewhere introducing pattern quantifier).
```

19 - 0006 - select.sgml

I don’t see INITIAL and SEEK are described. Even if SEEK is not supported currently, it’s worthy mentioning that.

20 - 0006 - select.sgml
```
+    after a match found. With <literal>AFTER MATCH SKIP PAST LAST
+    ROW</literal> (the default) next row position is next to the last row of
```

21 - 0006 - select.sgml
```
 [ <replaceable class="parameter">frame_clause</replaceable> ]
+[ <replaceable class="parameter">row_pattern_common_syntax</replaceable> ]
```

Looks like row_pattern_common_syntax belongs to frame_clause, and you have lately added row_pattern_common_syntax to frame_clause:
```
 <synopsis>
-{ RANGE | ROWS | GROUPS } <replaceable>frame_start</replaceable> [ <replaceable>frame_exclusion</replaceable> ]
-{ RANGE | ROWS | GROUPS } BETWEEN <replaceable>frame_start</replaceable> AND <replaceable>frame_end</replaceable> [ <replaceable>frame_exclusion</replaceable> ]
+{ RANGE | ROWS | GROUPS } <replaceable>frame_start</replaceable> [ <replaceable>frame_exclusion</replaceable> ] [row_pattern_common_syntax]
+{ RANGE | ROWS | GROUPS } BETWEEN <replaceable>frame_start</replaceable> AND <replaceable>frame_end</replaceable> [ <replaceable>frame_exclusion</replaceable> ] [row_pattern_common_syntax]
 </synopsis>
```

So I guess row_pattern_common_syntax after frame_clause should not be added.

22 - 0006 - advance.sgml
```
<programlisting>
DEFINE
 LOWPRICE AS price &lt;= 100,
 UP AS price &gt; PREV(price),
 DOWN AS price &lt; PREV(price)
</programlisting>

    Note that <function>PREV</function> returns the price column in the
```

In the “Note” line, “price” refers to a column from above example, so I think it should be tagged by <literal>. This comment applies to multiple places.

I will proceed 0007 tomorrow.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Re: Row pattern recognition

Tatsuo Ishii <ishii@postgresql.org> — 2025-11-24T04:57:25Z

Hi Chao,

Thank you for the review!

>> On Nov 20, 2025, at 15:33, Chao Li <li.evan.chao@gmail.com> wrote:
>> 
>> 
>> I’d stop here, and continue 0005 tomorrow.
>> 
> 
> I reviewed 0005 today. I'm still not very familiar with the executor code, so going through 0005 has also been a valuable learning process for me.
> 
> 12 - 0005 - nodeWindowAgg.c
> ```
> +	if (string_set_get_size(str_set) == 0)
> +	{
> +		/* no match found in the first row */
> +		register_reduced_frame_map(winstate, original_pos, RF_UNMATCHED);
> +		destroyStringInfo(encoded_str);
> +		return;
> +	}
> ```
> 
> encoded_str should also be destroyed if not entering the “if” clause.

Subsequent string_set_discard() will free encode_str since encoded_str
is part of str_set. So no need to call destroyStringInfo(encoded_str)
in not entering "if" clause.

	string_set_discard(str_set);

So I think this is Ok.

> 13 - 0005 - nodeWindowAgg.c
> ```
> static
> int
> do_pattern_match(char *pattern, char *encoded_str, int len)
> {
> static regex_t *regcache = NULL;
> static regex_t preg;
> static char patbuf[1024]; /* most recent 'pattern' is cached here */
> ```
> 
> Using static variable in executor seems something I have never seen, which may persistent data across queries. I think usually per query data are stored in state structures.

Yeah, such a usage maybe rare. But I hesitate to store the data
(regex_t) in WindowAggState because:

(1) it bloats WindowAggState which we want to avoid if possible
(2) it requires execnodes.h to include regex.h. (maybe this itself is harmless, I am not sure)

> 14 - 0005 - nodeWindowAgg.c
> ```
> +		MemoryContext oldContext = MemoryContextSwitchTo(TopMemoryContext);
> +
> +		if (regcache != NULL)
> +			pg_regfree(regcache);	/* free previous re */
> +
> +		/* we need to convert to char to pg_wchar */
> +		plen = strlen(pattern);
> +		data = (pg_wchar *) palloc((plen + 1) * sizeof(pg_wchar));
> +		data_len = pg_mb2wchar_with_len(pattern, data, plen);
> +		/* compile re */
> +		sts = pg_regcomp(&preg, /* compiled re */
> +						 data,	/* target pattern */
> +						 data_len,	/* length of pattern */
> +						 cflags,	/* compile option */
> +						 C_COLLATION_OID	/* collation */
> +			);
> +		pfree(data);
> +
> +		MemoryContextSwitchTo(oldContext);
> ```
> 
> Here in do_pattern_match, it switches to top memory context and compiles the re and stores to “preg". Based on the comment of pg_regcomp:
> ```
> /*
> * pg_regcomp - compile regular expression
> *
> * Note: on failure, no resources remain allocated, so pg_regfree()
> * need not be applied to re.
> */
> int
> pg_regcomp(regex_t *re,
> const chr *string,
> size_t len,
> int flags,
> Oid collation)
> ```
> 
> “preg” should be freed by pg_regfree(), given the memory is allocated in TopMemoryContext, this looks like a memory leak.

I admit current patch leaves the memory unfreed even after a query
finishes. What about adding something like:

static void do_pattern_match_end(void)
{
	if (regcache != NULL)
		pg_regfree(regcache);
}

and let ExecEndWindowAgg() call it?

> Okay, I’d stop here and continue to review 0006 next week.
> 
> Best regards,
> --
> Chao Li (Evan)
> HighGo Software Co., Ltd.
> https://www.highgo.com/
> 
> 
> 
>

Re: Row pattern recognition

Tatsuo Ishii <ishii@postgresql.org> — 2025-11-24T06:47:03Z

Hi Chao,

>> On Nov 21, 2025, at 13:25, Chao Li <li.evan.chao@gmail.com> wrote:
>> 
>> 
>> Okay, I’d stop here and continue to review 0006 next week.
>> 
> 
> I just finished reviewing 0006, and see some problems:
> 
> 15 - 0006 - select.sgml
> ```
> +[ <replaceable class="parameter">row_pattern_common_syntax</replaceable> ]
> ```
> 
> row_pattern_common_syntax doesn’t look like a good name. I searched over the docs, and couldn't find a name suffixed by “_syntax”. I think we can just name it as “row_pattern_recognition_clause” or a shorter name just “row_pattern”.

I believe these names are based on the SQL standard. All syntax
components do not necessary be suffixed by "clause". For example
in select.sgml,

[ WITH [ RECURSIVE ] <replaceable class="parameter">with_query</replaceable> [, ...] ]
SELECT [ ALL | DISTINCT [ ON ( <replaceable class="parameter">expression</replaceable> [, ...] ) ] ]
    [ { * | <replaceable class="parameter">expression</replaceable> [ [ AS ] <replaceable class="parameter">output_name</replaceable> ] } [, ...] ]
    [ FROM <replaceable class="parameter">from_item</replaceable> [, ...] ]
    [ WHERE <replaceable class="parameter">condition</replaceable> ]
    [ GROUP BY { ALL | [ ALL | DISTINCT ] <replaceable class="parameter">grouping_element</replaceable> [, ...] } ]
    [ HAVING <replaceable class="parameter">condition</replaceable> ]
    [ WINDOW <replaceable class="parameter">window_name</replaceable> AS ( <replaceable class="parameter">window_definition</replaceable> ) [, ...] ]

"window_definition" comes from the standard and does not suffixed by
"clause". If you look into the window clause definition in the
standard, you will see:

<window clause> ::=
WINDOW <window definition list>
<window definition list> ::=
<window definition> [ { <comma> <window definition> }... ]

So the usage of terms in our document matches the standard. Likewise,
we see the definition of row pattern common syntax in the stabdard:

<window clause> ::=
WINDOW <window definition list>
<window definition list> ::=
<window definition> [ { <comma> <window definition> }... ]
<window definition> ::=
<new window name> AS <window specification>
<new window name> ::=
<window name>
<window specification> ::=
<left paren> <window specification details> <right paren>
<window specification details> ::=
[ <existing window name> ]
[ <window partition clause> ]
[ <window order clause> ]
[ <window frame clause> ]
:
:
<window frame clause> ::=
[ <row pattern measures> ]
<window frame units> <window frame extent>
[ <window frame exclusion> ]
[ <row pattern common syntax> ]

So I think "row pattern common syntax" is perfectly appropriate
name. If we change it to “row_pattern_recognition_clause”, it will
just bring confusion.  In the standard “row pattern recognition
clause” is defined RPR in FROM clause.

SELECT ... FROM table MATCH RECOGNIZE (....);

Here MATCH RECOGNIZE (....) is the “row pattern recognition clause”.

> 16 - 0006 - select.sgml
> ```
> + <synopsis>
> + [ AFTER MATCH SKIP PAST LAST ROW | AFTER MATCH SKIP TO NEXT ROW ]
> ``
> 
> I think the two values are mutually exclusive, so curly braces should added surround them:
> 
> [ { AFTER MATCH SKIP PAST LAST ROW | AFTER MATCH SKIP TO NEXT ROW } ]
> 
> [] means optional, and {} means choose one from all alternatives.

Agreed. Will fix.

> 17 - 0006 - select.sgml
> ```
> PATTERN <replaceable class="parameter">pattern_variable_name</replaceable>[+] [, ...]
> ```
> 
> PATTERN definition should be placed inside (), here you missed ()

Good catch. Will fix.

> 18 - 0006 - select.sgml
> The same code as 17, <replaceable class="parameter">pattern_variable_name</replaceable>[+], do you only put “+” here, I think SQL allows a lot of pattern quantifiers. From 0001, gram.y, looks like +, * and ? Are supported in this patch. So, maybe we can do:
> 
> ```
> PATTERN ( pattern_element, [pattern_element …] )
> 
> and pattern_element = variable name optionally followed by quantifier (reference to somewhere introducing pattern quantifier).
> ```

Currently only *, + and ? are supported and I don't think it's worth
to invent "pattern_element".  (Actually the standard defines much more
complex syntax for PATTERN. I think we can revisit this once the basic
support for quantifier *,+ and ? are brought in the core PostgreSQL
code.

> 19 - 0006 - select.sgml
> 
> I don’t see INITIAL and SEEK are described. Even if SEEK is not supported currently, it’s worthy mentioning that.

Agreed. Will fix.

> 20 - 0006 - select.sgml
> ```
> +    after a match found. With <literal>AFTER MATCH SKIP PAST LAST
> +    ROW</literal> (the default) next row position is next to the last row of
> ```
> 
> 21 - 0006 - select.sgml
> ```
>  [ <replaceable class="parameter">frame_clause</replaceable> ]
> +[ <replaceable class="parameter">row_pattern_common_syntax</replaceable> ]
> ```
> 
> Looks like row_pattern_common_syntax belongs to frame_clause, and you have lately added row_pattern_common_syntax to frame_clause:
> ```
>  <synopsis>
> -{ RANGE | ROWS | GROUPS } <replaceable>frame_start</replaceable> [ <replaceable>frame_exclusion</replaceable> ]
> -{ RANGE | ROWS | GROUPS } BETWEEN <replaceable>frame_start</replaceable> AND <replaceable>frame_end</replaceable> [ <replaceable>frame_exclusion</replaceable> ]
> +{ RANGE | ROWS | GROUPS } <replaceable>frame_start</replaceable> [ <replaceable>frame_exclusion</replaceable> ] [row_pattern_common_syntax]
> +{ RANGE | ROWS | GROUPS } BETWEEN <replaceable>frame_start</replaceable> AND <replaceable>frame_end</replaceable> [ <replaceable>frame_exclusion</replaceable> ] [row_pattern_common_syntax]
>  </synopsis>
> ```
> 
> So I guess row_pattern_common_syntax after frame_clause should not be added.

Yes, row_pattern_common_syntax belongs to frame_clause. Will fix.

> 22 - 0006 - advance.sgml
> ```
> <programlisting>
> DEFINE
>  LOWPRICE AS price &lt;= 100,
>  UP AS price &gt; PREV(price),
>  DOWN AS price &lt; PREV(price)
> </programlisting>
> 
>     Note that <function>PREV</function> returns the price column in the
> ```
> 
> In the “Note” line, “price” refers to a column from above example, so I think it should be tagged by <literal>. This comment applies to multiple places.

You are right. Will fix.

> I will proceed 0007 tomorrow.

Thanks!

Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Chao Li <li.evan.chao@gmail.com> — 2025-11-25T06:16:10Z

> On Nov 24, 2025, at 08:59, Chao Li <li.evan.chao@gmail.com> wrote:
> 
> I will proceed 0007 tomorrow.

I just finished reviewing 0007 and 0008. This patch set really demonstrates the full lifecycle of adding a SQL feature, from changing the syntax in gram.y all the way down to the executor, including tests and docs. I learned a lot from it. Thanks!

23 - 0007

As you mentioned earlier, pattern regular expression support *, + and ?, but I don’t see ? is tested.

24 - 0007

I don’t see negative tests for unsupported quantifiers, like PATTERN (A+?).

25 - 0007
```
+-- basic test with none greedy pattern
```

Typo: “none greedy” -> non-greedy

No comment for 0008.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Re: Row pattern recognition

Tatsuo Ishii <ishii@postgresql.org> — 2025-11-27T01:16:42Z

> I just finished reviewing 0007 and 0008. This patch set really demonstrates the full lifecycle of adding a SQL feature, from changing the syntax in gram.y all the way down to the executor, including tests and docs. I learned a lot from it. Thanks!

You are welcome!

> 23 - 0007
> 
> As you mentioned earlier, pattern regular expression support *, + and ?, but I don’t see ? is tested.

Good catch. I will add the test case. In the mean time I found a bug
with gram.y part which handles '?' quantifier.  gram.y can be
successfully compiled but there's no way to put '?' quantifier in a
SQL statement.  We cannot write directly "ColId '?'" like other
quantifier '*' and '+' in the grammer because '?' is not a "self"
token.  So we let '?' matches Op first then check if it's '?'  or
not. 

			| ColId Op
				{
					if (strcmp("?", $2))
						ereport(ERROR,
								errcode(ERRCODE_SYNTAX_ERROR),
								errmsg("unsupported quantifier"),
								parser_errposition(@2));

					$$ = (Node *) makeSimpleA_Expr(AEXPR_OP, "?", (Node *)makeString($1), NULL, @1);
				}

Another bug was with executor (nodeWindowAgg.c). The code to check
greedy quantifiers missed the case '?'.

> 24 - 0007
> 
> I don’t see negative tests for unsupported quantifiers, like PATTERN (A+?).

I will add such test cases.

> 25 - 0007
> ```
> +-- basic test with none greedy pattern
> ```
> 
> Typo: “none greedy” -> non-greedy

Will fix.

> No comment for 0008.

Thanks again for the review. I will post v35 patch set soon.  Attached
is the diff from v34.

Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Tatsuo Ishii <ishii@postgresql.org> — 2025-11-27T03:10:08Z

Hi Chao,

Any comment on this?

>> 13 - 0005 - nodeWindowAgg.c
>> ```
>> static
>> int
>> do_pattern_match(char *pattern, char *encoded_str, int len)
>> {
>> static regex_t *regcache = NULL;
>> static regex_t preg;
>> static char patbuf[1024]; /* most recent 'pattern' is cached here */
>> ```
>> 
>> Using static variable in executor seems something I have never seen, which may persistent data across queries. I think usually per query data are stored in state structures.
>
>Yeah, such a usage maybe rare. But I hesitate to store the data
>(regex_t) in WindowAggState because:
>
>(1) it bloats WindowAggState which we want to avoid if possible
>(2) it requires execnodes.h to include regex.h. (maybe this itself is harmless, I am not sure)
>
>> 14 - 0005 - nodeWindowAgg.c
>> ```
>> +		MemoryContext oldContext = MemoryContextSwitchTo(TopMemoryContext);
>> +
>> +		if (regcache != NULL)
>> +			pg_regfree(regcache);	/* free previous re */
>> +
>> +		/* we need to convert to char to pg_wchar */
>> +		plen = strlen(pattern);
>> +		data = (pg_wchar *) palloc((plen + 1) * sizeof(pg_wchar));
>> +		data_len = pg_mb2wchar_with_len(pattern, data, plen);
>> +		/* compile re */
>> +		sts = pg_regcomp(&preg, /* compiled re */
>> +						 data,	/* target pattern */
>> +						 data_len,	/* length of pattern */
>> +						 cflags,	/* compile option */
>> +						 C_COLLATION_OID	/* collation */
>> +			);
>> +		pfree(data);
>> +
>> +		MemoryContextSwitchTo(oldContext);
>> ```
>> 
>> Here in do_pattern_match, it switches to top memory context and compiles the re and stores to “preg". Based on the comment of pg_regcomp:
>> ```
>> /*
>> * pg_regcomp - compile regular expression
>> *
>> * Note: on failure, no resources remain allocated, so pg_regfree()
>> * need not be applied to re.
>> */
>> int
>> pg_regcomp(regex_t *re,
>> const chr *string,
>> size_t len,
>> int flags,
>> Oid collation)
>> ```
>> 
>> “preg” should be freed by pg_regfree(), given the memory is allocated in TopMemoryContext, this looks like a memory leak.
>
>I admit current patch leaves the memory unfreed even after a query
>finishes. What about adding something like:
>
>static void do_pattern_match_end(void)
>{
>	if (regcache != NULL)
>		pg_regfree(regcache);
>}
>
>and let ExecEndWindowAgg() call it?

Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Chao Li <li.evan.chao@gmail.com> — 2025-11-27T06:56:02Z


> On Nov 27, 2025, at 11:10, Tatsuo Ishii <ishii@postgresql.org> wrote:
> 
> Hi Chao,
> 
> Any comment on this?
> 
>>> 13 - 0005 - nodeWindowAgg.c
>>> ```
>>> static
>>> int
>>> do_pattern_match(char *pattern, char *encoded_str, int len)
>>> {
>>> static regex_t *regcache = NULL;
>>> static regex_t preg;
>>> static char patbuf[1024]; /* most recent 'pattern' is cached here */
>>> ```
>>> 
>>> Using static variable in executor seems something I have never seen, which may persistent data across queries. I think usually per query data are stored in state structures.
>> 
>> Yeah, such a usage maybe rare. But I hesitate to store the data
>> (regex_t) in WindowAggState because:
>> 
>> (1) it bloats WindowAggState which we want to avoid if possible
>> (2) it requires execnodes.h to include regex.h. (maybe this itself is harmless, I am not sure)

With the static function-scope variables, those data persist across queries, which is error prone. I am afraid that even if I let this pass, other reviewers might have the same concern.

Maybe define a sub structure, say WindowAggCache, and put a pointer of WindowAggCache in WindowAggState, and only allocate memory when a query needs to.

>> 
>>> 14 - 0005 - nodeWindowAgg.c
>>> ```
>>> + MemoryContext oldContext = MemoryContextSwitchTo(TopMemoryContext);
>>> +
>>> + if (regcache != NULL)
>>> + pg_regfree(regcache); /* free previous re */
>>> +
>>> + /* we need to convert to char to pg_wchar */
>>> + plen = strlen(pattern);
>>> + data = (pg_wchar *) palloc((plen + 1) * sizeof(pg_wchar));
>>> + data_len = pg_mb2wchar_with_len(pattern, data, plen);
>>> + /* compile re */
>>> + sts = pg_regcomp(&preg, /* compiled re */
>>> + data, /* target pattern */
>>> + data_len, /* length of pattern */
>>> + cflags, /* compile option */
>>> + C_COLLATION_OID /* collation */
>>> + );
>>> + pfree(data);
>>> +
>>> + MemoryContextSwitchTo(oldContext);
>>> ```
>>> 
>>> Here in do_pattern_match, it switches to top memory context and compiles the re and stores to “preg". Based on the comment of pg_regcomp:
>>> ```
>>> /*
>>> * pg_regcomp - compile regular expression
>>> *
>>> * Note: on failure, no resources remain allocated, so pg_regfree()
>>> * need not be applied to re.
>>> */
>>> int
>>> pg_regcomp(regex_t *re,
>>> const chr *string,
>>> size_t len,
>>> int flags,
>>> Oid collation)
>>> ```
>>> 
>>> “preg” should be freed by pg_regfree(), given the memory is allocated in TopMemoryContext, this looks like a memory leak.
>> 
>> I admit current patch leaves the memory unfreed even after a query
>> finishes. What about adding something like:
>> 
>> static void do_pattern_match_end(void)
>> {
>> if (regcache != NULL)
>> pg_regfree(regcache);
>> }
>> 
>> and let ExecEndWindowAgg() call it?
> 

I’m not sure. But I think if we move the data into WindowAggState, then I guess we don’t have to use TopmemoryContext here.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Re: Row pattern recognition

Tatsuo Ishii <ishii@postgresql.org> — 2025-12-01T06:57:02Z

Hi Chao,

Sorry, I missed this email.

>> 9 - 0002 - parse_clause.c
> 
> I am continuing to review 0003
> 
> 10 - 0003
> ```
> +	Assert(list_length(patternVariables) == list_length(defineClause));
> ```
> 
> Is this assert true? From what I learned, pattern length doesn’t have to equal to define length. For example, I got an example from [1]:

You are right. If PATTERN clause uses the same pattern variable more
than once (which is perfectly valid), the assertion fails. I will
remove the assersion and fix the subsequent forboth loop.

> ```
> Example 4
> -- This example has three different patterns.
> -- Comment two of them, to get error-free query.
> SELECT company, price_date, price
>   FROM stock_price
>     MATCH_RECOGNIZE (
>        partition by company
>        order by price_date
>        all rows per match
>        pattern ( limit_50  up   up* down  down*  )
>        define
>            limit_50 as price <= 50.00,
>            up   as price > prev(price),
>            down as price < prev(price)
>     )
>    WHERE company = 'B'
>    ORDER BY price_date;
> ```
> 
> In this example, pattern has 5 elements and define has only 3 elements.

Yes.

> 11 - 0004 - plannodes.h
> ```
> +	/* Row Pattern Recognition AFTER MACH SKIP clause */
> +	RPSkipTo	rpSkipTo;		/* Row Pattern Skip To type */
> ```
> 
> Typo: MACH -> MATCH

Will fix/

> I’d stop here, and continue 0005 tomorrow.

Thanks!

> [1] https://link.springer.com/article/10.1007/s13222-022-00404-3
> 
> Best regards,
> --
> Chao Li (Evan)
> HighGo Software Co., Ltd.
> https://www.highgo.com/
> 
> 
> 
>

Re: Row pattern recognition

Tatsuo Ishii <ishii@postgresql.org> — 2025-12-01T12:21:38Z

>> On Nov 27, 2025, at 11:10, Tatsuo Ishii <ishii@postgresql.org> wrote:
>> 
>> Hi Chao,
>> 
>> Any comment on this?
>> 
>>>> 13 - 0005 - nodeWindowAgg.c
>>>> ```
>>>> static
>>>> int
>>>> do_pattern_match(char *pattern, char *encoded_str, int len)
>>>> {
>>>> static regex_t *regcache = NULL;
>>>> static regex_t preg;
>>>> static char patbuf[1024]; /* most recent 'pattern' is cached here */
>>>> ```
>>>> 
>>>> Using static variable in executor seems something I have never seen, which may persistent data across queries. I think usually per query data are stored in state structures.
>>> 
>>> Yeah, such a usage maybe rare. But I hesitate to store the data
>>> (regex_t) in WindowAggState because:
>>> 
>>> (1) it bloats WindowAggState which we want to avoid if possible
>>> (2) it requires execnodes.h to include regex.h. (maybe this itself is harmless, I am not sure)
> 
> With the static function-scope variables, those data persist across queries, which is error prone. I am afraid that even if I let this pass, other reviewers might have the same concern.
> 
> Maybe define a sub structure, say WindowAggCache, and put a pointer of WindowAggCache in WindowAggState, and only allocate memory when a query needs to.
> 
>>> 
>>>> 14 - 0005 - nodeWindowAgg.c
>>>> ```
>>>> + MemoryContext oldContext = MemoryContextSwitchTo(TopMemoryContext);
>>>> +
>>>> + if (regcache != NULL)
>>>> + pg_regfree(regcache); /* free previous re */
>>>> +
>>>> + /* we need to convert to char to pg_wchar */
>>>> + plen = strlen(pattern);
>>>> + data = (pg_wchar *) palloc((plen + 1) * sizeof(pg_wchar));
>>>> + data_len = pg_mb2wchar_with_len(pattern, data, plen);
>>>> + /* compile re */
>>>> + sts = pg_regcomp(&preg, /* compiled re */
>>>> + data, /* target pattern */
>>>> + data_len, /* length of pattern */
>>>> + cflags, /* compile option */
>>>> + C_COLLATION_OID /* collation */
>>>> + );
>>>> + pfree(data);
>>>> +
>>>> + MemoryContextSwitchTo(oldContext);
>>>> ```
>>>> 
>>>> Here in do_pattern_match, it switches to top memory context and compiles the re and stores to “preg". Based on the comment of pg_regcomp:
>>>> ```
>>>> /*
>>>> * pg_regcomp - compile regular expression
>>>> *
>>>> * Note: on failure, no resources remain allocated, so pg_regfree()
>>>> * need not be applied to re.
>>>> */
>>>> int
>>>> pg_regcomp(regex_t *re,
>>>> const chr *string,
>>>> size_t len,
>>>> int flags,
>>>> Oid collation)
>>>> ```
>>>> 
>>>> “preg” should be freed by pg_regfree(), given the memory is allocated in TopMemoryContext, this looks like a memory leak.
>>> 
>>> I admit current patch leaves the memory unfreed even after a query
>>> finishes. What about adding something like:
>>> 
>>> static void do_pattern_match_end(void)
>>> {
>>> if (regcache != NULL)
>>> pg_regfree(regcache);
>>> }
>>> 
>>> and let ExecEndWindowAgg() call it?
>> 
> 
> I’m not sure. But I think if we move the data into WindowAggState, then I guess we don’t have to use TopmemoryContext here.

I decided to add new fields to WindowAggState:

	/* regular expression compiled result cache. Used for RPR. */
	char	   *patbuf;			/* pattern to be compiled */
	regex_t		preg;			/* compiled re pattern */

Then allocate the memory for them using
winstate->ss.ps.ps_ExprContext->ecxt_per_query_memory;

Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Tatsuo Ishii <ishii@postgresql.org> — 2025-12-01T12:42:18Z

>> I just finished reviewing 0007 and 0008. This patch set really demonstrates the full lifecycle of adding a SQL feature, from changing the syntax in gram.y all the way down to the executor, including tests and docs. I learned a lot from it. Thanks!
> 
> You are welcome!
> 
>> 23 - 0007
>> 
>> As you mentioned earlier, pattern regular expression support *, + and ?, but I don’t see ? is tested.
> 
> Good catch. I will add the test case. In the mean time I found a bug
> with gram.y part which handles '?' quantifier.  gram.y can be
> successfully compiled but there's no way to put '?' quantifier in a
> SQL statement.  We cannot write directly "ColId '?'" like other
> quantifier '*' and '+' in the grammer because '?' is not a "self"
> token.  So we let '?' matches Op first then check if it's '?'  or
> not. 
> 
> 			| ColId Op
> 				{
> 					if (strcmp("?", $2))
> 						ereport(ERROR,
> 								errcode(ERRCODE_SYNTAX_ERROR),
> 								errmsg("unsupported quantifier"),
> 								parser_errposition(@2));
> 
> 					$$ = (Node *) makeSimpleA_Expr(AEXPR_OP, "?", (Node *)makeString($1), NULL, @1);
> 				}
> 
> Another bug was with executor (nodeWindowAgg.c). The code to check
> greedy quantifiers missed the case '?'.
> 
>> 24 - 0007
>> 
>> I don’t see negative tests for unsupported quantifiers, like PATTERN (A+?).
> 
> I will add such test cases.
> 
>> 25 - 0007
>> ```
>> +-- basic test with none greedy pattern
>> ```
>> 
>> Typo: “none greedy” -> non-greedy
> 
> Will fix.
> 
>> No comment for 0008.
> 
> Thanks again for the review. I will post v35 patch set soon.  Attached
> is the diff from v34.

Attached are the v35 patches for Row pattern recognition.  Chao Li
reviewed v34 thoroughly. Thank you! v35 reflects the review
comments. Major differences from v34 include:

- Make "DEFINE" an unreserved keyword. Previously it was a reserved keyword.
- Refactor transformDefineClause() to make two foreach loops into single foreach loop.
- Make '?' quantifier in PATTERN work as advertised. Test for '?' quantifier is added too.
- Unsupported quantifier test added.
- Fix get_rule_define().
- Fix memory leak related to regcomp.
- Move regcomp compiled result cache from static data to WindowAggState.
- Fix several typos.

Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Tatsuo Ishii <ishii@postgresql.org> — 2025-12-23T13:26:54Z

Attached are the v36 patches, just for rebasing.

Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

Henson Choi <assam258@gmail.com> — 2025-12-31T12:10:53Z

SQL/RPR Patch Review Report
============================

Patch: Row Pattern Recognition (SQL:2016)
Commitfest: https://commitfest.postgresql.org/patch/4460

Review Methodology:
  This review focused on quality assessment, not line-by-line code audit.
  Key code paths and quality issues were examined with surrounding context
  when concerns arose. Documentation files were reviewed with AI-assisted
  grammar and typo checking. Code coverage was measured using gcov and
  custom analysis tools.

Limitations:
  - Not a security audit or formal verification
  - Parser and executor internals reviewed at module level, not exhaustively
  - Performance characteristics not benchmarked


TABLE OF CONTENTS
-----------------

1. Executive Summary
2. Issues Found
   2.1 Critical / Major
   2.2 Minor Issues (Review Needed)
   2.3 Minor Issues (Style)
   2.4 Suggestions for Discussion
3. Test Coverage
   3.1 Covered Areas
   3.2 Untested Items
   3.3 Unimplemented Features (No Test Needed)
4. Code Coverage Analysis
   4.1 Overall Coverage
   4.2 Coverage by File
   4.3 Uncovered Code Risk Assessment
5. Commit Analysis
6. Recommendations
7. Conclusion


1. EXECUTIVE SUMMARY
--------------------

Overall assessment: GOOD

The SQL/RPR patch demonstrates solid implementation quality within its
defined scope. Code follows PostgreSQL coding standards (with minor style
issues), test coverage is comprehensive at 96.4%, and documentation is
thorough with only minor typos.

Rating by Area:
- Code Quality:     Good (PostgreSQL style compliant, 26 static style fixes
recommended)
- Test Coverage:    Good (96.4% line coverage, 28 test cases)
- Documentation:    Good (Complete, 1 minor typo)
- Build/Regress:    Pass (make check-world, 753 tests passed)


2. ISSUES FOUND
---------------

2.1 Critical / Major

#1 [Code] Greedy pattern combinatorial explosion
   Pattern: PATTERN (A+ B+ C+) with DEFINE A AS TRUE, B AS TRUE, C AS TRUE
   Impact: Memory exhaustion or infinite wait
   Recommendation: Add work_mem-based memory limit (error on exceed)

   Evidence - No memory limit in current code:
   - nodeWindowAgg.c:5718-5722 string_set_init(): palloc() unconditional
   - nodeWindowAgg.c:5741-5750 string_set_add(): set_size *= 2; repalloc()
unlimited
   - nodeWindowAgg.c:5095-5174 generate_patterns(): Triple loop, no limit

   Only work_mem usage in RPR (nodeWindowAgg.c:1297):
     winstate->buffer = tuplestore_begin_heap(false, false, work_mem);
   -> For tuple buffer, unrelated to pattern combination memory (StringSet)

2.2 Minor Issues (Review Needed)

#1 [Code] nodeWindowAgg.c:5849,5909,5912
   pos > NUM_ALPHABETS check - intent unclear, causes error at 28 PATTERN
elements

   Reproduction:
   - PATTERN (A B C ... Z A) (27 elements) -> OK
   - PATTERN (A B C ... Z A B) (28 elements) -> ERROR "initial is not valid
char: b"

2.3 Minor Issues (Style)

#1 [Code] nodeWindowAgg.c (25 blocks)
   #ifdef RPR_DEBUG -> recommend elog(DEBUG2, ...) or remove

#2 [Code] src/backend/executor/nodeWindowAgg.c
   static keyword on separate line (26 functions)

#3 [Code] src/backend/utils/adt/ruleutils.c
   Whitespace typo: "regexp !=NULL" -> "regexp != NULL"

#4 [Code] nodeWindowAgg.c:4619
   Error message case: "Unrecognized" -> "unrecognized" (PostgreSQL style)

#5 [Doc] doc/src/sgml/ref/select.sgml:1128
   Typo: "maximu" -> "maximum"

2.4 Suggestions for Discussion

#1 Incremental matching with streaming NFA redesign
   Benefits:
   - O(n) time complexity per row (current: exponential in worst case)
   - Bounded memory via state merging and context absorption
   - Natural extension for OR patterns, {n,m} quantifiers, nested groups
   - Enables MEASURES clause with incremental aggregates
   - Proven approach in CEP engines (Flink, Esper)


3. TEST COVERAGE
----------------

3.1 Covered Areas

- PATTERN clause: +, *, ? quantifiers (line 41, 71, 86)
- DEFINE clause: Variable conditions, auto-TRUE for missing (line 120-133)
- PREV/NEXT functions: Single argument (line 44, 173)
- AFTER MATCH SKIP: TO NEXT ROW (line 182), PAST LAST ROW (line 198)
- Aggregate integration: sum, avg, count, max, min (line 258-277)
- Window function integration: first_value, last_value, nth_value (line
34-35)
- JOIN/CTE: JOIN (line 313), WITH (line 324)
- View: VIEW creation, pg_get_viewdef (line 390-406)
- Error cases: 7 error scenarios (line 409-532)
- Large partition: 5000 row smoke test (line 360-387)
- ROWS BETWEEN offset: CURRENT ROW AND 2 FOLLOWING (line 244)

3.2 Untested Items

Feature                  Priority   Recommendation
-------------------------------------------------------------------------------
26 variable limit        Medium     Test 26 variables success, 27th
variable error
NULL value handling      Low        Test PREV(col) where col or previous
row is NULL

Sample test for 26 variable limit:

    -- Should fail with 27th variable (parser error, no table needed)
    SELECT * FROM (SELECT 1 AS x) t
    WINDOW w AS (
      ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
      PATTERN (a b c d e f g h i j k l m n o p q r s t u v w x y z aa)
      DEFINE a AS TRUE, b AS TRUE, c AS TRUE, d AS TRUE, e AS TRUE, f AS
TRUE,
             g AS TRUE, h AS TRUE, i AS TRUE, j AS TRUE, k AS TRUE, l AS
TRUE,
             m AS TRUE, n AS TRUE, o AS TRUE, p AS TRUE, q AS TRUE, r AS
TRUE,
             s AS TRUE, t AS TRUE, u AS TRUE, v AS TRUE, w AS TRUE, x AS
TRUE,
             y AS TRUE, z AS TRUE, aa AS TRUE
    );
    -- ERROR: number of row pattern definition variable names exceeds 26

Sample test for NULL handling:

    CREATE TEMP TABLE stock_null (company TEXT, tdate DATE, price INTEGER);
    INSERT INTO stock_null VALUES ('c1', '2023-07-01', 100);
    INSERT INTO stock_null VALUES ('c1', '2023-07-02', NULL);  -- NULL in
middle
    INSERT INTO stock_null VALUES ('c1', '2023-07-03', 200);
    INSERT INTO stock_null VALUES ('c1', '2023-07-04', 150);

    SELECT company, tdate, price, count(*) OVER w AS match_count
    FROM stock_null
    WINDOW w AS (
      PARTITION BY company
      ORDER BY tdate
      ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
      PATTERN (START UP DOWN)
      DEFINE START AS TRUE, UP AS price > PREV(price), DOWN AS price <
PREV(price)
    );
    -- Result: all rows show match_count = 0 (NULL breaks pattern matching)

3.3 Unimplemented Features (No Test Needed)

Per documentation (select.sgml:1133-1136):
- MEASURES clause: Not implemented
- SUBSET clause: Not implemented
- AFTER MATCH variants: Only SKIP TO NEXT ROW, PAST LAST ROW supported

Not in documentation, verified by testing:
- {n,m} quantifier: Parser error ("syntax error at or near {")
- (A|B) OR pattern: Not supported (parsed as single variable name "a|b")
- (A B)+ compound repetition: Parser accepts, but execution returns 0
matches


4. CODE COVERAGE ANALYSIS
-------------------------

4.1 Overall Coverage

96.4% (732 / 759 lines)

4.2 Coverage by File (major RPR-modified files)

nodeWindowAgg.c:    96.6% (560/580) - Pattern matching execution engine
parse_clause.c:     96.7% (88/91) - PATTERN/DEFINE analysis
ruleutils.c:        93.1% (54/58) - pg_get_viewdef output

4.3 Uncovered Code Risk Assessment

Total: 27 lines uncovered

Medium Risk (2 items) - Test addition recommended (see section 3.2):
- parse_clause.c:4093: transformDefineClause - 27th variable error
- nodeWindowAgg.c:5623: get_slots - null_slot for PREV at partition first
row

Low Risk (25 items) - Defensive code / Unreachable:
- nodeWindowAgg.c:3074: attno_map_walker - Parser validates arg count
- nodeWindowAgg.c:3137-3138: rpr_navigation_walker - switch default
- nodeWindowAgg.c:3566: window_gettupleslot - Before mark position
- nodeWindowAgg.c:4289: WinGetFuncArgInFrame - isout flag
- nodeWindowAgg.c:4393: WinGetSlotInFrame - Boundary check
- nodeWindowAgg.c:4618-4619: row_is_in_reduced_frame - switch default
- nodeWindowAgg.c:4697: register_reduced_frame_map - pos < 0 check
- nodeWindowAgg.c:5007: search_str_set - NULL return continue
- nodeWindowAgg.c:5405: do_pattern_match - Regex compile error
- nodeWindowAgg.c:5435,5437-5438: do_pattern_match - Regex exec error
- nodeWindowAgg.c:5700: pattern_initial - Variable not found
- nodeWindowAgg.c:5776: string_set_get - Index range check
- nodeWindowAgg.c:5850: variable_pos_register - a-z range check
- nodeWindowAgg.c:5910: variable_pos_fetch - a-z range check
- nodeWindowAgg.c:5913: variable_pos_fetch - Index range check
- parse_clause.c:3989: transformDefineClause - A_Expr type check
- parse_clause.c:4145: transformPatternClause - A_Expr type check
- ruleutils.c:6904-6908: get_rule_windowspec - SKIP TO NEXT ROW output

Conclusion: Most uncovered code consists of defensive error handling or
code unreachable due to parser pre-validation. No security or functional
risk.


5. COMMIT ANALYSIS
------------------

8 sequential commits:

Commit  Area            Files   +/-         Key Content
-------------------------------------------------------------------------------
1       Raw Parser      4       +174/-16    gram.y grammar (PATTERN/DEFINE)
2       Parse/Analysis  4       +277/-1     parse_clause.c analysis logic
3       Rewriter        1       +109/-0     pg_get_viewdef extension
4       Planner         5       +73/-3      WindowAgg plan node extension
5       Executor        4       +1,942/-11  CORE: Pattern matching engine
(+1,850)
6       Docs            3       +192/-7     advanced.sgml, func-window.sgml
7       Tests           3       +1,585/-1   rpr.sql (532), rpr.out (1,052)
8       typedefs        1       +6/-0       pgindent support

Code Change Statistics:
- Total files: 25
- Lines added: 4,358
- Lines deleted: 39
- Net increase: +4,319 lines


6. RECOMMENDATIONS
------------------

6.1 Combinatorial Explosion Prevention (Major, Required)

Add work_mem-based memory limit for StringSet allocation.
Location: string_set_add() in nodeWindowAgg.c:5741-5750
Consistent with existing PostgreSQL approach (Hash Join, Sort, etc.)

6.2 Code Review Required (Minor, Decision Needed)

Location: nodeWindowAgg.c:5849,5909,5912
Issue: pos > NUM_ALPHABETS check intent unclear
Current: PATTERN with 28 elements causes error
Question: Is 27 element limit intentional?

6.3 Code Style Fixes (Minor)

- #ifdef RPR_DEBUG: Use elog(DEBUG2, ...) or remove (25 blocks)
- static keyword on separate line: 26 functions to fix
- Whitespace: "regexp !=NULL" -> "regexp != NULL"
- Error message case: "Unrecognized" -> "unrecognized"

6.4 Documentation Fixes (Minor)

- select.sgml: "maximu" -> "maximum"

6.5 Test Additions (Optional)

Black-box Tests (Functional):

Feature              Test Case                                    Priority
-------------------------------------------------------------------------------
Variable limit       26 variables success, 27 error               Medium
NULL boundary        PREV at partition first row                  Medium

White-box Tests (Coverage) - covered by above black-box tests:

File:Line                          Target Branch
-------------------------------------------------------------------------------
parse_clause.c:4093                Limit error branch (Variable limit test)
nodeWindowAgg.c:5623               null_slot usage (NULL boundary test)


7. CONCLUSION
-------------

Test Quality: GOOD

Core functionality is thoroughly tested with comprehensive error case
coverage.

The patch is well-implemented within its defined scope. Identified issues
include
one major concern (combinatorial explosion) and minor style/documentation
items.

Recommended actions before commit:
1. Add work_mem-based memory limit for pattern combinations (required)
2. Clarify pos > NUM_ALPHABETS check intent (review needed)
3. Fix code style issues (optional but recommended)
4. Fix documentation typo (trivial)
5. Add tests for variable limit and NULL handling (optional)

Points for discussion (optional):
6. Incremental matching with streaming NFA redesign


Attachment:
- coverage.tgz (gcov HTML report, RPR-modified code only)


---
End of Report

2025년 9월 24일 (수) PM 7:36, Tatsuo Ishii <ishii@postgresql.org>님이 작성:

> Attached are the v33 patches for Row pattern recognition.  The
> difference from v32 is that the raw parse tree printing patch is not
> included in v33. This is because now that we have
> debug_print_raw_parse.
>
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS K.K.
> English: http://www.sraoss.co.jp/index_en/
> Japanese:http://www.sraoss.co.jp
>